Ah, I forgot the u-syntax. And yes, I did assume UTF-8 in the Java source.

But there's still an inconsistency: If I parse the program directly via PythonInterpreter.exec(String), then it throws "SyntaxError: Illegal character in file '<string>' for encoding 'utf-8'", regardless of whether I use the u-syntax or not. But if I omit the "# coding: utf-8" comment in the beginning of the string, then the test passes both with _and without_ the u-syntax.

Granted, PEP 263 says nothing about how source code in _strings_ (instead of files) should be interpreted. But I think it should work consistently unless otherwise specified.

2011/1/19 Philip Jenvey <report@bugs.jython.org>

New submission from Philip Jenvey <pjenvey@underboss.org>:

I can reproduce this issue on OS X, but on this platform, your test case ends up being invalid for a couple reasons:

1) When I compile this under OS X javac the a w/ umlat char is not interpreted correctly -- because the OSX file.encoding defaults to MacRoman. Make sure the file.encoding value for your platform is what you think it is (it should be utf-8 since that's how your .java file is encoded). You can force the encoding by specifying -encoding utf8 to javac

2) When I force javac's encoding to utf8 the generated file.py is correct but the test still fails. That's because it's comparing raw utf-8 encoded bytes (as a Python str) to a Java string. If you instead declare the ae variable as a Python unicode str (change ae = '<ae from java>' to ae = u'<ae from java>' ) the test passes

----------
nosy: +pjenvey
resolution:  -> invalid
status: open -> closed

_______________________________________
Jython tracker <report@bugs.jython.org>
<http://bugs.jython.org/issue1696>
_______________________________________