Issue1696
Created on 2011-01-12.08:20:09 by jpulakka, last changed 2018-03-17.19:24:46 by jeff.allen.
File name |
Uploaded |
Description |
Edit |
Remove |
JythonEncodingIssue.java
|
jpulakka,
2011-01-12.08:20:09
|
Program to demonstrate the issue |
|
|
unnamed
|
jpulakka,
2011-01-19.07:24:53
|
|
|
|
msg6325 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2011-01-19.01:01:38 |
|
I can reproduce this issue on OS X, but on this platform, your test case ends up being invalid for a couple reasons:
1) When I compile this under OS X javac the a w/ umlat char is not interpreted correctly -- because the OSX file.encoding defaults to MacRoman. Make sure the file.encoding value for your platform is what you think it is (it should be utf-8 since that's how your .java file is encoded). You can force the encoding by specifying -encoding utf8 to javac
2) When I force javac's encoding to utf8 the generated file.py is correct but the test still fails. That's because it's comparing raw utf-8 encoded bytes (as a Python str) to a Java string. If you instead declare the ae variable as a Python unicode str (change ae = '<ae from java>' to ae = u'<ae from java>' ) the test passes
|
msg6326 (view) |
Author: Joonas Pulakka (jpulakka) |
Date: 2011-01-19.07:24:54 |
|
Ah, I forgot the u-syntax. And yes, I did assume UTF-8 in the Java source.
But there's still an inconsistency: If I parse the program directly via
PythonInterpreter.exec(String), then it throws "SyntaxError: Illegal
character in file '<string>' for encoding 'utf-8'", regardless of whether I
use the u-syntax or not. But if I omit the "# coding: utf-8" comment in the
beginning of the string, then the test passes both with _and without_ the
u-syntax.
Granted, PEP 263 says nothing about how source code in _strings_ (instead of
files) should be interpreted. But I think it should work consistently unless
otherwise specified.
2011/1/19 Philip Jenvey <report@bugs.jython.org>
>
> New submission from Philip Jenvey <pjenvey@underboss.org>:
>
> I can reproduce this issue on OS X, but on this platform, your test case
> ends up being invalid for a couple reasons:
>
> 1) When I compile this under OS X javac the a w/ umlat char is not
> interpreted correctly -- because the OSX file.encoding defaults to MacRoman.
> Make sure the file.encoding value for your platform is what you think it is
> (it should be utf-8 since that's how your .java file is encoded). You can
> force the encoding by specifying -encoding utf8 to javac
>
> 2) When I force javac's encoding to utf8 the generated file.py is correct
> but the test still fails. That's because it's comparing raw utf-8 encoded
> bytes (as a Python str) to a Java string. If you instead declare the ae
> variable as a Python unicode str (change ae = '<ae from java>' to ae = u'<ae
> from java>' ) the test passes
>
> ----------
> nosy: +pjenvey
> resolution: -> invalid
> status: open -> closed
>
> _______________________________________
> Jython tracker <report@bugs.jython.org>
> <http://bugs.jython.org/issue1696>
> _______________________________________
>
|
msg6327 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2011-01-20.00:13:24 |
|
Ok, you have encountered a minor issue with PythonInterpreter.exec(String). It should be acting like compile(unicode), which would reject your example with a SyntaxError because it includes a magic encoding comment. From PEP 263:
The builtin compile() API will be enhanced to accept Unicode as
input. 8-bit string input is subject to the standard procedure for
encoding detection as described above.
If a Unicode string with a coding declaration is passed to compile(),
a SyntaxError will be raised.
It should be rejected because it doesn't make sense to tell the parser how to decode source code that's already decoded. The magic encoding comment is only applicable to encoded data (like plain Python strs or Java bytes)
We do raise a SyntaxError here in accordance with the PEP, but I'll leave this issue open because as our SyntaxError message doesn't make the issue clear. It should match CPython's:
Python 2.6.4 (r264:75706, Dec 8 2009, 15:56:45)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> compile(u'# coding: utf-8\n', '<file>', 'exec')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<file>", line 0
SyntaxError: encoding declaration in Unicode string
|
msg11829 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2018-03-17.19:24:45 |
|
It matches now, so we should close.
>>> compile(u'# coding: utf-8\n', '<file>', 'exec')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<file>", line 0
SyntaxError: encoding declaration in Unicode string
https://hg.python.org/jython/rev/d2cba6eb8d3d
|
|
Date |
User |
Action |
Args |
2018-03-17 19:24:46 | jeff.allen | set | status: open -> closed resolution: remind -> fixed messages:
+ msg11829 nosy:
+ jeff.allen versions:
+ Jython 2.7, - Jython 2.5 |
2013-02-19 23:38:13 | fwierzbicki | set | versions:
+ Jython 2.5, - 2.5.2rc |
2013-02-19 23:38:00 | fwierzbicki | set | priority: low nosy:
+ fwierzbicki resolution: remind |
2011-01-20 00:13:25 | pjenvey | set | status: closed -> open resolution: invalid -> (no value) messages:
+ msg6327 title: Module importing doesn't respect encoding (PEP 263) -> PythonInterpreter.exec(String) should reject magic encoding comments like compile(unicode) |
2011-01-19 07:24:54 | jpulakka | set | files:
+ unnamed messages:
+ msg6326 |
2011-01-19 01:01:38 | pjenvey | set | status: open -> closed resolution: invalid messages:
+ msg6325 nosy:
+ pjenvey |
2011-01-12 08:20:10 | jpulakka | create | |
|