Issue1696

classification
Title: PythonInterpreter.exec(String) should reject magic encoding comments like compile(unicode)
Type: behaviour Severity: normal
Components: Any Versions: Jython 2.5
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: Nosy List: fwierzbicki, jpulakka, pjenvey
Priority: low Keywords:

Created on 2011-01-12.08:20:09 by jpulakka, last changed 2013-02-19.23:38:13 by fwierzbicki.

Files
File name Uploaded Description Edit Remove
JythonEncodingIssue.java jpulakka, 2011-01-12.08:20:09 Program to demonstrate the issue
unnamed jpulakka, 2011-01-19.07:24:53
Messages
msg6325 (view) Author: Philip Jenvey (pjenvey) Date: 2011-01-19.01:01:38
I can reproduce this issue on OS X, but on this platform, your test case ends up being invalid for a couple reasons:

1) When I compile this under OS X javac the a w/ umlat char is not interpreted correctly -- because the OSX file.encoding defaults to MacRoman. Make sure the file.encoding value for your platform is what you think it is (it should be utf-8 since that's how your .java file is encoded). You can force the encoding by specifying -encoding utf8 to javac

2) When I force javac's encoding to utf8 the generated file.py is correct but the test still fails. That's because it's comparing raw utf-8 encoded bytes (as a Python str) to a Java string. If you instead declare the ae variable as a Python unicode str (change ae = '<ae from java>' to ae = u'<ae from java>' ) the test passes
msg6326 (view) Author: Joonas Pulakka (jpulakka) Date: 2011-01-19.07:24:54
Ah, I forgot the u-syntax. And yes, I did assume UTF-8 in the Java source.

But there's still an inconsistency: If I parse the program directly via
PythonInterpreter.exec(String), then it throws "SyntaxError: Illegal
character in file '<string>' for encoding 'utf-8'", regardless of whether I
use the u-syntax or not. But if I omit the "# coding: utf-8" comment in the
beginning of the string, then the test passes both with _and without_ the
u-syntax.

Granted, PEP 263 says nothing about how source code in _strings_ (instead of
files) should be interpreted. But I think it should work consistently unless
otherwise specified.

2011/1/19 Philip Jenvey <report@bugs.jython.org>

>
> New submission from Philip Jenvey <pjenvey@underboss.org>:
>
> I can reproduce this issue on OS X, but on this platform, your test case
> ends up being invalid for a couple reasons:
>
> 1) When I compile this under OS X javac the a w/ umlat char is not
> interpreted correctly -- because the OSX file.encoding defaults to MacRoman.
> Make sure the file.encoding value for your platform is what you think it is
> (it should be utf-8 since that's how your .java file is encoded). You can
> force the encoding by specifying -encoding utf8 to javac
>
> 2) When I force javac's encoding to utf8 the generated file.py is correct
> but the test still fails. That's because it's comparing raw utf-8 encoded
> bytes (as a Python str) to a Java string. If you instead declare the ae
> variable as a Python unicode str (change ae = '<ae from java>' to ae = u'<ae
> from java>' ) the test passes
>
> ----------
> nosy: +pjenvey
> resolution:  -> invalid
> status: open -> closed
>
> _______________________________________
> Jython tracker <report@bugs.jython.org>
> <http://bugs.jython.org/issue1696>
> _______________________________________
>
msg6327 (view) Author: Philip Jenvey (pjenvey) Date: 2011-01-20.00:13:24
Ok, you have encountered a minor issue with PythonInterpreter.exec(String). It should be acting like compile(unicode), which would reject your example with a SyntaxError because it includes a magic encoding comment. From PEP 263:

The builtin compile() API will be enhanced to accept Unicode as
    input. 8-bit string input is subject to the standard procedure for
    encoding detection as described above.

    If a Unicode string with a coding declaration is passed to compile(),
    a SyntaxError will be raised.

It should be rejected because it doesn't make sense to tell the parser how to decode source code that's already decoded. The magic encoding comment is only applicable to encoded data (like plain Python strs or Java bytes)

We do raise a SyntaxError here in accordance with the PEP, but I'll leave this issue open because as our SyntaxError message doesn't make the issue clear. It should match CPython's:

Python 2.6.4 (r264:75706, Dec  8 2009, 15:56:45) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> compile(u'# coding: utf-8\n', '<file>', 'exec')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<file>", line 0
SyntaxError: encoding declaration in Unicode string
History
Date User Action Args
2013-02-19 23:38:13fwierzbickisetversions: + Jython 2.5, - 2.5.2rc
2013-02-19 23:38:00fwierzbickisetpriority: low
nosy: + fwierzbicki
resolution: remind
2011-01-20 00:13:25pjenveysetstatus: closed -> open
resolution: invalid -> (no value)
messages: + msg6327
title: Module importing doesn't respect encoding (PEP 263) -> PythonInterpreter.exec(String) should reject magic encoding comments like compile(unicode)
2011-01-19 07:24:54jpulakkasetfiles: + unnamed
messages: + msg6326
2011-01-19 01:01:38pjenveysetstatus: open -> closed
resolution: invalid
messages: + msg6325
nosy: + pjenvey
2011-01-12 08:20:10jpulakkacreate