Issue1696

classification

Title:	PythonInterpreter.exec(String) should reject magic encoding comments like compile(unicode)
Type:	behaviour	Severity:	normal
Components:	Any	Versions:	Jython 2.7
		Milestone:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	fwierzbicki, jeff.allen, jpulakka, pjenvey
Priority:	low	Keywords:

Created on 2011-01-12.08:20:09 by jpulakka, last changed 2018-03-17.19:24:46 by jeff.allen.

Files
File name	Uploaded	Description	Edit	Remove
JythonEncodingIssue.java	jpulakka, 2011-01-12.08:20:09	Program to demonstrate the issue
unnamed	jpulakka, 2011-01-19.07:24:53

Messages
msg6325 (view)	Author: Philip Jenvey (pjenvey)	Date: 2011-01-19.01:01:38
I can reproduce this issue on OS X, but on this platform, your test case ends up being invalid for a couple reasons: 1) When I compile this under OS X javac the a w/ umlat char is not interpreted correctly -- because the OSX file.encoding defaults to MacRoman. Make sure the file.encoding value for your platform is what you think it is (it should be utf-8 since that's how your .java file is encoded). You can force the encoding by specifying -encoding utf8 to javac 2) When I force javac's encoding to utf8 the generated file.py is correct but the test still fails. That's because it's comparing raw utf-8 encoded bytes (as a Python str) to a Java string. If you instead declare the ae variable as a Python unicode str (change ae = '<ae from java>' to ae = u'<ae from java>' ) the test passes
msg6326 (view)	Author: Joonas Pulakka (jpulakka)	Date: 2011-01-19.07:24:54
Ah, I forgot the u-syntax. And yes, I did assume UTF-8 in the Java source. But there's still an inconsistency: If I parse the program directly via PythonInterpreter.exec(String), then it throws "SyntaxError: Illegal character in file '<string>' for encoding 'utf-8'", regardless of whether I use the u-syntax or not. But if I omit the "# coding: utf-8" comment in the beginning of the string, then the test passes both with _and without_ the u-syntax. Granted, PEP 263 says nothing about how source code in _strings_ (instead of files) should be interpreted. But I think it should work consistently unless otherwise specified. 2011/1/19 Philip Jenvey <report@bugs.jython.org> > > New submission from Philip Jenvey <pjenvey@underboss.org>: > > I can reproduce this issue on OS X, but on this platform, your test case > ends up being invalid for a couple reasons: > > 1) When I compile this under OS X javac the a w/ umlat char is not > interpreted correctly -- because the OSX file.encoding defaults to MacRoman. > Make sure the file.encoding value for your platform is what you think it is > (it should be utf-8 since that's how your .java file is encoded). You can > force the encoding by specifying -encoding utf8 to javac > > 2) When I force javac's encoding to utf8 the generated file.py is correct > but the test still fails. That's because it's comparing raw utf-8 encoded > bytes (as a Python str) to a Java string. If you instead declare the ae > variable as a Python unicode str (change ae = '<ae from java>' to ae = u'<ae > from java>' ) the test passes > > ---------- > nosy: +pjenvey > resolution: -> invalid > status: open -> closed > > _______________________________________ > Jython tracker <report@bugs.jython.org> > <http://bugs.jython.org/issue1696> > _______________________________________ >
msg6327 (view)	Author: Philip Jenvey (pjenvey)	Date: 2011-01-20.00:13:24
Ok, you have encountered a minor issue with PythonInterpreter.exec(String). It should be acting like compile(unicode), which would reject your example with a SyntaxError because it includes a magic encoding comment. From PEP 263: The builtin compile() API will be enhanced to accept Unicode as input. 8-bit string input is subject to the standard procedure for encoding detection as described above. If a Unicode string with a coding declaration is passed to compile(), a SyntaxError will be raised. It should be rejected because it doesn't make sense to tell the parser how to decode source code that's already decoded. The magic encoding comment is only applicable to encoded data (like plain Python strs or Java bytes) We do raise a SyntaxError here in accordance with the PEP, but I'll leave this issue open because as our SyntaxError message doesn't make the issue clear. It should match CPython's: Python 2.6.4 (r264:75706, Dec 8 2009, 15:56:45) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> compile(u'# coding: utf-8\n', '<file>', 'exec') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<file>", line 0 SyntaxError: encoding declaration in Unicode string
msg11829 (view)	Author: Jeff Allen (jeff.allen)	Date: 2018-03-17.19:24:45
It matches now, so we should close. >>> compile(u'# coding: utf-8\n', '<file>', 'exec') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<file>", line 0 SyntaxError: encoding declaration in Unicode string https://hg.python.org/jython/rev/d2cba6eb8d3d

History
Date	User	Action	Args
2018-03-17 19:24:46	jeff.allen	set	status: open -> closed resolution: remind -> fixed messages: + msg11829 nosy: + jeff.allen versions: + Jython 2.7, - Jython 2.5
2013-02-19 23:38:13	fwierzbicki	set	versions: + Jython 2.5, - 2.5.2rc
2013-02-19 23:38:00	fwierzbicki	set	priority: low nosy: + fwierzbicki resolution: remind
2011-01-20 00:13:25	pjenvey	set	status: closed -> open resolution: invalid -> (no value) messages: + msg6327 title: Module importing doesn't respect encoding (PEP 263) -> PythonInterpreter.exec(String) should reject magic encoding comments like compile(unicode)
2011-01-19 07:24:54	jpulakka	set	files: + unnamed messages: + msg6326
2011-01-19 01:01:38	pjenvey	set	status: open -> closed resolution: invalid messages: + msg6325 nosy: + pjenvey
2011-01-12 08:20:10	jpulakka	create