Issue439688

classification

Title:	Syntax error for non-ascii characters
Type:		Severity:	normal
Components:	Core	Versions:
		Milestone:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	bckfnn	Nosy List:	bckfnn
Priority:	normal	Keywords:

Created on 2001-07-09.12:05:08 by bckfnn, last changed 2001-07-21.09:24:03 by bckfnn.

Messages
msg341 (view)	Author: Finn Bock (bckfnn)	Date: 2001-07-09.12:05:08
When entering a big O-Umlaut in a euro-centric windows command prompt this happens: 1) Java reads it as 0x99. 2) Jython (wrongly) puts it trough the default encoding which return unicode 0x2122. 3) This is passed to the javaCC parser which assumes it is only dealing with ascii and cuts away the top 8 bits. 4) The result is 0x22 (a double quote) which cause a syntax error. >>> "Ö" Traceback (innermost last): (no code object) at line 0 File "<console>", line 2 SyntaxError: Lexical error at line 2, column 0. Encountered: <EOF> after : "" >>>
msg342 (view)	Author: Finn Bock (bckfnn)	Date: 2001-07-19.19:18:43
Logged In: YES user_id=4201 Added as test302
msg343 (view)	Author: Finn Bock (bckfnn)	Date: 2001-07-21.09:24:03
Logged In: YES user_id=4201 Fixed in parser.java: 2.10; python.jjt: 2.15. The parser now operate on the full unicode input as created by a default Reader. This fixes the syntax error but the using a default reader is still wrong when reading from the windows console. See patch #442906 for a way of fixing the encoding part of this bug.

History
Date	User	Action	Args
2001-07-09 12:05:08	bckfnn	create