Issue439688

classification
Title: Syntax error for non-ascii characters
Type: Severity: normal
Components: Core Versions:
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: bckfnn Nosy List: bckfnn
Priority: normal Keywords:

Created on 2001-07-09.12:05:08 by bckfnn, last changed 2001-07-21.09:24:03 by bckfnn.

Messages
msg341 (view) Author: Finn Bock (bckfnn) Date: 2001-07-09.12:05:08
When entering a big O-Umlaut in a euro-centric windows 
command prompt this happens:

1) Java reads it as 0x99.
2) Jython (wrongly) puts it trough the default
   encoding which return unicode 0x2122.
3) This is passed to the javaCC parser which assumes
   it is only dealing with ascii and cuts away the
   top 8 bits.
4) The result is 0x22 (a double quote) which cause a 
   syntax error.


>>> "Ö"
Traceback (innermost last):
  (no code object) at line 0
  File "<console>", line 2
SyntaxError: Lexical error at line 2, column 0.  
Encountered: <EOF> after : ""
>>>
msg342 (view) Author: Finn Bock (bckfnn) Date: 2001-07-19.19:18:43
Logged In: YES 
user_id=4201

Added as test302
msg343 (view) Author: Finn Bock (bckfnn) Date: 2001-07-21.09:24:03
Logged In: YES 
user_id=4201

Fixed in parser.java: 2.10; python.jjt: 2.15.

The parser now operate on the full unicode input as created 
by a default Reader. This fixes the syntax error but the 
using a default reader is still wrong when reading from the 
windows console. See patch #442906 for a way of fixing the 
encoding part of this bug.
History
Date User Action Args
2001-07-09 12:05:08bckfnncreate