Message6768

Author jeff.allen
Recipients fwierzbicki, jeff.allen
Date 2012-02-02.22:14:17
SpamBayes Score 8.104628e-15
Marked as misclassified No
Message-id <1328220858.35.0.872507017817.issue1836@psf.upfronthosting.co.za>
In-reply-to
Content
In the present tip (2b4f725d4d29 date Tue Jan 03 09:34:18 2012 -0800) the response of the Jython compiler rejects a string literal that contains invalid Unicode characters. This behaviour is divergent from CPython. As a result, valid Python programs, including the regression test CPythonLib\test\test_bytes.py, fail to run.

In interactive mode, Jython seems to miss the end of the string:

Jython 2.6a0+ (, Feb 2 2012, 19:46:58)
[Java HotSpot(TM) 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0_26
Type "help", "copyright", "credits" or "license" for more information.
>>> a = u"Hello world\n\u1234\u5678\u9abc\udef0"
...
...
... "
  File "<stdin>", line 4
    "
    ^
SyntaxError: no viable alternative at character '"'
>>>

In the same situation, CPython accepts the literal, although a subsequent attempt to transcode it, for example print it, may fail at run-time.

Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> a = u"Hello world\n\u1234\u5678\u9abc\udef0"
>>> print a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\python\27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 12-15: character maps to undefined>
>>> for c in a: print "%x" % ord(c)
...
48
65
6c
6c
6f
20
77
6f
72
6c
64
a
1234
5678
9abc
def0
>>>
History
Date User Action Args
2012-02-02 22:14:18jeff.allensetrecipients: + jeff.allen, fwierzbicki
2012-02-02 22:14:18jeff.allensetmessageid: <1328220858.35.0.872507017817.issue1836@psf.upfronthosting.co.za>
2012-02-02 22:14:18jeff.allenlinkissue1836 messages
2012-02-02 22:14:17jeff.allencreate