Issue2037

classification
Title: Byte-string containing elements greater than 255
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.5
process
Status: open Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: Dolda2000, fwierzbicki, zyasoft
Priority: high Keywords:

Created on 2013-04-06.03:02:20 by Dolda2000, last changed 2014-06-18.17:51:53 by zyasoft.

Messages
msg7987 (view) Author: Fredrik Tolf (Dolda2000) Date: 2013-04-06.03:02:19
Byte-strings can contain elements that aren't bytes. The problem is easily reproduced, like this:

$ jython
Jython 2.5.2 (Debian:hg/91332231a448, May 8 2012, 09:50:46) 
[OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0_27
>>> foo = str(java.lang.String(u"\u1234"))
>>> print foo
?
>>> foo
'\u1234'

I can't say I know what the proper solution to this problem would be, but it seems strange that byte-strings should be able to contain non-byte elements.

It also seems like a bug in itself that the repr() representation of such an object does not reproduce the same object when eval'ed:

>>> eval(repr(foo))
'\\u1234'

It is also worth noting that such strings are poison even to Unicode codecs that should be able to handle any bytestring without choking:

>>> unicode(foo, "latin1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'latin-1' codec can't decode byte 0x34 in position 0: ordinal not in range(256)

Perhaps str() should raise an exception when such objects would be created?
msg8329 (view) Author: Jim Baker (zyasoft) Date: 2014-05-04.20:17:07
Wrapping a java.lang.String with str should make this check
msg8460 (view) Author: Jim Baker (zyasoft) Date: 2014-05-21.20:32:34
Target beta 4
History
Date User Action Args
2014-06-18 17:51:53zyasoftsetpriority: high
2014-05-21 20:32:34zyasoftsetmessages: + msg8460
2014-05-04 20:17:07zyasoftsetresolution: accepted
messages: + msg8329
nosy: + zyasoft
2013-04-08 17:31:24fwierzbickisetnosy: + fwierzbicki
2013-04-06 03:02:20Dolda2000create