Message7987

Author Dolda2000
Recipients Dolda2000
Date 2013-04-06.03:02:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1365217340.57.0.797634890825.issue2037@psf.upfronthosting.co.za>
In-reply-to
Content
Byte-strings can contain elements that aren't bytes. The problem is easily reproduced, like this:

$ jython
Jython 2.5.2 (Debian:hg/91332231a448, May 8 2012, 09:50:46) 
[OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0_27
>>> foo = str(java.lang.String(u"\u1234"))
>>> print foo
?
>>> foo
'\u1234'

I can't say I know what the proper solution to this problem would be, but it seems strange that byte-strings should be able to contain non-byte elements.

It also seems like a bug in itself that the repr() representation of such an object does not reproduce the same object when eval'ed:

>>> eval(repr(foo))
'\\u1234'

It is also worth noting that such strings are poison even to Unicode codecs that should be able to handle any bytestring without choking:

>>> unicode(foo, "latin1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'latin-1' codec can't decode byte 0x34 in position 0: ordinal not in range(256)

Perhaps str() should raise an exception when such objects would be created?
History
Date User Action Args
2013-04-06 03:02:20Dolda2000setrecipients: + Dolda2000
2013-04-06 03:02:20Dolda2000setmessageid: <1365217340.57.0.797634890825.issue2037@psf.upfronthosting.co.za>
2013-04-06 03:02:20Dolda2000linkissue2037 messages
2013-04-06 03:02:19Dolda2000create