Message11637

Author jeff.allen
Recipients jeff.allen
Date 2017-10-30.07:54:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1509350092.9.0.213398074469.issue2638@psf.upfronthosting.co.za>
In-reply-to
Content
Incidental to working on #2632, I noticed that mixed comparisons of unicode and str do not produce the same results in Jython as in CPython.

CPython:

>>> u = u"caf\xe9"
>>> u == u.encode('latin-1')
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

Jython:

>>> u = u"caf\xe9"
>>> u == u.encode('latin-1')
True

CPython converts the str (or whatever is opposite on the operator) into a unicode, if it can, using the default encoding. Jython just compares the internal Java string without reference to the default encoding. The default is ASCII normally, but may be changed with sys.setdefaultencoding('utf-8'), which is likely to happen exactly when this kind of encoding is significant to use.

We should check all the binary operation of PyUnicode for this problem.
History
Date User Action Args
2017-10-30 07:54:52jeff.allensetrecipients: + jeff.allen
2017-10-30 07:54:52jeff.allensetmessageid: <1509350092.9.0.213398074469.issue2638@psf.upfronthosting.co.za>
2017-10-30 07:54:52jeff.allenlinkissue2638 messages
2017-10-30 07:54:51jeff.allencreate