Message11669

Author jeff.allen
Recipients jeff.allen, stefan.richthofer, zyasoft
Date 2017-11-21.23:22:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1511306571.01.0.213398074469.issue2638@psf.upfronthosting.co.za>
In-reply-to
Content
We now have fairly complete support for default encoding when mixing bytes and unicode, thanks to these two change sets:
https://hg.python.org/jython/rev/78482073e91f
https://hg.python.org/jython/rev/f71e0b2cfaf7

People who call sys.setdefaultencoding should now have an experience closer to CPython. I'm hoping this will help with #2633 in these circumstances.

Reading test_unicode_jy.DefaultDecodingTestCase gives a pretty good account of where Jython diverges from CPython, by doing more. The test passes for CPython 2.7.14, thanks to a few if-statements testing for Jython. I've made what I think is a reasonable compromise between CPython behaviour and consistency in the comparisons/equality.

One loose end: str.find(unicode) returns an index in the encoded string, not a byte offset in the original. I think this is wrong, but is what CPython does.

Oh, and I may have broken shadowstring ... is there a test? I'd quite like to modify startswith.
History
Date User Action Args
2017-11-21 23:22:51jeff.allensetmessageid: <1511306571.01.0.213398074469.issue2638@psf.upfronthosting.co.za>
2017-11-21 23:22:51jeff.allensetrecipients: + jeff.allen, zyasoft, stefan.richthofer
2017-11-21 23:22:50jeff.allenlinkissue2638 messages
2017-11-21 23:22:50jeff.allencreate