Message8163

Author jeff.allen
Recipients jeff.allen
Date 2013-10-27.07:24:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1382858698.71.0.208864105053.issue2100@psf.upfronthosting.co.za>
In-reply-to
Content
Our implementation of the unicode type does not always deal correctly with those codepoints represented by surrogate pairs. For example:
>>> s = u"\U00010000a"
>>> s.index('a')
2
>>> s[1]
u'a'

This definitely affects the "find" family of methods (find, rfind, index, rindex) in their simplest for. In other cases, the fault is more
subtle, being revealed only when a sub-range is the effective target.

>>> s = u"\U00010000hello world"
>>> s.startswith("hell",1)
False
>>> s.startswith("hell",2)
True
History
Date User Action Args
2013-10-27 07:24:58jeff.allensetrecipients: + jeff.allen
2013-10-27 07:24:58jeff.allensetmessageid: <1382858698.71.0.208864105053.issue2100@psf.upfronthosting.co.za>
2013-10-27 07:24:58jeff.allenlinkissue2100 messages
2013-10-27 07:24:57jeff.allencreate