Message8163

Author	jeff.allen
Recipients	jeff.allen
Date	2013-10-27.07:24:57
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1382858698.71.0.208864105053.issue2100@psf.upfronthosting.co.za>
In-reply-to

Content
Our implementation of the unicode type does not always deal correctly with those codepoints represented by surrogate pairs. For example: >>> s = u"\U00010000a" >>> s.index('a') 2 >>> s[1] u'a' This definitely affects the "find" family of methods (find, rfind, index, rindex) in their simplest for. In other cases, the fault is more subtle, being revealed only when a sub-range is the effective target. >>> s = u"\U00010000hello world" >>> s.startswith("hell",1) False >>> s.startswith("hell",2) True

Our implementation of the unicode type does not always deal correctly with those codepoints represented by surrogate pairs. For example:
>>> s = u"\U00010000a"
>>> s.index('a')
2
>>> s[1]
u'a'

This definitely affects the "find" family of methods (find, rfind, index, rindex) in their simplest for. In other cases, the fault is more
subtle, being revealed only when a sub-range is the effective target.

>>> s = u"\U00010000hello world"
>>> s.startswith("hell",1)
False
>>> s.startswith("hell",2)
True

History
Date	User	Action	Args
2013-10-27 07:24:58	jeff.allen	set	recipients: + jeff.allen
2013-10-27 07:24:58	jeff.allen	set	messageid: <1382858698.71.0.208864105053.issue2100@psf.upfronthosting.co.za>
2013-10-27 07:24:58	jeff.allen	link	issue2100 messages
2013-10-27 07:24:57	jeff.allen	create