Message8163
Our implementation of the unicode type does not always deal correctly with those codepoints represented by surrogate pairs. For example:
>>> s = u"\U00010000a"
>>> s.index('a')
2
>>> s[1]
u'a'
This definitely affects the "find" family of methods (find, rfind, index, rindex) in their simplest for. In other cases, the fault is more
subtle, being revealed only when a sub-range is the effective target.
>>> s = u"\U00010000hello world"
>>> s.startswith("hell",1)
False
>>> s.startswith("hell",2)
True |
|
Date |
User |
Action |
Args |
2013-10-27 07:24:58 | jeff.allen | set | recipients:
+ jeff.allen |
2013-10-27 07:24:58 | jeff.allen | set | messageid: <1382858698.71.0.208864105053.issue2100@psf.upfronthosting.co.za> |
2013-10-27 07:24:58 | jeff.allen | link | issue2100 messages |
2013-10-27 07:24:57 | jeff.allen | create | |
|