Message8180

Author jeff.allen
Recipients jeff.allen
Date 2013-11-25.07:57:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1385366231.04.0.499864771148.issue2100@psf.upfronthosting.co.za>
In-reply-to
Content
Correcting my spelling and adding a solution idea ...

A clean solution might be a general change to the way we index PyUnicode. In general the UTF-16 contains a scatter of surrogate pairs so that the code unit index is offset from the character index. We could keep a table of offsets for converting one index to the other.

Mulling this over, it seems common string operations (find, replace etc.) could still use the java.lang.String implementations, and index translation (if neccesary at all) would be constant-time.
History
Date User Action Args
2013-11-25 07:57:11jeff.allensetmessageid: <1385366231.04.0.499864771148.issue2100@psf.upfronthosting.co.za>
2013-11-25 07:57:11jeff.allensetrecipients: + jeff.allen
2013-11-25 07:57:10jeff.allenlinkissue2100 messages
2013-11-25 07:57:10jeff.allencreate