Message8180

Author	jeff.allen
Recipients	jeff.allen
Date	2013-11-25.07:57:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1385366231.04.0.499864771148.issue2100@psf.upfronthosting.co.za>
In-reply-to

Content
Correcting my spelling and adding a solution idea ... A clean solution might be a general change to the way we index PyUnicode. In general the UTF-16 contains a scatter of surrogate pairs so that the code unit index is offset from the character index. We could keep a table of offsets for converting one index to the other. Mulling this over, it seems common string operations (find, replace etc.) could still use the java.lang.String implementations, and index translation (if neccesary at all) would be constant-time.

Correcting my spelling and adding a solution idea ...

A clean solution might be a general change to the way we index PyUnicode. In general the UTF-16 contains a scatter of surrogate pairs so that the code unit index is offset from the character index. We could keep a table of offsets for converting one index to the other.

Mulling this over, it seems common string operations (find, replace etc.) could still use the java.lang.String implementations, and index translation (if neccesary at all) would be constant-time.

History
Date	User	Action	Args
2013-11-25 07:57:11	jeff.allen	set	messageid: <1385366231.04.0.499864771148.issue2100@psf.upfronthosting.co.za>
2013-11-25 07:57:11	jeff.allen	set	recipients: + jeff.allen
2013-11-25 07:57:10	jeff.allen	link	issue2100 messages
2013-11-25 07:57:10	jeff.allen	create