Message9966

Author jeff.allen
Recipients gsnedders, jeff.allen, zyasoft
Date 2015-04-26.00:01:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1430006486.03.0.455334677308.issue2340@psf.upfronthosting.co.za>
In-reply-to
Content
I favour an approach like that of PEP 393: a unicode object is an immutable sequence of Unicode code points. It is based on an int[] array if it contains any *supplementary* code points. Is the PyUCS4 subclass effectively this pluggable representation?

There are several ways to define a conversion to a Java String, or back, depending on how surrogates and supplementary characters are treated. (The most tolerant is not reversible.) We should choose consciously the one we invoke each time.

The methods of PyUnicode would no longer be implemented using java.lang.String, but by our own code. In the end, I think that is no more difficult to get right as we have byte versions already in bytearray.

I guess I knew that clever indexing trick I provided would eventually be redundant :(
History
Date User Action Args
2015-04-26 00:01:26jeff.allensetmessageid: <1430006486.03.0.455334677308.issue2340@psf.upfronthosting.co.za>
2015-04-26 00:01:26jeff.allensetrecipients: + jeff.allen, zyasoft, gsnedders
2015-04-26 00:01:26jeff.allenlinkissue2340 messages
2015-04-26 00:01:25jeff.allencreate