Message7862

Author amak
Recipients amak, fwierzbicki, pekka.klarck
Date 2013-02-28.01:10:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1362013850.58.0.66786974758.issue1865@psf.upfronthosting.co.za>
In-reply-to
Content
The problem with using java.nio.charset.CharsetEncoder and java.nio.charset.CharsetDecoder is that they don't have a customizable replacement mechanism, which is required for python codecs, to implement the 'xmlcharrefreplace' and 'backslashreplace' error handling methods.

http://docs.python.org/2/library/codecs.html

In order to support these errors methods, the input has to be processed character by character, checking for every character if the character can be encoded. This approach can be seen in this jsoup code

https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java

See the "escape" method.

The problem with this is that the documentation for "canEncode" says "The default implementation of this method is not very efficient; it should generally be overridden to improve performance."

Having looked at the implementation, it is indeed very inefficient: performance would be very poor.
History
Date User Action Args
2013-02-28 01:10:50amaksetmessageid: <1362013850.58.0.66786974758.issue1865@psf.upfronthosting.co.za>
2013-02-28 01:10:50amaksetrecipients: + amak, fwierzbicki, pekka.klarck
2013-02-28 01:10:50amaklinkissue1865 messages
2013-02-28 01:10:50amakcreate