Message7862
The problem with using java.nio.charset.CharsetEncoder and java.nio.charset.CharsetDecoder is that they don't have a customizable replacement mechanism, which is required for python codecs, to implement the 'xmlcharrefreplace' and 'backslashreplace' error handling methods.
http://docs.python.org/2/library/codecs.html
In order to support these errors methods, the input has to be processed character by character, checking for every character if the character can be encoded. This approach can be seen in this jsoup code
https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/nodes/Entities.java
See the "escape" method.
The problem with this is that the documentation for "canEncode" says "The default implementation of this method is not very efficient; it should generally be overridden to improve performance."
Having looked at the implementation, it is indeed very inefficient: performance would be very poor. |
|
Date |
User |
Action |
Args |
2013-02-28 01:10:50 | amak | set | messageid: <1362013850.58.0.66786974758.issue1865@psf.upfronthosting.co.za> |
2013-02-28 01:10:50 | amak | set | recipients:
+ amak, fwierzbicki, pekka.klarck |
2013-02-28 01:10:50 | amak | link | issue1865 messages |
2013-02-28 01:10:50 | amak | create | |
|