Message6769

Author jeff.allen
Recipients fwierzbicki, jeff.allen
Date 2012-02-02.22:44:34
SpamBayes Score 3.9722428e-07
Marked as misclassified No
Message-id <1328222674.98.0.0318623945123.issue1836@psf.upfronthosting.co.za>
In-reply-to
Content
Partial analysis ...

The error is effectively raised by org.python.core.PyString.hexescape(), which is trying to translate "\udef0" into a Unicode character. This accepts a value that controls how it responds to an invalid character code. Options are "ignore" (i.e. don't insert it), "replace" (with the standard Unicode replacement character), or "strict", meaning throw this error.

hexescape() is called (indirectly) from the parser at org.python.antlr.GrammarActions.extractToken() to convert the text. That is the place where a "strict" error policy is chosen.

None of the existing policies corresponds to inserting a character "unchecked", which appears to be the CPython policy. Either a fourth should be defined, or the behaviour of an existing policy changed.

This is not the only point at which the error policy may determine behaviour, so the other implications of not being "strict" should be examined.
History
Date User Action Args
2012-02-02 22:44:34jeff.allensetmessageid: <1328222674.98.0.0318623945123.issue1836@psf.upfronthosting.co.za>
2012-02-02 22:44:34jeff.allensetrecipients: + jeff.allen, fwierzbicki
2012-02-02 22:44:34jeff.allenlinkissue1836 messages
2012-02-02 22:44:34jeff.allencreate