Issue2190
Created on 2014-08-14.14:07:29 by yecril71pl, last changed 2014-08-15.23:12:11 by alex.gronholm.
msg8920 (view) |
Author: Christopher Yeleighton (yecril71pl) |
Date: 2014-08-14.14:07:28 |
|
>>> unichr (0xD800)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(Unknown Source)
at org.python.core.PyString.encode_UnicodeEscape(PyString.java:227)
at org.python.core.PyUnicode.unicode___repr__(PyUnicode.java:245)
at org.python.core.PyUnicode.__repr__(PyUnicode.java:240)
at org.python.core.PySystemState.displayhook(PySystemState.java:1518)
at org.python.core.PySystemStateFunctions.__call__(PySystemState.java:17
12)
at org.python.core.PyObject.invoke(PyObject.java:3638)
at org.python.core.Py.printResult(Py.java:1920)
at org.python.pycode._pyx55.f$0(<stdin>:1)
at org.python.pycode._pyx55.call_function(<stdin>)
at org.python.core.PyTableCode.call(PyTableCode.java:165)
at org.python.core.PyCode.call(PyCode.java:18)
at org.python.core.Py.runCode(Py.java:1312)
at org.python.core.Py.exec(Py.java:1356)
at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:215)
at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter
.java:89)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpret
er.java:70)
at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpret
er.java:46)
at org.python.util.InteractiveConsole.push(InteractiveConsole.java:112)
at org.python.util.InteractiveConsole.interactInteractiveConsole.java:9
3)
at org.python.util.jython.run(jython.java:396)
at org.python.util.jython.main(jython.java:145)
java.lang.StringIndexOutOfBoundsException: java.lang.StringIndexOutOfBoundsException: String index out of range: 1
|
msg8921 (view) |
Author: Alex Grönholm (alex.gronholm) |
Date: 2014-08-14.14:12:43 |
|
This is a bug in PyString.encode_UnicodeEscape(). When encountering a surrogate character, it doesn't check if there are more characters in the string before attempting to get it with charAt().
|
msg8922 (view) |
Author: Jim Baker (zyasoft) |
Date: 2014-08-14.14:24:26 |
|
We need to throw an appropriate Python error to complete this fix, given that we cannot support such lone surrogates using Jython's underlying UTF-16 representation of Unicode. I suggest ValueError, following this model:
>>> unichr(-1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: unichr() arg not in range(0x110000)
So maybe the text should be somewhat like the following:
ValueError: unichr() arg is a lone surrogate in range (0xD800, 0xDFFF) (Jython UTF-16 encoding)
Target beta 4
|
msg8924 (view) |
Author: Alex Grönholm (alex.gronholm) |
Date: 2014-08-14.15:14:49 |
|
I already committed a fix which harmonized the behavior with CPython. But if you say so, then I shall commit a better fix.
|
msg8925 (view) |
Author: Jim Baker (zyasoft) |
Date: 2014-08-14.15:50:39 |
|
Alex, thanks for making this second change. Sorry about the misleading comment indicating support for lone surrogates in the codebase. (Another fix to be made? Not for this bug fix however.)
|
msg8928 (view) |
Author: Alex Grönholm (alex.gronholm) |
Date: 2014-08-15.23:12:10 |
|
I left the earlier fix in place as it does no harm. I however replaced the test with something that tests that calling unichr() with a surrogate code results in a ValueError (as Jim suggested) and verified that the test passes.
|
|
Date |
User |
Action |
Args |
2014-08-15 23:12:11 | alex.gronholm | set | status: open -> closed resolution: fixed messages:
+ msg8928 |
2014-08-14 15:50:39 | zyasoft | set | messages:
+ msg8925 |
2014-08-14 15:14:49 | alex.gronholm | set | messages:
+ msg8924 |
2014-08-14 14:24:26 | zyasoft | set | nosy:
+ zyasoft messages:
+ msg8922 |
2014-08-14 14:12:43 | alex.gronholm | set | priority: high assignee: alex.gronholm messages:
+ msg8921 nosy:
+ alex.gronholm versions:
+ Jython 2.7 |
2014-08-14 14:07:29 | yecril71pl | create | |
|