Issue2191

classification
Title: re .compile fails with isolated surrogates
Type: Severity: normal
Components: Library Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: yecril71pl, zyasoft
Priority: Keywords:

Created on 2014-08-14.14:41:33 by yecril71pl, last changed 2014-10-21.19:20:03 by zyasoft.

Messages
msg8923 (view) Author: Christopher Yeleighton (yecril71pl) Date: 2014-08-14.14:41:32
>>> re.compile(u"%c-\U0001FFFE" % (0xD800))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\jython2.7b2\Lib\re.py", line 190, in compile
    return _compile(pattern, flags)
  File "C:\jython2.7b2\Lib\re.py", line 240, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\jython2.7b2\Lib\re.py", line 240, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\jython2.7b2\Lib\sre_compile.py", line 512, in compile
    p = sre_parse.parse(p, flags)
  File "C:\jython2.7b2\Lib\sre_parse.py", line 684, in parse
    p = _parse_sub(source, pattern, 0)
  File "C:\jython2.7b2\Lib\sre_parse.py", line 309, in _parse_sub
    itemsappend(_parse(source, state))
  File "C:\jython2.7b2\Lib\sre_parse.py", line 398, in _parse
    this = sourceget()
  File "C:\jython2.7b2\Lib\sre_parse.py", line 204, in get
    self.__next()
  File "C:\jython2.7b2\Lib\sre_parse.py", line 187, in _Tokenizer__next
    char = self.string[self.index]
java.lang.StringIndexOutOfBoundsException: String index out of range: 4
        at java.lang.String.codePointAt(Unknown Source)
        at org.python.core.PyUnicode.pyget(PyUnicode.java:310)
        at org.python.core.PySequence$DefaultIndexDelegate.getItem(PySequence.java:520)
        at org.python.core.SequenceIndexDelegate.checkIdxAndFindItem(SequenceIndexDelegate.java:88)
        at org.python.core.SequenceIndexDelegate.checkIdxAndFindItem(SequenceIndexDelegate.java:70)
        at org.python.core.SequenceIndexDelegate.checkIdxAndGetItem(SequenceIndexDelegate.java:61)
        at org.python.core.PySequence.seq___getitem__(PySequence.java:364)
        at org.python.core.PySequence.__getitem__(PySequence.java:360)
        at sre_parse$py._Tokenizer__next$19(C:\jython2.7b2\Lib\sre_parse.py:195)

        at sre_parse$py.call_function(C:\jython2.7b2\Lib\sre_parse.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:134)
        at org.python.core.PyFunction.__call__(PyFunction.java:347)
        at org.python.core.PyMethod.__call__(PyMethod.java:121)
        at sre_parse$py.get$21(C:\jython2.7b2\Lib\sre_parse.py:205)
        at sre_parse$py.call_function(C:\jython2.7b2\Lib\sre_parse.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:134)
        at org.python.core.PyFunction.__call__(PyFunction.java:347)
        at org.python.core.PyMethod.__call__(PyMethod.java:121)
        at sre_parse$py._parse$31(C:\jython2.7b2\Lib\sre_parse.py:672)
        at sre_parse$py.call_function(C:\jython2.7b2\Lib\sre_parse.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:149)
        at org.python.core.PyFunction.__call__(PyFunction.java:357)
        at sre_parse$py._parse_sub$29(C:\jython2.7b2\Lib\sre_parse.py:359)
        at sre_parse$py.call_function(C:\jython2.7b2\Lib\sre_parse.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:166)
        at org.python.core.PyFunction.__call__(PyFunction.java:368)
        at sre_parse$py.parse$32(C:\jython2.7b2\Lib\sre_parse.py:700)
        at sre_parse$py.call_function(C:\jython2.7b2\Lib\sre_parse.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:301)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:141)
        at org.python.core.PyFunction.__call__(PyFunction.java:357)
        at sre_compile$py.compile$13(C:\jython2.7b2\Lib\sre_compile.py:532)
        at sre_compile$py.call_function(C:\jython2.7b2\Lib\sre_compile.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:149)
        at org.python.core.PyFunction.__call__(PyFunction.java:357)
        at re$py._compile$12(C:\jython2.7b2\Lib\re.py:246)
        at re$py.call_function(C:\jython2.7b2\Lib\re.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:301)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:141)
        at org.python.core.PyFunction.__call__(PyFunction.java:357)
        at re$py.compile$8(C:\jython2.7b2\Lib\re.py:190)
        at re$py.call_function(C:\jython2.7b2\Lib\re.py)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:301)
        at org.python.core.PyBaseCode.call(PyBaseCode.java:127)
        at org.python.core.PyFunction.__call__(PyFunction.java:347)
        at org.python.pycode._pyx95.f$0(<stdin>:1)
        at org.python.pycode._pyx95.call_function(<stdin>)
        at org.python.core.PyTableCode.call(PyTableCode.java:165)
        at org.python.core.PyCode.call(PyCode.java:18)
        at org.python.core.Py.runCode(Py.java:1312)
        at org.python.core.Py.exec(Py.java:1356)
        at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:215)
        at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:89)
        at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:70)
        at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:46)
        at org.python.util.InteractiveConsole.push(InteractiveConsole.java:112)
        at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:93)
        at org.python.util.jython.run(jython.java:396)
        at org.python.util.jython.main(jython.java:145)

java.lang.StringIndexOutOfBoundsException: java.lang.StringIndexOutOfBoundsException: String index out of range: 4
msg9087 (view) Author: Jim Baker (zyasoft) Date: 2014-10-06.03:01:23
Fixed with the recent Unicode changes in trunk:

>>> import re
>>> re.compile(u"%c-\U0001FFFE" % (0xD800))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unpaired surrogate 0xd800 at code unit 0
History
Date User Action Args
2014-10-21 19:20:03zyasoftsetstatus: pending -> closed
2014-10-06 03:01:23zyasoftsetstatus: open -> pending
resolution: fixed
messages: + msg9087
nosy: + zyasoft
2014-08-18 07:29:49yecril71plsetversions: + Jython 2.7
title: re.compare fails with isolated surrogates -> re .compile fails with isolated surrogates
2014-08-14 14:41:33yecril71plcreate