Message8942

Author jeff.allen
Recipients jeff.allen
Date 2014-08-31.22:16:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1409523389.15.0.647786822512.issue2197@psf.upfronthosting.co.za>
In-reply-to
Content
I found this working on #2100, but it is sufficiently separate I think to be its own issue. It seems that when PyUnicode.isBasicPlane() is false, count() resorts to the more complex implementation, and this fails here.

>jython
Jython 2.7b3 (default:e81256215fb0, Aug 4 2014, 02:39:51)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60
Type "help", "copyright", "credits" or "license" for more information.
>>> u = u"aaabbc"
>>> v = u"aaa\U00010002bcc"
>>> u.count(u'b')
2
>>> v.count(u'b')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
java.lang.StringIndexOutOfBoundsException: String index out of range: 8
        at java.lang.String.charAt(String.java:658)
        at org.python.core.PyUnicode$SubsequenceIteratorImpl.nextCodePoint(PyUnicode.java:353)
        at org.python.core.PyUnicode$SubsequenceIteratorImpl.next(PyUnicode.java:342)
        at org.python.core.PyUnicode.unicode_count(PyUnicode.java:1031)
        at org.python.core.PyUnicode$unicode_count_exposer.__call__(Unknown Source)
        at org.python.core.PyObject.__call__(PyObject.java:407)
        at org.python.pycode._pyx4.f$0(<stdin>:1)
        at org.python.pycode._pyx4.call_function(<stdin>)
        at org.python.core.PyTableCode.call(PyTableCode.java:166)
        at org.python.core.PyCode.call(PyCode.java:18)
        at org.python.core.Py.runCode(Py.java:1312)
        at org.python.core.Py.exec(Py.java:1356)
        at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:231)
        at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:89)
        at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:70)
        at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:46)
        at org.python.util.InteractiveConsole.push(InteractiveConsole.java:112)
        at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:93)
        at org.python.util.jython.run(jython.java:396)
        at org.python.util.jython.main(jython.java:145)

java.lang.StringIndexOutOfBoundsException: java.lang.StringIndexOutOfBoundsException: String index out of range: 8

Furthermore, this causes an endless loop:

>>> v.count(u'')

At present, I'm working on #2100, by providing index translation when necessary to deal with supplementary characters. The current implementation, uses custom iterators heavily, and mostly successfully, but I wonder if we could not use the same implementation as we do for BMP strings with the index translation. (I think so, but only if there are no un-paired surrogates.)
History
Date User Action Args
2014-08-31 22:16:29jeff.allensetrecipients: + jeff.allen
2014-08-31 22:16:29jeff.allensetmessageid: <1409523389.15.0.647786822512.issue2197@psf.upfronthosting.co.za>
2014-08-31 22:16:28jeff.allenlinkissue2197 messages
2014-08-31 22:16:27jeff.allencreate