Issue2197
Created on 2014-08-31.22:16:28 by jeff.allen, last changed 2014-12-15.20:35:11 by jeff.allen.
Messages | |||
---|---|---|---|
msg8942 (view) | Author: Jeff Allen (jeff.allen) | Date: 2014-08-31.22:16:27 | |
I found this working on #2100, but it is sufficiently separate I think to be its own issue. It seems that when PyUnicode.isBasicPlane() is false, count() resorts to the more complex implementation, and this fails here. >jython Jython 2.7b3 (default:e81256215fb0, Aug 4 2014, 02:39:51) [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60 Type "help", "copyright", "credits" or "license" for more information. >>> u = u"aaabbc" >>> v = u"aaa\U00010002bcc" >>> u.count(u'b') 2 >>> v.count(u'b') Traceback (most recent call last): File "<stdin>", line 1, in <module> java.lang.StringIndexOutOfBoundsException: String index out of range: 8 at java.lang.String.charAt(String.java:658) at org.python.core.PyUnicode$SubsequenceIteratorImpl.nextCodePoint(PyUnicode.java:353) at org.python.core.PyUnicode$SubsequenceIteratorImpl.next(PyUnicode.java:342) at org.python.core.PyUnicode.unicode_count(PyUnicode.java:1031) at org.python.core.PyUnicode$unicode_count_exposer.__call__(Unknown Source) at org.python.core.PyObject.__call__(PyObject.java:407) at org.python.pycode._pyx4.f$0(<stdin>:1) at org.python.pycode._pyx4.call_function(<stdin>) at org.python.core.PyTableCode.call(PyTableCode.java:166) at org.python.core.PyCode.call(PyCode.java:18) at org.python.core.Py.runCode(Py.java:1312) at org.python.core.Py.exec(Py.java:1356) at org.python.util.PythonInterpreter.exec(PythonInterpreter.java:231) at org.python.util.InteractiveInterpreter.runcode(InteractiveInterpreter.java:89) at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:70) at org.python.util.InteractiveInterpreter.runsource(InteractiveInterpreter.java:46) at org.python.util.InteractiveConsole.push(InteractiveConsole.java:112) at org.python.util.InteractiveConsole.interact(InteractiveConsole.java:93) at org.python.util.jython.run(jython.java:396) at org.python.util.jython.main(jython.java:145) java.lang.StringIndexOutOfBoundsException: java.lang.StringIndexOutOfBoundsException: String index out of range: 8 Furthermore, this causes an endless loop: >>> v.count(u'') At present, I'm working on #2100, by providing index translation when necessary to deal with supplementary characters. The current implementation, uses custom iterators heavily, and mostly successfully, but I wonder if we could not use the same implementation as we do for BMP strings with the index translation. (I think so, but only if there are no un-paired surrogates.) |
|||
msg8943 (view) | Author: Jim Baker (zyasoft) | Date: 2014-09-01.03:32:44 | |
Thanks for finding this. One thing that I found useful in testing the non-BMP implementation, when it was under development, is running the regrtest with PyUnicode#isBasicPlane always returning false. Sounds good about the progress on index translation. Adding an additional O(n) factor for indexing is not very nice, and certainly surprising to code that assumes it is constant (if possibly expensive). |
|||
msg8944 (view) | Author: Jeff Allen (jeff.allen) | Date: 2014-09-01.08:07:47 | |
You mean this: http://hg.python.org/jython/file/83cd10f1826d/src/org/python/core/PyUnicode.java#l134 I spotted that, once I needed it, although I'd looked at it many times before without understanding. Turning it on made test_unicode hang. For this report I reproduced the problem with an 'honest' build. Skipping the test of count, I find replace() also fails when the find and target are both ''. |
|||
msg8954 (view) | Author: Jeff Allen (jeff.allen) | Date: 2014-09-06.17:39:54 | |
Seems necessary to take this as part of #2100 after all. |
|||
msg9239 (view) | Author: Jim Baker (zyasoft) | Date: 2014-12-15.18:39:53 | |
Looks like this is now fixed with all the recent improvements in str/unicode handling |
|||
msg9242 (view) | Author: Jeff Allen (jeff.allen) | Date: 2014-12-15.20:35:10 | |
>>> u = u"aaabbc" >>> v = u"aaa\U00010002bcc" >>> u.count(u'b') 2 >>> v.count(u'b') 1 >>> u.count(u'') 7 I agree. Hard to say quite where, but leading up to this merge: https://hg.python.org/jython/rev/776cae0189ed |
History | |||
---|---|---|---|
Date | User | Action | Args |
2014-12-15 20:35:11 | jeff.allen | set | status: open -> closed messages: + msg9242 |
2014-12-15 18:39:54 | zyasoft | set | messages: + msg9239 |
2014-09-06 17:39:54 | jeff.allen | set | assignee: jeff.allen dependencies: + Deficiencies in PyUnicode beyond the BMP messages: + msg8954 |
2014-09-01 08:07:47 | jeff.allen | set | messages: + msg8944 |
2014-09-01 03:32:44 | zyasoft | set | nosy:
+ zyasoft messages: + msg8943 |
2014-08-31 22:16:29 | jeff.allen | create |
Supported by Python Software Foundation,
Powered by Roundup