Issue2226
Created on 2014-11-05.03:45:32 by thatch, last changed 2015-03-20.18:28:38 by zyasoft.
File name |
Uploaded |
Description |
Edit |
Remove |
uni.diff
|
thatch,
2014-11-05.03:45:31
|
Other examples of codepoints that don't match as whitespace |
|
|
msg9189 (view) |
Author: Tim Hatch (thatch) |
Date: 2014-11-05.03:45:31 |
|
The codepoint 0x00a0 (160 decimal, nbsp) is not considered to match as whitespace in Jython re, but is in CPython.
>>> re.compile('\\s', re.UNICODE).match(u'\u00a0')
There are a few other diffs, so attaching a list. 's' and 'su' refer to whether it's \s with default flags, or \s + re.UNICODE flag.
|
msg9190 (view) |
Author: Tim Hatch (thatch) |
Date: 2014-11-05.06:03:44 |
|
Found SRE_STATE.java handling SRE_CATEGORTY_UNI_SPACE just delegates this to Character.isWhitespace which has a note specifically about 0x00a0.
http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isWhitespace(char) calls out the few exceptions. isSpaceChar looks better in the docs, but fails to note decimal 9, 10, 11, 12, 28, 29, 30, 31.
CPython also handles 133 (0x0085) specially. Its unicode category is Cc (same as 10, 13, etc).
Would you accept a patch that makes the jython agree with the CPython implementation here?
|
msg9394 (view) |
Author: Jim Baker (zyasoft) |
Date: 2015-01-14.19:52:09 |
|
Seems quite reasonable, especially if you can provide a patch. If you do so, it's best to add an appropriate unit test as well in Lib/test/test_unicode_jy.py
|
msg9464 (view) |
Author: Jim Baker (zyasoft) |
Date: 2015-01-28.19:18:02 |
|
Would be nice to get this fixed so it agrees with CPython, would be trivial to do so
|
msg9653 (view) |
Author: Jim Baker (zyasoft) |
Date: 2015-03-14.13:44:45 |
|
Fixed as of https://hg.python.org/jython/rev/3ee1feff962d
|
|
Date |
User |
Action |
Args |
2015-03-20 18:28:38 | zyasoft | set | status: pending -> closed |
2015-03-14 13:44:45 | zyasoft | set | status: open -> pending resolution: fixed messages:
+ msg9653 versions:
- Jython 2.5 |
2015-01-28 19:18:02 | zyasoft | set | priority: high messages:
+ msg9464 |
2015-01-14 19:52:10 | zyasoft | set | messages:
+ msg9394 |
2015-01-14 17:17:19 | zyasoft | set | assignee: zyasoft nosy:
+ zyasoft |
2014-11-05 06:03:44 | thatch | set | messages:
+ msg9190 |
2014-11-05 03:45:32 | thatch | create | |
|