Issue2226

classification
Title: Differences in re unicode 'whitespace' matching
Type: Severity: normal
Components: Core Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: zyasoft Nosy List: thatch, zyasoft
Priority: high Keywords: patch

Created on 2014-11-05.03:45:32 by thatch, last changed 2015-03-20.18:28:38 by zyasoft.

Files
File name Uploaded Description Edit Remove
uni.diff thatch, 2014-11-05.03:45:31 Other examples of codepoints that don't match as whitespace
Messages
msg9189 (view) Author: Tim Hatch (thatch) Date: 2014-11-05.03:45:31
The codepoint 0x00a0 (160 decimal, nbsp) is not considered to match as whitespace in Jython re, but is in CPython.

>>> re.compile('\\s', re.UNICODE).match(u'\u00a0')

There are a few other diffs, so attaching a list.  's' and 'su' refer to whether it's \s with default flags, or \s + re.UNICODE flag.
msg9190 (view) Author: Tim Hatch (thatch) Date: 2014-11-05.06:03:44
Found SRE_STATE.java handling SRE_CATEGORTY_UNI_SPACE just delegates this to Character.isWhitespace which has a note specifically about 0x00a0.

http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isWhitespace(char) calls out the few exceptions.  isSpaceChar looks better in the docs, but fails to note decimal 9, 10, 11, 12, 28, 29, 30, 31.

CPython also handles 133 (0x0085) specially.  Its unicode category is Cc (same as 10, 13, etc).

Would you accept a patch that makes the jython agree with the CPython implementation here?
msg9394 (view) Author: Jim Baker (zyasoft) Date: 2015-01-14.19:52:09
Seems quite reasonable, especially if you can provide a patch. If you do so, it's best to add an appropriate unit test as well in Lib/test/test_unicode_jy.py
msg9464 (view) Author: Jim Baker (zyasoft) Date: 2015-01-28.19:18:02
Would be nice to get this fixed so it agrees with CPython, would be trivial to do so
msg9653 (view) Author: Jim Baker (zyasoft) Date: 2015-03-14.13:44:45
Fixed as of https://hg.python.org/jython/rev/3ee1feff962d
History
Date User Action Args
2015-03-20 18:28:38zyasoftsetstatus: pending -> closed
2015-03-14 13:44:45zyasoftsetstatus: open -> pending
resolution: fixed
messages: + msg9653
versions: - Jython 2.5
2015-01-28 19:18:02zyasoftsetpriority: high
messages: + msg9464
2015-01-14 19:52:10zyasoftsetmessages: + msg9394
2015-01-14 17:17:19zyasoftsetassignee: zyasoft
nosy: + zyasoft
2014-11-05 06:03:44thatchsetmessages: + msg9190
2014-11-05 03:45:32thatchcreate