Issue2772
Created on 2019-05-13.14:55:32 by pekka.klarck, last changed 2019-05-14.20:02:06 by pekka.klarck.
msg12516 (view) |
Author: Pekka Klärck (pekka.klarck) |
Date: 2019-05-13.14:55:31 |
|
On Jython no-break space (u'\xa0'), figure space (u'\u2007̈́) and narrow no-break space (u'\u202F') are not considered to be space characters. Other space characters listed at https://www.compart.com/en/unicode/category/Zs are.
This affects also string methods like `strip()` and `split()`, but the `re` module doesn't seem to be affected.
Jython 2.7.0 (default:9987c746f838, Apr 29 2015, 02:25:11)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_201
Type "help", "copyright", "credits" or "license" for more information.
>>> for ordinal in '0020 00A0 1680 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 200A 202F 205F 3000'.split():
... char = unichr(int(ordinal, 16))
... if not char.isspace():
... print '%s is not space' % ordinal
...
00A0 is not space
2007 is not space
202F is not space
>>>
>>> u'\xa0...\u1680'.strip()
u'\xa0...'
>>> u'.\xa0.'.split()
[u'.\xa0.']
>>> import re
>>> re.split(r'\s+', u'.\xa0.', flags=re.UNICODE)
[u'.', u'.']
|
msg12518 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2019-05-14.06:08:07 |
|
I'm happy to report that this works in the development tip, thanks to: https://hg.python.org/jython/rev/a1f68d091a1c .
Jython 2.7.2a1+ (default:a1ae652df5e3+, May 12 2019, 09:17:21)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_80
Type "help", "copyright", "credits" or "license" for more information.
>>> for ordinal in '0020 00A0 1680 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 200A 202F 205F 3000'.split():
... char = unichr(int(ordinal, 16))
... if not char.isspace():
... print '%s is not space' % ordinal
...
>>> u'\xa0...\u1680'.strip()
u'...'
>>> u'.\xa0.'.split()
[u'.', u'.']
>>> import re
>>> re.split(r'\s+', u'.\xa0.', flags=re.UNICODE)
[u'.', u'.']
|
msg12520 (view) |
Author: Pekka Klärck (pekka.klarck) |
Date: 2019-05-14.20:02:06 |
|
Awesome, thanks Jeff!
|
|
Date |
User |
Action |
Args |
2019-05-14 20:02:06 | pekka.klarck | set | messages:
+ msg12520 |
2019-05-14 06:08:07 | jeff.allen | set | status: open -> closed nosy:
+ jeff.allen messages:
+ msg12518 resolution: out of date milestone: Jython 2.7.2 type: behaviour |
2019-05-13 14:55:32 | pekka.klarck | create | |
|