Message12516

Author pekka.klarck
Recipients pekka.klarck
Date 2019-05-13.14:55:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1557759332.16.0.898950359564.issue2772@roundup.psfhosted.org>
In-reply-to
Content
On Jython no-break space (u'\xa0'), figure space (u'\u2007̈́) and narrow no-break space (u'\u202F') are not considered to be space characters. Other space characters listed at https://www.compart.com/en/unicode/category/Zs are.

This affects also string methods like `strip()` and `split()`, but the `re` module doesn't seem to be affected.

Jython 2.7.0 (default:9987c746f838, Apr 29 2015, 02:25:11) 
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_201
Type "help", "copyright", "credits" or "license" for more information.
>>> for ordinal in '0020 00A0 1680 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 200A 202F 205F 3000'.split():
...     char = unichr(int(ordinal, 16))
...     if not char.isspace():
...         print '%s is not space' % ordinal
... 
00A0 is not space
2007 is not space
202F is not space
>>> 
>>> u'\xa0...\u1680'.strip()
u'\xa0...'
>>> u'.\xa0.'.split()
[u'.\xa0.']
>>> import re
>>> re.split(r'\s+', u'.\xa0.', flags=re.UNICODE)
[u'.', u'.']
History
Date User Action Args
2019-05-13 14:55:32pekka.klarcksetrecipients: + pekka.klarck
2019-05-13 14:55:32pekka.klarcksetmessageid: <1557759332.16.0.898950359564.issue2772@roundup.psfhosted.org>
2019-05-13 14:55:32pekka.klarcklinkissue2772 messages
2019-05-13 14:55:31pekka.klarckcreate