Issue2364

classification
Title: bytearray isalpha behaviour different from Python 2
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: jeff.allen, ztane, zyasoft
Priority: normal Keywords:

Created on 2015-05-31.11:30:16 by ztane, last changed 2015-09-22.17:28:40 by zyasoft.

Messages
msg10092 (view) Author: Antti Haapala (ztane) Date: 2015-05-31.11:30:16
bytearray uses `Character.is*` methods to do the various bytearray.isxxx methods. This is not compatible with the CPython behaviour; Jython bytearray tests imply latin-1 character encoding, whereas CPython exactly does 7-bit ASCII testing.

CPython 2.7.9:

    >>> bytearray('\xc0').isalpha()
    False

and Jython:

    >>> bytearray('\xc0').isalpha()
    True
msg10094 (view) Author: Jeff Allen (jeff.allen) Date: 2015-06-01.09:05:27
The docs say it's locale-dependent:
https://docs.python.org/2/library/stdtypes.html#str.isalpha
Jython's locale support is weak, and in our code you can see us fall back on Latin-1, as a rule. However, I agree that on examination CPython seems to have an ascii interpretation hard-wired.

I guess they forgot the docs when dealing with: http://bugs.python.org/issue5793


The policy has been made explicit in Python 3.5 docs:
https://docs.python.org/3.5/library/stdtypes.html#bytearray.isalpha
In Python 3:
>>> '\xc0'.isalpha()
True
>>> b'\xc0'.isalpha()
False
>>> bytearray(b'\xc0').isalpha()
False
>>> (u'\xc0').isalpha()
True

I think consistency with Python 3 is sensible. (Differing views?)
msg10097 (view) Author: Jim Baker (zyasoft) Date: 2015-06-03.17:34:13
Jeff, +1. bytearray was backported from Python 3, so its behavior on CPython 3 must be considered as canonical. Good to see that is actually documented now.
msg10249 (view) Author: Jeff Allen (jeff.allen) Date: 2015-09-10.23:05:03
In order to fix this I have:
1. implemented character classifiers (isalpha, etc.) in BaseBytes.
2. re-implemented the BaseBytes methods using these classifiers.
3. made PyUnicode not depend on PyString for these operations.
4. given PyString implementations that use the BaseBytes.isalpha, etc.

Benchmarks show the new PyString methods to be a little quicker than the old ones (as you might hope, given the simplification). Change sets:
https://hg.python.org/jython/rev/50082331db8d
and successors address this.

There are still parts of PyString that use Character.is* methods, for example the transformation methods lower, upper, title.
History
Date User Action Args
2015-09-22 17:28:40zyasoftsetstatus: pending -> closed
2015-09-10 23:05:03jeff.allensetstatus: open -> pending
resolution: fixed
messages: + msg10249
2015-06-03 17:34:13zyasoftsetmessages: + msg10097
2015-06-01 09:05:28jeff.allensetpriority: normal
assignee: jeff.allen
type: behaviour
messages: + msg10094
nosy: + jeff.allen, zyasoft
2015-05-31 11:30:16ztanecreate