Title: bytearray isalpha behaviour different from Python 2
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: jeff.allen, ztane, zyasoft
Priority: normal Keywords:

Created on 2015-05-31.11:30:16 by ztane, last changed 2015-09-22.17:28:40 by zyasoft.

msg10092 (view) Author: Antti Haapala (ztane) Date: 2015-05-31.11:30:16
bytearray uses `*` methods to do the various bytearray.isxxx methods. This is not compatible with the CPython behaviour; Jython bytearray tests imply latin-1 character encoding, whereas CPython exactly does 7-bit ASCII testing.

CPython 2.7.9:

    >>> bytearray('\xc0').isalpha()

and Jython:

    >>> bytearray('\xc0').isalpha()
msg10094 (view) Author: Jeff Allen (jeff.allen) Date: 2015-06-01.09:05:27
The docs say it's locale-dependent:
Jython's locale support is weak, and in our code you can see us fall back on Latin-1, as a rule. However, I agree that on examination CPython seems to have an ascii interpretation hard-wired.

I guess they forgot the docs when dealing with:

The policy has been made explicit in Python 3.5 docs:
In Python 3:
>>> '\xc0'.isalpha()
>>> b'\xc0'.isalpha()
>>> bytearray(b'\xc0').isalpha()
>>> (u'\xc0').isalpha()

I think consistency with Python 3 is sensible. (Differing views?)
msg10097 (view) Author: Jim Baker (zyasoft) Date: 2015-06-03.17:34:13
Jeff, +1. bytearray was backported from Python 3, so its behavior on CPython 3 must be considered as canonical. Good to see that is actually documented now.
msg10249 (view) Author: Jeff Allen (jeff.allen) Date: 2015-09-10.23:05:03
In order to fix this I have:
1. implemented character classifiers (isalpha, etc.) in BaseBytes.
2. re-implemented the BaseBytes methods using these classifiers.
3. made PyUnicode not depend on PyString for these operations.
4. given PyString implementations that use the BaseBytes.isalpha, etc.

Benchmarks show the new PyString methods to be a little quicker than the old ones (as you might hope, given the simplification). Change sets:
and successors address this.

There are still parts of PyString that use* methods, for example the transformation methods lower, upper, title.
Date User Action Args
2015-09-22 17:28:40zyasoftsetstatus: pending -> closed
2015-09-10 23:05:03jeff.allensetstatus: open -> pending
resolution: fixed
messages: + msg10249
2015-06-03 17:34:13zyasoftsetmessages: + msg10097
2015-06-01 09:05:28jeff.allensetpriority: normal
assignee: jeff.allen
type: behaviour
messages: + msg10094
nosy: + jeff.allen, zyasoft
2015-05-31 11:30:16ztanecreate