Issue2364

classification

Title:	bytearray isalpha behaviour different from Python 2
Type:	behaviour	Severity:	normal
Components:	Core	Versions:	Jython 2.7
		Milestone:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	jeff.allen	Nosy List:	jeff.allen, ztane, zyasoft
Priority:	normal	Keywords:

Created on 2015-05-31.11:30:16 by ztane, last changed 2015-09-22.17:28:40 by zyasoft.

Messages
msg10092 (view)	Author: Antti Haapala (ztane)	Date: 2015-05-31.11:30:16
bytearray uses `Character.is*` methods to do the various bytearray.isxxx methods. This is not compatible with the CPython behaviour; Jython bytearray tests imply latin-1 character encoding, whereas CPython exactly does 7-bit ASCII testing. CPython 2.7.9: >>> bytearray('\xc0').isalpha() False and Jython: >>> bytearray('\xc0').isalpha() True
msg10094 (view)	Author: Jeff Allen (jeff.allen)	Date: 2015-06-01.09:05:27
The docs say it's locale-dependent: https://docs.python.org/2/library/stdtypes.html#str.isalpha Jython's locale support is weak, and in our code you can see us fall back on Latin-1, as a rule. However, I agree that on examination CPython seems to have an ascii interpretation hard-wired. I guess they forgot the docs when dealing with: http://bugs.python.org/issue5793 The policy has been made explicit in Python 3.5 docs: https://docs.python.org/3.5/library/stdtypes.html#bytearray.isalpha In Python 3: >>> '\xc0'.isalpha() True >>> b'\xc0'.isalpha() False >>> bytearray(b'\xc0').isalpha() False >>> (u'\xc0').isalpha() True I think consistency with Python 3 is sensible. (Differing views?)
msg10097 (view)	Author: Jim Baker (zyasoft)	Date: 2015-06-03.17:34:13
Jeff, +1. bytearray was backported from Python 3, so its behavior on CPython 3 must be considered as canonical. Good to see that is actually documented now.
msg10249 (view)	Author: Jeff Allen (jeff.allen)	Date: 2015-09-10.23:05:03
In order to fix this I have: 1. implemented character classifiers (isalpha, etc.) in BaseBytes. 2. re-implemented the BaseBytes methods using these classifiers. 3. made PyUnicode not depend on PyString for these operations. 4. given PyString implementations that use the BaseBytes.isalpha, etc. Benchmarks show the new PyString methods to be a little quicker than the old ones (as you might hope, given the simplification). Change sets: https://hg.python.org/jython/rev/50082331db8d and successors address this. There are still parts of PyString that use Character.is* methods, for example the transformation methods lower, upper, title.

History
Date	User	Action	Args
2015-09-22 17:28:40	zyasoft	set	status: pending -> closed
2015-09-10 23:05:03	jeff.allen	set	status: open -> pending resolution: fixed messages: + msg10249
2015-06-03 17:34:13	zyasoft	set	messages: + msg10097
2015-06-01 09:05:28	jeff.allen	set	priority: normal assignee: jeff.allen type: behaviour messages: + msg10094 nosy: + jeff.allen, zyasoft
2015-05-31 11:30:16	ztane	create