Issue1612

classification

Title:	array.array should use specialized bulk operations to initialize from an input source, such as a string
Type:	behaviour	Severity:	minor
Components:	Core	Versions:	Jython 2.7
		Milestone:

process

Status:	open	Resolution:	remind
Dependencies:		Superseder:
Assigned To:		Nosy List:	akong, doublep, fwierzbicki, mcieslik, santa4nt, zyasoft
Priority:	low	Keywords:	patch

Created on 2010-05-18.18:33:04 by mcieslik, last changed 2015-01-14.00:48:23 by santa4nt.

Files
File name	Uploaded	Description	Edit	Remove
profile_array.py	mcieslik, 2010-05-18.18:33:03
issue1612.patch	santa4nt, 2015-01-14.00:48:21

Messages
msg5769 (view)	Author: Marcin (mcieslik)	Date: 2010-05-18.18:33:03
It takes ~ 300x longer to create instances of array.array in Jython2.5.1 vs Python2.6 and Python3.1 e.g. the following: from array import array array('b', large_string) $ python2.6 profile_array.py 0.0104711055756 $ python3.1 profile_array.py 0.00699281692505 $ jython profile_array.py 3.00600004196 $ jython --version Jython 2.5.1
msg5770 (view)	Author: (doublep)	Date: 2010-05-19.11:21:16
Did you measure total program time?
msg5771 (view)	Author: Marcin (mcieslik)	Date: 2010-05-19.12:09:26
The 3s of jython profile_array.py do NOT include the JVM start-up time, so it is 'wall-clock' time of the loop. this is what is in the attached script: start = time() for i in range(10000): array('b', large_string) stop = time()
msg6186 (view)	Author: Jim Baker (zyasoft)	Date: 2010-10-17.17:21:48
The problem here is that we copy the string. In 2.6 this can be avoided by supporting a string to back an array. This can (and should) be part of a general support for memoryview.
msg6187 (view)	Author: Jim Baker (zyasoft)	Date: 2010-10-17.17:24:14
better title - "Jython ____" is just noise here
msg9375 (view)	Author: Jim Baker (zyasoft)	Date: 2015-01-12.16:10:34
The reported performance problem is still seen in 2.7.0 beta 4. In reviewing CPython 2.7's arraymodule.c, I don't see any support for copy-on-write semantics to do this speedup. Instead it's just a straightforward memcpy in the frombytes function.
msg9376 (view)	Author: Jim Baker (zyasoft)	Date: 2015-01-12.17:35:18
So the additional overhead here has a simple root cause: unlike CPython, Jython uses the same method, PyArray.fromStream, to read from an input stream into a given array. Although the read should be reasonably fast/inlineable (but more overhead than simply looping through the string), the write performance into the array is very slow since it uses java.lang.reflect.Array, in this case java.lang.reflect.Array#setByte. Some simple specialization would speed things up considerably, much as was done with CPython. Changing misleading title! (Copy-on-write would still be interesting, and perhaps more feasible on Jython.)
msg9381 (view)	Author: Santoso Wijaya (santa4nt)	Date: 2015-01-14.00:48:21
@zyasoft Something like the patch I have in mind? I can get a better profile number with this naive "bulk" put() implementation sans-copy-on-write optimization, but it's modest at best.

History
Date	User	Action	Args
2015-01-14 00:48:23	santa4nt	set	files: + issue1612.patch keywords: + patch messages: + msg9381
2015-01-13 19:15:58	santa4nt	set	nosy: + santa4nt type: behaviour
2015-01-12 17:35:18	zyasoft	set	messages: + msg9376 title: array.array copies strings instead of using them to back the new array -> array.array should use specialized bulk operations to initialize from an input source, such as a string
2015-01-12 16:10:35	zyasoft	set	messages: + msg9375
2015-01-12 07:36:56	zyasoft	set	resolution: remind
2013-02-26 17:33:07	fwierzbicki	set	nosy: + fwierzbicki
2013-02-25 19:04:22	fwierzbicki	set	versions: + Jython 2.7, - 2.5.1
2010-10-17 17:24:15	zyasoft	set	messages: + msg6187 title: Jython copies strings instead of using them to back an array -> array.array copies strings instead of using them to back the new array
2010-10-17 17:21:49	zyasoft	set	priority: low nosy: + zyasoft messages: + msg6186 title: Jython ~300x slower on array.array instance creation -> Jython copies strings instead of using them to back an array
2010-05-23 00:10:01	akong	set	nosy: + akong
2010-05-19 12:09:26	mcieslik	set	messages: + msg5771
2010-05-19 11:21:17	doublep	set	nosy: + doublep messages: + msg5770
2010-05-18 18:33:04	mcieslik	create