Issue1612

classification
Title: array.array should use specialized bulk operations to initialize from an input source, such as a string
Type: behaviour Severity: minor
Components: Core Versions: Jython 2.7
Milestone:
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: Nosy List: akong, doublep, fwierzbicki, mcieslik, santa4nt, zyasoft
Priority: low Keywords: patch

Created on 2010-05-18.18:33:04 by mcieslik, last changed 2015-01-14.00:48:23 by santa4nt.

Files
File name Uploaded Description Edit Remove
profile_array.py mcieslik, 2010-05-18.18:33:03
issue1612.patch santa4nt, 2015-01-14.00:48:21
Messages
msg5769 (view) Author: Marcin (mcieslik) Date: 2010-05-18.18:33:03
It takes ~ 300x longer to create instances of array.array in Jython2.5.1 vs Python2.6 and Python3.1

e.g. the following: 
from array import array
array('b', large_string)

$ python2.6 profile_array.py 
0.0104711055756
$ python3.1 profile_array.py 
0.00699281692505
$ jython profile_array.py 
3.00600004196
$ jython --version
Jython 2.5.1
msg5770 (view) Author: (doublep) Date: 2010-05-19.11:21:16
Did you measure total program time?
msg5771 (view) Author: Marcin (mcieslik) Date: 2010-05-19.12:09:26
The 3s of jython profile_array.py do **NOT** include the JVM start-up time, so it is 'wall-clock' time of the loop.

this is what is in the attached script:
start = time()
for i in range(10000):
    array('b', large_string)
stop = time()
msg6186 (view) Author: Jim Baker (zyasoft) Date: 2010-10-17.17:21:48
The problem here is that we copy the string. In 2.6 this can be avoided by supporting a string to back an array. This can (and should) be part of a general support for memoryview.
msg6187 (view) Author: Jim Baker (zyasoft) Date: 2010-10-17.17:24:14
better title - "Jython ____" is just noise here
msg9375 (view) Author: Jim Baker (zyasoft) Date: 2015-01-12.16:10:34
The reported performance problem is still seen in 2.7.0 beta 4.

In reviewing CPython 2.7's arraymodule.c, I don't see any support for copy-on-write semantics to do this speedup. Instead it's just a straightforward memcpy in the frombytes function.
msg9376 (view) Author: Jim Baker (zyasoft) Date: 2015-01-12.17:35:18
So the additional overhead here has a simple root cause: unlike CPython, Jython uses the same method, PyArray.fromStream, to read from an input stream into a given array. Although the read should be reasonably fast/inlineable (but more overhead than simply looping through the string), the write performance into the array is very slow since it uses java.lang.reflect.Array, in this case java.lang.reflect.Array#setByte.

Some simple specialization would speed things up considerably, much as was done with CPython.

Changing misleading title! (Copy-on-write would still be interesting, and perhaps more feasible on Jython.)
msg9381 (view) Author: Santoso Wijaya (santa4nt) Date: 2015-01-14.00:48:21
@zyasoft Something like the patch I have in mind? I can get a better profile number with this naive "bulk" put() implementation sans-copy-on-write optimization, but it's modest at best.
History
Date User Action Args
2015-01-14 00:48:23santa4ntsetfiles: + issue1612.patch
keywords: + patch
messages: + msg9381
2015-01-13 19:15:58santa4ntsetnosy: + santa4nt
type: behaviour
2015-01-12 17:35:18zyasoftsetmessages: + msg9376
title: array.array copies strings instead of using them to back the new array -> array.array should use specialized bulk operations to initialize from an input source, such as a string
2015-01-12 16:10:35zyasoftsetmessages: + msg9375
2015-01-12 07:36:56zyasoftsetresolution: remind
2013-02-26 17:33:07fwierzbickisetnosy: + fwierzbicki
2013-02-25 19:04:22fwierzbickisetversions: + Jython 2.7, - 2.5.1
2010-10-17 17:24:15zyasoftsetmessages: + msg6187
title: Jython copies strings instead of using them to back an array -> array.array copies strings instead of using them to back the new array
2010-10-17 17:21:49zyasoftsetpriority: low
nosy: + zyasoft
messages: + msg6186
title: Jython ~300x slower on array.array instance creation -> Jython copies strings instead of using them to back an array
2010-05-23 00:10:01akongsetnosy: + akong
2010-05-19 12:09:26mciesliksetmessages: + msg5771
2010-05-19 11:21:17doublepsetnosy: + doublep
messages: + msg5770
2010-05-18 18:33:04mcieslikcreate