Issue1709162

classification
Title: String performance problem
Type: Severity: normal
Components: Core Versions:
Milestone:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: dushakov, fwierzbicki, iceslice, pedronis
Priority: normal Keywords:

Created on 2007-04-28.09:58:21 by dushakov, last changed 2007-04-28.13:22:58 by fwierzbicki.

Files
File name Uploaded Description Edit Remove
test.zip dushakov, 2007-04-28.09:58:21 Simple testcase
Messages
msg1575 (view) Author: Dennis Ushakov (dushakov) Date: 2007-04-28.09:58:21
Adding string to another like this
for line in lines:
    xml = xml + line
has extremely low performance in comparison with CPython

Simple testcase is attached. 
generator.py generates file of given size (first argument) with approximate line count (second argument)

test.py reads file from generated file, collecting all lines in one, and prints time of this operation

For input:
python generator.py 21 90000
(that is 2M file with approx. 90K lines)

python takes 1882 ms
jython fails to finish in more than 10 minutes
msg1576 (view) Author: Dennis Ushakov (dushakov) Date: 2007-04-28.10:02:50
This happens because on every addition a new Java String is created. 
Something more like StringBuilder should be used for PyString.
msg1577 (view) Author: Dennis Ushakov (dushakov) Date: 2007-04-28.10:12:23
This happens because on every addition a new Java String is created. 
Something more like StringBuilder should be used for PyString.
msg1578 (view) Author: Samuele Pedroni (pedronis) Date: 2007-04-28.11:55:12
that code should be written as:

xml = ''.join(lines)

this is  quality implementation issue. CPython uses ref counting based optimisations to avoid the quadratic perfomance,
is nevertheless the case that the ''.join idiom is the right way to concatenate many strings that works across Python implementations.

Given that Jython doesn't uses ref counting is unlikely that somethigng reasonably simple can be done to change this.

Java simple switching to use StringBuilder is possible because types are known at compile time.
msg1579 (view) Author: Sergey Salishev (iceslice) Date: 2007-04-28.13:07:27
Of course the 'join' workaround is more correct in this particular case. But this inefficiency can affect other existing code which runs well on CPython. So in my opinion it should be considered as bug and not as implementation detail.

Actually the reference counting isn't needed as it's done by Java itself. What's needed optimize append operations:

1. Separate PyString and string storage. The storage will keep the char vector and PyString will be wrapper keeping (storage, start, length). The PyString will be still immutable while the storage can be mutable.
2. Use StringBuilder as the storage. This way append to the end of storage would be very cheap.
3. Copy on Write the storage when need to modify the internal characters.

These changes require approximately <100 lines of code and are localized in PyString.

As this doesn't add an additional indirection compared to current implementation probably the performance of other string uses will not be affected.
msg1580 (view) Author: Samuele Pedroni (pedronis) Date: 2007-04-28.13:14:48
there are synchronisation issues, also what is the cost of charAt or indexOf for a StringBuilder etc

we are disagreeing of what counts as reasonably simple, in particular I'm very much not to eager to add more synchronisation code.
msg1581 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2007-04-28.13:17:58
iceslice,

We will be happy to review a patch if you wish to submit one based on your proposal.  Because we are getting so close to 2.2's release, it stands a good chance of missing this release.  If StringBuilder is used it will definitely need to wait for the next release, since StringBuilder is a JDK 5 feature.

msg1582 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2007-04-28.13:22:58
This discussion is getting long enough that we should probably move any further discussion to jython-dev so others can benefit from any decisions we make.  iceslice, any patch would need to address the issues that pedronis brings up.
History
Date User Action Args
2007-04-28 09:58:21dushakovcreate