Issue1571

classification
Title: int() conversion to integer needless overhead: BigInteger in implementation
Type: behaviour Severity: normal
Components: Core Versions: 2.5.1
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: zyasoft Nosy List: pjenvey, pr3d4t0r, zyasoft
Priority: normal Keywords:

Created on 2010-03-15.01:22:49 by pr3d4t0r, last changed 2010-08-15.17:14:01 by zyasoft.

Messages
msg5565 (view) Author: Eugene Ciurana (pr3d4t0r) Date: 2010-03-15.01:22:48
Huge performance hit when issuing a million or more calls to int(string) because it uses the BigInteger parser instead of Long's or Integer's.  This results in 2x to 10x slower performance for the same code running on CPython, equivalent version.
msg5567 (view) Author: Jim Baker (zyasoft) Date: 2010-03-15.19:45:25
Relevant code is PyInteger#int_new, which call PyString#atoi; if an OverflowException, it then tries to build as a PyLong.

We can simply capture the NumberFormatException instead from java.lang.Integer#parseInt. This should reduce overhead accordingly.
msg5584 (view) Author: Jim Baker (zyasoft) Date: 2010-03-21.22:08:30
Some test results for 2.5.1 of Jython show that we have to let hotspot heat up first, in which case it eventually outperforms CPython 2.6.4:

jimbaker:~ jbaker$ released-jython2.5.1/bin/jython -m timeit -n 1000 "int('12345')"
1000 loops, best of 3: 12 usec per loop
jimbaker:~ jbaker$ released-jython2.5.1/bin/jython -m timeit -n 10000 "int('12345')"
10000 loops, best of 3: 3.3 usec per loop
jimbaker:~ jbaker$ released-jython2.5.1/bin/jython -m timeit -n 100000 "int('12345')"
100000 loops, best of 3: 0.53 usec per loop
jimbaker:~ jbaker$ released-jython2.5.1/bin/jython -m timeit -n 1000000 "int('12345')"
1000000 loops, best of 3: 0.416 usec per loop
jimbaker:~ jbaker$ released-jython2.5.1/bin/jython -m timeit -n 10000000 "int('12345')"
10000000 loops, best of 3: 0.402 usec per loop

vs CPython 2.6.4:

jimbaker:~ jbaker$ python -m timeit -n 10000 "int('12345')"
10000 loops, best of 3: 0.731 usec per loop

I doubt there's any caching behavior here on the string itself. A naive variant I wrote is not nearly as fast for large number of iterations, almost certainly because it doesn't inline as well:

jimbaker:jython jbaker$ dist/bin/jython -m timeit -n 10000 "int('12345')"
10000 loops, best of 3: 8.7 usec per loop
jimbaker:jython jbaker$ dist/bin/jython -m timeit -n 100000 "int('12345')"
100000 loops, best of 3: 8.69 usec per loop

I believe this is because it doesn't aggressively inline. We can do better by removing all conditional logic within the inside loop, such as around base calcs, but that requires more work for this specialization.
msg5585 (view) Author: Philip Jenvey (pjenvey) Date: 2010-03-21.22:31:29
I think part of the call path down to atoi/atol is unnecessary now too. I'm not sure we even need separate atoi/atol methods now, I don't think we ever need to ask for a Java primitive int or long from a string object, what we always ask for is a Python number object from a string. Especially with the int/long converging

In that case we could combine the two functions and remove the OverflowException

Not to mention that PyString implementing __int/long/float__ is annoying. We still have an occasional case where you can pass a '1' into as a numeric argument and it's treated as valid
msg5971 (view) Author: Jim Baker (zyasoft) Date: 2010-08-15.17:14:01
pjenvey fixed this according to #jython by shortening the call path, so closing for now
History
Date User Action Args
2010-08-15 17:14:01zyasoftsetstatus: open -> closed
messages: + msg5971
2010-03-21 22:31:29pjenveysetnosy: + pjenvey
messages: + msg5585
2010-03-21 22:08:31zyasoftsetmessages: + msg5584
2010-03-15 19:45:26zyasoftsetpriority: normal
assignee: zyasoft
resolution: accepted
messages: + msg5567
nosy: + zyasoft
2010-03-15 01:22:50pr3d4t0rcreate