Issue2013

classification
Title: %x hex formatting takes O(N^2) time.
Type: behaviour Severity: major
Components: Core Versions: Jython 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: amak, fwierzbicki, jeff.allen, santa4nt
Priority: high Keywords: patch

Created on 2013-02-11.03:54:17 by amak, last changed 2013-12-22.18:00:01 by jeff.allen.

Files
File name Uploaded Description Edit Remove
long___hex__.patch santa4nt, 2013-03-19.05:29:12 Patch long.__hex__ with a more optimal algorithm.
optimize_hexify.patch santa4nt, 2013-03-19.06:04:32 Patch *.formatIntOrLong and long.__hex__
optimize.patch santa4nt, 2013-03-20.03:47:45 Optimize hex(), bin(), and oct()
issue2075_and_2013.patch santa4nt, 2013-12-19.05:24:48 This patch is relevant for both this issue and issue #2075.
issue2075_and_2013.patch santa4nt, 2013-12-19.22:25:33
Messages
msg7660 (view) Author: Alan Kennedy (amak) Date: 2013-02-11.03:54:17
The following code, which formats an increasingly large number as hexadecimal

###############

import time

for power in range(0, 23):
    start = time.time()
    s = "%x" % (2**(2**power))
    print "%d (%d digits) took %1.3lf seconds" % (power, len(s), (time.time() - start))

###############

gives the following output

0 (1 digits) took 0.000 seconds
1 (1 digits) took 0.000 seconds
2 (2 digits) took 0.000 seconds
3 (3 digits) took 0.000 seconds
4 (5 digits) took 0.000 seconds
5 (9 digits) took 0.000 seconds
6 (17 digits) took 0.000 seconds
7 (33 digits) took 0.000 seconds
8 (65 digits) took 0.000 seconds
9 (129 digits) took 0.000 seconds
10 (257 digits) took 0.015 seconds
11 (513 digits) took 0.000 seconds
12 (1025 digits) took 0.000 seconds
13 (2049 digits) took 0.000 seconds
14 (4097 digits) took 0.016 seconds
15 (8193 digits) took 0.047 seconds
16 (16385 digits) took 0.125 seconds
17 (32769 digits) took 0.531 seconds
18 (65537 digits) took 2.031 seconds
19 (131073 digits) took 8.188 seconds
20 (262145 digits) took 33.328 seconds
21 (524289 digits) took 138.281 seconds
22 (1048577 digits) took 625.562 seconds

This appears to be O(N^2)
msg7661 (view) Author: Alan Kennedy (amak) Date: 2013-02-11.03:57:04
This appears to be because of java.math.BigInteger().toString(16)

The following code

// -=-=-=-=-=-=-=-=-=-=-=-=-=-

import java.math.BigInteger;
import java.util.Random;

public class BigIntTest {
     
    public static void main (String args[]) {
        String numString = "1";
        for (int power = 1 ; power < 23 ; power++) {
            BigInteger b = new BigInteger(numString);
            long start = System.currentTimeMillis();
            String s = b.toString(16);
            System.out.println(power+"("+numString.length()+" digits): took " + ((System.currentTimeMillis() - start)/1000) + " seconds");
            numString = numString + numString;
        }
    }
}

// -=-=-=-=-=-=-=-=-=-=-=-=-=-

Outputs the following (I didn't let the code finish, it was taking too long)

1(1 digits): took 0 seconds
2(2 digits): took 0 seconds
3(4 digits): took 0 seconds
4(8 digits): took 0 seconds
5(16 digits): took 0 seconds
6(32 digits): took 0 seconds
7(64 digits): took 0 seconds
8(128 digits): took 0 seconds
9(256 digits): took 0 seconds
10(512 digits): took 0 seconds
11(1024 digits): took 0 seconds
12(2048 digits): took 0 seconds
13(4096 digits): took 0 seconds
14(8192 digits): took 0 seconds
15(16384 digits): took 0 seconds
16(32768 digits): took 0 seconds
17(65536 digits): took 2 seconds
18(131072 digits): took 9 seconds
19(262144 digits): took 37 seconds
20(524288 digits): took 150 seconds
msg7940 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-03-19.05:29:12
Attaching a possible approach for this. BigInteger.toString() is indeed known to be implemented with a sub-optimal algorithm.

Note that `'{:x}'.format(num)` is also affected. My patch only addresses `'%x' % num` and hex(num) for large numbers just to see what you guys think.

Also, hello!
msg7941 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-03-19.06:04:32
Address '{:x}'.format() as well.

Still suffers from computationally expensive runtime for other radixes, though.
msg7944 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-03-20.03:47:45
Optimizing all three hex(), bin(), and oct() while I'm at it.
msg7947 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-03-20.16:46:40
Hi Santoso Wijaya! I'm really excited that you are having a look at these issues. These patches are big enough that we would really appreciate it if you could sign a code contributor agreement with the PSF. The rationale for contributor agreements is at http://www.python.org/psf/contrib/ and the actual (now electronic) agreement form is here: http://www.python.org/psf/contrib/contrib-form/

Thanks!
msg7948 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-03-20.22:23:15
Hi Frank,

I have signed that agreement previously, since I already submitted a few
patches to cpython in the past.

Is there a way to confirm that?

Thanks,
Santoso.
msg7949 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-03-20.22:37:45
Ah I see Santoso Wijaya/santa4nt on bugs.python.org must be you, and you have the contributor form checked. I'm in the process of updating this tracker so that it will have the same info (maybe even find a way to sync the contributor agreements, but that would come after). Not sure how long that will be. We don't need anything further - now either Alan or I need to evaluate your patches. Thanks again!
msg8195 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-12-12.22:25:58
ping!
msg8196 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-12-14.19:05:13
Yep - I let this one fall through the cracks. I'll try hard to look at this this week. Thanks for your patience and sorry!
msg8197 (view) Author: Jeff Allen (jeff.allen) Date: 2013-12-14.21:29:48
Santoso:

On #2075 you've provided a patch that seems to include your optimised code for this problem too. Is that the one we should now be considering?

The techniques look good to me, although there are a few coding quirks I ought to change. (There's a standard on the Wiki.) But let's start with the right patch.

Jeff
msg8199 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-12-15.00:08:40
Jeff,

That's right. I built my patch for #2075 on top of this one.
msg8201 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-12-17.16:21:14
Hi Jeff,

Thanks for the feedback. I can't fault any of them. I'll have another iteration of the patch in soon.

It's been indeed fun working on these patches!

Regards,
Santoso.
msg8202 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-12-17.17:56:42
Stepping down on this one as Jeff is way more on top of it. Thanks Jeff!
msg8203 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-12-19.05:24:47
Attaching a revised patch that addresses some test regression. Not yet optimized for string usage. Submitted for correctness review. More to follow...
msg8204 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-12-19.05:25:31
Oops, pardon the project file changes noise in that patch.
msg8206 (view) Author: Santoso Wijaya (santa4nt) Date: 2013-12-19.22:25:33
Trimmed patch file.
msg8212 (view) Author: Jeff Allen (jeff.allen) Date: 2013-12-22.18:00:01
This and #2075 resolved by http://hg.python.org/jython/rev/3bfef9e72808 .
History
Date User Action Args
2013-12-22 18:00:01jeff.allensetstatus: open -> closed
resolution: fixed
messages: + msg8212
2013-12-19 22:25:34santa4ntsetfiles: + issue2075_and_2013.patch
messages: + msg8206
2013-12-19 05:25:31santa4ntsetmessages: + msg8204
2013-12-19 05:24:49santa4ntsetfiles: + issue2075_and_2013.patch
messages: + msg8203
2013-12-17 17:56:43fwierzbickisetassignee: fwierzbicki -> jeff.allen
messages: + msg8202
2013-12-17 17:52:57jeff.allensetmessages: - msg8200
2013-12-17 16:21:15santa4ntsetmessages: + msg8201
2013-12-17 09:27:11jeff.allensetmessages: + msg8200
2013-12-15 00:08:40santa4ntsetmessages: + msg8199
2013-12-14 21:29:48jeff.allensetnosy: + jeff.allen
messages: + msg8197
2013-12-14 19:05:36fwierzbickisetversions: + Jython 2.7
2013-12-14 19:05:14fwierzbickisetmessages: + msg8196
2013-12-12 22:25:58santa4ntsetmessages: + msg8195
2013-03-21 02:24:43fwierzbickisetassignee: fwierzbicki
2013-03-21 00:56:00santa4ntsetfiles: - unnamed
2013-03-20 22:37:45fwierzbickisetmessages: + msg7949
2013-03-20 22:23:15santa4ntsetfiles: + unnamed
messages: + msg7948
2013-03-20 16:46:40fwierzbickisetmessages: + msg7947
2013-03-20 03:47:46santa4ntsetfiles: + optimize.patch
messages: + msg7944
2013-03-19 06:04:32santa4ntsetfiles: + optimize_hexify.patch
messages: + msg7941
2013-03-19 05:29:13santa4ntsetfiles: + long___hex__.patch
keywords: + patch
messages: + msg7940
2013-03-18 20:09:45santa4ntsetnosy: + santa4nt
2013-02-26 18:39:52fwierzbickisetpriority: high
2013-02-26 18:39:43fwierzbickisetnosy: + fwierzbicki
2013-02-11 04:17:21amaksetpriority: high -> (no value)
2013-02-11 03:57:04amaksetmessages: + msg7661
2013-02-11 03:54:17amakcreate