Issue1595

classification
Title: [patch] CachedJarsPackageManager cannot write cache for packages in jar over 64k
Type: behaviour Severity: normal
Components: Core Versions: 2.5.1
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: bpedman, pjenvey
Priority: Keywords: patch

Created on 2010-04-12.15:30:13 by bpedman, last changed 2010-06-03.03:54:56 by pjenvey.

Files
File name Uploaded Description Edit Remove
cached-jar-over64k.patch bpedman, 2010-04-12.15:30:10 patch for CachedJarsPackageManager.java
cached-jar-over64k.patch bpedman, 2010-04-13.18:05:56 updated patch
CachedJarsOver64k.java bpedman, 2010-04-13.18:08:44 junit testcase
vim25.jar bpedman, 2010-04-13.18:10:32 offending jar file for test
cached-jar-over64k-v2.diff pjenvey, 2010-04-17.17:56:18
cached-jar-over64k-v3.patch bpedman, 2010-04-27.20:06:31 patch to support reading multiple chunks
CachedJarsOver64kTest.java bpedman, 2010-04-27.20:08:32 updated test to catch invalid cache read after writing multiple chunks
Messages
msg5681 (view) Author: Brandon (bpedman) Date: 2010-04-12.15:30:10
The DataOutputStream in java cannot write UTF strings that are over 64k. This is an issue when a jar is over 64k because the CachedJarsPackageManager will try to write the whole thing out to the cache which will cause an exception to be thrown and the jar package will not be processed. 

To fix this the jar file needs to be split up into 64k chunks and each chunk needs to be written to the output stream.

I am submitting a patch to fix this issue. I have tested it with a jar file I was using which was causing the issue and it works correctly for me. It also works for jars that are under 64k with no problem.
msg5690 (view) Author: Philip Jenvey (pjenvey) Date: 2010-04-13.04:13:31
Can you create an accompanying test for this code, or at least post an offending jar?

And we'd appreciate it if the syntax in the patch adhered to our coding standards, listed here: http://wiki.python.org/jython/CodingStandards
msg5693 (view) Author: Brandon (bpedman) Date: 2010-04-13.18:05:56
Ok, so I modified the code to match the standards (I think...let me know if anything else is wrong). Here is the modified patch

Also, I realized it wasn't that the actual jar file that is over 64k but the packages (or whatever is returned from getZipPackages) is over 64k. I will attach a testcase and the jar file to go with it
msg5694 (view) Author: Brandon (bpedman) Date: 2010-04-13.18:08:19
Attaching the junit testcase I created
msg5695 (view) Author: Brandon (bpedman) Date: 2010-04-13.18:10:32
Attaching the jar file that causes the error. Taken from the VMware vsphere sdk
msg5729 (view) Author: Philip Jenvey (pjenvey) Date: 2010-04-17.17:56:17
This change writes potentially multiple strings out for the listing of classes, but there's no change to read more than one string in from the cache file

So with this change and the vim25.jar cached what happens is only its first chunk of classes is read, then the second chunk is only read when trying to read the next package name. In this jar's case the next read is EOF so the bad package name is discarded

What needs to happen is the on disk format should basically be package name followed by the number of classes chunks for the package, then the chunks themselves

With this change we'd also need to either handle the old format of the packagecache, or somehow migrate users over to the new version. I don't know enough about the packagecache code yet to know the best way to gracefully deal with a change of the on disk format
msg5730 (view) Author: Philip Jenvey (pjenvey) Date: 2010-04-17.17:57:20
I've attached a new patch that integrates the test case into our "ant javatest", and some other minor changes
msg5731 (view) Author: Philip Jenvey (pjenvey) Date: 2010-04-17.18:03:20
Correct me if I'm wrong... but one possible hack^H^H^H^Hsolution might be to just emit another package name with the same name for each chunk. This data is mapped to a java Map when read, so the reader code would just have to check if the package name already exists in the Map, and if so append the classes to what's already there
msg5753 (view) Author: Brandon (bpedman) Date: 2010-04-27.20:06:31
You are right...it wasn't reading from the cache properly. I agree, it would be proper to add the number of chunks to the cache format but I implemented your second solution. 

While there is some performance degradation here as the number of chunks increases, this would probably(tm) never become an issue because this situation is rare (and chances of the number of chunks getting very large is even smaller)

Either way, I am attaching both a modified test to catch the invalid cache read and the patch for what I am going to use in the package manager
msg5795 (view) Author: Philip Jenvey (pjenvey) Date: 2010-06-03.03:54:56
applied in r7063 with a couple changes:

o slightly cleaned up splitString method from my previous patch
o reduced the vim25 test jar size
o the test fails if any warning is emitted instead of relying on the warning message (which could easily change)

thanks for the patch!
History
Date User Action Args
2010-06-03 03:54:56pjenveysetstatus: open -> closed
resolution: fixed
messages: + msg5795
2010-04-27 20:08:32bpedmansetfiles: + CachedJarsOver64kTest.java
2010-04-27 20:06:33bpedmansetfiles: + cached-jar-over64k-v3.patch
messages: + msg5753
2010-04-17 18:03:21pjenveysetmessages: + msg5731
2010-04-17 17:57:20pjenveysetmessages: + msg5730
2010-04-17 17:56:21pjenveysetfiles: + cached-jar-over64k-v2.diff
messages: + msg5729
2010-04-13 18:10:32bpedmansetfiles: + vim25.jar
messages: + msg5695
2010-04-13 18:08:44bpedmansetfiles: + CachedJarsOver64k.java
2010-04-13 18:08:19bpedmansetmessages: + msg5694
2010-04-13 18:05:57bpedmansetfiles: + cached-jar-over64k.patch
messages: + msg5693
title: [patch] CachedJarsPackageManager cannot write cache for jar over 64k -> [patch] CachedJarsPackageManager cannot write cache for packages in jar over 64k
2010-04-13 04:13:31pjenveysetnosy: + pjenvey
messages: + msg5690
2010-04-12 15:30:13bpedmancreate