Issue2176

classification
Title: Extracting from bz2 compressed tarfile may result in missing chunks of files
Type: Severity: normal
Components: Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: zyasoft Nosy List: zyasoft
Priority: Keywords:

Created on 2014-07-01.03:57:52 by zyasoft, last changed 2014-07-09.23:59:46 by zyasoft.

Messages
msg8855 (view) Author: Jim Baker (zyasoft) Date: 2014-07-01.03:57:51
It's trivial to reproduce this bug:

wget https://pypi.python.org/packages/source/P/PrettyTable/prettytable-0.7.2.tar.bz2#md5=760dc900590ac3c46736167e09fa463a

import tarfile

with tarfile.open(name='prettytable-0.7.2.tar.bz2', mode='r') as t:
    t.extractall()

will reliably produce output files with chunks of data missing, as can be readily be verified with diff -r

Note that using the bz2 module to decompress produces a valid tar, so there's likely some interaction between streaming decompress and output of a given block to the untarred directory.
msg8862 (view) Author: Jim Baker (zyasoft) Date: 2014-07-01.22:46:14
bz2.BZ2File.read would not read all of the bytes requested. In particular, the tarfile module by default would request 16384 bytes at a time, but the read would return no more than 8192 bytes. Then the tarfile would seek forward, assuming it had read 16384 bytes, thereby missing chunks of files.

Fixed as of http://hg.python.org/jython/rev/91b39451dc89
History
Date User Action Args
2014-07-09 23:59:46zyasoftsetstatus: pending -> closed
2014-07-01 22:46:15zyasoftsetstatus: open -> pending
resolution: fixed
messages: + msg8862
2014-07-01 03:59:15zyasoftsettitle: bz2 compressed tarfile can see a corrupted read -> Extracting from bz2 compressed tarfile may result in missing chunks of files
2014-07-01 03:57:52zyasoftcreate