Issue1523

classification
Title: xml.sax parser receiving inconsistent read() buffer sizes
Type: behaviour Severity: normal
Components: Versions: 2.5.1
process
Status: closed Resolution: invalid
Dependencies: Superseder:
Assigned To: Nosy List: kurtmckee, pjenvey
Priority: Keywords:

Created on 2009-12-14.05:07:58 by kurtmckee, last changed 2009-12-15.03:55:05 by pjenvey.

Messages
msg5376 (view) Author: Kurt McKee (kurtmckee) Date: 2009-12-14.05:07:57
The xml.sax parser is receiving different read() buffer sizes with each
call. This is in contrast to the CPython implementation, which appears
to call read() with consistent buffer sizes. The following code
demonstrates the issue (I can't seem to create attachments, sorry):

import xml.sax
import StringIO      

class Catcher(StringIO.StringIO):
    def read(self, size):
        print size
        return StringIO.StringIO.read(self, size)

s = """<?xml version="1.0"?>\n<root version="2.0"/>"""

handler = xml.sax.handler.ContentHandler()
parser = xml.sax.make_parser()
parser.setContentHandler(handler)
parser.parse(Catcher(s))

Python 2.6 prints

"""
65516
65516
"""

Jython 2.5.1 prints

"""
1
1
1
1
28
8188
8192
"""

I read through the Jython xml.sax module source code but couldn't figure
out why this was happening.
msg5377 (view) Author: Philip Jenvey (pjenvey) Date: 2009-12-14.06:38:16
Why is this a problem?
msg5384 (view) Author: Kurt McKee (kurtmckee) Date: 2009-12-15.03:26:11
I ran across this because of some code I wrote that injects a DOCTYPE
into otherwise-invalid XML; I had written it assuming that read() calls
would always be 2**16, as it is in the IncrementalParser code, or
2**16-20, as it is in the CPython expatreader.py file.

It may be that this isn't a bug in Jython at all; this may be a
perfectly valid parser-specific difference, but I wouldn't know. It's
your call. :)
msg5385 (view) Author: Philip Jenvey (pjenvey) Date: 2009-12-15.03:55:05
It's not a bug, CPython only says that it requires a file (or I guess a 
file-like) object, it doesn't make any guarantees about the size of the 
reads it makes, they're just an implementation detail.

Jython backs these xml libraries by Java xml libs which do their reads 
differently. It doesn't use the expatreader.py module at all
History
Date User Action Args
2009-12-15 03:55:05pjenveysetstatus: open -> closed
resolution: invalid
messages: + msg5385
2009-12-15 03:26:11kurtmckeesetmessages: + msg5384
2009-12-14 06:38:17pjenveysetnosy: + pjenvey
messages: + msg5377
2009-12-14 05:08:14kurtmckeesetversions: + 2.5.1
2009-12-14 05:07:58kurtmckeecreate