Issue1523
Created on 2009-12-14.05:07:58 by kurtmckee, last changed 2009-12-15.03:55:05 by pjenvey.
msg5376 (view) |
Author: Kurt McKee (kurtmckee) |
Date: 2009-12-14.05:07:57 |
|
The xml.sax parser is receiving different read() buffer sizes with each
call. This is in contrast to the CPython implementation, which appears
to call read() with consistent buffer sizes. The following code
demonstrates the issue (I can't seem to create attachments, sorry):
import xml.sax
import StringIO
class Catcher(StringIO.StringIO):
def read(self, size):
print size
return StringIO.StringIO.read(self, size)
s = """<?xml version="1.0"?>\n<root version="2.0"/>"""
handler = xml.sax.handler.ContentHandler()
parser = xml.sax.make_parser()
parser.setContentHandler(handler)
parser.parse(Catcher(s))
Python 2.6 prints
"""
65516
65516
"""
Jython 2.5.1 prints
"""
1
1
1
1
28
8188
8192
"""
I read through the Jython xml.sax module source code but couldn't figure
out why this was happening.
|
msg5377 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2009-12-14.06:38:16 |
|
Why is this a problem?
|
msg5384 (view) |
Author: Kurt McKee (kurtmckee) |
Date: 2009-12-15.03:26:11 |
|
I ran across this because of some code I wrote that injects a DOCTYPE
into otherwise-invalid XML; I had written it assuming that read() calls
would always be 2**16, as it is in the IncrementalParser code, or
2**16-20, as it is in the CPython expatreader.py file.
It may be that this isn't a bug in Jython at all; this may be a
perfectly valid parser-specific difference, but I wouldn't know. It's
your call. :)
|
msg5385 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2009-12-15.03:55:05 |
|
It's not a bug, CPython only says that it requires a file (or I guess a
file-like) object, it doesn't make any guarantees about the size of the
reads it makes, they're just an implementation detail.
Jython backs these xml libraries by Java xml libs which do their reads
differently. It doesn't use the expatreader.py module at all
|
|
Date |
User |
Action |
Args |
2009-12-15 03:55:05 | pjenvey | set | status: open -> closed resolution: invalid messages:
+ msg5385 |
2009-12-15 03:26:11 | kurtmckee | set | messages:
+ msg5384 |
2009-12-14 06:38:17 | pjenvey | set | nosy:
+ pjenvey messages:
+ msg5377 |
2009-12-14 05:08:14 | kurtmckee | set | versions:
+ 2.5.1 |
2009-12-14 05:07:58 | kurtmckee | create | |
|