Issue1523

classification

Title:	xml.sax parser receiving inconsistent read() buffer sizes
Type:	behaviour	Severity:	normal
Components:		Versions:	2.5.1
		Milestone:

process

Status:	closed	Resolution:	invalid
Dependencies:		Superseder:
Assigned To:		Nosy List:	kurtmckee, pjenvey
Priority:		Keywords:

Created on 2009-12-14.05:07:58 by kurtmckee, last changed 2009-12-15.03:55:05 by pjenvey.

Messages
msg5376 (view)	Author: Kurt McKee (kurtmckee)	Date: 2009-12-14.05:07:57
The xml.sax parser is receiving different read() buffer sizes with each call. This is in contrast to the CPython implementation, which appears to call read() with consistent buffer sizes. The following code demonstrates the issue (I can't seem to create attachments, sorry): import xml.sax import StringIO class Catcher(StringIO.StringIO): def read(self, size): print size return StringIO.StringIO.read(self, size) s = """<?xml version="1.0"?>\n<root version="2.0"/>""" handler = xml.sax.handler.ContentHandler() parser = xml.sax.make_parser() parser.setContentHandler(handler) parser.parse(Catcher(s)) Python 2.6 prints """ 65516 65516 """ Jython 2.5.1 prints """ 1 1 1 1 28 8188 8192 """ I read through the Jython xml.sax module source code but couldn't figure out why this was happening.
msg5377 (view)	Author: Philip Jenvey (pjenvey)	Date: 2009-12-14.06:38:16
Why is this a problem?
msg5384 (view)	Author: Kurt McKee (kurtmckee)	Date: 2009-12-15.03:26:11
I ran across this because of some code I wrote that injects a DOCTYPE into otherwise-invalid XML; I had written it assuming that read() calls would always be 216, as it is in the IncrementalParser code, or 216-20, as it is in the CPython expatreader.py file. It may be that this isn't a bug in Jython at all; this may be a perfectly valid parser-specific difference, but I wouldn't know. It's your call. :)
msg5385 (view)	Author: Philip Jenvey (pjenvey)	Date: 2009-12-15.03:55:05
It's not a bug, CPython only says that it requires a file (or I guess a file-like) object, it doesn't make any guarantees about the size of the reads it makes, they're just an implementation detail. Jython backs these xml libraries by Java xml libs which do their reads differently. It doesn't use the expatreader.py module at all

History
Date	User	Action	Args
2009-12-15 03:55:05	pjenvey	set	status: open -> closed resolution: invalid messages: + msg5385
2009-12-15 03:26:11	kurtmckee	set	messages: + msg5384
2009-12-14 06:38:17	pjenvey	set	nosy: + pjenvey messages: + msg5377
2009-12-14 05:08:14	kurtmckee	set	versions: + 2.5.1
2009-12-14 05:07:58	kurtmckee	create