Created on 2011-07-21.15:09:12 by pjac, last changed 2012-03-17.22:39:43 by amak.
|msg6574 (view)||Author: Peter (pjac)||Date: 2011-07-21.15:09:12|
Test case: import sys print sys.version from StringIO import StringIO from xml.dom import pulldom from xml.sax import SAXParseException handle = StringIO() # simulate empty file try: for event,node in pulldom.parse(handle): print event except SAXParseException, e: print repr(e) print "Line number", e.getLineNumber() print "Column number", e.getColumnNumber() print "Done" Reference output from (C) Python on Linux, $ python2.5 sax_empty_xml.py 2.5.5 (r255:77872, Jan 14 2011, 17:09:55) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] SAXParseException('no element found',) Line number 1 Column number 0 Done $ python2.6 sax_empty_xml.py 2.6.6 (r266:84292, Aug 31 2010, 16:21:14) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] SAXParseException('no element found',) Line number 1 Column number 0 Done $ python2.7 sax_empty_xml.py 2.7 (r27:82500, Jul 13 2010, 14:02:41) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] SAXParseException('no element found',) Line number 1 Column number 0 Done Inconsistent output from Jython, $ jython sax_empty_xml.py 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) [OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] SAXParseException(u'Premature end of file.',) Line number 1 Column number 1 Done Notice (a) different exception description, (b) different column number. This bug was identified from a failing Biopython unit test, see: https://redmine.open-bio.org/issues/3267
|msg6591 (view)||Author: Alan Kennedy (amak)||Date: 2011-07-30.00:32:50|
This is fundamentally an interpretation issue. Does one interpret an empty document as a failure to provide parsable tokens from the input stream (the java interpretation, i.e. the tokenizer raises the error) or does one interpret an empty document as a stream of tokens that is empty (the python interpretation, i.e the parser raises the error)? Is there an xml declaration present in the file? i.e. does the stream contain something like "<?xml version="x.y" encoding="blah_encoding"?>" Or is the input stream completely empty, i.e. contains no characters other than whitespace? If the latter, i.e. the document is pure whitespace, then I recommend a pragmatic solution, i.e. document = document.strip() if document: xml_parse(document) else: raise MyException("An whitespace document is meaningless, no matter what its file extension is") In the meantime, I will investigate whether an empty file or a file full of whitespace can meaningfully be described as an XML file.
|msg6811 (view)||Author: Alan Kennedy (amak)||Date: 2012-03-17.22:39:43|
Having thought about this, I think that this is not a valid bug. The correct exception is raised. The textual description contained in the exception cannot be expected to be identical across platforms. As for the column number issue, it looks like expat (which cpython uses) is counting from zero, where the java parser is counting from column 1. Both are valid interpretations, but I consider the java one more appropriate, particularly on jython, since there is no column zero when you open the file in a text editor, for example.
|2012-03-17 22:39:43||amak||set||status: open -> closed|
resolution: wont fix
messages: + msg6811
|2011-07-30 00:32:51||amak||set||assignee: amak|
messages: + msg6591
nosy: + amak