Issue1268

classification
Title: SAX parsers wants to load external DTDs, causing an exception
Type: behaviour Severity: normal
Components: Library Versions: 2.5b1
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: amak Nosy List: amak, fdb, fwierzbicki, lukasz.heldt, ssteiner
Priority: normal Keywords:

Created on 2009-03-06.11:12:18 by fdb, last changed 2010-04-16.18:18:57 by amak.

Files
File name Uploaded Description Edit Remove
do-not-load-external-dtds.patch.txt fdb, 2009-03-06.11:12:17 Patch to turn off loading of external DTDs
test_parseString.py fdb, 2009-03-06.11:14:59
do-not-load-external-dtds.patch.txt fdb, 2009-03-06.11:16:38
Messages
msg4179 (view) Author: Frederik De Bleser (fdb) Date: 2009-03-06.11:12:16
The following patch prevents SAX from loading external DTDs. These
caused Jython to crash when parsing XML documents with Doctypes (such as
XHTML or SVG).
msg4180 (view) Author: Frederik De Bleser (fdb) Date: 2009-03-06.11:14:59
Included is a simple testcase that fails in the latest SVN versions.
Applying the patch fixes the error. The following error is displayed:

Traceback (most recent call last):
  File "test_parseString.py", line 6, in <module>
    xml = parseString(data)
  File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/minidom.py", line
1933, in parseString
    return _do_pulldom_parse(pulldom.parseString, (string,),
  File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/minidom.py", line
1908, in _do_pulldom_parse
    toktype, rootNode = events.getEvent()
  File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/pulldom.py", line
275, in _slurp
    self.parser.parse(self.stream)
  File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 143, in parse
    self._parser.parse(JyInputSourceWrapper(source))
  File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 92, in resolveEntity
    return JyInputSourceWrapper(self._resolver.resolveEntity(pubId, sysId))
  File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 77, in __init__
    if source.getByteStream():
AttributeError: 'unicode' object has no attribute 'getByteStream'
msg4181 (view) Author: Frederik De Bleser (fdb) Date: 2009-03-06.11:16:38
The original patch contained an oversight where the actual changed line
was commented out for testing purposes. The new patch fixes the problem.
msg5323 (view) Author: Lukasz Heldt (lukasz.heldt) Date: 2009-11-24.13:06:13
I have come across the same bug when trying to actually use the external
DTD functionality. After attaching following line into my XML file:

<!DOCTYPE common SYSTEM "common.dtd">

I got the error mentioned by Frederik. Luckily there is an easy fix for
this issue. Following line in JyInputSourceWrapper needs to be changed from:
        if isinstance(source, str):
            javasax.InputSource.__init__(self, source)
to:
        if isinstance(source, str) or isinstance(source, unicode):
            javasax.InputSource.__init__(self, source)
msg5347 (view) Author: simon steiner (ssteiner) Date: 2009-12-04.16:03:56
After Lukasz Heldt fix i get (i dont get this on cpython):

java.io.FileNotFoundException: java.io.FileNotFoundException:
C:\sysdef_1_4_0.dtd (The system can
not find the file specified)
msg5348 (view) Author: simon steiner (ssteiner) Date: 2009-12-04.16:27:21
I added the patch and its ok do-not-load-external-dtds.patch.txt
msg5352 (view) Author: Frederik De Bleser (fdb) Date: 2009-12-05.12:34:52
Turning off DTD validation seems the cleanest solution and is conform
with the Python parsing API, which does not validate external DTDs.
msg5726 (view) Author: Alan Kennedy (amak) Date: 2010-04-16.18:16:39
Lukasz Heldt is right, the correct fix for this is to have JyInputSource check for unicode as well as ordinary str types.

Fix checked in at revision 7028.
msg5727 (view) Author: Alan Kennedy (amak) Date: 2010-04-16.18:18:57
However, the fix mentioned above gives rise to a consequent error, which is that any attempt to retrieve the actual DTD file from the w3.org web site for xhtml strict gives a HTTP 503 - Service Unavailable error.

The only way to solve that problem is to use set the "load-external-dtds" feature to False.

I'll see if I can find a simple way to make this easily configurable for the end user.
History
Date User Action Args
2010-04-16 18:18:57amaksetmessages: + msg5727
2010-04-16 18:16:40amaksetassignee: fwierzbicki -> amak
messages: + msg5726
nosy: + amak
2009-12-05 12:34:52fdbsetmessages: + msg5352
2009-12-04 16:27:21ssteinersetmessages: + msg5348
2009-12-04 16:03:57ssteinersetmessages: + msg5347
2009-11-24 13:06:13lukasz.heldtsetnosy: + lukasz.heldt
messages: + msg5323
2009-10-22 07:34:07ssteinersetnosy: + ssteiner
2009-03-14 14:59:20fwierzbickisetpriority: normal
assignee: fwierzbicki
2009-03-06 18:31:37fwierzbickisetnosy: + fwierzbicki
2009-03-06 11:16:38fdbsetfiles: + do-not-load-external-dtds.patch.txt
messages: + msg4181
2009-03-06 11:15:00fdbsetfiles: + test_parseString.py
messages: + msg4180
2009-03-06 11:12:18fdbcreate