Issue1268

classification
Title: SAX parsers wants to load external DTDs, causing an exception
Type: behaviour Severity: normal
Components: Library Versions: 2.5.2b1
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: amak Nosy List: amak, dregan, fdb, fwierzbicki, lukasz.heldt, ssteiner
Priority: normal Keywords: patch

Created on 2009-03-06.11:12:18 by fdb, last changed 2012-04-01.17:05:33 by amak.

Files
File name Uploaded Description Edit Remove
do-not-load-external-dtds.patch.txt fdb, 2009-03-06.11:12:17 Patch to turn off loading of external DTDs
test_parseString.py fdb, 2009-03-06.11:14:59
do-not-load-external-dtds.patch.txt fdb, 2009-03-06.11:16:38
do_not_load_external_dtds.patch dregan, 2010-09-06.06:33:55 Patch for bug 1268, do not load external DTDs.
do_not_load_external_dtds.patch dregan, 2010-09-06.06:50:09 Patch for bug 1268, do not load external DTDs.
do_not_load_external_dtds.patch dregan, 2010-09-06.06:55:10 Patch for bug 1268, do not load external DTDs.
Messages
msg4179 (view) Author: Frederik De Bleser (fdb) Date: 2009-03-06.11:12:16
The following patch prevents SAX from loading external DTDs. These
caused Jython to crash when parsing XML documents with Doctypes (such as
XHTML or SVG).
msg4180 (view) Author: Frederik De Bleser (fdb) Date: 2009-03-06.11:14:59
Included is a simple testcase that fails in the latest SVN versions.
Applying the patch fixes the error. The following error is displayed:

Traceback (most recent call last):
  File "test_parseString.py", line 6, in <module>
    xml = parseString(data)
  File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/minidom.py", line
1933, in parseString
    return _do_pulldom_parse(pulldom.parseString, (string,),
  File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/minidom.py", line
1908, in _do_pulldom_parse
    toktype, rootNode = events.getEvent()
  File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/pulldom.py", line
275, in _slurp
    self.parser.parse(self.stream)
  File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 143, in parse
    self._parser.parse(JyInputSourceWrapper(source))
  File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 92, in resolveEntity
    return JyInputSourceWrapper(self._resolver.resolveEntity(pubId, sysId))
  File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 77, in __init__
    if source.getByteStream():
AttributeError: 'unicode' object has no attribute 'getByteStream'
msg4181 (view) Author: Frederik De Bleser (fdb) Date: 2009-03-06.11:16:38
The original patch contained an oversight where the actual changed line
was commented out for testing purposes. The new patch fixes the problem.
msg5323 (view) Author: Lukasz Heldt (lukasz.heldt) Date: 2009-11-24.13:06:13
I have come across the same bug when trying to actually use the external
DTD functionality. After attaching following line into my XML file:

<!DOCTYPE common SYSTEM "common.dtd">

I got the error mentioned by Frederik. Luckily there is an easy fix for
this issue. Following line in JyInputSourceWrapper needs to be changed from:
        if isinstance(source, str):
            javasax.InputSource.__init__(self, source)
to:
        if isinstance(source, str) or isinstance(source, unicode):
            javasax.InputSource.__init__(self, source)
msg5347 (view) Author: simon steiner (ssteiner) Date: 2009-12-04.16:03:56
After Lukasz Heldt fix i get (i dont get this on cpython):

java.io.FileNotFoundException: java.io.FileNotFoundException:
C:\sysdef_1_4_0.dtd (The system can
not find the file specified)
msg5348 (view) Author: simon steiner (ssteiner) Date: 2009-12-04.16:27:21
I added the patch and its ok do-not-load-external-dtds.patch.txt
msg5352 (view) Author: Frederik De Bleser (fdb) Date: 2009-12-05.12:34:52
Turning off DTD validation seems the cleanest solution and is conform
with the Python parsing API, which does not validate external DTDs.
msg5726 (view) Author: Alan Kennedy (amak) Date: 2010-04-16.18:16:39
Lukasz Heldt is right, the correct fix for this is to have JyInputSource check for unicode as well as ordinary str types.

Fix checked in at revision 7028.
msg5727 (view) Author: Alan Kennedy (amak) Date: 2010-04-16.18:18:57
However, the fix mentioned above gives rise to a consequent error, which is that any attempt to retrieve the actual DTD file from the w3.org web site for xhtml strict gives a HTTP 503 - Service Unavailable error.

The only way to solve that problem is to use set the "load-external-dtds" feature to False.

I'll see if I can find a simple way to make this easily configurable for the end user.
msg6038 (view) Author: dan regan (dregan) Date: 2010-09-06.06:33:54
Here is a patch to fix bug 1268 (don't load external DTDs), which is based on Frederik De Bleser's patch, but I made it work whether or not JAXP was used, and attempted to make it work whether or not Xerxes is the SAX parser.  Unfortunately, there isn't a truly parser agnostic way to disable DTD parsing, but I made attempts to do so if Xerxes is not the parser (as far as I know, it always is).
msg6039 (view) Author: dan regan (dregan) Date: 2010-09-06.06:36:05
This is my first patch, by the way.  Please give let me know if I have not done something correctly.

I have run into this issue, and it's annoying.  I wrote a editor for an XML preference file that contains a DTD with a bogus URL, and to do so, I had to remove the DTD before parsing the pref file, and add it after writing it again.
msg6040 (view) Author: dan regan (dregan) Date: 2010-09-06.06:49:09
Forgot to import some feature constants from xml/sax/handler.py.
msg6041 (view) Author: dan regan (dregan) Date: 2010-09-06.06:50:09
Here is a patch to fix bug 1268 (don't load external DTDs), which is based on Frederik De Bleser's patch, but I made it work whether or not JAXP was used, and attempted to make it work whether or not Xerxes is the SAX parser.  Unfortunately, there isn't a truly parser agnostic way to disable DTD parsing, but I made attempts to do so if Xerxes is not the parser (as far as I know, it always is).
msg6042 (view) Author: dan regan (dregan) Date: 2010-09-06.06:55:10
Oops, it's right this time.
msg6154 (view) Author: Alan Kennedy (amak) Date: 2010-10-07.19:58:56
Hi Dan,

Patch looks good, except that I'm not too sure about the disabling of entities, through disabling the following features.

feature_external_ges
feature_external_pes

If you can provide good reasoning for these, I will apply the patch as is. Otherwise, I'd prefer to make minimal changes, and thus remove the code that disables these features.
msg6431 (view) Author: Alan Kennedy (amak) Date: 2011-03-12.14:00:18
Ping.

Dan, please can you address the questions in my most recent message?

Thanks,

Alan.
msg7011 (view) Author: Alan Kennedy (amak) Date: 2012-04-01.17:05:33
Closing this bug as fixed.

The original bug is fixed, i.e. the bug relating to not recognising unicode parameters.

The consequent issue, where w3.org was refusing service for DTD, instead returning a 503, is a w3 issue, not a jython one. Also, the DTD is now resolving, so w3.org have changed their policy for this.

There are necessarily differences between cpython and jython, and XML processing has many, because of the fundamentally different parsers used.

For future users that may wish to disable the DTD related features, it is worth noting  that dregan's patch of 2010-09-06.06:55:10 works.
History
Date User Action Args
2012-04-01 17:05:33amaksetstatus: open -> closed
resolution: fixed
messages: + msg7011
2011-03-12 14:00:18amaksetmessages: + msg6431
2010-10-07 19:58:57amaksetmessages: + msg6154
2010-09-06 06:55:11dregansetfiles: + do_not_load_external_dtds.patch
messages: + msg6042
2010-09-06 06:50:09dregansetfiles: + do_not_load_external_dtds.patch
messages: + msg6041
2010-09-06 06:49:09dregansetmessages: + msg6040
2010-09-06 06:36:05dregansetmessages: + msg6039
2010-09-06 06:33:57dregansetfiles: + do_not_load_external_dtds.patch
keywords: + patch
messages: + msg6038
nosy: + dregan
versions: + 2.5.2b1, - 2.5b1
2010-04-16 18:18:57amaksetmessages: + msg5727
2010-04-16 18:16:40amaksetassignee: fwierzbicki -> amak
messages: + msg5726
nosy: + amak
2009-12-05 12:34:52fdbsetmessages: + msg5352
2009-12-04 16:27:21ssteinersetmessages: + msg5348
2009-12-04 16:03:57ssteinersetmessages: + msg5347
2009-11-24 13:06:13lukasz.heldtsetnosy: + lukasz.heldt
messages: + msg5323
2009-10-22 07:34:07ssteinersetnosy: + ssteiner
2009-03-14 14:59:20fwierzbickisetpriority: normal
assignee: fwierzbicki
2009-03-06 18:31:37fwierzbickisetnosy: + fwierzbicki
2009-03-06 11:16:38fdbsetfiles: + do-not-load-external-dtds.patch.txt
messages: + msg4181
2009-03-06 11:15:00fdbsetfiles: + test_parseString.py
messages: + msg4180
2009-03-06 11:12:18fdbcreate