Issue1268
Created on 2009-03-06.11:12:18 by fdb, last changed 2012-04-01.17:05:33 by amak.
msg4179 (view) |
Author: Frederik De Bleser (fdb) |
Date: 2009-03-06.11:12:16 |
|
The following patch prevents SAX from loading external DTDs. These
caused Jython to crash when parsing XML documents with Doctypes (such as
XHTML or SVG).
|
msg4180 (view) |
Author: Frederik De Bleser (fdb) |
Date: 2009-03-06.11:14:59 |
|
Included is a simple testcase that fails in the latest SVN versions.
Applying the patch fixes the error. The following error is displayed:
Traceback (most recent call last):
File "test_parseString.py", line 6, in <module>
xml = parseString(data)
File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/minidom.py", line
1933, in parseString
return _do_pulldom_parse(pulldom.parseString, (string,),
File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/minidom.py", line
1908, in _do_pulldom_parse
toktype, rootNode = events.getEvent()
File "/Users/fdb/Java/jython-svn/dist/Lib/xml/dom/pulldom.py", line
275, in _slurp
self.parser.parse(self.stream)
File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 143, in parse
self._parser.parse(JyInputSourceWrapper(source))
File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 92, in resolveEntity
return JyInputSourceWrapper(self._resolver.resolveEntity(pubId, sysId))
File
"/Users/fdb/Java/jython-svn/dist/Lib/xml/sax/drivers2/drv_javasax.py",
line 77, in __init__
if source.getByteStream():
AttributeError: 'unicode' object has no attribute 'getByteStream'
|
msg4181 (view) |
Author: Frederik De Bleser (fdb) |
Date: 2009-03-06.11:16:38 |
|
The original patch contained an oversight where the actual changed line
was commented out for testing purposes. The new patch fixes the problem.
|
msg5323 (view) |
Author: Lukasz Heldt (lukasz.heldt) |
Date: 2009-11-24.13:06:13 |
|
I have come across the same bug when trying to actually use the external
DTD functionality. After attaching following line into my XML file:
<!DOCTYPE common SYSTEM "common.dtd">
I got the error mentioned by Frederik. Luckily there is an easy fix for
this issue. Following line in JyInputSourceWrapper needs to be changed from:
if isinstance(source, str):
javasax.InputSource.__init__(self, source)
to:
if isinstance(source, str) or isinstance(source, unicode):
javasax.InputSource.__init__(self, source)
|
msg5347 (view) |
Author: simon steiner (ssteiner) |
Date: 2009-12-04.16:03:56 |
|
After Lukasz Heldt fix i get (i dont get this on cpython):
java.io.FileNotFoundException: java.io.FileNotFoundException:
C:\sysdef_1_4_0.dtd (The system can
not find the file specified)
|
msg5348 (view) |
Author: simon steiner (ssteiner) |
Date: 2009-12-04.16:27:21 |
|
I added the patch and its ok do-not-load-external-dtds.patch.txt
|
msg5352 (view) |
Author: Frederik De Bleser (fdb) |
Date: 2009-12-05.12:34:52 |
|
Turning off DTD validation seems the cleanest solution and is conform
with the Python parsing API, which does not validate external DTDs.
|
msg5726 (view) |
Author: Alan Kennedy (amak) |
Date: 2010-04-16.18:16:39 |
|
Lukasz Heldt is right, the correct fix for this is to have JyInputSource check for unicode as well as ordinary str types.
Fix checked in at revision 7028.
|
msg5727 (view) |
Author: Alan Kennedy (amak) |
Date: 2010-04-16.18:18:57 |
|
However, the fix mentioned above gives rise to a consequent error, which is that any attempt to retrieve the actual DTD file from the w3.org web site for xhtml strict gives a HTTP 503 - Service Unavailable error.
The only way to solve that problem is to use set the "load-external-dtds" feature to False.
I'll see if I can find a simple way to make this easily configurable for the end user.
|
msg6038 (view) |
Author: dan regan (dregan) |
Date: 2010-09-06.06:33:54 |
|
Here is a patch to fix bug 1268 (don't load external DTDs), which is based on Frederik De Bleser's patch, but I made it work whether or not JAXP was used, and attempted to make it work whether or not Xerxes is the SAX parser. Unfortunately, there isn't a truly parser agnostic way to disable DTD parsing, but I made attempts to do so if Xerxes is not the parser (as far as I know, it always is).
|
msg6039 (view) |
Author: dan regan (dregan) |
Date: 2010-09-06.06:36:05 |
|
This is my first patch, by the way. Please give let me know if I have not done something correctly.
I have run into this issue, and it's annoying. I wrote a editor for an XML preference file that contains a DTD with a bogus URL, and to do so, I had to remove the DTD before parsing the pref file, and add it after writing it again.
|
msg6040 (view) |
Author: dan regan (dregan) |
Date: 2010-09-06.06:49:09 |
|
Forgot to import some feature constants from xml/sax/handler.py.
|
msg6041 (view) |
Author: dan regan (dregan) |
Date: 2010-09-06.06:50:09 |
|
Here is a patch to fix bug 1268 (don't load external DTDs), which is based on Frederik De Bleser's patch, but I made it work whether or not JAXP was used, and attempted to make it work whether or not Xerxes is the SAX parser. Unfortunately, there isn't a truly parser agnostic way to disable DTD parsing, but I made attempts to do so if Xerxes is not the parser (as far as I know, it always is).
|
msg6042 (view) |
Author: dan regan (dregan) |
Date: 2010-09-06.06:55:10 |
|
Oops, it's right this time.
|
msg6154 (view) |
Author: Alan Kennedy (amak) |
Date: 2010-10-07.19:58:56 |
|
Hi Dan,
Patch looks good, except that I'm not too sure about the disabling of entities, through disabling the following features.
feature_external_ges
feature_external_pes
If you can provide good reasoning for these, I will apply the patch as is. Otherwise, I'd prefer to make minimal changes, and thus remove the code that disables these features.
|
msg6431 (view) |
Author: Alan Kennedy (amak) |
Date: 2011-03-12.14:00:18 |
|
Ping.
Dan, please can you address the questions in my most recent message?
Thanks,
Alan.
|
msg7011 (view) |
Author: Alan Kennedy (amak) |
Date: 2012-04-01.17:05:33 |
|
Closing this bug as fixed.
The original bug is fixed, i.e. the bug relating to not recognising unicode parameters.
The consequent issue, where w3.org was refusing service for DTD, instead returning a 503, is a w3 issue, not a jython one. Also, the DTD is now resolving, so w3.org have changed their policy for this.
There are necessarily differences between cpython and jython, and XML processing has many, because of the fundamentally different parsers used.
For future users that may wish to disable the DTD related features, it is worth noting that dregan's patch of 2010-09-06.06:55:10 works.
|
|
Date |
User |
Action |
Args |
2012-04-01 17:05:33 | amak | set | status: open -> closed resolution: fixed messages:
+ msg7011 |
2011-03-12 14:00:18 | amak | set | messages:
+ msg6431 |
2010-10-07 19:58:57 | amak | set | messages:
+ msg6154 |
2010-09-06 06:55:11 | dregan | set | files:
+ do_not_load_external_dtds.patch messages:
+ msg6042 |
2010-09-06 06:50:09 | dregan | set | files:
+ do_not_load_external_dtds.patch messages:
+ msg6041 |
2010-09-06 06:49:09 | dregan | set | messages:
+ msg6040 |
2010-09-06 06:36:05 | dregan | set | messages:
+ msg6039 |
2010-09-06 06:33:57 | dregan | set | files:
+ do_not_load_external_dtds.patch keywords:
+ patch messages:
+ msg6038 nosy:
+ dregan versions:
+ 2.5.2b1, - 2.5b1 |
2010-04-16 18:18:57 | amak | set | messages:
+ msg5727 |
2010-04-16 18:16:40 | amak | set | assignee: fwierzbicki -> amak messages:
+ msg5726 nosy:
+ amak |
2009-12-05 12:34:52 | fdb | set | messages:
+ msg5352 |
2009-12-04 16:27:21 | ssteiner | set | messages:
+ msg5348 |
2009-12-04 16:03:57 | ssteiner | set | messages:
+ msg5347 |
2009-11-24 13:06:13 | lukasz.heldt | set | nosy:
+ lukasz.heldt messages:
+ msg5323 |
2009-10-22 07:34:07 | ssteiner | set | nosy:
+ ssteiner |
2009-03-14 14:59:20 | fwierzbicki | set | priority: normal assignee: fwierzbicki |
2009-03-06 18:31:37 | fwierzbicki | set | nosy:
+ fwierzbicki |
2009-03-06 11:16:38 | fdb | set | files:
+ do-not-load-external-dtds.patch.txt messages:
+ msg4181 |
2009-03-06 11:15:00 | fdb | set | files:
+ test_parseString.py messages:
+ msg4180 |
2009-03-06 11:12:18 | fdb | create | |
|