Issue1757419

classification

Title:	xml handling in jython and in python works different
Type:		Severity:	normal
Components:	None	Versions:
		Milestone:

process

Status:	closed	Resolution:	invalid
Dependencies:		Superseder:
Assigned To:		Nosy List:	amak, cgroves, pekka.klarck, pillesoft
Priority:	normal	Keywords:

Created on 2007-07-20.10:03:31 by pillesoft, last changed 2007-07-20.18:33:06 by amak.

Files
File name	Uploaded	Description	Edit	Remove
xml_test.ZIP	pillesoft, 2007-07-20.10:03:32

Messages
msg1743 (view)	Author: pillesoft (pillesoft)	Date: 2007-07-20.10:03:31
Dear All, i have a strange problem, which i don't understand. i need to parse an xml string, which works different in python 2.4.2 and in jython 2.2rc2 please check the attached files, and run the py code in python and in jython2.2rc2 you will see that in jython there are nodes where more textnodes exist, while in python every node has only one textnode. if there is a problem in the code, please give me some help? thank you Ivan
msg1744 (view)	Author: Pekka Klärck (pekka.klarck)	Date: 2007-07-20.10:46:59
Could you please create the simplest possible example where something goes wrong? The xml file you attached is huge -- I'm sure you can create a simpler one showing the same problem. Your Python code is ok otherwise but it contains hard coded path (d:\temp\pms_act.xml) so it won't run without editing. Probably the best would be if you created just one .py file where you had the xml as string. Then you could simply attach that file (instead of a zip) to the bug report so that it would be easier to view and download the code. Your example could look something like below. from xml.dom import minidom from StringIO import StringIO myxml = """<hello> world </hello>""" mydom = minidom.parse(StringIO(myxml)) # do something demonstrating the problem here ... If you make the example simple it is more likely that someone will actually take a look at it. I'm doing all my little Jython contributions totally on my own time. I know something about xml and could investigate this out of my own interest but I'm not planning to spent time on getting the example running first.
msg1745 (view)	Author: Alan Kennedy (amak)	Date: 2007-07-20.11:18:45
Hmm. I don't have cpython 2.4.2, I have cpython 2.4.4. When I run your code on cpython 2.4.4, I get nothing at all, no output, zippo. When I run your code on jython 2.2rc2, I get some output, but I can't tell whether it's correct or not, because I can't look at your XML file; I can't find a text editor that can handle 760,000 bytes all on the same line. I suggest you put some newlines into your XML file. At least then we'll be able to see what's inside it. It's quite possible that the extra long line is the source of your problems. I don't have a specific mechanism in mind, but do remember that files opened in text mode undergo line-ending translation, which means they pass through internal buffers and processing at some stage. It is unlikely that these buffers are big enough to hold the entirety of your 760000-byte string at once. (They should still be able to process it correctly, but perhaps there's a bug in there, caused by the enormity of the string, that has nothing to do with XML at all). So I'd tidy up your XML as a first step, i.e. put a lot more newlines in it.
msg1746 (view)	Author: pillesoft (pillesoft)	Date: 2007-07-20.11:30:22
Dear All, thank you for dealing with my problem. i tried to create a smaller xml file, but then i don't receive any differences in python, and jython. i think the problem is really the size of the string. in my xp i use notetab light as a simple text editor the application which i'm developing is an interface between a vfp based com and primavera java api. the xml is to exchange information between the two different systems. therefore it is more likely to need to work even bigger strings.
msg1747 (view)	Author: Charlie Groves (cgroves)	Date: 2007-07-20.17:50:11
It actually looks like this isn't supposed to produce any output if it runs successfully. If I understand your complaint correctly, it's that Jython returns multiple text nodes for a single, contiguous piece of text in the xml? This is just a property of the underlying DOM created by the Java XML libraries. They create multiple text nodes for that text, so minidom exposes the multiple text nodes. It isn't wrong that CPython makes a single node and it isn't wrong that Jython makes multiple nodes; that behaviour isn't specified by DOM. If you know all the childNodes of a node are going to be text nodes you can just use something like ''.join([subnode.nodeValue for subnode in node.childNodes]) to get it all as a single string. This will work with a single text node in CPython and multiple nodes in Jython. To add to what Alan and Pekka said a little bit, it's really helpful if you can get your report down to the smallest possible thing that causes the problem. Clearly say what happens, what you expect to happen, and how to make that happen with your test case. Here it wasn't clear that the code didn't print anything at all if it ran successfully.
msg1748 (view)	Author: Alan Kennedy (amak)	Date: 2007-07-20.18:33:06
To comment further on DOMs having multiple nodes for a section of text: This is the way that DOM is designed, and is essentially caused by SAX, which does not guarantee to deliver all text in the same event call, primarily reasons of buffering. See the DOM method Node.normalize() """ Puts all Text nodes in the full depth of the sub-tree underneath this Node, including attribute nodes, into a "normal" form where only structure (e.g., elements, comments, processing instructions, CDATA sections, and entity references) separates Text nodes, i.e., there are neither adjacent Text nodes nor empty Text nodes. """ http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-1950641247

History
Date	User	Action	Args
2007-07-20 10:03:31	pillesoft	create