Issue1537

classification
Title: expat: org.python.apache.xerces.parsers.SAXParser
Type: behaviour Severity: minor
Components: Library Versions: 2.5.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: amak Nosy List: amak, kwatford
Priority: Keywords:

Created on 2010-01-11.20:37:25 by kwatford, last changed 2012-04-01.17:35:17 by amak.

Files
File name Uploaded Description Edit Remove
stacktrace.txt kwatford, 2012-03-20.22:34:40 Stack trace from ClassNotFoundException
Messages
msg5423 (view) Author: Ken Watford (kwatford) Date: 2010-01-11.20:37:24
I'm using jython2.5.1 in MATLAB's JVM. Upon trying to parse an XML file with ElementTree, XMLParser.__init__ tries to get XMLReaderFactory to find "org.python.apache.xerces.parsers.SAXParser". Probably due to limitations in MATLAB's stupid classloader, this does not work (throws a SAXException/ClassNotFoundException)

The only relevant comment on why this weird name is used is expat.py line 51: "Name mangled by jarjar?". 

Manually setting expat._xerces_parser to "org.apache.xerces.parsers.SAXParser" resolves the issue.

Perhaps a try block around the XMLReaderFactory call?
msg5720 (view) Author: Alan Kennedy (amak) Date: 2010-04-16.16:52:16
Hmm, that's odd.

If the classorg.python.apache.xerces.parsers.SAXParser is not found, then it should throw an ImportError.

Please can you try these statements on a jython command line and report the results?

>>> import org.python.apache.xerces.parsers.SAXParser
>>> import org.apache.xerces.parsers.SAXParser

Thanks.
msg5721 (view) Author: Ken Watford (kwatford) Date: 2010-04-16.17:24:55
Due to the way MATLAB manhandles its filehandles I can't use a real Jython prompt, but I can push to an InteractiveConsole just fine. Here are the results, plus the results with my workaround:

>> ic = org.python.util.InteractiveConsole();
>> ic.push('import org.python.apache.xerces.parsers.SAXParser');
>> ic.push('import org.apache.xerces.parsers.SAXParser');
>> ic.push('import xml.etree.ElementTree as etree');
>> ic.push('x = etree.parse("test.xml")');
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/amd/ss01/export/homes16/kwatford/TUF/jlib/jython-2.5.1.jar/Lib/xml/etree/ElementTree.py", line 862, in parse
  File "/amd/ss01/export/homes16/kwatford/TUF/jlib/jython-2.5.1.jar/Lib/xml/etree/ElementTree.py", line 581, in parse
  File "/amd/ss01/export/homes16/kwatford/TUF/jlib/jython-2.5.1.jar/Lib/xml/etree/ElementTree.py", line 1120, in __init__
  File "/amd/ss01/export/homes16/kwatford/TUF/jlib/jython-2.5.1.jar/Lib/xml/parsers/expat.py", line 63, in ParserCreate
  File "/amd/ss01/export/homes16/kwatford/TUF/jlib/jython-2.5.1.jar/Lib/xml/parsers/expat.py", line 91, in __init__
	at org.xml.sax.helpers.XMLReaderFactory.loadClass(Unknown Source)
	at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Unknown Source)

org.xml.sax.SAXException: java.lang.ClassNotFoundException: org.python.apache.xerces.parsers.SAXParser
>> ic.push('import xml.parsers.expat');
>> ic.push('xml.parsers.expat._xerces_parser = "org.apache.xerces.parsers.SAXParser"');
>> ic.push('x = etree.parse("test.xml")');
>> ic.push('print x');
<xml.etree.ElementTree.ElementTree instance at 0x2>
msg5723 (view) Author: Alan Kennedy (amak) Date: 2010-04-16.17:55:11
Please can you do just this bit?

>> ic.push('import org.python.apache.xerces.parsers.SAXParser');

I want to see what it generates. It should generate an ImportError.
msg5724 (view) Author: Ken Watford (kwatford) Date: 2010-04-16.18:01:27
As you can see, that's the first thing I did, and no exception was thrown. I just tried it again in a fresh session, and still no exception. They both import just fine.
msg5735 (view) Author: Alan Kennedy (amak) Date: 2010-04-18.17:09:23
I'm not sure what to do about this problem.

If 

>>> import org.python.apache.xerces.parsers.SAXParser

succeeds, then the class should be loadable.

But an attempt to instantiate the class later fails with a java.lang.ClassNotFoundException: org.python.apache.xerces.parsers.SAXParser

I can only recommend that you make use of the fix you have already identified, i.e. explicitly set the value of _xerces_parser.

I don't see what benefit a try block around the XMLReaderFactory would bring?
msg5736 (view) Author: Alan Kennedy (amak) Date: 2010-04-18.17:16:22
I'm not sure what to do about this problem.

If 

>>> import org.python.apache.xerces.parsers.SAXParser

succeeds, then the class should be loadable.

But an attempt to instantiate the class later fails with a java.lang.ClassNotFoundException: org.python.apache.xerces.parsers.SAXParser

I can only recommend that you make use of the fix you have already identified, i.e. explicitly set the value of _xerces_parser.

I don't see what benefit a try block around the XMLReaderFactory would bring?
msg6846 (view) Author: Alan Kennedy (amak) Date: 2012-03-19.19:56:30
Any further information on this issue? Is it still happening on the latest MATLAB JVM>
msg6862 (view) Author: Ken Watford (kwatford) Date: 2012-03-19.20:38:03
Just tested with 2.5.3b1 and MATLAB 2012a. Same behavior, same manual fix.
msg6934 (view) Author: Alan Kennedy (amak) Date: 2012-03-20.21:18:44
Checking MATLAB docs, I see that they're using OpenJDK. Can you let us know which version you're using?

The name mangling thing is because some JVMs include their own copy of Xerces, which conflicts with jython, whose code is written against a specific version of Xerces.

The name mangling does indeed intentionally take place during the jython build process.
msg6935 (view) Author: Alan Kennedy (amak) Date: 2012-03-20.21:21:12
Also, can you change the import in expat.py to read as follows, so if we can see if the try ... except ImportError is failing.

try:
    raise ImportError # <--- add this
    # Name mangled by jarjar?
    import org.python.apache.xerces.parsers.SAXParser
    _xerces_parser = "org.python.apache.xerces.parsers.SAXParser"
except ImportError:
    _xerces_parser = "org.apache.xerces.parsers.SAXParser"
msg6940 (view) Author: Ken Watford (kwatford) Date: 2012-03-20.22:34:40
Java version on the machine I'm currently on is:

Java 1.6.0_29-b11-402-11M3527 with Apple Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode

I added the unconditional ImportError and it was, as one would expect, caught, resulting in the non-mangled name being used and everything working.

I also tried instantiating the SAXParser via the org.python...SAXParser name. The import is clearly working as I'm able to do this from the Jython console without incident. I can also instantiate it via that name from the MATLAB console, so MATLAB also doesn't seem to mind it either.

My guess is that the XMLReaderFactory mentioned in the stack trace is using a different classloader. Jython's classes are being loaded by whatever classloader MATLAB is using. So it's likely that either MATLAB isn't making its classloader sufficiently "default" or XMLReaderFactory is picking a specific classloader that isn't correct in the case.

Here's my new fix. It goes in __init__.

try:
    self._reader = XMLReaderFactory.createXMLReader(_xerces_parser)
except:
    self._reader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser")

I attempted to make the except block only catch the specific exception (a SAXException containing a ClassNotFoundException) but I was unable to catch it if I tried to do any class matching. I'm uncertain as to why, but I would guess it's more classloader issues. I was able to get the ClassNotFoundException's stacktrace, though, and I've attached it.
msg6941 (view) Author: Alan Kennedy (amak) Date: 2012-03-20.22:45:10
> I added the unconditional ImportError and it was, as one would expect, 
> caught, resulting in the non-mangled name being used and everything working.

AS I thought.

Let's focus on trying to get the import working correctly.

How about this?

#######################

import java

try:
    # Name mangled by jarjar?
    import org.python.apache.xerces.parsers.SAXParser
    _xerces_parser = "org.python.apache.xerces.parsers.SAXParser"
except ImportError:
    _xerces_parser = "org.apache.xerces.parsers.SAXParser"
except java.lang.ClassDefNotFoundException:
    _xerces_parser = "org.apache.xerces.parsers.SAXParser"

#######################

We can tidy it up later if it works.
msg6942 (view) Author: Ken Watford (kwatford) Date: 2012-03-20.22:53:47
Perhaps my explanation wasn't clear.

There was no error or exception to catch at that time (other than the artificial one) because the class loaded and imported correctly. As far as Jython is concerned, there is no problem with that class.

The exception occurs at a later time because a classloader external to Jython can't load it. 

The problem could be detected at import time, but only by attempting to use the XMLReaderFactory and catching the exception that it produces.
msg6943 (view) Author: Alan Kennedy (amak) Date: 2012-03-20.22:59:17
Sorry, I wasn't clear myself. I understand that the XMLReader seems to be using a different classloader.

As for your solution

try:
    self._reader = XMLReaderFactory.createXMLReader(_xerces_parser)
except:
    self._reader = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser")

I'm glad that it works. However, I'd prefer not to use a bare except clause.

I need to think about the appropriate solution to this.

Thanks for taking the time to dig into it.
msg6982 (view) Author: Alan Kennedy (amak) Date: 2012-03-29.20:03:06
Given the classloader issues, I think your solution is a good one.

I'd prefer not to use the bare except clause. But if no other solution presents itself, I will commit your most recent fix as-is in the next few days.
msg7012 (view) Author: Alan Kennedy (amak) Date: 2012-04-01.17:35:17
Fix checked in at

2.5:  http://hg.python.org/jython/rev/a972112ac1b1
head: http://hg.python.org/jython/rev/491a9451d21d

Thanks to Kevin Watford for the analysis and the fix.
History
Date User Action Args
2012-04-01 17:35:17amaksetstatus: open -> closed
resolution: fixed
messages: + msg7012
2012-03-29 20:03:06amaksetmessages: + msg6982
2012-03-20 22:59:17amaksetmessages: + msg6943
2012-03-20 22:53:47kwatfordsetmessages: + msg6942
2012-03-20 22:45:10amaksetmessages: + msg6941
2012-03-20 22:34:41kwatfordsetfiles: + stacktrace.txt
messages: + msg6940
2012-03-20 21:21:12amaksetmessages: + msg6935
2012-03-20 21:18:44amaksetmessages: + msg6934
2012-03-19 20:38:03kwatfordsetmessages: + msg6862
2012-03-19 19:56:30amaksetassignee: amak
messages: + msg6846
2010-04-18 17:16:23amaksetmessages: + msg5736
2010-04-18 17:12:38amaksetmessages: + msg5735
2010-04-16 18:01:27kwatfordsetmessages: + msg5724
2010-04-16 17:55:12amaksetmessages: + msg5723
2010-04-16 17:24:56kwatfordsetmessages: + msg5721
2010-04-16 16:52:16amaksetnosy: + amak
messages: + msg5720
2010-01-11 20:37:25kwatfordcreate