Issue2848

classification
Title: Inconsistent results under concurrency in org.w3c.dom
Type: behaviour Severity: normal
Components: Core Versions:
Milestone: Jython 2.7.2
process
Status: closed Resolution: invalid
Dependencies: Superseder:
Assigned To: Nosy List: gbach, jeff.allen
Priority: normal Keywords:

Created on 2019-12-14.14:38:21 by jeff.allen, last changed 2020-02-01.09:17:37 by jeff.allen.

Files
File name Uploaded Description Edit Remove
issue2848.tar.gz jeff.allen, 2019-12-14.17:54:30
Messages
msg12836 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-14.14:38:21
Raised by Gunter (gbach) as possible recurrence of #2487, but I think it is something else. See details there initially to reproduce. (We should transfer the key information here.)
msg12840 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-14.17:54:30
Adding the file from #2487 (name changed only) that should reproduce the error. Working on Windows, I have so far seen it once with my own variant of the test,
quite early in the run:

PS issue2848> inst\bin\jython test2848.py
17:34:26 GMT
17:34:27 GMT
Unhandled exception in thread started by <function do_something at 0x2>
Traceback (most recent call last):
  File "test2848.py", line 52, in do_something
    class_name = attr.getNamedItem("class").getValue()
17:34:27 GMT

--------------------------------------------------------------------
The relevant text from Gunter on #2487 runs as follows:

"""
I made a short script (test.py) that is able to reproduce this error.
Attached the needed files.
In test.py are 2 links to get the needed jar-files (I reproduced the error with the help of an older version of SAXParser - that is the context where the error occurs in my production system)
I created also two shell-scripts to start the tests.
You must adapt the paths but this should be straight forward.

...

test272b2 yields another one "AttributeError: 'NoneType' object has no attribute 'getValue'" (but maybe the root-cause is the old issue) after several hours.
Interestingly enough the error either occurs "en masse" or does not at all.
Also if you do not send stdout somewhere else (e.g. /dev/null(), the error seems not to occur in this context.
"""
msg12842 (view) Author: Jeff Allen (jeff.allen) Date: 2019-12-14.20:54:57
I haven't seen the NoneType error again, but I've seen several NPEs like this:

Unhandled exception in thread started by <function do_something at 0x2>
Traceback (most recent call last):
  File "test2848.py", line 49, in do_something
    for idx in range(els.getLength()):
        at org.apache.xerces.dom.DeepNodeListImpl.nextMatchingElementAfter(Unknown Source)
        at org.apache.xerces.dom.DeepNodeListImpl.item(Unknown Source)
        at org.apache.xerces.dom.DeepNodeListImpl.getLength(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
java.lang.NullPointerException: java.lang.NullPointerException

Notice this is deep inside Xerces, navigating internal data structures. Research turns up that the Xerces DOM implementation is thread unsafe. You might think reading the parse result would be ok, but the DOM is lazily constructed as clients navigate it.

I suggest this unsafe use of Xerces is the explanation for the observations and that Jython is in the clear on this one.
History
Date User Action Args
2020-02-01 09:17:37jeff.allensetstatus: pending -> closed
2019-12-14 20:54:57jeff.allensetstatus: open -> pending
resolution: accepted -> invalid
messages: + msg12842
2019-12-14 17:54:30jeff.allensetpriority: high -> normal
files: + issue2848.tar.gz
resolution: accepted
messages: + msg12840
2019-12-14 14:38:21jeff.allencreate