Issue2392

classification
Title: Intermittent errors from sre_compile.py - ValueError('unsupported operand type', 'subpattern')
Type: Severity: normal
Components: Library Versions: Jython 2.7
Milestone: Jython 2.7.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: zyasoft Nosy List: asalabaev, fwierzbicki, helix84, public.marvin, zyasoft
Priority: urgent Keywords:

Created on 2015-09-09.20:18:28 by public.marvin, last changed 2015-12-28.17:34:33 by zyasoft.

Messages
msg10239 (view) Author: marvin greenberg (public.marvin) Date: 2015-09-09.20:18:27
This error occurs intermittently.  We are using some existing python modules from under java using jython.  When we switched from 2.5.2 to 2.7, these errors starting occurring intermittently.  The errors do not seem to be related to the pattern being compiled.  Identical executions may run successfully several times but then fail.  Also have seen 'unsupported operand type','in'.
msg10274 (view) Author: Jim Baker (zyasoft) Date: 2015-09-22.17:50:01
Very odd, I know the regrtest does a fair amount of re compilation and we haven't seen these issues. Perhaps a stress test is in order?
msg10275 (view) Author: Jim Baker (zyasoft) Date: 2015-09-22.18:24:19
There's a minor difference between CPython 2.7's sre_compile.py and our patched version:

$ diff Lib/sre_compile.py lib-python/2.7/sre_compile.py
14c14
<
---
> import sre_parse
15a16
> from _sre import MAXREPEAT
27,32d27
< def set(seq):
<     s = {}
<     for elem in seq:
<         s[elem] = 1
<     return s
<
308,310d302
<     # problems with optimization in Jython, forget about it for now
<     return charset
<
353,355c345
<         # change this for Jython from 'I', since that will expand to
<         # long, and cause needless complexity (or so it seems)
<         code = 'i'
---
>         code = 'I'
510d499
<         import sre_parse

The latter change in the diff is not even used - optimization as a whole for Unicode is skipped. I don't see this as an issue.

Because re contains a cache of compile REs, it's possible there is a chance of reentrancy issues. However my simple stress test running 20 threads for 15 minutes didn't see this issue. We have the same re.py as in CPython 2.7, but there's a possibility that we need to update.

Maybe you can explicitly list REs that this problem arises on? Right now, it works for me.
msg10373 (view) Author: helix84 (helix84) Date: 2015-10-23.08:38:47
I also got this error. I just upgraded from jython 2.5.2 to 2.7.0. The script ran successfully (with the same input) the first time, but threw this error after the servlet container was restarted.

  File "...", line 15, in doPost
    pattern = re.compile(r"^[0-9Xx]{8}$")
...
  File "/.../WEB-INF/lib/Lib/sre_compile.py", line 182, in _compile
    raise ValueError, ("unsupported operand type", op)
ValueError: ('unsupported operand type', 'at')


Debian Jessie
Tomcat 8.0.14-1
OpenJDK 7u79-2.5.6-1~deb8u1
msg10374 (view) Author: helix84 (helix84) Date: 2015-10-23.08:53:52
Wow, my script was missing "import re". After I restarted the servlet container again, I got "NameError: global name 're' is not defined", as I should. This script works (always has) with 2.5.2 and once with 2.7.0, as I wrote.
msg10401 (view) Author: Jim Baker (zyasoft) Date: 2015-10-29.02:09:30
helix84, to clarify: you still get this problem, correct?
msg10402 (view) Author: helix84 (helix84) Date: 2015-10-29.08:57:16
Sorry for being unclear.

pattern = re.compile(r"^[0-9Xx]{8}$")

2.5.2 without import re: works (!)
2.5.2 with import re: works
2.7.0 without import re: intermittent "ValueError: ('unsupported operand type', 'at')"
2.7.0 with import re: works
msg10403 (view) Author: helix84 (helix84) Date: 2015-10-29.09:03:18
In case it helps, here are the complete imports of this script:

import sys
import httplib
import re
from javax.servlet.http import HttpServlet

Is it possible that "import re" somehow "leaks in" from some other part of jython / libs?
msg10411 (view) Author: Jim Baker (zyasoft) Date: 2015-10-29.22:38:45
I'm thinking this is a PySystemState lifecycle issue, per #2321. I'm glad that correct code works correctly however.
msg10415 (view) Author: marvin greenberg (public.marvin) Date: 2015-10-30.02:11:21
We were correctly importing re module, and were seeing intermittent failures.  Or code never omitted the 'import re' (which would be a strange thing to do)
msg10420 (view) Author: Jim Baker (zyasoft) Date: 2015-10-30.21:32:28
Marvin, I need a small test case that reproduces this problem.
msg10435 (view) Author: helix84 (helix84) Date: 2015-11-06.09:16:33
Sorry, "import re" turned out to be a red herring (the fact that in 2.5.2  and even in 2.7.0 calling re without importing it works when it shouldn't remains, but it seems to be a separate issue).

This problem just occurred again. After tomcat restart, first call of the script worked, later calls didn't.

To make the script work again, it was actually enough to save the .py file again unchanged (that's why adding "import re" seemingly helped the first time - I saved the file).

I'm using standard stuff in web.xml:
        <servlet>
                <servlet-name>PyServlet</servlet-name>
                <servlet-class>org.python.util.PyServlet</servlet-class>
                <load-on-startup>1</load-on-startup>
        </servlet>
        <servlet-mapping>
                <servlet-name>PyServlet</servlet-name>
                <url-pattern>*.py</url-pattern>
        </servlet-mapping>
msg10438 (view) Author: Artem (asalabaev) Date: 2015-11-07.20:31:06
We are seeing it on 2.5.3 very occasionally but it happens from time to time.
msg10439 (view) Author: Artem (asalabaev) Date: 2015-11-07.20:36:00
The code:
startservermatch = re.match('\[(.*)\]', startserverstring)


  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/re.py", line 137, in match
    return _compile(pattern, flags).match(string)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/re.py", line 239, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/re.py", line 239, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/sre_compile.py", line 516, in compile
    code = _code(p, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/sre_compile.py", line 500, in _code
    _compile(code, p.data, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/sre_compile.py", line 182, in _compile
    raise ValueError, ("unsupported operand type", op)
ValueError: ('unsupported operand type', 'subpattern')

On the other site I found http://grinder.sourceforge.net/faq.html#re-problems which suggests import re everywhere but it obviously does not help
msg10442 (view) Author: Artem (asalabaev) Date: 2015-11-09.13:02:26
And (sorry for spamming) today the error occurred from another python code:

  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/urllib.py", line 89, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/urllib.py", line 222, in retrieve
    fp = self.open(url, data)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/urllib.py", line 190, in open
    return getattr(self, name)(url)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/urllib.py", line 190, in open
    return getattr(self, name)(url)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/urllib.py", line 278, in open_http
    user_passwd, host = splituser(host)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/urllib.py", line 1047, in splituser
    _userprog = re.compile('^(.*)@(.*)$')
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/re.py", line 188, in compile
    return _compile(pattern, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/re.py", line 239, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/re.py", line 239, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/sre_compile.py", line 516, in compile
    code = _code(p, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/sre_compile.py", line 500, in _code
    _compile(code, p.data, flags)
  File "/opt/IBM/WebSphere/AppServer/profiles/AppSrv02/jython2.5.3/Lib/sre_compile.py", line 182, in _compile
    raise ValueError, ("unsupported operand type", op)
ValueError: ('unsupported operand type', 'at')

Hope this helps. The jython is a so great tool, there's one thing that worries. I wonder if it is possible to rewrite the re with java regex package and if it can be more stable. Does it make any sense? I can offer my help in whatever you find useful to resolve this annoying issue
msg10443 (view) Author: Jim Baker (zyasoft) Date: 2015-11-09.17:48:05
Artem, this is definitely not spamming - it's extremely helpful in pinning down the issue. So we know the following:

* This problem happens on both 2.5.3 and 2.7.0. So this rules out the changes in PySystemState lifecycle to prevent non GC-able class loaders (#2321) as being the root cause. (But it is still possible that this change makes the problem surface more frequently.)

* We see this in occurring in simple regular expressions.

* Reloads of class code trigger this problem. It is possibly related to the PyServlet implementation.

* The advice on the Grinder docs (http://grinder.sourceforge.net/faq.html#re-problems) is very interesting to me. It is certainly unusual to use imports like this at a thread level. I even discuss this in the Jython book as a significant source of performance degradation due to Python's module import lock. (http://www.jython.org/jythonbook/en/1.0/Concurrency.html#module-import-lock) Is it possible that the `import re` ensures a strong reference to the SRE engine, and how it interacts with servlet (and thus class) reloading?

Jython users can certainly use Java's built-in regex support (http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html), but it's not possible for us to use as the basis of Jython's re engine, because of somewhat different semantics. There is an outstanding bug (#2408) to redo the SRE port, but purely for performance issues.
msg10444 (view) Author: Jim Baker (zyasoft) Date: 2015-11-09.21:50:24
Another thought: maybe acquiring the module import lock is masking a multithreaded issue?

Let's look at what happens with `import re`:

re.py -> sre_compile.py -> _sre.java, sre_constants.py (which also imports from _sre.java for MAXREPEAT). Except for _sre.java and its supporting Java classes, the remainder are pure Python and these Python modules in all cases except what I mentioned in msg10275, identical source with CPython.

Some things that I note:

* The specific constants that are missing (as reported by ValueError("unsupported operand type", op)) are not seen in the failed compilations in the _compile function in sre_compile.py are defined in pure Python: sre_constants.AT (= "at"), sre_constants.IN (= "in")

* sre_compile._compile function does a conditional import of sre_parse.py.

* constants like sre_constants.IN, etc, are compared using identity with the `is` operator.

So this begs the question: is it possible that a parse of a regex pattern occurs with one set of constants from sre_constants.py, using one version of sre_parse.py; unloading of sre_constants.py occurs; and the compilation is attempted with a reloaded version of sre_constants.py?

This possible interleaving seems to be crazy, but could explain why the module import lock acquistion can help, as seen in The Grinder docs.

One possible way to test whether this is occurring as I suggest is to change lines like the following in sre_compile._compile from, for example:

        elif op is AT:

to

        elif op == AT:

etc.

Users who want to try this out can make the suggested code changes in a test environment and see if this in any way helps.
msg10446 (view) Author: Artem (asalabaev) Date: 2015-11-10.05:44:37
Jim, 
That was quite fast..

I followed your suggestion and modified the sre_compile code like as follows (a few cases shown only):

        elif op is IN or op == IN:
            print "op is IN or op == IN %s %s" % (op is IN, op == IN)


        elif op is ANY or op == ANY:
            print "op is ANY or op == ANY %s %s " %(op is ANY, op == ANY) 


        elif op is SUBPATTERN or op == SUBPATTERN:
            print "op is SUBPATTERN or op == SUBPATTERN %s %s " % (op is SUBPATTERN, op == SUBPATTERN)


        elif op is AT or op == AT:
            print "op is AT or op == AT %s %s" % (op is AT,  op == AT) 


At the first test I got in the system out:

[11/10/15 8:22:51:307 MSK] 00000033 SystemOut     O op is BRANCH or op == BRANCH True True
[11/10/15 8:22:51:307 MSK] 00000033 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:51:307 MSK] 00000033 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:51:307 MSK] 00000033 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:51:308 MSK] 00000033 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:51:310 MSK] 00000033 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:51:310 MSK] 00000033 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:51:311 MSK] 00000033 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:51:313 MSK] 00000033 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:53:456 MSK] 00000033 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:53:457 MSK] 00000033 SystemOut     O op is ANY or op == ANY True True
[11/10/15 8:22:53:741 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:53:741 MSK] 00000036 SystemOut     O op is BRANCH or op == BRANCH True True
[11/10/15 8:22:53:741 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:53:742 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:53:742 MSK] 00000036 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:53:742 MSK] 00000036 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:53:744 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:53:744 MSK] 00000036 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:53:744 MSK] 00000036 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:53:746 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:54:094 MSK] 00000032 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:55:458 MSK] 00000032 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:55:458 MSK] 00000032 SystemOut     O op is AT or op == AT True True
[11/10/15 8:22:55:460 MSK] 00000032 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:55:460 MSK] 00000032 SystemOut     O op is ANY or op == ANY True True
[11/10/15 8:22:55:461 MSK] 00000032 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:55:461 MSK] 00000032 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:55:461 MSK] 00000032 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:55:462 MSK] 00000032 SystemOut     O op is IN or op == IN True True
[11/10/15 8:22:55:462 MSK] 00000032 SystemOut     O op is SUBPATTERN or op == SUBPATTERN True True
[11/10/15 8:22:56:721 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN False True
[11/10/15 8:22:56:721 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN False True
[11/10/15 8:22:56:721 MSK] 00000036 SystemOut     O op is ANY or op == ANY False True
[11/10/15 8:22:57:038 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN False True
[11/10/15 8:22:57:038 MSK] 00000036 SystemOut     O op is SUBPATTERN or op == SUBPATTERN False True
[11/10/15 8:22:57:039 MSK] 00000036 SystemOut     O op is ANY or op == ANY False True

so all of a sudden  "op is SUBPATTERN" and "op is ANY" (actually everything) started evaluating to False, while still being  ==. this is a full log I got at that time. Those log lines could come from different threads, so then I added

import threading 

before the def _compile(

and the threading.currentThread() in the the printed output and as you can imagine, the issue since then did not reproduce. I am happy to spend more time attempting getting a "False" if you think it might be helpful to you.

I tried searching the difference between is and == and could not find anything yet, could you provide a hint where I can educate myself?

and the more important questions:
- Is this solution is robust enough? 
- if yes,  should / can I use the modified the sre_compile version? The issue I am having is that the I am limited with the java version enforced by websphere so I cannot switch to the 2.7, any chance it will be fixed in 2.5.X ?

Thank you very much anyways!
msg10447 (view) Author: Artem (asalabaev) Date: 2015-11-10.06:20:56
Sorry, found the answer myself in http://eev.ee/blog/2012/03/24/python-faq-equality/

So I am guessing the original sre_compile code compared the PyString object references since they are supposed to be static constants for the sake of performance rather than the their values? Your idea about multi threading with classes reloads seems very correct. In my particular test I do reloads of imports (literally reload(__import__(module) ). In our implementation of Jython we automatically reload classes as soon as the jython code files change (pretty much like JSPs) so it all looks like you are just right about the real cause of an issue in at least our environment.
msg10449 (view) Author: Jim Baker (zyasoft) Date: 2015-11-10.16:25:40
Artem, I'm very glad you could conduct that test! This is very helpful.

Using `is` to compare against singletons is a standard example of when identity comparisons are appropriate in Python (or other languages). But as we see, sre_constants.AT, etc, are not singletons when we consider module reloading.

The minor possible speedup of comparing references in regular expression compilation instead of values is a good example of premature optimization.

I'm not certain about being able to backport 2.7.x changes to 2.5.x There is an stalled release candidate process for 2.5.4 that has been that way for a number of years. It is possible that this bug might be worth fixing in 2.5.4 because of its simplicity: we know that value equality should always work in this compilation; and now we know that identity comparison does not. Also I don't believe there has been any changes in this particular code since 2.5. And by fixing this bug, we might get over the hump of getting 2.5.4 really out.
msg10489 (view) Author: Jim Baker (zyasoft) Date: 2015-11-21.01:19:47
Let's get this into 2.7.1 as we close it up
msg10493 (view) Author: Jim Baker (zyasoft) Date: 2015-11-21.16:59:57
Fixed as of https://hg.python.org/jython/rev/f048badad7f7

Goes nicely with the other re fixes that just landed
msg10494 (view) Author: helix84 (helix84) Date: 2015-11-23.09:56:53
Thank you for the fix. I'd love to test it before release, but I still don't know how to reliably reproduce the problem. I tried:
import re
reload(__import__('re'))
msg10538 (view) Author: Jim Baker (zyasoft) Date: 2015-12-15.03:42:59
@helix84 - given this is almost certainly a classloader issue, it is certainly going to be subtle to reproduce outside of deploying and testing in a similar environment. However, the changes I made are certainly safe in that they preserve the desired equality. Most likely we will see real testing as part of the planned 2.7.1 beta 3.
History
Date User Action Args
2015-12-28 17:34:33zyasoftsetstatus: pending -> closed
2015-12-15 03:43:00zyasoftsetmessages: + msg10538
2015-11-23 09:56:54helix84setmessages: + msg10494
2015-11-21 16:59:57zyasoftsetstatus: open -> pending
resolution: accepted -> fixed
messages: + msg10493
2015-11-21 01:19:48zyasoftsetpriority: urgent
messages: + msg10489
2015-11-12 04:47:01zyasoftsetassignee: zyasoft
milestone: Jython 2.7.1
2015-11-10 16:25:41zyasoftsetnosy: + fwierzbicki
messages: + msg10449
2015-11-10 06:20:57asalabaevsetmessages: + msg10447
2015-11-10 05:44:39asalabaevsetmessages: + msg10446
2015-11-09 21:50:24zyasoftsetmessages: + msg10444
2015-11-09 17:48:23zyasoftsetresolution: works for me -> accepted
2015-11-09 17:48:06zyasoftsetmessages: + msg10443
2015-11-09 13:02:28asalabaevsetmessages: + msg10442
2015-11-07 20:36:00asalabaevsetmessages: + msg10439
2015-11-07 20:31:07asalabaevsetnosy: + asalabaev
messages: + msg10438
2015-11-06 09:16:35helix84setmessages: + msg10435
2015-10-30 21:32:28zyasoftsetmessages: + msg10420
2015-10-30 02:11:22public.marvinsetmessages: + msg10415
2015-10-29 22:38:46zyasoftsetmessages: + msg10411
2015-10-29 09:03:18helix84setmessages: + msg10403
2015-10-29 08:57:16helix84setmessages: + msg10402
2015-10-29 02:09:30zyasoftsetmessages: + msg10401
2015-10-23 08:53:52helix84setmessages: + msg10374
2015-10-23 08:38:48helix84setnosy: + helix84
messages: + msg10373
2015-09-22 18:24:20zyasoftsetresolution: works for me
messages: + msg10275
2015-09-22 17:50:01zyasoftsetnosy: + zyasoft
messages: + msg10274
2015-09-09 20:18:28public.marvincreate