Issue222815

classification
Title: Bug in MatchObject.group(PyString)
Type: Severity: normal
Components: Library Versions:
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: bckfnn
Priority: low Keywords:

Created on 2000-11-18.19:13:24 by bckfnn, last changed 2000-12-11.18:49:25 by bckfnn.

Messages
msg68 (view) Author: Finn Bock (bckfnn) Date: 2000-11-18.19:13:24
There appears to be a bug either in MatchObject.java or in the regular
expression package (I've only tested using a distribution WITH
OROMatcher so far).  For more complex regular expressions, it appears
that referencing a group in a match object by its name can cause an
IndexError.  Following is a code snippet that demonstrates the
problem.  I haven't been able to determine why exactly Case 3 works
whereas Case 1 fails inappropriately.  Case 2 seems to fail
appropriately.

I have tested under all combinations of jdk 1.1/jdk 1.2 and jpython
1.1beta2/1.1beta3.  The code snippet seems to behave as expected in
CPython.

-------------------------begin-----------------------------------
import re

# Regular expressions used for parsing

_S = '[ \t\r\n]+'                       # white space
_opS = '[ \t\r\n]*'                     # optional white space
_Name = '[a-zA-Z_:][-a-zA-Z0-9._:]*'    # valid XML name
_QStr = "(?:'[^']*'|\"[^\"]*\")"        # quoted XML string

attrfind = re.compile(
    _S + '(?P<name>' + _Name + ')'
    '(' + _opS + '=' + _opS +
    '(?P<value>'+_QStr+'|[-a-zA-Z0-9.:+*%?!()_#=~]+))?')
starttagend = re.compile(_opS + '(?P<slash>/?)>')


## Case 1
starttagmatch = re.compile('<(?P<tagname>'+_Name+')'
                           '(?P<attrs>(?:'+attrfind.pattern+')*)'+
                           starttagend.pattern)
m = starttagmatch.match('<foo>')
try:
    print '|%s|' % m.group('slash')
except IndexError, e:
    print e


## Case 2
starttagmatch = re.compile('<(?P<tagname>'+_Name+')'
                           '(?P<attrs>(?:'+attrfind.pattern+')*)')
m = starttagmatch.match('<foo>')
try:
    print '|%s|' % m.group('slash')
except IndexError, e:
    print e


## Case 3
r = re.compile('<(?P<tagname>' + _Name + ')'
               + _opS + '(?P<slash>/?)>')
m = r.match('<foo>')
print '|%s|' % m.group('slash')
---------------------------end-----------------------------------

JPython output:
---------------
group 7 is undefined
group 'slash' is undefined
||

CPython output:
---------------
||
group 'slash' is undefined
||
msg69 (view) Author: Finn Bock (bckfnn) Date: 2000-11-19.16:25:53
This works when using the "sre" moduled instead of "re".
msg70 (view) Author: Finn Bock (bckfnn) Date: 2000-12-11.18:49:25
Closed when we switched to sre where the bug is solved.
History
Date User Action Args
2000-11-18 19:13:24bckfnncreate