Message270

Author sjprocter
Recipients
Date 2001-01-23.01:47:45
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
re.findall throws an exception when processing some strings.

Following is a code fragment which exhibits this.  

The only difference between the two strings in the example, okinput and badinput,
is a space between the HTML tags in badinput.  Calling findall on okinput works and
on badinput generates an exception.  

When run from C python both strings are processed without exception.

                 - Steven

--- cut here ---

import re

def test():
	imagepattern = '(?P<img><[ \t\n]*img[^>]*>)'
	framepattern = '(?P<frame><[ \t\n]*frame[^>]*>)'

	# embed directives that use a src= construction
	extractSrcTags = '(' + imagepattern + '|' + framepattern + ')'

	okinput = """<img src="foo bar"><frame src=baz>"""
	badinput = """<img src="foo bar"> <frame src=baz>"""
	
	print re.findall(extractSrcTags, okinput)
	print re.findall(extractSrcTags, badinput)

--- cut here ---

Here is the first bit of the exception:

Traceback (innermost last):
  File "<console>", line 1, in ?
  File "E:\jakarta-tomcat\webapps\python\WEB-INF\source\bug.py", line 14, in tes
t
  File "e:\jython-2.0\Lib\sre.py", line 59, in findall
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(Unknown Source)
        at org.python.modules.sre.SRE_STATE.getslice(SRE_STATE.java:1128)
History
Date User Action Args
2008-02-20 17:16:48adminlinkissue229746 messages
2008-02-20 17:16:48admincreate