Message270
re.findall throws an exception when processing some strings.
Following is a code fragment which exhibits this.
The only difference between the two strings in the example, okinput and badinput,
is a space between the HTML tags in badinput. Calling findall on okinput works and
on badinput generates an exception.
When run from C python both strings are processed without exception.
- Steven
--- cut here ---
import re
def test():
imagepattern = '(?P<img><[ \t\n]*img[^>]*>)'
framepattern = '(?P<frame><[ \t\n]*frame[^>]*>)'
# embed directives that use a src= construction
extractSrcTags = '(' + imagepattern + '|' + framepattern + ')'
okinput = """<img src="foo bar"><frame src=baz>"""
badinput = """<img src="foo bar"> <frame src=baz>"""
print re.findall(extractSrcTags, okinput)
print re.findall(extractSrcTags, badinput)
--- cut here ---
Here is the first bit of the exception:
Traceback (innermost last):
File "<console>", line 1, in ?
File "E:\jakarta-tomcat\webapps\python\WEB-INF\source\bug.py", line 14, in tes
t
File "e:\jython-2.0\Lib\sre.py", line 59, in findall
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(Unknown Source)
at org.python.modules.sre.SRE_STATE.getslice(SRE_STATE.java:1128) |
|
Date |
User |
Action |
Args |
2008-02-20 17:16:48 | admin | link | issue229746 messages |
2008-02-20 17:16:48 | admin | create | |
|