Issue1483

classification
Title: optparse std module dies on non-ASCII unicode data
Type: behaviour Severity: normal
Components: Library Versions: 2.5.1
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: dvska, pjenvey
Priority: Keywords:

Created on 2009-10-01.16:50:56 by dvska, last changed 2010-04-11.17:37:48 by pjenvey.

Files
File name Uploaded Description Edit Remove
optparse_index_out_of_range_0_test.py dvska, 2009-10-01.16:50:55 tiny test
Messages
msg5215 (view) Author: dvska (dvska) Date: 2009-10-01.16:50:55
please run an attached file, result is

Traceback (most recent call last):
  File "optparse_index_out_of_range_0_test.py", line 19, in <module>
    parser.print_help()
  File "C:\jython25\Lib\optparse.py", line 1657, in print_help
    file.write(self.format_help().encode(encoding, "replace"))
  File "C:\jython25\Lib\optparse.py", line 1637, in format_help
    result.append(self.format_option_help(formatter))
  File "C:\jython25\Lib\optparse.py", line 1617, in format_option_help
    result.append(OptionContainer.format_option_help(self, formatter))
  File "C:\jython25\Lib\optparse.py", line 1066, in format_option_help
    result.append(formatter.format_option(option))
  File "C:\jython25\Lib\optparse.py", line 303, in format_option
    result.append("%*s%s\n" % (indent_first, "", help_lines[0]))
IndexError: index out of range: 0
msg5629 (view) Author: Philip Jenvey (pjenvey) Date: 2010-04-04.18:38:03
The problem here is textwrap.wrap doesn't handle the unicode input correctly:

Python 2.5.4 (r254:67916, Jul  7 2009, 23:51:24) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import textwrap
>>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f'
>>> textwrap.wrap(text, 54)
[u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f']

Jython 2.5.1+ (trunk:6995:6999M, Apr 4 2010, 11:01:22) 
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_17
Type "help", "copyright", "credits" or "license" for more information.
>>> import textwrap
>>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f'
>>> textwrap.wrap(text, 54)
[]

textwrap heavily relies on regular expressions, so I'm going to guess this is a bug in the re module in dealing with unicode input
msg5660 (view) Author: Philip Jenvey (pjenvey) Date: 2010-04-11.17:37:48
The problem was actually in the unicode.translate method. This was fixed in r7017, thanks
History
Date User Action Args
2010-04-11 17:37:49pjenveysetstatus: open -> closed
resolution: fixed
messages: + msg5660
2010-04-04 18:38:03pjenveysetnosy: + pjenvey
messages: + msg5629
2009-10-01 16:50:56dvskacreate