Message5629

Author pjenvey
Recipients dvska, pjenvey
Date 2010-04-04.18:38:03
SpamBayes Score 4.92119e-10
Marked as misclassified No
Message-id <1270406283.85.0.801758770522.issue1483@psf.upfronthosting.co.za>
In-reply-to
Content
The problem here is textwrap.wrap doesn't handle the unicode input correctly:

Python 2.5.4 (r254:67916, Jul  7 2009, 23:51:24) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import textwrap
>>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f'
>>> textwrap.wrap(text, 54)
[u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f']

Jython 2.5.1+ (trunk:6995:6999M, Apr 4 2010, 11:01:22) 
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_17
Type "help", "copyright", "credits" or "license" for more information.
>>> import textwrap
>>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f'
>>> textwrap.wrap(text, 54)
[]

textwrap heavily relies on regular expressions, so I'm going to guess this is a bug in the re module in dealing with unicode input
History
Date User Action Args
2010-04-04 18:38:03pjenveysetmessageid: <1270406283.85.0.801758770522.issue1483@psf.upfronthosting.co.za>
2010-04-04 18:38:03pjenveysetrecipients: + pjenvey, dvska
2010-04-04 18:38:03pjenveylinkissue1483 messages
2010-04-04 18:38:03pjenveycreate