Message5629

Author	pjenvey
Recipients	dvska, pjenvey
Date	2010-04-04.18:38:03
SpamBayes Score	4.921188e-10
Marked as misclassified	No
Message-id	<1270406283.85.0.801758770522.issue1483@psf.upfronthosting.co.za>
In-reply-to

Content
The problem here is textwrap.wrap doesn't handle the unicode input correctly: Python 2.5.4 (r254:67916, Jul 7 2009, 23:51:24) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import textwrap >>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f' >>> textwrap.wrap(text, 54) [u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f'] Jython 2.5.1+ (trunk:6995:6999M, Apr 4 2010, 11:01:22) [Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_17 Type "help", "copyright", "credits" or "license" for more information. >>> import textwrap >>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f' >>> textwrap.wrap(text, 54) [] textwrap heavily relies on regular expressions, so I'm going to guess this is a bug in the re module in dealing with unicode input

The problem here is textwrap.wrap doesn't handle the unicode input correctly:

Python 2.5.4 (r254:67916, Jul  7 2009, 23:51:24) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import textwrap
>>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f'
>>> textwrap.wrap(text, 54)
[u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f']

Jython 2.5.1+ (trunk:6995:6999M, Apr 4 2010, 11:01:22) 
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_17
Type "help", "copyright", "credits" or "license" for more information.
>>> import textwrap
>>> text = u'\u0443\u043a\u0430\u0437\u0430\u0442\u044c \u043f\u0440\u0438\u0447\u0438\u043d\u0443 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u044f'
>>> textwrap.wrap(text, 54)
[]

textwrap heavily relies on regular expressions, so I'm going to guess this is a bug in the re module in dealing with unicode input

History
Date	User	Action	Args
2010-04-04 18:38:03	pjenvey	set	messageid: <1270406283.85.0.801758770522.issue1483@psf.upfronthosting.co.za>
2010-04-04 18:38:03	pjenvey	set	recipients: + pjenvey, dvska
2010-04-04 18:38:03	pjenvey	link	issue1483 messages
2010-04-04 18:38:03	pjenvey	create