Issue2028

classification

Title:	unicode lacks a _formatter_parser()
Type:	behaviour	Severity:	normal
Components:	Library	Versions:	Jython 2.7
		Milestone:

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	jeff.allen	Nosy List:	Arfrever, fwierzbicki, jeff.allen, santa4nt, seletz, zyasoft
Priority:	normal	Keywords:

Created on 2013-03-23.11:39:12 by seletz, last changed 2014-06-29.15:44:18 by jeff.allen.

Messages
msg7964 (view)	Author: Stefan Eletzhofer (seletz)	Date: 2013-03-23.11:43:27
The parse() function in the string.py module (line 620) tries to return a _formatter_parser from a unicode object. The _formatter_parser() is available on str, but not on unicode in the latest jython.
msg7969 (view)	Author: Frank Wierzbicki (fwierzbicki)	Date: 2013-03-23.20:03:23
Could you provide a small example that shows the bug that results, or the motivating use case? It would be even better if there is a test in the standard lib that is failing or skipped because of this lack if you happen to know of one. That way we can know when we've properly fixed it.
msg7979 (view)	Author: Stefan Eletzhofer (seletz)	Date: 2013-03-25.07:56:31
I just looked at the "string.py" module in Lib -- there's a comment about this (line 533ff):: ######################################################################## # the Formatter class # see PEP 3101 for details and purpose of this class # The hard parts are reused from the C implementation. They're exposed as "_" # prefixed methods of str and unicode. # The overall parser is implemented in str._formatter_parser. # The field name parser is implemented in str._formatter_field_name_split And I looked at the output of dir("foo") and dir(u"foo"). The former does indeed have a _formatter_parser() and _formatter_field_name_split() method. Where are the tests located in a checked-out Jython tree?
msg7980 (view)	Author: Stefan Eletzhofer (seletz)	Date: 2013-03-25.08:01:06
Ah, the use case -- I tried to run IPython (master) using the latest Jython. IPython seems to use this method somehow to parse/create its fancy prompts for its REPL. They contain colour escapes and are unicode strings. I hacked around this issue by force-coercing to string -- this is surely not the correct way, tho. --- string.py.orig 2013-03-25 08:59:17.000000000 +0100 +++ string.py 2013-03-25 08:59:29.000000000 +0100 @@ -618,7 +618,7 @@ # if field_name is not None, it is looked up, formatted # with format_spec and conversion and then used def parse(self, format_string): - return format_string._formatter_parser() + return str(format_string)._formatter_parser() # given a field_name, find the object it references.
msg8301 (view)	Author: Arfrever Frehtes Taifersar Arahesis (Arfrever)	Date: 2014-04-23.07:52:55
Example in CPython 2.7: >>> import string >>> string.Formatter().format(u"{a}", a=u"b") u'b' >>> Example in Jython 2.7: >>> import string >>> string.Formatter().format(u"{a}", a=u"b") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/share/jython-2.7/Lib/string.py", line 545, in format return self.vformat(format_string, args, kwargs) File "/usr/share/jython-2.7/Lib/string.py", line 549, in vformat result = self._vformat(format_string, args, kwargs, used_args, 2) File "/usr/share/jython-2.7/Lib/string.py", line 557, in _vformat for literal_text, field_name, format_spec, conversion in \ File "/usr/share/jython-2.7/Lib/string.py", line 621, in parse return format_string._formatter_parser() AttributeError: 'unicode' object has no attribute '_formatter_parser' >>>
msg8534 (view)	Author: Jim Baker (zyasoft)	Date: 2014-05-22.01:11:12
Target beta 4, especially given the ipython dependency Adding Jeff to the nosy list, he probably has some insight here.
msg8574 (view)	Author: Jeff Allen (jeff.allen)	Date: 2014-05-24.11:59:42
It's not quite as simple as exposing PyString._formatter_parser in PyUnicode too. That would continue to produce byte strings. The formatting in Jython supports Unicode here: >>> x = 3405691582 >>> format(x, "#x") '0xcafebabe' >>> format(x, u"#x") u'0xcafebabe' So the apparatus is there in the elementary formatters. The thing that would drive it actually to produce Unicode answers from string.Formatter(), is for(a newly exposed) unicode._formatter_parser to tell (sub-class?) its stringlib.MarkupIterator to make PyUnicode elements, instead of PyString elements. Compare CPython: >>> fs = "{:#x}" >>> fp = fs._formatter_parser() >>> list(fp) [('', '', '#x', None)] >>> fu = u"{:#x}" >>> fp = fu._formatter_parser() >>> list(fp) [(u'', u'', u'#x', None)] I think unicode elements here in Jython too, would probably do the trick. My first idea is a PyObject asElement(String s) to return either PyString or PyUnicode, and used here: http://hg.python.org/jython/file/5688f9c2b743/src/org/python/core/stringlib/MarkupIterator.java#l62 Maybe absorb the ugly conditional expressions too, if the logic allows.
msg8827 (view)	Author: Jeff Allen (jeff.allen)	Date: 2014-06-27.13:03:33
I'm taking my usual approach of inspecting and commenting what we've got, ahead of any change. I notice we are also divergent in our handling of field numbering. CPython: >>> list("hello {!r:#12x} and {!s:8.3f} world!"._formatter_parser()) [('hello ', '', '#12x', 'r'), (' and ', '', '8.3f', 's'), (' world!', None, None, None)] >>> Jython: >>> list("hello {!r:#12x} and {!s:8.3f} world!"._formatter_parser()) [('hello ', '0', '#12x', 'r'), (' and ', '1', '8.3f', 's'), (' world!', None, None, None)] >>> Not the cause of this bug, I know.
msg8848 (view)	Author: Jeff Allen (jeff.allen)	Date: 2014-06-29.15:44:17
This is now fixed in http://hg.python.org/jython/rev/6cffc2f6a643 (I claim). unicode._formatter_field_name_split is needed as well. Jython 2.7b3+ (default:5efdcedc9817+6cffc2f6a643+, Jun 29 2014, 15:46:14) [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60 Type "help", "copyright", "credits" or "license" for more information. >>> list(u"hello {:{0}d} and {}"._formatter_parser()) [(u'hello ', u'0', u'{0}d', None), (u' and ', u'1', u'', None)] >>> first, rest = u"a.b[2]"._formatter_field_name_split() >>> first, list(rest) (u'a', [(True, u'b'), (False, 2)]) >>> import string >>> string.Formatter().format(u"{} {:>12d} {:{width}.{prec}f}", 10, 20, 30, width=8, prec=3) u'10 *********20 30.000' >>> string.Formatter().format(u"{0}", 10) '10' >>> The last one is interesting because the problem is in string.py (it produces a str). It happens only if the format contains no literal text at all (just replacement fields) and no unicode arguments: >>> string.Formatter().format(u"{a}", a=u"b") u'b' >>> string.Formatter().format(u"{a}", a="b") 'b' This appears to be a "won't fix" for CPython. See http://bugs.python.org/issue15951 .

History
Date	User	Action	Args
2014-06-29 15:44:18	jeff.allen	set	status: open -> closed resolution: accepted -> fixed messages: + msg8848
2014-06-27 13:03:33	jeff.allen	set	messages: + msg8827
2014-06-16 21:12:02	jeff.allen	set	assignee: jeff.allen
2014-05-24 11:59:43	jeff.allen	set	messages: + msg8574
2014-05-22 01:11:13	zyasoft	set	assignee: fwierzbicki -> (no value) messages: + msg8534 nosy: + jeff.allen, zyasoft
2014-04-23 17:54:35	santa4nt	set	nosy: + santa4nt type: behaviour
2014-04-23 15:03:23	zyasoft	set	resolution: accepted
2014-04-23 07:52:56	Arfrever	set	nosy: + Arfrever messages: + msg8301
2013-03-25 08:01:06	seletz	set	messages: + msg7980
2013-03-25 07:56:31	seletz	set	messages: + msg7979
2013-03-23 20:03:43	fwierzbicki	set	priority: normal assignee: fwierzbicki
2013-03-23 20:03:23	fwierzbicki	set	nosy: + fwierzbicki messages: + msg7969
2013-03-23 11:43:28	seletz	set	messages: + msg7964
2013-03-23 11:39:44	seletz	set	title: unicude lacks a _formatter_parser() -> unicode lacks a _formatter_parser()
2013-03-23 11:39:12	seletz	create