Issue2028

classification
Title: unicode lacks a _formatter_parser()
Type: behaviour Severity: normal
Components: Library Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: Arfrever, fwierzbicki, jeff.allen, santa4nt, seletz, zyasoft
Priority: normal Keywords:

Created on 2013-03-23.11:39:12 by seletz, last changed 2014-06-29.15:44:18 by jeff.allen.

Messages
msg7964 (view) Author: Stefan Eletzhofer (seletz) Date: 2013-03-23.11:43:27
The parse() function in the string.py module (line 620) tries
to return a _formatter_parser from a unicode object.

The _formatter_parser() is available on str, but not on unicode in
the latest jython.
msg7969 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-03-23.20:03:23
Could you provide a small example that shows the bug that results, or the motivating use case? It would be even better if there is a test in the standard lib that is failing or skipped because of this lack if you happen to know of one. That way we can know when we've properly fixed it.
msg7979 (view) Author: Stefan Eletzhofer (seletz) Date: 2013-03-25.07:56:31
I just looked at the "string.py" module in Lib -- there's a comment
about this (line 533ff)::

    ########################################################################
    # the Formatter class
    # see PEP 3101 for details and purpose of this class
    
    # The hard parts are reused from the C implementation.  They're exposed as "_"
    # prefixed methods of str and unicode.
    
    # The overall parser is implemented in str._formatter_parser.
    # The field name parser is implemented in str._formatter_field_name_split

And I looked at the output of dir("foo")  and dir(u"foo").  The former does indeed
have a _formatter_parser() and _formatter_field_name_split() method.

Where are the tests located in a checked-out Jython tree?
msg7980 (view) Author: Stefan Eletzhofer (seletz) Date: 2013-03-25.08:01:06
Ah, the use case -- I tried to run IPython (master) using the latest Jython.  IPython seems to use this method somehow to parse/create its
fancy prompts for its REPL.  They contain colour escapes and are
unicode strings.  I hacked around this issue by force-coercing to
string -- this is surely not the correct way, tho.

--- string.py.orig	2013-03-25 08:59:17.000000000 +0100
+++ string.py	2013-03-25 08:59:29.000000000 +0100
@@ -618,7 +618,7 @@
     # if field_name is not None, it is looked up, formatted
     #  with format_spec and conversion and then used
     def parse(self, format_string):
-        return format_string._formatter_parser()
+        return str(format_string)._formatter_parser()


     # given a field_name, find the object it references.
msg8301 (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) Date: 2014-04-23.07:52:55
Example in CPython 2.7:

>>> import string
>>> string.Formatter().format(u"{a}", a=u"b")
u'b'
>>>

Example in Jython 2.7:

>>> import string
>>> string.Formatter().format(u"{a}", a=u"b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/share/jython-2.7/Lib/string.py", line 545, in format
    return self.vformat(format_string, args, kwargs)
  File "/usr/share/jython-2.7/Lib/string.py", line 549, in vformat
    result = self._vformat(format_string, args, kwargs, used_args, 2)
  File "/usr/share/jython-2.7/Lib/string.py", line 557, in _vformat
    for literal_text, field_name, format_spec, conversion in \
  File "/usr/share/jython-2.7/Lib/string.py", line 621, in parse
    return format_string._formatter_parser()
AttributeError: 'unicode' object has no attribute '_formatter_parser'
>>>
msg8534 (view) Author: Jim Baker (zyasoft) Date: 2014-05-22.01:11:12
Target beta 4, especially given the ipython dependency

Adding Jeff to the nosy list, he probably has some insight here.
msg8574 (view) Author: Jeff Allen (jeff.allen) Date: 2014-05-24.11:59:42
It's not quite as simple as exposing PyString._formatter_parser in PyUnicode too. That would continue to produce byte strings.

The formatting in Jython supports Unicode here:
>>> x = 3405691582
>>> format(x, "#x")
'0xcafebabe'
>>> format(x, u"#x")
u'0xcafebabe'

So the apparatus is there in the elementary formatters. The thing that would drive it actually to produce Unicode answers from string.Formatter(), is for(a newly exposed) unicode._formatter_parser to tell (sub-class?) its stringlib.MarkupIterator to make PyUnicode elements, instead of PyString elements. Compare CPython:

>>> fs = "{:#x}"
>>> fp = fs._formatter_parser()
>>> list(fp)
[('', '', '#x', None)]

>>> fu = u"{:#x}"
>>> fp = fu._formatter_parser()
>>> list(fp)
[(u'', u'', u'#x', None)]

I think unicode elements here in Jython too, would probably do the trick.

My first idea is a PyObject asElement(String s) to return either PyString or PyUnicode, and used here:
http://hg.python.org/jython/file/5688f9c2b743/src/org/python/core/stringlib/MarkupIterator.java#l62
Maybe absorb the ugly conditional expressions too, if the logic allows.
msg8827 (view) Author: Jeff Allen (jeff.allen) Date: 2014-06-27.13:03:33
I'm taking my usual approach of inspecting and commenting what we've got, ahead of any change. I notice we are also divergent in our handling of field numbering. CPython:
>>> list("hello {!r:#12x} and {!s:8.3f} world!"._formatter_parser())
[('hello ', '', '#12x', 'r'), (' and ', '', '8.3f', 's'), (' world!', None, None, None)]
>>>

Jython:
>>> list("hello {!r:#12x} and {!s:8.3f} world!"._formatter_parser())
[('hello ', '0', '#12x', 'r'), (' and ', '1', '8.3f', 's'), (' world!', None, None, None)]
>>>

Not the cause of this bug, I know.
msg8848 (view) Author: Jeff Allen (jeff.allen) Date: 2014-06-29.15:44:17
This is now fixed in http://hg.python.org/jython/rev/6cffc2f6a643 (I claim). unicode._formatter_field_name_split is needed as well.

Jython 2.7b3+ (default:5efdcedc9817+6cffc2f6a643+, Jun 29 2014, 15:46:14)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_60
Type "help", "copyright", "credits" or "license" for more information.
>>> list(u"hello {:{0}d} and {}"._formatter_parser())
[(u'hello ', u'0', u'{0}d', None), (u' and ', u'1', u'', None)]
>>> first, rest = u"a.b[2]"._formatter_field_name_split()
>>> first, list(rest)
(u'a', [(True, u'b'), (False, 2)])
>>> import string
>>> string.Formatter().format(u"{} {:*>12d} {:{width}.{prec}f}", 10, 20, 30, width=8, prec=3)
u'10 **********20   30.000'
>>> string.Formatter().format(u"{0}", 10)
'10'
>>>

The last one is interesting because the problem is in string.py (it produces a str). It happens only if the format contains no literal text at all (just replacement fields) and no unicode arguments:
>>> string.Formatter().format(u"{a}", a=u"b")
u'b'
>>> string.Formatter().format(u"{a}", a="b")
'b'
This appears to be a "won't fix" for CPython. See http://bugs.python.org/issue15951 .
History
Date User Action Args
2014-06-29 15:44:18jeff.allensetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg8848
2014-06-27 13:03:33jeff.allensetmessages: + msg8827
2014-06-16 21:12:02jeff.allensetassignee: jeff.allen
2014-05-24 11:59:43jeff.allensetmessages: + msg8574
2014-05-22 01:11:13zyasoftsetassignee: fwierzbicki -> (no value)
messages: + msg8534
nosy: + jeff.allen, zyasoft
2014-04-23 17:54:35santa4ntsetnosy: + santa4nt
type: behaviour
2014-04-23 15:03:23zyasoftsetresolution: accepted
2014-04-23 07:52:56Arfreversetnosy: + Arfrever
messages: + msg8301
2013-03-25 08:01:06seletzsetmessages: + msg7980
2013-03-25 07:56:31seletzsetmessages: + msg7979
2013-03-23 20:03:43fwierzbickisetpriority: normal
assignee: fwierzbicki
2013-03-23 20:03:23fwierzbickisetnosy: + fwierzbicki
messages: + msg7969
2013-03-23 11:43:28seletzsetmessages: + msg7964
2013-03-23 11:39:44seletzsettitle: unicude lacks a _formatter_parser() -> unicode lacks a _formatter_parser()
2013-03-23 11:39:12seletzcreate