Message8574

Author	jeff.allen
Recipients	Arfrever, fwierzbicki, jeff.allen, santa4nt, seletz, zyasoft
Date	2014-05-24.11:59:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1400932783.71.0.396638759113.issue2028@psf.upfronthosting.co.za>
In-reply-to

Content
It's not quite as simple as exposing PyString._formatter_parser in PyUnicode too. That would continue to produce byte strings. The formatting in Jython supports Unicode here: >>> x = 3405691582 >>> format(x, "#x") '0xcafebabe' >>> format(x, u"#x") u'0xcafebabe' So the apparatus is there in the elementary formatters. The thing that would drive it actually to produce Unicode answers from string.Formatter(), is for(a newly exposed) unicode._formatter_parser to tell (sub-class?) its stringlib.MarkupIterator to make PyUnicode elements, instead of PyString elements. Compare CPython: >>> fs = "{:#x}" >>> fp = fs._formatter_parser() >>> list(fp) [('', '', '#x', None)] >>> fu = u"{:#x}" >>> fp = fu._formatter_parser() >>> list(fp) [(u'', u'', u'#x', None)] I think unicode elements here in Jython too, would probably do the trick. My first idea is a PyObject asElement(String s) to return either PyString or PyUnicode, and used here: http://hg.python.org/jython/file/5688f9c2b743/src/org/python/core/stringlib/MarkupIterator.java#l62 Maybe absorb the ugly conditional expressions too, if the logic allows.

It's not quite as simple as exposing PyString._formatter_parser in PyUnicode too. That would continue to produce byte strings.

The formatting in Jython supports Unicode here:
>>> x = 3405691582
>>> format(x, "#x")
'0xcafebabe'
>>> format(x, u"#x")
u'0xcafebabe'

So the apparatus is there in the elementary formatters. The thing that would drive it actually to produce Unicode answers from string.Formatter(), is for(a newly exposed) unicode._formatter_parser to tell (sub-class?) its stringlib.MarkupIterator to make PyUnicode elements, instead of PyString elements. Compare CPython:

>>> fs = "{:#x}"
>>> fp = fs._formatter_parser()
>>> list(fp)
[('', '', '#x', None)]

>>> fu = u"{:#x}"
>>> fp = fu._formatter_parser()
>>> list(fp)
[(u'', u'', u'#x', None)]

I think unicode elements here in Jython too, would probably do the trick.

My first idea is a PyObject asElement(String s) to return either PyString or PyUnicode, and used here:
http://hg.python.org/jython/file/5688f9c2b743/src/org/python/core/stringlib/MarkupIterator.java#l62
Maybe absorb the ugly conditional expressions too, if the logic allows.

History
Date	User	Action	Args
2014-05-24 11:59:43	jeff.allen	set	messageid: <1400932783.71.0.396638759113.issue2028@psf.upfronthosting.co.za>
2014-05-24 11:59:43	jeff.allen	set	recipients: + jeff.allen, fwierzbicki, zyasoft, Arfrever, santa4nt, seletz
2014-05-24 11:59:43	jeff.allen	link	issue2028 messages
2014-05-24 11:59:42	jeff.allen	create