Issue1671134

classification
Title: '%s' % u'x' returns str object
Type: Severity: normal
Components: Core Versions:
Milestone:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: cgroves, pekka.klarck
Priority: normal Keywords:

Created on 2007-02-28.18:57:49 by cgroves, last changed 2007-03-09.17:21:21 by cgroves.

Messages
msg1508 (view) Author: Charlie Groves (cgroves) Date: 2007-02-28.18:57:49
This is broken out of a comment Pekka made on http://jython.org/bugs/1659819

Same problem appears also if you create a string using pattern like
'Something %s' and the substituted string is unicode. Examples below
demonstrate.


Jython 2.2b1 on java1.5.0_10 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'str'>
>>> x
'Good Hyv\xE4'
>>> unicode(x)
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: ascii decoding error: ordinal not in range(128)
>>>

Python 2.4.3 (#1, May 18 2006, 07:40:45) 
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'Good %s' % u'Hyv\u00E4'
>>> type(x)
<type 'unicode'>
>>> x
u'Good Hyv\xe4'
>>> unicode(x)
u'Good Hyv\xe4'
>>> 
msg1509 (view) Author: Charlie Groves (cgroves) Date: 2007-02-28.19:11:10
In fixing the original bug I added Lib/test/test_str2unicode.py which would be a good place to add tests for this formatting stuff....
msg1510 (view) Author: Pekka Klärck (pekka.klarck) Date: 2007-02-28.22:58:08
After figuring out that the affected method in this case is __mod__ fixing the problem was pretty easy. The patch is available at http://jython.org/patches/1671304 and it includes following test added to Lib/test/test_str2unicode.py.

def test_string_formatting(self):
    self.assertEquals(unicode, type('%s' % u'x'))
    self.assertEquals(unicode, type('%s %s' % (u'x', 'y')))

(I also replaced few tabs with spaces in Lib/test/test_str2unicode.py. Hope that's ok. If not, I won't touch them in the future.)
msg1511 (view) Author: Pekka Klärck (pekka.klarck) Date: 2007-02-28.23:48:53
Darn, I noticed one more scenario that needs to be taken care. The patch mentioned in the previous comment doesn't fix this.

>>> '%(x)s' % { 'x' : u'xxx' }
'xxx'

This got me thinking that this issue probably ought to be fixed in StringFormatter.format instead of PyString.str___mod__ (as in the current patch). SF.format could keep count on given format items and return PyString or PyUnicode as needed. That would of course require SF.format's return type to be changed from String to PyString. Probably at the same time it would make sense to change SF.formatXXX methods (formaLong, formatInteger, ...) from public to private to make it more clear that they are only helper methods of SF.format (at least they seem to be) and can thus still return String. These changes seem pretty safe because I could find StringFormatter used elsewhere than in PyString.str___mod__ and the only method it uses is SF.format.

Comments about the implementation ideas above are welcome. If the approach seems reasonable I can make a new patch.
msg1512 (view) Author: Pekka Klärck (pekka.klarck) Date: 2007-02-28.23:58:05
The first patch also introduced this bug. 

>>> u'%s' % 's'
's'

It seems PyUnicode inherits str___mod__ from PyString directly and the old implementation used createInstance which is overridded in PyUnicode. 

Need to add a test for this one. It probably would be a good idea to add tests also for unicode join and replace (e.g. "u''.join([])") to make sure similar issues don't appear there.

A fix for this could be overriding str__mod__ in PyUnicode so that the implementation in PyString is called first and then its results casted to PyUnicode.


msg1513 (view) Author: Pekka Klärck (pekka.klarck) Date: 2007-03-01.10:50:44
A new patch uploaded to http://jython.org/patches/1671304. It is implemented according to my earlier comment at 2007-03-01 01:48.

The patch also contains new tests for string formatting with both str and unicode and join/replace with unicode. I decided to put TestUnicodeReturnsUnicode into Lib/test/test_str2unicode.py to keep it close to other similar tests even though the file name is a bit incorrect in this case.
msg1514 (view) Author: Charlie Groves (cgroves) Date: 2007-03-09.17:21:21
committed in r3135.
History
Date User Action Args
2007-02-28 18:57:49cgrovescreate