Issue1693
Created on 2011-01-03.23:31:15 by alex.gronholm, last changed 2011-01-07.01:15:58 by pjenvey.
msg6309 (view) |
Author: Alex Grönholm (alex.gronholm) |
Date: 2011-01-03.23:31:15 |
|
Specifically, names with non-ascii characters in them. Whether the module you're trying to import exists or not is irrelevant.
CPython 2.5.5:
>> sys.path.append(u'/home/alex/t/töö')
>>> import ttt
>>>
Jython 2.5.2rc2:
>>> sys.path.append(u'/home/alex/t/töö')
>>> import ttt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/alex/libs/jython2.5.2rc2/Lib/encodings/__init__.py", line 31, in <module>
import codecs, types
UnicodeEncodeError: 'ascii' codec can't encode characters in position 14-15: ordinal not in range(128)
|
msg6311 (view) |
Author: Oti Humbel (otmarhumbel) |
Date: 2011-01-04.22:57:45 |
|
The unit test in the attached file test_sys2_jy.py fails,
regardless if the first line
# coding=latin2
is present or not.
|
msg6313 (view) |
Author: Oti Humbel (otmarhumbel) |
Date: 2011-01-06.21:23:59 |
|
The changes in 1693-patch.txt solve the problem.
The fix is to encode a unicode string with latin-1 instead of ascii only (in __str__()).
javatests and regrtests all pass.
pjenvey: could you please review this? - thanks!
|
msg6314 (view) |
Author: Alex Grönholm (alex.gronholm) |
Date: 2011-01-06.23:46:27 |
|
I'm sorry to say that this patch doesn't cut it, not by far.
Two reasons: first, using latin-1 encoding in PyUnicode breaks CPython compatibility (UnicodeError should be thrown when u'åäö' is converted to str); second, you'd still get a UnicodeError when adding a path element with, say, chinese characters. Why are sys.path elements being converted to bytestrings anyway?
|
msg6315 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2011-01-07.00:19:23 |
|
Alex is right. Our import system is converting sys.path items to java Strings via item.__str__().toString(). CPython in this case converts unicode to strings by encoding them via the filesystem encoding.
We don't support a filesystem encoding on Jython (at this point anyway). Instead we've just been 'passing thru' unicode when it's requested (e.g. os.listdir).
That is technically broken (you could end up with a plain str where ord(somestr[0]) > 255) but I think we'll continue getting away with this strategy until 2.6. This is one of the few leftover str/unicode weirdness bits carried over from 2.2 where unicode and str were pretty much the same object
|
msg6316 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2011-01-07.00:21:54 |
|
something like this..
|
msg6317 (view) |
Author: Philip Jenvey (pjenvey) |
Date: 2011-01-07.01:15:58 |
|
applied that patch and Oti's test in r7182
|
|
Date |
User |
Action |
Args |
2011-01-07 01:15:58 | pjenvey | set | status: open -> closed resolution: fixed messages:
+ msg6317 |
2011-01-07 00:21:54 | pjenvey | set | files:
+ imp_unicode_fix.diff keywords:
+ patch messages:
+ msg6316 |
2011-01-07 00:19:23 | pjenvey | set | messages:
+ msg6315 |
2011-01-06 23:46:27 | alex.gronholm | set | messages:
+ msg6314 |
2011-01-06 21:24:00 | otmarhumbel | set | files:
+ 1693-patch.txt assignee: otmarhumbel messages:
+ msg6313 nosy:
+ pjenvey |
2011-01-04 22:57:45 | otmarhumbel | set | files:
+ test_sys2_jy.py nosy:
+ otmarhumbel messages:
+ msg6311 |
2011-01-03 23:31:15 | alex.gronholm | create | |
|