Issue1128

classification
Title: java.lang.String should be mapped to PyUnicode, not PyString
Type: Severity: normal
Components: Core Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: leosoto Nosy List: fwierzbicki, leosoto, robheittman
Priority: Keywords:

Created on 2008-09-13.19:57:06 by leosoto, last changed 2008-10-14.03:21:38 by leosoto.

Files
File name Uploaded Description Edit Remove
jstring_pyunicode_mapping.diff leosoto, 2008-10-02.16:19:02
Messages
msg3531 (view) Author: Leonardo Soto (leosoto) Date: 2008-09-13.19:57:05
...mainly because is the Right Thing. 

PyString expects that its backing String don't have any code point above
255, without actively enforcing it. So whatever character over that
limit is potentially lost when mapping Strings to PyString. 

Also, it is causing other practical problems, such as
unicodedata.normalize returning str instead of unicode.
msg3557 (view) Author: Leonardo Soto (leosoto) Date: 2008-09-13.23:44:48
Here is a patch which doesn't present any regression on my system:

http://codereview.appspot.com/5655

[CC-ing Frank because it is important to know if we are going to push
this for 2.5 or not]
msg3625 (view) Author: Leonardo Soto (leosoto) Date: 2008-09-29.20:07:53
Not sure if it will be relevant for this issue, but here is a pointer to
how CPython is solving the issue of unicode vs bytestrings on file names
and path handling: http://bugs.python.org/issue3187
msg3627 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2008-09-30.20:27:01
Does this fix test_doctest?  Looking at this patch is high on my list
but that would put it to the top :)
msg3628 (view) Author: Leonardo Soto (leosoto) Date: 2008-09-30.20:35:17
No, it doesn't. But perhaps it can help once we fix the problems at the
parsing level. See
http://article.gmane.org/gmane.comp.lang.jython.devel/4565
msg3634 (view) Author: Rob Heittman (robheittman) Date: 2008-10-02.06:18:33
The proposed patch transparently fixes the lossy behavior I see in inline 
web page scripts that consume Unicode from Java code and then send results 
back to Java code.  Without the patch, script authors need to explicitly 
understand and do str/unicode conversions, which they needn't with other 
script engines like Rhino and JRuby.  Very worthwhile proposal!
msg3635 (view) Author: Leonardo Soto (leosoto) Date: 2008-10-02.16:19:00
I had a revised patch a few days back, but wanted to post it to
rietveld. Unfortunately, right now I don't have the time to clean it up
(e.g: removing whitespace-only changes) and adapt it to rietvield
(adding a leading "jython/" to paths inside the diff). 

As I've realized now that I won't have the time until at least the
weekend, I'm attaching the raw patch here...
msg3678 (view) Author: Leonardo Soto (leosoto) Date: 2008-10-14.03:21:37
The last patch was applied on r5390.
History
Date User Action Args
2008-10-14 03:21:38leosotosetstatus: open -> closed
resolution: fixed
messages: + msg3678
2008-10-02 16:19:02leosotosetfiles: + jstring_pyunicode_mapping.diff
messages: + msg3635
2008-10-02 06:18:33robheittmansetmessages: + msg3634
2008-10-02 06:04:06robheittmansetnosy: + robheittman
2008-09-30 20:35:18leosotosetmessages: + msg3628
2008-09-30 20:27:01fwierzbickisetmessages: + msg3627
2008-09-29 20:07:54leosotosetmessages: + msg3625
2008-09-13 23:44:48leosotosetnosy: + fwierzbicki
messages: + msg3557
2008-09-13 19:57:06leosotocreate