Issue1802339
Created on 2007-09-25.23:13:24 by pekka.klarck, last changed 2009-10-28.18:41:58 by pjenvey.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | Remove |
unic.patch | pekka.klarck, 2007-09-27.22:52:42 |
Messages | |||
---|---|---|---|
msg1933 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2007-09-25.23:13:24 | |
Running following code fails when using Jython 2.2.1 rc 1 but succeeds with Jython 2.2 (and earlier alphas/betas/rcs) and Python 2.3/2.4/2.5. - - - - - - - - - - import sys from StringIO import StringIO msg = u'Circle is 360\u00B0' sys.stdout = StringIO() print msg assert sys.stdout.getvalue() == msg + '\n' - - - - - - - - - - The traceback is below the code and shows that printing a unicode string fails even though in this case stdout has been intercepted. - - - - - - - - - - Traceback (innermost last): File "unictest.py", line 7, in ? UnicodeError: ascii encoding error: ordinal not in range(128) - - - - - - - - - - Being able to print unicode strings like this is crucial in our case. We've been implementing a test automation framework that runs on Python and Jython and it can be extended using so called test libraries which they can write messages to a common test log simply by writing to stdout. This way the API between the framework and libraries is pretty simple and it works the same way both when a lib is written in Python and when it's written in Java (we intercept java.lang.System.out too). |
|||
msg1934 (view) | Author: Philip Jenvey (pjenvey) | Date: 2007-09-26.04:17:08 | |
This one actually fails on CPython 2.2 though CPython > 2.2 calls PyObject_Str on anything printed. Jython doesn't have an equivalent function; in this case it just calls __str__ (in StdoutWrapper) on any object printed. PyObject_Str looks like it's a safer version of __str__ for situations like these, it specially handles unicode objects, returning PyUnicode_AsEncodedString (which is like our encode_UnicodeEscape) We could special case unicode objects in StdoutWrapper, but I see PythonObject_Str used in a few places in CPython. So patching StdoutWrapper might miss other cases where this is a problem $ grep -r PyObject_Str\( * | grep \.c: Modules/_csv.c: str = PyObject_Str(field); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Modules/_tkinter.c: PyObject *v = PyObject_Str(value); Objects/descrobject.c: return PyObject_Str(pp->dict); Objects/fileobject.c: value = PyObject_Str(v); Objects/object.c: s = PyObject_Str(op); Objects/object.c:PyObject_Str(PyObject *v) Objects/stringobject.c: op = (PyStringObject *) PyObject_Str((PyObject *)op); Objects/stringobject.c: return PyObject_Str(x); Objects/stringobject.c: temp = PyObject_Str(v); Objects/stringobject.c: PyObject_Str() assure this */ Objects/unicodeobject.c: temp = PyObject_Str(v); Objects/unicodeobject.c: PyObject_Repr() and PyObject_Str() assure Python/bltinmodule.c: po = PyObject_Str(v); Python/codecs.c: PyObject *string = PyObject_Str(name); Python/errors.c: tmp = PyObject_Str(v); Python/exceptions.c: out = PyObject_Str(tmp); Python/exceptions.c: out = PyObject_Str(args); Python/exceptions.c: str = PyObject_Str(msg); Python/pythonrun.c: v = PyObject_Str(v); Python/pythonrun.c: w = PyObject_Str(w); Python/pythonrun.c: PyObject *s = PyObject_Str(value); |
|||
msg1935 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2007-09-26.07:34:24 | |
This might be a bit involved for me to investigate and fix but if nobody else is doing it I can try. Getting the original example working would be a big step forward and even if other places were missed that would be better than nothing. I hope that this failing on CPython 2.2 doesn't mean that it won't be fixed in Jython 2.2. At least for us that would be really inconvenient because it'll take some time before Jython 2.3 (or whatever the version will be) is released. We can of course instruct people needing to use unicode to stick with 2.2 but then they won't get any other fixes/features in 2.2.x releases. |
|||
msg1936 (view) | Author: Charlie Groves (cgroves) | Date: 2007-09-27.04:42:28 | |
Without a patch in hand and a good understanding of the problem, I think this is too big of a change to attempt between release candidates. Even Philip's explanation below isn't complete because if CPython were just using unicode_escape on the printed objects, your final assert would fail. sys.stdout.getvalue() would have a str object in it which isn't equal to the unicode object from above. It definitely passes though. While 2.2.1 is too far along to fix this, I wouldn't mind making a 2.2.2 for this and whatever else comes up. That said, as long as you're not relying on unicode objects coming out of getvalue(which I don't think could be the case since that wouldn't have happened under 2.2 either), you might be able to get around this by setting the default encoding. The reason it's complaining about ascii is because ascii is the default default encoding. You can change that to any encoding supported by Jython in your site.py, and then whenever Jython attempts to turn a unicode object into a str without an explict encoding, it'll use that encoding to do the work. It works the same in the opposite direction when decoding a str into a unicode object without an explicit encoding. |
|||
msg1937 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2007-09-27.22:52:42 | |
Philip pointed me to StdoutWrapper and after playing with it a little bit I was able to come up with a simple patch (attached) that makes the original example pass. I run dist/Lib/test/regrtest.py on 2.2 maint branch both w/ and w/o the patch and got same failures so it doesn't break everything. I have to confess that I don't really know the code in StdoutWrapper nor the code using it so I may very well be missing something totally obvious. The patch is rather ugly (catching Throwable is probably not the best idea) and should be taken as a prototype at this phase. File Added: unic.patch |
|||
msg1938 (view) | Author: Charlie Groves (cgroves) | Date: 2007-09-30.01:49:39 | |
I don't think this patch is going in the right direction. Rather than slipping in a quick fix for this particular case, we need to figure out exactly what CPython was doing in 2.2 and what CPython is doing currently. If the current behavior won't break 2.2's expectations in a horrible way, we can add it to our 2.2. Just shoehorning a fix in for this one case could lead to weirdly inconsistent behavior in different parts of the code, which I really want to avoid. Did you try setting the default encoding? You can do it from java with org.python.core.codecs.setDefaultEncoding. |
|||
msg1939 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2007-09-30.22:21:49 | |
I totally agree that fixing this issue with a hack that just seems to solve the problems is not the right thing to do. My patch was just an example showing that somehow modifying StdoutWrapper might be a part of the solution. Unfortunately I don't understand Jython (nor CPython) internals well enough to be able to figure out a real fix. =/ Thanks for mentioning org.python.core.codecs.setDefaultEncoding. I played with it a little and it seems that we could even have a workaround for the problem in our system. I changed my original example slightly and was able to get "print <unicode>" working. There are still some differences between different Jython versions and CPython but we should be able to handle them. Here's the new code: - - - - - - - - - - import sys import os from StringIO import StringIO if os.name == 'java': from org.python.core import codecs codecs.setDefaultEncoding('utf-8') print 'Jython', sys.version else: print 'Python', sys.version sys.stdout = StringIO() msg = u'Circle is 360\u00B0' print msg out = sys.stdout.getvalue() sys.stdout = sys.__stdout__ print out, type(out) print msg, type(msg) assert out == msg + '\n' - - - - - - - - - - And here are outputs using few different interpreters: - - - - - - - - - - Jython 2.2rc3 Circle is 360° <type 'str'> Circle is 360° <type 'unicode'> - - - - - - - - - - Jython 2.2.1rc1 Circle is 360° <type 'str'> Circle is 360° <type 'unicode'> Traceback (innermost last): File "unictest.py", line 21, in ? AssertionError: - - - - - - - - - - Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] Circle is 360° <type 'unicode'> Circle is 360° <type 'unicode'> |
|||
msg3176 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2008-05-02.20:06:17 | |
This issue might be related to http://bugs.jython.org/issue1032 |
|||
msg3178 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2008-05-02.20:13:55 | |
While investigating #1032 I also noticed that the example in msg1939 passes on Jython 2.2.1 if setDefaultEncoding('utf-8') is changed to setDefaultEncoding('iso-8859-1'). Unfortunately it only works if the printed string is ISO-8859-1 like in the example -- if it's something else the familiar UnicodeError reappears. |
|||
msg4832 (view) | Author: Philip Jenvey (pjenvey) | Date: 2009-06-21.21:49:59 | |
This is still present on 2.5 |
|||
msg4850 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2009-06-22.02:58:09 | |
pjenvey, if you need help testing this let me know. I'd like to get Unicode working fully with Robot Framework (http://robotframework.org) also on Jython. |
|||
msg4894 (view) | Author: Philip Jenvey (pjenvey) | Date: 2009-07-11.23:23:02 | |
fixed in r6529. Hope this helps, Pekka |
|||
msg4895 (view) | Author: Pekka Klärck (pekka.klarck) | Date: 2009-07-11.23:46:32 | |
Awesome! I'm currently on holiday trying to avoid work related tasks, but I added verifying this behavior to our Jython 2.5(.1) compatibility issue (http://code.google.com/p/robotframework/issues/detail?id=198). |
|||
msg5281 (view) | Author: Richard Woolliscroft (richardwoolliscroft) | Date: 2009-10-28.09:32:49 | |
The fix does not solve the case where the stdout is a PyFileWriter which uses encoding. |
|||
msg5282 (view) | Author: Richard Woolliscroft (richardwoolliscroft) | Date: 2009-10-28.09:54:39 | |
Actually, I'm not sure a change to StoutWrapper would fix this. The problem is that displayhook in PySystemState calls __repr__ on everything to be printed to the stdout. If the object is a PyUnicode then its __repr__ method returns a PyString which is passed into Py.stdout.println, so StoutWrapper cannot distinguish between something which originally was a PyUnicode or a PyString. So any unicode string would always come out with a u at the start and not be properly encoded. |
|||
msg5288 (view) | Author: Philip Jenvey (pjenvey) | Date: 2009-10-28.18:33:02 | |
That issue should really be a new ticket. It also needs a test |
History | |||
---|---|---|---|
Date | User | Action | Args |
2009-10-28 18:41:58 | pjenvey | set | nosy: + nriley |
2009-10-28 18:33:03 | pjenvey | set | messages: + msg5288 |
2009-10-28 09:54:40 | richardwoolliscroft | set | messages: + msg5282 |
2009-10-28 09:32:49 | richardwoolliscroft | set | nosy:
+ richardwoolliscroft messages: + msg5281 |
2009-07-11 23:46:32 | pekka.klarck | set | messages: + msg4895 |
2009-07-11 23:23:03 | pjenvey | set | status: open -> closed resolution: fixed messages: + msg4894 |
2009-06-22 02:58:09 | pekka.klarck | set | messages:
+ msg4850 title: [221rc1] Problem printing unicode when stdout intercepted -> Problem printing unicode when stdout intercepted |
2009-06-21 23:38:45 | pjenvey | set | assignee: pjenvey |
2009-06-21 21:49:59 | pjenvey | set | messages:
+ msg4832 versions: + 2.5.1, - 2.2.2 |
2009-03-14 03:02:28 | fwierzbicki | set | versions: + 2.2.2 |
2008-12-15 17:09:52 | fwierzbicki | set | components: + Core, - None |
2008-05-02 20:13:55 | pekka.klarck | set | messages: + msg3178 |
2008-05-02 20:06:17 | pekka.klarck | set | messages: + msg3176 |
2007-09-25 23:13:24 | pekka.klarck | create |
Supported by Python Software Foundation,
Powered by Roundup