Title: Encoding bug in IO
Type: behaviour Severity: major
Components: Core Versions: Jython 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: beachmachine, fwierzbicki, jeff.allen
Priority: Keywords:

Created on 2013-03-31.07:13:49 by beachmachine, last changed 2014-06-15.07:22:05 by jeff.allen.

File name Uploaded Description Edit Remove
jython_utf_bug.png beachmachine, 2013-03-31.07:13:48 Screenshot showing the bug
msg7983 (view) Author: Andreas Stocker (beachmachine) Date: 2013-03-31.07:13:48
There is a problem with handling unicode characters on the interactive console. It also affects the values of GET and POST parameters on Django - so I guess it is related to any kind of buffers and streams.

When I start the interactive console, I get following output (German locale):
> M�rz 31, 2013 - 09:04:29
where I should get:
> März 31, 2013 - 09:04:29

If I type in an umlaut via the keyboard (eg. ö), � prints on the screen.

All this stuff is working fine in Jython 2.5.x on the same system. I tried various encoding related parameters, with no effect.

My system:
* OS: Debian wheezy
* Java: OpenJDK Server VM (Oracle Corporation)] on java1.7.0_03

Attached a screenshot showing the problem.
msg7984 (view) Author: Andreas Stocker (beachmachine) Date: 2013-03-31.07:34:41
This bug exists since Jython 2.7b1, previous version working fine.
msg8062 (view) Author: Jeff Allen (jeff.allen) Date: 2013-07-12.20:45:48
Hi Andreas:

There are shortcomings in the default JLine library which I've been working to understand. An alternative Java Readline console is available. One may also use a plain console and do without any line-editing.

I have worked through a number of console encoding and library combinations in order to catalogue their behaviour. The only one I find to work on (my) Linux system with unicode is the Readline console with GNU readline as the supporting library. I had to get and install the package libreadline-java for this alternative to work.

You could select the Readline console at run time with:
jython -Dpython.console=org.python.util.ReadlineConsole -Dpython.console.readlinelib=GnuReadline

Please let us know if this works for you. If it gives you what you want, you can set these values as default in the Jython registry.

I tested this on Linux Mint 14, OpenJDK 1.7.0_21, in a Gnome Terminal window set to UTF-8 encoding, and running bash with LC_ALL=en_GB.UTF-8. raw_input() and sys.stdin.readline() both returned correctly encoded accented characters typed using the Extended Winkeys keyboard, but I would expect a genuine German keyboard to work too. (Setting a Greek keyboard certainly works.)

I'm working on console integration at the moment, so this advice may change, but it is my aim at present to keep both JLine and Readline useable, and selected the same way, since neither seems adequate in all circumstances. (Also, Java Readline seems not to be actively maintained.)

msg8645 (view) Author: Andreas Stocker (beachmachine) Date: 2014-06-15.06:34:27
Seems like this bug is solved in 2.7b2 for the interactive console as well as for Django's GET and POST parameter handling!

I think the issue can be closed now. Thanks for your great work!
msg8646 (view) Author: Jeff Allen (jeff.allen) Date: 2014-06-15.07:22:04
Thanks for reporting back. The console continues to bring up interesting problems. I believe I fixed the basic fault you report, but it's good to have confirmation from real use.

Worth a look:
Date User Action Args
2014-06-15 07:22:05jeff.allensetstatus: open -> closed
resolution: fixed
messages: + msg8646
2014-06-15 06:34:27beachmachinesetmessages: + msg8645
2013-07-12 20:45:48jeff.allensetnosy: + jeff.allen
messages: + msg8062
2013-04-08 17:31:50fwierzbickisetnosy: + fwierzbicki
2013-03-31 07:34:41beachmachinesetmessages: + msg7984
2013-03-31 07:13:49beachmachinecreate