Issue1841

classification
Title: Non-ASCII environment variables are encoded incorrectly in os.environ
Type: Severity: normal
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amak, fwierzbicki, pekka.klarck
Priority: Keywords:

Created on 2012-02-17.22:45:43 by pekka.klarck, last changed 2013-02-26.22:00:11 by fwierzbicki.

Messages
msg6782 (view) Author: Pekka Klärck (pekka.klarck) Date: 2012-02-17.22:45:42
On my Linux machine with UTF-8 system encoding I got the following:

$ a=ä python
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['a']
'\xc3\xa4'
>>> _.decode('UTF-8')
u'\xe4'

$ a=ä jython
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) 
[Java HotSpot(TM) Server VM (Sun Microsystems Inc.)] on java1.6.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['a']
'\xe4'

I have seen Jython to return similarly wrong bytes earlier (e.g. #1592 and #1593) and know that I can decode them using this hack:

>>> from java.lang import String
>>> String(os.environ['a']).toString()
u'\xe4'

The problem is that if I set environment variables myself and encode them correctly, using the hack doesn't work:

>>> os.environ['b'] = u'\xe4'.encode('UTF-8')
>>> String(os.environ['b']).toString()
u'\xc3\xa4'

In other words I needed to know has the value been set before or during the execution. It turns out that I actually can do that using using java.lang.System.getenv which only knows about the former:

>>> from java.lang.System import getenv
>>> getenv('a')
u'\xe4'
>>> getenv('b') is None
True

Notice also how getenv above returned the correct value as Unicode.
msg6830 (view) Author: Alan Kennedy (amak) Date: 2012-03-19.18:18:19
What is the setting of "python.console.encoding" in your registry file?

Is it set to the actual encoding of your shell?

Note also that you should really be passing an encoding to the String constructor when decoding from bytes, i.e.

>>> os.environ['b'] = u'\xe4'.encode('UTF-8')
>>> String(os.environ['b'], "UTF-8").toString()

If you don't specify an encoding, the bytes are unlikely to be decoded properly.
msg7799 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-02-26.18:27:45
No answer in a long time, closing as out of date.
msg7815 (view) Author: Pekka Klärck (pekka.klarck) Date: 2013-02-26.21:29:35
Sorry, hadn't noticed Alan's question. Where is the registry file stored? I certainly haven't touched it.
msg7816 (view) Author: Pekka Klärck (pekka.klarck) Date: 2013-02-26.21:32:24
Found the registry. "python.console.encoding" is commented out.
msg7817 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-02-26.22:00:11
Pekka: by default it is commented out, I think Alan is suggesting that you specify an encoding. Opening back up.
History
Date User Action Args
2013-02-26 22:00:11fwierzbickisetstatus: closed -> open
resolution: out of date ->
messages: + msg7817
2013-02-26 21:32:24pekka.klarcksetmessages: + msg7816
2013-02-26 21:29:35pekka.klarcksetmessages: + msg7815
2013-02-26 18:27:45fwierzbickisetstatus: open -> closed
resolution: out of date
messages: + msg7799
nosy: + fwierzbicki
2012-03-19 18:18:19amaksetmessages: + msg6830
2012-03-19 17:50:35amaksetnosy: + amak
2012-02-17 22:45:43pekka.klarckcreate