Message6782

Author pekka.klarck
Recipients pekka.klarck
Date 2012-02-17.22:45:42
SpamBayes Score 9.470874e-11
Marked as misclassified No
Message-id <1329518743.81.0.683448071667.issue1841@psf.upfronthosting.co.za>
In-reply-to
Content
On my Linux machine with UTF-8 system encoding I got the following:

$ a=ä python
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['a']
'\xc3\xa4'
>>> _.decode('UTF-8')
u'\xe4'

$ a=ä jython
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) 
[Java HotSpot(TM) Server VM (Sun Microsystems Inc.)] on java1.6.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['a']
'\xe4'

I have seen Jython to return similarly wrong bytes earlier (e.g. #1592 and #1593) and know that I can decode them using this hack:

>>> from java.lang import String
>>> String(os.environ['a']).toString()
u'\xe4'

The problem is that if I set environment variables myself and encode them correctly, using the hack doesn't work:

>>> os.environ['b'] = u'\xe4'.encode('UTF-8')
>>> String(os.environ['b']).toString()
u'\xc3\xa4'

In other words I needed to know has the value been set before or during the execution. It turns out that I actually can do that using using java.lang.System.getenv which only knows about the former:

>>> from java.lang.System import getenv
>>> getenv('a')
u'\xe4'
>>> getenv('b') is None
True

Notice also how getenv above returned the correct value as Unicode.
History
Date User Action Args
2012-02-17 22:45:43pekka.klarcksetrecipients: + pekka.klarck
2012-02-17 22:45:43pekka.klarcksetmessageid: <1329518743.81.0.683448071667.issue1841@psf.upfronthosting.co.za>
2012-02-17 22:45:43pekka.klarcklinkissue1841 messages
2012-02-17 22:45:43pekka.klarckcreate