Title: Jython 2.1 converts string to unicode
Type: Severity: critical
Components: None Versions: 2.1a3
Status: closed Resolution: invalid
Dependencies: Superseder:
Assigned To: Nosy List: amak, amylin, fwierzbicki
Priority: Keywords:

Created on 2008-10-29.21:02:43 by amylin, last changed 2008-12-15.16:05:37 by fwierzbicki.

msg3723 (view) Author: (amylin) Date: 2008-10-29.21:02:42
When we issue a jython command in WebSphere wsadmin scripting tool 
which invokes a WebSphere MBean operation and the MBean method returns 
a string containing some NLS translated characters such as French 
characters "ceci aété fait déjà", Jython will automatically converts 
the string to python unicode string and return to the wsadmin.

Here is an example:

The Java method "mbeanTest" returns some French characters:

public String mbeanTest()
return "ceci a été fait déjà"; 

Invoke the method in wsadmin interactive mode using jython and the 
command returns some unicode string to wsadmin. It appears a bug in 
Jython to mangle and convert the French string to python unicode 

C:\WebSphere\AppServer\profiles\Dmgr01\bin>wsadmin -lang jython
*sys-package-mgr*: processing modified jar,
WASX7209I: Connected to process "dmgr" on node AMYLIN4CellManager06 
using SOAP connector; The type of process is: DeploymentManager
WASX7031I: For help, enter: "print"

wsadmin>cs = AdminControl.queryNames("type=ConfigService,*")
wsadmin>AdminControl.invoke(cs, 'mbeanTest')
'ceci a \xE9t\xE9 fait d\xE9j\xE0'

The wsadmin does not convert the French string to unicode string. The
wsadmin has invoked BSFManager.iexec(lang, command, 0, 0, command) and
BSFManger has taken the input command and finally invoked Jython
PyInterpreter.runcode() code and it somehow converts the French string 
to the unicode string in Jython code and returns the unicode string to 

If we issue Jython "print" output command, it will returns the correct
string. We have added System.out.println() statements to prove wsadmin 
has not converted the string and Jython has converted the string to 
unicode string and return to wsadmin.

wsadmin>cs = AdminControl.queryNames("WebSphere:type=ConfigService,*")
********* invoke BSFManager.iexec(lang, command, 0, 0, command)
********* done
wsadmin>AdminControl.invoke(cs, 'mbeanTest')
********* invoke BSFManager.iexec(lang, command, 0, 0, command)
********** return from AdminControl.invoke() ceci a été fait déjà
'ceci a \xE9t\xE9 fait d\xE9j\xE0'

When issue the "print" command, it returns the correct French string.

wsadmin>s = AdminControl.invoke(cs, 'mbeanTest')
wsadmin>print s
ceci a été fait déjà
msg3853 (view) Author: Alan Kennedy (amak) Date: 2008-11-26.12:55:48
This is not a bug.

When you ask for a string object to dumped on the console, you get the
repr of the string, which in this case is the correct: 'ceci a \xE9t\xE9
fait d\xE9j\xE0'

Python repr (see repr(object))

Note that all of those character codes are correct, according to the
ISO-8859-1 character encoding (which is presumably the encoding of your
original java source file).

You get the correct representation when you print because jython sends
the actual bytes (not the repr) to the terminal, and your terminal
understands them (presumably because it is a ISO-8859-1) aware terminal.

If that was a windows terminal you were working on, you would have to
encode as something like "cp850" to get the correct display of glyphs on
your terminal; i.e. you would have to do:>>> print s.encode("cp850")

Python repr exists to render a *printable* representation that will
display the actual content of the string, regardless of the character
encoding of the terminal you print it to.

I recommend that we close this is "not a bug".
msg3924 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2008-12-15.16:05:37
Agree with amak.  Closing.
Date User Action Args
2008-12-15 16:05:37fwierzbickisetstatus: open -> closed
nosy: + fwierzbicki
resolution: invalid
messages: + msg3924
2008-11-26 12:55:50amaksetnosy: + amak
messages: + msg3853
2008-10-29 21:02:43amylincreate