Issue1792

classification
Title: Jython returns unicode string when using Jython 2.5.2 in WebSphere Application Server
Type: behaviour Severity: normal
Components: None Versions: 2.5.2
Milestone:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: amak Nosy List: amak, amyhlin, amylin
Priority: Keywords:

Created on 2011-08-22.21:00:16 by amyhlin, last changed 2011-10-15.10:49:30 by amak.

Files
File name Uploaded Description Edit Remove
unnamed amylin, 2011-08-26.21:32:21
Messages
msg6610 (view) Author: (amyhlin) Date: 2011-08-22.21:00:15
When issuing WebSphere Application Server wsadmin command to display a list of WebSphere configuration object identification such as node ID, it returns the jython unicode string (u'xxxx) instead of regular string in jython 2.5.2.  Does the behavior change in v2.5.2?  Is any way to prevent this?   

For example, it returns the list of the WebSphere node configuration object ids in jython unicode format (u'<config id string>) when issuing wsadmin AdminConfig.list('Node') command in Windows.

Using Jython 2.5.2:

c:\WebSphere\AppServer\profiles\Dmgr01\bin>wsadmin -lang jython
WASX7031I: For help, enter: "print Help.help()"
wsadmin>AdminConfig.list('Node')
u'AMYLINCellManager01(cells/AMYLINCell01/nodes/AMYLINCellManager01|node.xml#Node_1)\r\nAMYLINNode01(cells/AMYLINCell01/nodes/AMYLINNode01|node.xml#Node_1)'

It returns correctly if we issue jython print command or assign the output list to a variable.

wsadmin>print AdminConfig.list('Node')
AMYLINCellManager0(cells/AMYLINCell01/nodes/AMYLINCellManager01|node.xml#Node_1)AMYLINNode01(cells/AMYLINCell01/nodes/AMYLINNode01|node.xml#Node_1)'

wsadmin>nodes = AdminConfig.list('Node')
wsadmin>print nodes
AMYLINCellManager01(cells/AMYLINCell01/nodes/AMYLINCellManager01|node.xml#Node_1)AMYLINNode01(cells/AMYLINCell01/nodes/AMYLINNode01|node.xml#Node_1)

It display normally (non-unicode) when issuing same commnd in WebSphere Application Server V6.1 and V7.0 with jython version 2.1.

Using Jython 2.1:

wsadmin>AdminConfig.list('Node')
'AMYLINCellManager01(cells/AMYLINCell01/nodes/AMYLINCellManager01|node.xml#Node_1)\r\nAMYLINNode01(cells/AMYLINCell01/nodes/AMYLINNode01|node.xml#Node_1)'
msg6615 (view) Author: Alan Kennedy (amak) Date: 2011-08-26.20:33:56
> When issuing WebSphere Application Server wsadmin command to display a list of 
> WebSphere configuration object identification such as node ID, it returns the 
> jython unicode string (u'xxxx) instead of regular string in jython 2.5.2. 
> Does the behavior change in v2.5.2?  

Yes, jython string handling has changed between 2.1 and 2.5, in that unicode strings are now the default string type.

> Is any way to prevent this? 

I believe that the problem and solution is in IBMs wsdamin code, which I don't think is open source, and so we can't examine or change it. You could try contacting IBM about this.

However, why would you want to prevent it? What problems is it causing for you?

There are simple ways to work around this kind of issue. Give us an example of the problem you face?

For example, if you want to restrict your processing to iso-8859-1 strings (which is possibly what you're expecting), then run this operation on every string before you process it.

try:
    my_string = ws_unicode_string.encode('iso-8859-1')
except UnicodeEncodeError:
    print "Ouch! That string contained funny characters!"
    raise
msg6616 (view) Author: (amylin) Date: 2011-08-26.21:32:22
Hi, Alan,

Thanks for your quick answer.  We are about to upgrade jython version from 
2.1 to 2.5.2 in WebSphere version 8.5 and I am investigating any breaking 
change (behavior change) when upgrade to v2.5.2.  I think that this is 
just one of behaviors change.   We want to prevent this since customers 
may complain the output type change and it may also break customer scripts 
if they parse the output string .   It can be resolved in wsadmin code, 
but do you know any other behavior/breaking change in jython 2.5.2 such as 
built-in function or name space change? 

I also like to confirm about the jython cache.  I have chatted with Frank 
Wierzbicki a while ago and he told me that jython 2.2 or higher version 
does not require to build the cachedir.   We have been complained the 
wsadmin startup performance and jython *sys-package-mgr* messages shown in 
console when first use of jython in wsadmin because it takes time for 
jython to create all packages/jars to cachedir.   Can I simply set 
"python.cachedir.skip" property in wsadmin code to get rid of building the 
cache?   Will it cause any problem without building cache during 
initialization?  For example, if I like to import some java or Websphere 
package/class in jython. 

Thanks.

Amy Lin
WebSphere Scripting/ConfigService Development lead
amylin@us.ibm.com
Phone: 286-7245, T/L: 363-7245

From:
Alan Kennedy <report@bugs.jython.org>
To:
Amy Lin/Austin/IBM@IBMUS
Date:
08/26/2011 03:34 PM
Subject:
[issue1792] Jython returns unicode string when using Jython 2.5.2 in 
WebSphere Application Server

Alan Kennedy <jython-dev@xhaus.com> added the comment:

list of 
> WebSphere configuration object identification such as node ID, it 
returns the 
> jython unicode string (u'xxxx) instead of regular string in jython 
2.5.2. 
> Does the behavior change in v2.5.2? 

Yes, jython string handling has changed between 2.1 and 2.5, in that 
unicode strings are now the default string type.

> Is any way to prevent this? 

I believe that the problem and solution is in IBMs wsdamin code, which I 
don't think is open source, and so we can't examine or change it. You 
could try contacting IBM about this.

However, why would you want to prevent it? What problems is it causing for 
you?

There are simple ways to work around this kind of issue. Give us an 
example of the problem you face?

For example, if you want to restrict your processing to iso-8859-1 strings 
(which is possibly what you're expecting), then run this operation on 
every string before you process it.

try:
    my_string = ws_unicode_string.encode('iso-8859-1')
except UnicodeEncodeError:
    print "Ouch! That string contained funny characters!"
    raise

----------
nosy: +amak

_______________________________________
Jython tracker <report@bugs.jython.org>
<http://bugs.jython.org/issue1792>
_______________________________________
msg6617 (view) Author: Alan Kennedy (amak) Date: 2011-08-26.22:21:16
> Thanks for your quick answer.  We are about to upgrade jython version from 
> 2.1 to 2.5.2 in WebSphere version 8.5 and I am investigating any breaking 
> change (behavior change) when upgrade to v2.5.2.  I think that this is 
> just one of behaviors change.   We want to prevent this since customers 
> may complain the output type change and it may also break customer scripts 
> if they parse the output string .   It can be resolved in wsadmin code, 
> but do you know any other behavior/breaking change in jython 2.5.2 such as 
> built-in function or name space change? 

OK.

I think string types is the only change you need to worry about. But I also think you should post a question to the jython-dev list about other potential issues, to be certain. I'll post that question for you if you wish.

Here is some information about the string changes.

Since python/jython is a "duck typing" language, users who try to carry out string operations on string data types will not notice a difference, because all operations on PyString should work exactly the same on PyUnicode.

e.g.

>>> s = u"hello world"
>>> t = "hello world"
>>> s.encode("iso-8859-1")
'hello world'
>>> t.encode("iso-8859-1")
'hello world'

However, users who are carrying out type checking will have code breakage, e.g.

>>> s = u"hello world"
>>> isinstance(s, str)
False
>>> isinstance(s, unicode)
True

However, one way for them to code around this is as follows

>>> isinstance(s, (str, unicode))
True

Also, they will have breakage if they do this

>>> import types
>>> type (s) is types.StringType
False
>>> type (s) is types.UnicodeType
True

And they should change their code to this

>>> type (s) in types.StringTypes
True

>>> isinstance(s, types.StringTypes)
True

The fundamental problem is cpython compatibility. Jython has *always* done unicode strings, because java strings are unicode. But from a typing POV, we had to be compatible with cpython. That's the only reason why we have separate 'str' and 'unicode' types in jython.

Since I see you work for IBM, you can 

A: Prevent code breakage by converting everything to 'str' before you return it, e.g.

>>> t = str(s)
>>> t
'hello'
>>> isinstance(t, str)
True
>>> isinstance(t, unicode)
False

The 'str' type has all the same capabilities as the 'unicode' type

>>> t.encode('iso-8859-1')
'hello'
>>> dir (s)
['__add__', '__class__', '__cmp__', '__contains__', '__delattr__', '__doc__', '__eq__', '__getattrib
ute__', '__getitem__', '__getnewargs__', '__getslice__', '__hash__', '__init__', '__len__', '__mod__
', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr
__', '__str__', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find
', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'islower', 'isnumeric', 'isspace', 'istitl
e', 'isunicode', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'ri
ndex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swa
pcase', 'title', 'translate', 'upper', 'zfill']
>>> dir (t)
['__add__', '__class__', '__cmp__', '__contains__', '__delattr__', '__doc__', '__eq__', '__ge__', '_
_getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__',
 '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_e
x__', '__repr__', '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count', 'decode', '
encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'i
slower', 'isnumeric', 'isspace', 'istitle', 'isunicode', 'isupper', 'join', 'ljust', 'lower', 'lstri
p', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', '
splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

B: Force your users to update their code to reflect the new 'unicode' type name. This is a simple code change, and will be easy for them to carry out. If they have adequate unit-testing ;-) This is the option you should select if you want their code to run also correctly under modern cpython, ironpython or pypy. (websphere.Net anybody?)

> I also like to confirm about the jython cache.  I have chatted with Frank 
> Wierzbicki a while ago and he told me that jython 2.2 or higher version 
> does not require to build the cachedir.   We have been complained the 
> wsadmin startup performance and jython *sys-package-mgr* messages shown in 
> console when first use of jython in wsadmin because it takes time for 
> jython to create all packages/jars to cachedir.   Can I simply set 
> "python.cachedir.skip" property in wsadmin code to get rid of building the 
> cache?   Will it cause any problem without building cache during 
> initialization?  For example, if I like to import some java or Websphere 
> package/class in jython. 

The package cache is literally that: a cache. 

There is a necessary process of building meta-data structures for all java packages that will be used with jython. This information *must* be available for jython to be able to use the packages.

Because this can be a time-consuming process, taking up to 10 or 20 seconds, depending on the number of packages to be processed, the information is cached, to speed future jython invocations. The length of time taken depends on the number of packages in the CLASSPATH.

If the caching is disabled, it will just mean slower invocations *every* time, because all of the packages will have to be scanned on *every* startup.

But it will still operate correctly: the only thing that will suffer is startup time.

If the caching is enabled, the scanning takes place once: every future invocation will be quicker because of the caching.

We're straying off your original bug report now: if you have any further questions, please post them to jython-users or jython-dev.

Alan.
msg6677 (view) Author: Alan Kennedy (amak) Date: 2011-10-15.10:49:30
This is a websphere issue, not a jython one.
History
Date User Action Args
2011-10-15 10:49:30amaksetstatus: open -> closed
assignee: amak
resolution: wont fix
messages: + msg6677
2011-08-26 22:21:16amaksetmessages: + msg6617
2011-08-26 21:32:22amylinsetfiles: + unnamed
nosy: + amylin
messages: + msg6616
2011-08-26 20:33:57amaksetnosy: + amak
messages: + msg6615
2011-08-22 21:00:16amyhlincreate