Title: PyStringMap does not fully behave as a dict
Type: Severity: normal
Components: Core Versions: Jython 2.7
Milestone: Jython 2.7.2
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amak, fwierzbicki, kzuberi, leouserz, pedronis, qny31541, zyasoft
Priority: high Keywords:

Created on 2005-02-26.22:17:26 by anonymous, last changed 2017-09-24.16:13:50 by zyasoft.

msg952 (view) Author: Nobody/Anonymous (nobody) Date: 2005-02-26.22:17:26
When getting an object's __dict__, the type() of the  
dictionary object returns PyStringMap. This causes a  
problem because types.DictType does not match  
PyStringMap. Some existing Marshallers (in my case,  
xmlrpclib) expect an Instance's __dict__ to be a DictType  
when marshalling an Instance (such as an Exception).  
It looks like types.DictType should match  
org.python.core.PyStringMap. When getting the __dict__  
of an Instance in CPython, it returns a type of DictType.  
msg953 (view) Author: Deleted User leouserz (leouserz) Date: 2006-12-22.14:24:31
its possible that this could be fixed by just ditching the PyStringMap used internally and switching over to PyDictionary.  From experimenting with gutting PyStringMap and replacing its internal arrays and hashing with a HashMap, I was able to get an increase in performance.  Given that the Dictionary appears remarkable similiar to that implementation---> forwarding to its Hashtable(yuck), there may not be a performance reason to stick with the PyStringMap(assuming that is the reason that there is a PyStringMap).

msg954 (view) Author: Khalid Zuberi (kzuberi) Date: 2006-12-22.19:10:26
The only (little) help i can add is to note Samuel's recent reference to performance & PyStringMap:

- kz
msg955 (view) Author: Deleted User leouserz (leouserz) Date: 2006-12-22.19:30:25
hmmm, speed wise Im not sure, I guess it depends upon how quickly the PyString is going to return a hashCode call.  From gutting PyStringMap and replacing it with a Map that used the interned strings I saw a boost in performance on the test I was running.  So from that angle PyStringMap didn't seem that speedy. 

I would suspect that PyString would return as quickly as the String.  Its hashCode, hashes the internal string and caches the value.  So I would expect equivilent behavior between the two.  Also, it looks like it should have a speedy equals method.  As long as the string is interned.  So I don't see any terrible issues using it as a key.

I think using a PyDictionary would make the instances more compliant with Python.  Given that I can take the dict from a Python instance and use non-strings for keys.

msg956 (view) Author: Samuele Pedroni (pedronis) Date: 2006-12-22.19:36:40
the issue is all the places that have an already interned String, not a PyString. String to PyString involve an allocation. Allocations are still costly.

Whether using hashCode vs. identityHashCode, it is well possible that the performace trade offs of the two have changed over time since 1.1. Implementing identityHashCode is not straightforward on moving gcs.
msg957 (view) Author: Deleted User leouserz (leouserz) Date: 2006-12-22.19:54:40
hmm, I was thinking about having a PyString cache.  Instead of calling new PyString("STRING OF SOMETHING") pass the call off to a factory method and have it return a cached PyString.  I was under the impression yesterday that PyStringMap was getting PyStrings anyway and that they were passing on their Strings.  So Im not sure how switching to PyDictionary is going to add any costs in this regard.

Yes, hashCode should be faster than System.identityHashCode().  Native methods add overhead that you won't ever see with a simple accessor method.  String just returns a newly calculated hashCode or a cached one.

The performance difference I saw yesterday may not even be centered around the difference between identityHashCode and hashCode, it may just be that the HashMap is more efficient in how it stores and retrieves things than PyStringMap.

msg958 (view) Author: Deleted User leouserz (leouserz) Date: 2006-12-22.20:38:33
yup, I did some fiddling with PyJavaClass so that it used a PyDictionary instead of a PyStringMap.   Performance wise, it improved but not to the degree that it improved with PyStringMap.  Even having the Strings interned in the PyDictionary did not give us as big a boost as PyStringMap did.  This may just mean that PyDictionary could use some additional tweaking.  Swapping in a HashMap will help a little as there will be less lock acquisition going on.  But I can't believe that is the key to the better performance I was seeing.

msg959 (view) Author: Deleted User leouserz (leouserz) Date: 2006-12-22.20:48:21
aha, PyStringMap does have a magic method,
__finditem__(String data)

this gets invoked first, and if we go directly to the table in PyDictionary we see a pretty good boost in performance there.  I guess the default __finditem__ method of PyStringMap is less performant than PyDictionary's __finditem__ chain.

msg960 (view) Author: Samuele Pedroni (pedronis) Date: 2006-12-22.20:58:24
notice that we want the synchronization. Although it makes no strong promises about what happens
with implementations withouh the GIL, Python style is influenced by the presence of the GIL in CPython
this means that builtin types should have an "atomic" behavior.

Now the are dissenting opionions ( on this but the bottom line (because it has happened) is that if we
miss some synchronized related to some of the listed ops, someone will ends up filing a bug because some code is not behaving like on CPython. We had this kind of reports from experienced Pythoneers (for example some twisted contributors), and telling them to add more locks themself doesn't really work or scale in practice, because is too
annoying especially if the code is to run on top CPython primarely.
msg961 (view) Author: Deleted User leouserz (leouserz) Date: 2006-12-22.21:16:04
I think we may be able to get away with a ConcurrentHashMap.  It should offer better scalability than the one lock per table that comes with the Hashtable and is safer than the HashMap which doesn't have any syncronization.  Im seeing roughly the same golden times I was seeing yesterday with a PyDictionary that has had its __finditem__(String) overriden and its Hashtable replaced with a ConcurrentHashMap.

msg962 (view) Author: Deleted User leouserz (leouserz) Date: 2006-12-22.21:36:27
it may be more scalable with threads but it is not more scalable with memory.  ConcurrentHashMap appears to be a hog in terms of what it consumes.  That nixes is for general mass usage.

msg963 (view) Author: Alan Kennedy (amak) Date: 2007-05-19.18:24:33
Could solving this problem be as simple as changing the org/python/modules/ to read like this

dict.__setitem__("DictType", new PyTuple(new PyObject[] {

The isinstance operator takes a tuple as a parameter, as can be seen in the definition for StringTypes.
msg964 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2007-05-19.19:56:10
I don't think making isinstance of PyStringMap return DictType makes sense, since it does not implement the full interface of a DictType, for example, it cannot take non-strings as keys while one would expect something of type DictType to behave that way.
msg4191 (view) Author: Jim Baker (zyasoft) Date: 2009-03-08.05:15:03
Given that PyStringMap proxies ConcurrentMap<Object, PyObject> (in order
to handle both interned String and PyObject instances), we should be
able to implement the ConcurrentMap interface just like PyDictionary.
Then we can apply Alan's fix.

Here are the missing methods from ConcurrentMap. We may also have some
methods necessary for dict usage, although I didn't see any (what tests
specifically test all of __dict__ ?)

    public Object putIfAbsent(Object key, Object value) 
    public boolean remove(Object key, Object value) {
    public boolean replace(Object key, Object oldValue, Object     value)
    public Object replace(Object key, Object value) 
    public int size()
    public boolean isEmpty()
    public boolean containsKey(Object key)
    public boolean containsValue(Object value)
    public Object get(Object key)
    public Object put(Object key, Object value)
    public Object remove(Object key)
    public void putAll(Map t)
    public Set keySet()
    public Set entrySet()
msg8859 (view) Author: Jim Baker (zyasoft) Date: 2014-07-01.17:40:11
PyStringMap now pretends to be a dict, however, there are a number of missing methods compared to a standard dict. This does not appear to impact xmlrpclib, which was the original reason this bug was posted.
msg9026 (view) Author: Jim Baker (zyasoft) Date: 2014-09-24.18:18:46
Target beta 4 - we need to close this ancient bug out
msg9027 (view) Author: Jim Baker (zyasoft) Date: 2014-09-24.18:26:45
In particular, we should employ the same strategy we did with #1631 and use much more robust testing from test_dict. (I don't think CPython did this for __dict__ specifically because they are the same underlying implementation, unlike our almost the same now, but not quite PyStringMap vs PyDictionary.)

$ python
Python 2.7.5 (default, Mar  9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo(object): pass
>>> Foo.__dict__.viewvalues()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dictproxy' object has no attribute 'viewvalues'
>>> Foo().__dict__.viewvalues()
>>> ^D
$ jython27
Jython 2.7b3+ (default:0fdc65448925, Sep 24 2014, 10:30:07)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo(object): pass
>>> Foo().__dict__.viewvalues()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'stringmap' object has no attribute 'viewvalues'
msg9036 (view) Author: Jim Baker (zyasoft) Date: 2014-09-26.04:59:04
Changing to high given the possibility of impacting metaprogramming support
msg9356 (view) Author: Jim Baker (zyasoft) Date: 2015-01-08.04:22:02
In particular, this is just the sort of bug that might impact projects like Twisted, which is said to work now in some form on Jython, but we cannot currently run its test suite due to some problem with trial
msg10391 (view) Author: Jim Baker (zyasoft) Date: 2015-10-27.22:48:08
With now implementing viewkeys/viewitems/viewvalues for __dict__, we are very close to closing out this bug. But first we should implement comprehensive testing, per msg9027
msg10573 (view) Author: Jim Baker (zyasoft) Date: 2015-12-29.23:56:44
Deferring such tests for the time being
msg11582 (view) Author: Jim Baker (zyasoft) Date: 2017-09-09.22:40:33
We can always write more tests, but so far everything looks like AbstractDict works well, as seen in the latest JSON support for __dict__. Closing out.
Date User Action Args
2017-09-24 16:13:50zyasoftsetstatus: pending -> closed
2017-09-09 22:40:34zyasoftsetstatus: open -> pending
resolution: fixed
messages: + msg11582
2015-12-29 23:56:44zyasoftsetmessages: + msg10573
milestone: Jython 2.7.1 -> Jython 2.7.2
2015-10-27 22:48:09zyasoftsetmessages: + msg10391
2015-04-15 20:42:08zyasoftsetassignee: zyasoft ->
milestone: Jython 2.7.1
2015-03-25 21:42:20qny31541setnosy: + qny31541
2015-01-08 04:22:02zyasoftsetmessages: + msg9356
2014-09-26 04:59:04zyasoftsetpriority: normal -> high
assignee: zyasoft
messages: + msg9036
2014-09-24 18:29:42zyasoftsetpriority: low -> normal
2014-09-24 18:28:56zyasoftsettitle: vars(obj) returns PyStringMap instead of DictType -> PyStringMap does not fully behave as a dict
2014-09-24 18:26:46zyasoftsetmessages: + msg9027
2014-09-24 18:18:46zyasoftsetmessages: + msg9026
2014-07-01 17:40:11zyasoftsetmessages: + msg8859
2013-02-26 21:52:19fwierzbickisetversions: + Jython 2.7
2009-03-08 05:15:04zyasoftsetnosy: + zyasoft
messages: + msg4191
2005-02-26 22:17:26anonymouscreate