Issue1031

classification
Title: PyDictionaryDerived.__cmp__ is not fully CPython-compatible
Type: behaviour Severity: normal
Components: Core Versions:
Milestone:
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: fwierzbicki Nosy List: fwierzbicki, leosoto, pjenvey, zyasoft
Priority: low Keywords:

Created on 2008-05-02.16:07:30 by leosoto, last changed 2015-04-19.22:32:25 by zyasoft.

Files
File name Uploaded Description Edit Remove
derived_cmp_fix.patch leosoto, 2008-05-19.03:27:39
test_bug1031.patch leosoto, 2008-05-19.03:29:36
Messages
msg3173 (view) Author: Leonardo Soto (leosoto) Date: 2008-05-02.16:07:30
Basically, dict-derived classes shouldn't raise TypeError when being
compared with non-dictionary instances:

>>> class dictderived(dict): pass
...
>>> dictderived() == ''

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: dict.__cmp__(x,y) requires y to be 'dict', not a 'str'

The compatible behavior in this case is to check if __cmp__ has been
overridden by the dict subclass. If not, the "default comparison" (by
class name) should be done. 

[A more complete description of my findings is on
http://blog.leosoto.com/2008/04/python-comparison-weirdness.html, but I
think that creating a TestCase reflecting what's written there is better
than pasting it here]
msg3174 (view) Author: Leonardo Soto (leosoto) Date: 2008-05-02.16:21:52
BTW: The following issues are closely related:

- #1889394 : PyUnicodeDerived.__cmp__ seems to have the same problem
- #1869347 : PyLongDerived.__cmp__ too.

The PyLongDerived case seems equivalent to the PyDictionaryDerived one.
But the PyUnicodeDerived case is special, because on CPython unicode has
no __cmp__. 

We also have #1804011, which reports a more specific problem with
dictderived() == {}. But it seems already solved on trunk.
msg3181 (view) Author: Leonardo Soto (leosoto) Date: 2008-05-03.02:15:14
OK, here is a test case, meant for including it on the test_dict_jy.py:

class DictCmpTest(unittest.TestCase):
    def testDictCmp(self):
        # 'Implicit' comparision of dicts against other types instances
        # shouldn't raise exception:
        self.assertNotEqual({}, '')
        # The same, but explicitly calling __cmp__ should raise TypeError:
        self.assertRaises(TypeError, {}.__cmp__, '')
    def testDictDerivedCmp(self):
        # With derived classes that doesn't override __cmp__, the behaviour
        # should be the same that with dicts:
        class derived_dict(dict): pass
        self.assertNotEqual(derived_dict(), '')
        self.assertRaises(TypeError, derived_dict().__cmp__, '')
        # But, if they *override* __cmp__ and raise TypeError from there, we
        # have exception raised when checking for equality...
        class non_comparable_dict(dict):
            def __cmp__(self, other):
                raise TypeError, "I always raise TypeError"
        self.assertRaises(TypeError, lambda: non_comparable_dict() == '')
        self.assertRaises(TypeError, non_comparable_dict().__cmp__, '')

        # The same happens even if the overridden __cmp__ doesn't
nothing apart
        # from calling super:
        class dummy_dict_with_cmp(dict):
            def __cmp__(self, other):
                return super(dummy_dict_with_cmp, self).__cmp__(other)

        self.assertEqual(dummy_dict_with_cmp(), {})
        # But TypeError is raised when comparing against other types
        self.assertRaises(TypeError, lambda: dummy_dict_with_cmp() == '')
        self.assertRaises(TypeError, dummy_dict_with_cmp().__cmp__, '')
        # Finally, the Python implementation shouldn't be tricked by not
        # implementing __cmp__ on the actual type of the dict-derived
instance,
        # but implementing it on a superclass.
        class derived_dict_with_custom_cmp(dict):
            def __cmp__(self, other):
                return 0
        class yet_another_dict(derived_dict_with_custom_cmp): pass
        self.assertEqual(derived_dict_with_custom_cmp(), '')
        self.assertEqual(yet_another_dict(), '')
msg3182 (view) Author: Leonardo Soto (leosoto) Date: 2008-05-03.02:42:35
And here is a change to the PyDerivedDict#__cmp___ that make us pass the
TestSuite. 

    public int __cmp__(PyObject other) {
        PyType self_type=getType();
        PyType[] where_type = new PyType[1];
        PyObject impl = self_type.lookup_where("__cmp__", where_type);
        // Full Compatibility with CPython __cmp__:
        // If the derived type don't override __cmp__, the 
        // *internal* PyDictionary#__cmp__ should be called, not the
exposed 
        //  one. The difference is that the exposed dict.__cmp__ throws a 
        // TypeError if the argument is an instance of dict.   
        //  
        // BTW, impl should never be null, because, if __cmp__ has not been 
        // overidden,  it should be the *exposed*
PyDictionary#dict___cmp__. 
        // But checking for that won't hurt.
        if (impl == null || where_type[0] == PyDictionary.TYPE) {
        	return super.__cmp__(other);
        }
        PyObject res=impl.__get__(this,self_type).__call__(other);
        if (res instanceof PyInteger) {
            int v=((PyInteger)res).getValue();
            return v<0?-1:v>0?1:0;
        }
        throw Py.TypeError("__cmp__ should return a int");
    }

That was before I was told that the method was generated by gderived.py.
Now I don't know if I should adapt the object.derived template to
generate similar code, or not. Thoughts?
msg3183 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2008-05-03.03:27:12
I think before we modify object.derived, we'd want to know how all of
the *Derived objects should behave.  Seems like it is a little ugly
already... but once we have an idea about the rest -- maybe that would
be the best place to fix these problems.  If it turns out that many of
the cases are different, we could consider removing __cmp__ from
object.derived and hand crafting them for each of the .derived templates
- but I hope it doesn't come to that.  It would be nice if we only
needed to change object.derived.
msg3188 (view) Author: Leonardo Soto (leosoto) Date: 2008-05-12.16:16:05
OK. Here is the CPython behaviour for cmp(v, w), from
Objects/object.c:do_cmp. Here tp_compare is the C function that
implements three-way comparison, similar to our PyObject#__cmp__. It is
a series of strategies, tried one after the other:

  - if the types are the same, use the tp_compare function of the type
  - use (and adapt) rich comparison if possible (details later)
  - use 3-way comparison if posible (details later)
  - use the "default" 3-way comparison (pointer address if same type,
class name if different types, considering numbers as are always smaller
than objects of other types, and None as the smallest thing ever)

The second attempt there was using rich comparison. PEP 207 have the
details of the adaptation to end with a series of single rich comparison
operation (such as __eq__, __lt__, ...). Each of these single operations
 proceed as the following (from objects.c:try_rich_compare):

  - if the second argument is a subtype of the first, and have
tp_richcompare, call it (swapping the arguments), and use the result
unless it is NotImplemented.
  - if the first argument have tp_richcompare, call it and use the
result, unless it is NotImplemented.
  - if the second argument have tp_richcompare, call it (swapping the
arguments) and use the result, unless it is NotImplemented.
  - return NotImplemented.

Finally, three way comparison (the third trial of the first recipe) acts
as the following (from objects.c:try_3way_compare):

  - use the __cmp__ of the first argument that is an old-style instance
and implements __cmp__.
  - if both tp_compares are the same, use it.
  - use the __cmp__ of the first argument that is a new-style instance
and implements[*] __cmp__.
  - coerce the arguments and, if the coerced types have the same
tp_compare, use it.

Once we already have all this mess in place, it is easy to show how an
externally called rich comparison works. That is, when the comparison
operators (==, !=, <, ...) are used:

 - Try rich comparison, as detailed on the second recipe, and use its
return value unless it is NotImplemented.
 - Use three way comparison, as detailed on the third recipe, and adapt
the result.

That's from objects.c:do_richcmp.

Now I'm going to take a close look at the Jython implementation hoping
to spot what's different there and how these differences can lead to
incompatibilities in practice.
msg3192 (view) Author: Leonardo Soto (leosoto) Date: 2008-05-19.03:27:38
As I'm going to focus on other things for the following weeks, I will
postpone the deep look on jython internals (which aimed at finding other
incompatibilities and/or corner cases, and a better way to fix the
observed problem).

By now, I have the attached patch, which (after executing gderived.py
and doing the manual fixing of files outside org.python.core) make the
dict test pass on trunk, and doesn't break any other test. 

The patch only modifies object.derived, because __cmp__ behavior is the
same on every type.
msg3239 (view) Author: Philip Jenvey (pjenvey) Date: 2008-06-08.00:08:54
#1869347 appears to be this exact same issue
msg3240 (view) Author: Philip Jenvey (pjenvey) Date: 2008-06-08.00:11:28
#1804011 is also this, I believe
msg3318 (view) Author: Leonardo Soto (leosoto) Date: 2008-07-02.21:43:29
Patch applied on r4841 (asm branch)
msg3398 (view) Author: Leonardo Soto (leosoto) Date: 2008-08-04.22:53:05
For completeness, I've followed the __cmp__ code path on Jython and
found that it has roughly the same structure as CPython, ie:

 - First, rich comparison is tried (on PyObject#_cmp itself)
 - Then, 3-way comparison (on PyObject#_cmp_unsafe)
 - Finally, the old "default" comparison (PyObject#_default_cmp)

Where the main difference being that we start directly by rich
comparison functions, instead of the old 3-way comparison for same
types. This means that the following gives different results on CPython
and Jython:

>>> class A(object):
...   def __eq__(self, other): return True
...   def __cmp__(self, other): return 1
... 
>>> cmp(A(), A())

CPython gives 1, while Jython gives 0. Interestingly, Jython is
following http://docs.python.org/ref/customization.html closely here.

Now, inside rich comparison we don't do the subtype check (first step of
this particular recipe on CPython), and here we have another subtle
difference:

>>> class A(object):
...  def __lt__(self, other): return True
...  def __gt__(self, other): return False
...  def __eq__(self, other): return False
...
>>> class B(A): pass
...
>>> cmp(A(), B())

Which outputs 1 on CPython and -1 on Jython.

On the 3-way comparison step (PyObject#_cmp_unsafe -- no idea why the
"unsafe" name), we don't check first for old-style instances, so this is
also different:

class A(object):
    def __cmp__(self, other): return 1

class B:
    def __cmp__(self, other): return 1

print cmp(A(), B())

[CPython: -1, Jython: 1]

Finally, and also on PyObject#_cmp_unsafe, we don't try to coerce the
arguments. This must be fixed ASAP, as it is making test_coercion fail
on the asm branch. I don't see yet how we are going to mimic CPython
here, because we can't easily check that two PyObjects have the same
__cmp__ implementation. 

Maybe I just need to understand why CPython puts that restriction
instead of trying coerced_v.__cmp__(coerced_w) and then
-coerced_w.__cmp__(coerced_v) if the first attempt doesn't work.
msg3399 (view) Author: Philip Jenvey (pjenvey) Date: 2008-08-04.23:21:04
we match pypy's behavior for the latter 2 tests

So example 1:
CPython 1 Jython 0 Pypy 1

2:
CPython 1 Jython -1 Pypy -1

3:
CPython -1 Jython 1 Pypy 1
msg3404 (view) Author: Leonardo Soto (leosoto) Date: 2008-08-05.20:00:54
Looks like the "subtype checking to start rich-cmp with the second arg"
[*] behavior can be justified as being consistent with what
http://docs.python.org/ref/coercion-rules.html says about binary operators:

"Exception to the previous item: if the left operand is an instance of a
built-in type or a new-style class, and the right operand is an instance
of a proper subclass of that type or class and overrides the base's
__rop__() method, the right operand's __rop__() method is tried before
the left operand's __op__() method. "

"Interestingly", CPython only does this for rich-cmp, but not for old
__cmp__. 

[*] To avoid refering the examples by meaningless ordinals, I'm going to
call the three previous examples from now on as:

- first example == "__cmp__ first on the same type"
- second example == "subtype checking to start rich-cmp with the second arg"
- third example == "start with old-style instances __cmp__"
msg3413 (view) Author: Leonardo Soto (leosoto) Date: 2008-08-10.20:53:33
Coercion was addded to cmp() on r5133.
msg3956 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2008-12-17.19:48:28
Can this be closed or do issues remain?
msg3972 (view) Author: Leonardo Soto (leosoto) Date: 2008-12-21.00:39:10
The three differences I pointed out on msg3398 still exist. Only
coercion was fixed. 

OTOH, I don't know of any real code which is being hit by this problem
on Jython. Maybe we should open a new issue for each case and try to fix
them individually if they are worth the effort.
msg9897 (view) Author: Jim Baker (zyasoft) Date: 2015-04-19.22:32:25
Closing this out. The original problem in msg3173 is fixed, and we are arguably correct in msg3398
History
Date User Action Args
2015-04-19 22:32:25zyasoftsetstatus: open -> closed
resolution: remind -> out of date
messages: + msg9897
nosy: + zyasoft
2013-02-27 17:19:46fwierzbickisetpriority: normal -> low
assignee: leosoto -> fwierzbicki
resolution: remind
2009-03-14 14:25:37fwierzbickisetpriority: normal
2008-12-21 00:39:11leosotosetmessages: + msg3972
2008-12-17 19:48:28fwierzbickisetmessages: + msg3956
2008-08-10 20:53:34leosotosetmessages: + msg3413
2008-08-05 20:00:56leosotosetmessages: + msg3404
2008-08-04 23:21:05pjenveysetmessages: + msg3399
2008-08-04 22:53:06leosotosetassignee: leosoto
messages: + msg3398
2008-07-02 21:43:30leosotosetmessages: + msg3318
2008-06-08 00:11:28pjenveysetmessages: + msg3240
2008-06-08 00:08:54pjenveysetmessages: + msg3239
2008-06-08 00:08:32pjenveylinkissue1869347 dependencies
2008-05-19 03:29:36leosotosetfiles: + test_bug1031.patch
2008-05-19 03:27:39leosotosetfiles: + derived_cmp_fix.patch
messages: + msg3192
2008-05-12 16:16:06leosotosetmessages: + msg3188
2008-05-05 05:02:52pjenveysetnosy: + pjenvey
2008-05-03 03:27:13fwierzbickisetmessages: + msg3183
2008-05-03 02:42:36leosotosetmessages: + msg3182
2008-05-03 02:15:17leosotosetmessages: + msg3181
2008-05-02 20:22:03fwierzbickisetnosy: + fwierzbicki
2008-05-02 16:21:52leosotosetmessages: + msg3174
2008-05-02 16:07:30leosotocreate