Issue1031

classification

Title:	PyDictionaryDerived.__cmp__ is not fully CPython-compatible
Type:	behaviour	Severity:	normal
Components:	Core	Versions:
		Milestone:

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:	fwierzbicki	Nosy List:	fwierzbicki, leosoto, pjenvey, zyasoft
Priority:	low	Keywords:

Created on 2008-05-02.16:07:30 by leosoto, last changed 2015-04-19.22:32:25 by zyasoft.

Files
File name	Uploaded	Description	Edit	Remove
derived_cmp_fix.patch	leosoto, 2008-05-19.03:27:39
test_bug1031.patch	leosoto, 2008-05-19.03:29:36

Messages
msg3173 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-05-02.16:07:30
Basically, dict-derived classes shouldn't raise TypeError when being compared with non-dictionary instances: >>> class dictderived(dict): pass ... >>> dictderived() == '' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: dict.__cmp__(x,y) requires y to be 'dict', not a 'str' The compatible behavior in this case is to check if __cmp__ has been overridden by the dict subclass. If not, the "default comparison" (by class name) should be done. [A more complete description of my findings is on http://blog.leosoto.com/2008/04/python-comparison-weirdness.html, but I think that creating a TestCase reflecting what's written there is better than pasting it here]
msg3174 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-05-02.16:21:52
BTW: The following issues are closely related: - #1889394 : PyUnicodeDerived.__cmp__ seems to have the same problem - #1869347 : PyLongDerived.__cmp__ too. The PyLongDerived case seems equivalent to the PyDictionaryDerived one. But the PyUnicodeDerived case is special, because on CPython unicode has no __cmp__. We also have #1804011, which reports a more specific problem with dictderived() == {}. But it seems already solved on trunk.
msg3181 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-05-03.02:15:14
OK, here is a test case, meant for including it on the test_dict_jy.py: class DictCmpTest(unittest.TestCase): def testDictCmp(self): # 'Implicit' comparision of dicts against other types instances # shouldn't raise exception: self.assertNotEqual({}, '') # The same, but explicitly calling __cmp__ should raise TypeError: self.assertRaises(TypeError, {}.__cmp__, '') def testDictDerivedCmp(self): # With derived classes that doesn't override __cmp__, the behaviour # should be the same that with dicts: class derived_dict(dict): pass self.assertNotEqual(derived_dict(), '') self.assertRaises(TypeError, derived_dict().__cmp__, '') # But, if they override __cmp__ and raise TypeError from there, we # have exception raised when checking for equality... class non_comparable_dict(dict): def __cmp__(self, other): raise TypeError, "I always raise TypeError" self.assertRaises(TypeError, lambda: non_comparable_dict() == '') self.assertRaises(TypeError, non_comparable_dict().__cmp__, '') # The same happens even if the overridden __cmp__ doesn't nothing apart # from calling super: class dummy_dict_with_cmp(dict): def __cmp__(self, other): return super(dummy_dict_with_cmp, self).__cmp__(other) self.assertEqual(dummy_dict_with_cmp(), {}) # But TypeError is raised when comparing against other types self.assertRaises(TypeError, lambda: dummy_dict_with_cmp() == '') self.assertRaises(TypeError, dummy_dict_with_cmp().__cmp__, '') # Finally, the Python implementation shouldn't be tricked by not # implementing __cmp__ on the actual type of the dict-derived instance, # but implementing it on a superclass. class derived_dict_with_custom_cmp(dict): def __cmp__(self, other): return 0 class yet_another_dict(derived_dict_with_custom_cmp): pass self.assertEqual(derived_dict_with_custom_cmp(), '') self.assertEqual(yet_another_dict(), '')
msg3182 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-05-03.02:42:35
And here is a change to the PyDerivedDict#__cmp___ that make us pass the TestSuite. public int __cmp__(PyObject other) { PyType self_type=getType(); PyType[] where_type = new PyType[1]; PyObject impl = self_type.lookup_where("__cmp__", where_type); // Full Compatibility with CPython __cmp__: // If the derived type don't override __cmp__, the // internal PyDictionary#__cmp__ should be called, not the exposed // one. The difference is that the exposed dict.__cmp__ throws a // TypeError if the argument is an instance of dict. // // BTW, impl should never be null, because, if __cmp__ has not been // overidden, it should be the exposed PyDictionary#dict___cmp__. // But checking for that won't hurt. if (impl == null \|\| where_type[0] == PyDictionary.TYPE) { return super.__cmp__(other); } PyObject res=impl.__get__(this,self_type).__call__(other); if (res instanceof PyInteger) { int v=((PyInteger)res).getValue(); return v<0?-1:v>0?1:0; } throw Py.TypeError("__cmp__ should return a int"); } That was before I was told that the method was generated by gderived.py. Now I don't know if I should adapt the object.derived template to generate similar code, or not. Thoughts?
msg3183 (view)	Author: Frank Wierzbicki (fwierzbicki)	Date: 2008-05-03.03:27:12
I think before we modify object.derived, we'd want to know how all of the *Derived objects should behave. Seems like it is a little ugly already... but once we have an idea about the rest -- maybe that would be the best place to fix these problems. If it turns out that many of the cases are different, we could consider removing __cmp__ from object.derived and hand crafting them for each of the .derived templates - but I hope it doesn't come to that. It would be nice if we only needed to change object.derived.
msg3188 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-05-12.16:16:05
OK. Here is the CPython behaviour for cmp(v, w), from Objects/object.c:do_cmp. Here tp_compare is the C function that implements three-way comparison, similar to our PyObject#__cmp__. It is a series of strategies, tried one after the other: - if the types are the same, use the tp_compare function of the type - use (and adapt) rich comparison if possible (details later) - use 3-way comparison if posible (details later) - use the "default" 3-way comparison (pointer address if same type, class name if different types, considering numbers as are always smaller than objects of other types, and None as the smallest thing ever) The second attempt there was using rich comparison. PEP 207 have the details of the adaptation to end with a series of single rich comparison operation (such as __eq__, __lt__, ...). Each of these single operations proceed as the following (from objects.c:try_rich_compare): - if the second argument is a subtype of the first, and have tp_richcompare, call it (swapping the arguments), and use the result unless it is NotImplemented. - if the first argument have tp_richcompare, call it and use the result, unless it is NotImplemented. - if the second argument have tp_richcompare, call it (swapping the arguments) and use the result, unless it is NotImplemented. - return NotImplemented. Finally, three way comparison (the third trial of the first recipe) acts as the following (from objects.c:try_3way_compare): - use the __cmp__ of the first argument that is an old-style instance and implements __cmp__. - if both tp_compares are the same, use it. - use the __cmp__ of the first argument that is a new-style instance and implements[*] __cmp__. - coerce the arguments and, if the coerced types have the same tp_compare, use it. Once we already have all this mess in place, it is easy to show how an externally called rich comparison works. That is, when the comparison operators (==, !=, <, ...) are used: - Try rich comparison, as detailed on the second recipe, and use its return value unless it is NotImplemented. - Use three way comparison, as detailed on the third recipe, and adapt the result. That's from objects.c:do_richcmp. Now I'm going to take a close look at the Jython implementation hoping to spot what's different there and how these differences can lead to incompatibilities in practice.
msg3192 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-05-19.03:27:38
As I'm going to focus on other things for the following weeks, I will postpone the deep look on jython internals (which aimed at finding other incompatibilities and/or corner cases, and a better way to fix the observed problem). By now, I have the attached patch, which (after executing gderived.py and doing the manual fixing of files outside org.python.core) make the dict test pass on trunk, and doesn't break any other test. The patch only modifies object.derived, because __cmp__ behavior is the same on every type.
msg3239 (view)	Author: Philip Jenvey (pjenvey)	Date: 2008-06-08.00:08:54
#1869347 appears to be this exact same issue
msg3240 (view)	Author: Philip Jenvey (pjenvey)	Date: 2008-06-08.00:11:28
#1804011 is also this, I believe
msg3318 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-07-02.21:43:29
Patch applied on r4841 (asm branch)
msg3398 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-08-04.22:53:05
For completeness, I've followed the __cmp__ code path on Jython and found that it has roughly the same structure as CPython, ie: - First, rich comparison is tried (on PyObject#_cmp itself) - Then, 3-way comparison (on PyObject#_cmp_unsafe) - Finally, the old "default" comparison (PyObject#_default_cmp) Where the main difference being that we start directly by rich comparison functions, instead of the old 3-way comparison for same types. This means that the following gives different results on CPython and Jython: >>> class A(object): ... def __eq__(self, other): return True ... def __cmp__(self, other): return 1 ... >>> cmp(A(), A()) CPython gives 1, while Jython gives 0. Interestingly, Jython is following http://docs.python.org/ref/customization.html closely here. Now, inside rich comparison we don't do the subtype check (first step of this particular recipe on CPython), and here we have another subtle difference: >>> class A(object): ... def __lt__(self, other): return True ... def __gt__(self, other): return False ... def __eq__(self, other): return False ... >>> class B(A): pass ... >>> cmp(A(), B()) Which outputs 1 on CPython and -1 on Jython. On the 3-way comparison step (PyObject#_cmp_unsafe -- no idea why the "unsafe" name), we don't check first for old-style instances, so this is also different: class A(object): def __cmp__(self, other): return 1 class B: def __cmp__(self, other): return 1 print cmp(A(), B()) [CPython: -1, Jython: 1] Finally, and also on PyObject#_cmp_unsafe, we don't try to coerce the arguments. This must be fixed ASAP, as it is making test_coercion fail on the asm branch. I don't see yet how we are going to mimic CPython here, because we can't easily check that two PyObjects have the same __cmp__ implementation. Maybe I just need to understand why CPython puts that restriction instead of trying coerced_v.__cmp__(coerced_w) and then -coerced_w.__cmp__(coerced_v) if the first attempt doesn't work.
msg3399 (view)	Author: Philip Jenvey (pjenvey)	Date: 2008-08-04.23:21:04
we match pypy's behavior for the latter 2 tests So example 1: CPython 1 Jython 0 Pypy 1 2: CPython 1 Jython -1 Pypy -1 3: CPython -1 Jython 1 Pypy 1
msg3404 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-08-05.20:00:54
Looks like the "subtype checking to start rich-cmp with the second arg" [] behavior can be justified as being consistent with what http://docs.python.org/ref/coercion-rules.html says about binary operators: "Exception to the previous item: if the left operand is an instance of a built-in type or a new-style class, and the right operand is an instance of a proper subclass of that type or class and overrides the base's __rop__() method, the right operand's __rop__() method is tried before the left operand's __op__() method. " "Interestingly", CPython only does this for rich-cmp, but not for old __cmp__. [] To avoid refering the examples by meaningless ordinals, I'm going to call the three previous examples from now on as: - first example == "__cmp__ first on the same type" - second example == "subtype checking to start rich-cmp with the second arg" - third example == "start with old-style instances __cmp__"
msg3413 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-08-10.20:53:33
Coercion was addded to cmp() on r5133.
msg3956 (view)	Author: Frank Wierzbicki (fwierzbicki)	Date: 2008-12-17.19:48:28
Can this be closed or do issues remain?
msg3972 (view)	Author: Leonardo Soto (leosoto)	Date: 2008-12-21.00:39:10
The three differences I pointed out on msg3398 still exist. Only coercion was fixed. OTOH, I don't know of any real code which is being hit by this problem on Jython. Maybe we should open a new issue for each case and try to fix them individually if they are worth the effort.
msg9897 (view)	Author: Jim Baker (zyasoft)	Date: 2015-04-19.22:32:25
Closing this out. The original problem in msg3173 is fixed, and we are arguably correct in msg3398

History
Date	User	Action	Args
2015-04-19 22:32:25	zyasoft	set	status: open -> closed resolution: remind -> out of date messages: + msg9897 nosy: + zyasoft
2013-02-27 17:19:46	fwierzbicki	set	priority: normal -> low assignee: leosoto -> fwierzbicki resolution: remind
2009-03-14 14:25:37	fwierzbicki	set	priority: normal
2008-12-21 00:39:11	leosoto	set	messages: + msg3972
2008-12-17 19:48:28	fwierzbicki	set	messages: + msg3956
2008-08-10 20:53:34	leosoto	set	messages: + msg3413
2008-08-05 20:00:56	leosoto	set	messages: + msg3404
2008-08-04 23:21:05	pjenvey	set	messages: + msg3399
2008-08-04 22:53:06	leosoto	set	assignee: leosoto messages: + msg3398
2008-07-02 21:43:30	leosoto	set	messages: + msg3318
2008-06-08 00:11:28	pjenvey	set	messages: + msg3240
2008-06-08 00:08:54	pjenvey	set	messages: + msg3239
2008-06-08 00:08:32	pjenvey	link	issue1869347 dependencies
2008-05-19 03:29:36	leosoto	set	files: + test_bug1031.patch
2008-05-19 03:27:39	leosoto	set	files: + derived_cmp_fix.patch messages: + msg3192
2008-05-12 16:16:06	leosoto	set	messages: + msg3188
2008-05-05 05:02:52	pjenvey	set	nosy: + pjenvey
2008-05-03 03:27:13	fwierzbicki	set	messages: + msg3183
2008-05-03 02:42:36	leosoto	set	messages: + msg3182
2008-05-03 02:15:17	leosoto	set	messages: + msg3181
2008-05-02 20:22:03	fwierzbicki	set	nosy: + fwierzbicki
2008-05-02 16:21:52	leosoto	set	messages: + msg3174
2008-05-02 16:07:30	leosoto	create