Issue1767

classification
Title: Inconsistency between python and jython
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.7
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: fwierzbicki Nosy List: amak, egg, fwierzbicki, jeff.allen, juneau001, santa4nt, zyasoft
Priority: normal Keywords: patch

Created on 2011-07-08.19:24:58 by egg, last changed 2014-06-19.16:01:12 by santa4nt.

Files
File name Uploaded Description Edit Remove
IntDemo.py jeff.allen, 2011-08-27.14:28:55 Demonstrate issue 1767 with integer data.
Issue_1767_sort_inconsistent.patch jeff.allen, 2011-08-28.00:00:43 Patch addressing Issue1767
Messages
msg6564 (view) Author: Egg (egg) Date: 2011-07-08.19:25:36
'''
Demonstrates an apparent jython bug. If the __cmp__ method and the __eq__ method use different criteria
for equality (generally a bad idea, admittedly), the sorted results follow neither criterion, but
appear to be a mix of the two.

Under python these sort as intended, like:
    [1-03, 1-04, 1-06, 1-06, 1-07, 1-08, 1-08, 2-01, 2-01, 2-02, 2-02, 2-03, 2-07, 2-08, 2-08, 2-08, 3-04, 3-05, 3-07, 3-08, 3-08, 3-08, 3-09, 3-09, 3-10]

Under jython they sort in an apparently nonsensical way:
    [1-02, 1-03, 1-06, 1-07, 1-08, 1-08, 1-09, 1-09, 2-04, 2-10, 3-01, 3-03, 3-06, 3-06, 3-07, 3-08, 3-08, 2-08, 3-08, 1-08, 3-06, 1-06, 2-05, 3-09, 2-09]

'''
from random import randint

class tricky_object:
        
    def __init__(self):
        self.keyfield1 = "%1d" % (randint(1,3))
        self.keyfield2 = "%02d" % (randint(0,10))
        
    def __cmp__(self,other):
        if self.keyfield1 != other.keyfield1:
            return cmp(self.keyfield1,other.keyfield1)
        return cmp(self.keyfield2,other.keyfield2)
        
    def __eq__(self,other):
        return (self.keyfield2 == other.keyfield2)

    def __repr__(self):
        return "%s-%s" %(self.keyfield1,self.keyfield2)

tos = [tricky_object() for i in range(25)]
tos.sort()
print tos
msg6565 (view) Author: Egg (egg) Date: 2011-07-08.19:26:18
'''
Demonstrates an apparent jython bug. If the __cmp__ method and the __eq__ method use different criteria
for equality (generally a bad idea, admittedly), the sorted results follow neither criterion, but
appear to be a mix of the two.

Under python 2.5 these sort as intended, like:
    [1-03, 1-04, 1-06, 1-06, 1-07, 1-08, 1-08, 2-01, 2-01, 2-02, 2-02, 2-03, 2-07, 2-08, 2-08, 2-08, 3-04, 3-05, 3-07, 3-08, 3-08, 3-08, 3-09, 3-09, 3-10]

Under jython they sort in an apparently nonsensical way:
    [1-02, 1-03, 1-06, 1-07, 1-08, 1-08, 1-09, 1-09, 2-04, 2-10, 3-01, 3-03, 3-06, 3-06, 3-07, 3-08, 3-08, 2-08, 3-08, 1-08, 3-06, 1-06, 2-05, 3-09, 2-09]

'''
from random import randint

class tricky_object:
        
    def __init__(self):
        self.keyfield1 = "%1d" % (randint(1,3))
        self.keyfield2 = "%02d" % (randint(0,10))
        
    def __cmp__(self,other):
        if self.keyfield1 != other.keyfield1:
            return cmp(self.keyfield1,other.keyfield1)
        return cmp(self.keyfield2,other.keyfield2)
        
    def __eq__(self,other):
        return (self.keyfield2 == other.keyfield2)

    def __repr__(self):
        return "%s-%s" %(self.keyfield1,self.keyfield2)

tos = [tricky_object() for i in range(25)]
tos.sort()
print tos
msg6618 (view) Author: Jeff Allen (jeff.allen) Date: 2011-08-27.14:28:55
I can offer the following analysis to the project.

The attached file is my variation on the demonstration supplied, using fixed pairs of integers in place of the random strings, for simplicity and reproducibility.

The inconsistency between __eq__() and __cmp__() is a risky choice, but allowable within Python as I read it.

PEP 207 is clear that sorting in Python should only use the less-than operation. If a less-than operation is not explicitly defined by __lt__(), Python will define it implicitly. Its strategy is quite complex, but in the present case cPython does that via the user-defined __cmp__().

Jython makes use of java.util.Collections.sort in its implementation of PyList.sort, which (via Arrays.sort) applies a Comparator object. In the circumstances of the demonstration code, Jython supplies a custom Comparator object based on PyObject._cmp().

The implementation of _cmp() resorts first to __eq__(), returning zero if the result is True (non-zero). The _cmp() function then tries __lt__() and __gt__() including their reverse counterparts. Only if it still has no answer does it find its way to the user-defined __cmp__(). As the user has defined __eq__() in the class, this is the point at which things go wrong for those pairs of values, where key2 is the same.

Suppose we accept that the semantics of sorting in Jython should be exactly those of Python. The root of the problem is the use of Collections.sort, with different semantics. Two possible solutions are:
1. implement a sort utility distinct from the Java library.
2. define a (Java) Comparator via __lt__() rather than _cmp(). 

The logic of _cmp() and the other comparison operators in Jython is terribly complex, essentially undocumented, and has variants in a number of built-in types. Perhaps the complexity is necessary for the semantics of Python. It makes me wary of trying, but if the analysis is correct, the necessary change is localised. So I will give solution 2 some thought. A concern is that PyList.sort may not be the only sort affected by the issue.
msg6619 (view) Author: Jeff Allen (jeff.allen) Date: 2011-08-28.00:00:43
Here's a patch for you to consider.

I created this against the v2.6 tip (and "hg diff") but I have applied it successfully to a clean checkout of tag v2.5.2.

I have run the regression tests (ant regrtest), but even on a clean checkout, I get around 9 failures on my machine, related mostly to file i/o. The patched code does not perform any worse. All the tests obviously related to list and sorting run without error. I added a test to test_sort_jy.py based on the bug demonstration, which v2.5.2 fails and the patched code passes.
msg6629 (view) Author: Alan Kennedy (amak) Date: 2011-09-03.02:11:37
C:\>C:\jython252\jython.bat
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.5.0_21
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

[snip]
> Explicit is better than implicit.
[snip]
> Special cases aren't special enough to break the rules.
> Although practicality beats purity.
[snip]
> In the face of ambiguity, refuse the temptation to guess.

Jeff, you've carried out a detailed analysis of this problem. Is the users use case and assertion sufficient justification to modify jython?

"If the __cmp__ method and the __eq__ method us different criteria for equality (generally a bad idea, admittedly)"

Call it.

Alan.
msg6638 (view) Author: Jeff Allen (jeff.allen) Date: 2011-09-04.08:46:46
Thanks for the razors.

>> Explicit is better than implicit.
We're dealing with a built-in where the sort criterion is implicit in Python (at the call site). The required behavour is explicit in the documentation of Python, if somewhat scattered throughout.

>> Special cases aren't special enough to break the rules.
Rare is not the same as special. It is rare that one would have a reason to define __cmp__ inconsistently with the rich comparison operations including __eq__. But Python allows for it and specifies a behaviour. I see no case that Jython sorting should, as a special case, be similar to Java sorting instead, __cmp__ being somehow analagous to implementing Comparable.

>> Although practicality beats purity.
The proposed implementation seems as practical as the existing one.

>> In the face of ambiguity, refuse the temptation to guess.
It is unambiguous that sort should be based on "less-than", so I'm calling _lt in the code. We have other possible definitions of "less-than": __lt__(), _cmp()==-1 and __cmp__()==-1. I looked at the way they delegate to each other and I'm 95% sure I chose correctly. Is there a language lawyer in the house?

Python 2.x may have departed from its own Zen in this area. Python 3.x is back on track as __cmp__ and friends are withdrawn, probably in the interests of having one obvious way to do things.

>> Is the users use case and assertion sufficient justification to modify jython?

Jython promises to implement Python (with explicit derogations concerning threads and garbage collection). Isn't Egg's application, wise or not, just the means by which we discovered Jython does not keep this promise when it comes to sorting? So it's not a special plea to change the definition of Jython, rather a bug fix.

On the downside, there would be a risk of breaking existing code, but only code that defines __cmp__ and rich comparison inconsistently (rare), and depends on the current (unintended) behaviour. Since the behaviour you get is "apparently nonsensical", I doubt anyone has done that.

So I call "do it", but I'm open to:
1. learning that delegating to _lt() was the wrong way after all; and
2. feedback on the mechanics, as I'm new to contributing patches, etc..

Jeff
msg6737 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2011-12-01.01:13:54
I tend to prefer obsessive compatibility with CPython, so I'm inclined to call this a bug and get it fixed. Python 3 actually requires the use of __lt__ for sorting and actually bans the use of __cmp__ and so this will put us on the right track for the future. I still need to carefully read PEP 207 to make sure I get it fully.
msg6738 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2011-12-01.02:15:02
OK so this is going to take some time to evaluate fully, now that I've had a look at the comparison code, I'm reminded as to how difficult it is to follow. Python 2.x comparisons are painfully convoluted to say the least. I think this should still get fixed - but not in 2.5.x since this will change the current behavior and will need testing in a beta period to be sure that it doesn't have unintended side effects. I am still in the process of evaluating this.
msg7680 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2013-02-14.02:57:52
OK, I found a much simpler demonstration of this problem while trying to figure out why test_pprint wasn't working.

Python 2.7.3 (default, Aug  1 2012, 05:14:39) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = frozenset([frozenset([1,2]), frozenset([3,4])])
>>> sorted(x)
[frozenset([1, 2]), frozenset([3, 4])]

Jython 2.7b1 (default:d77503b7a755+, Feb 8 2013, 17:23:40) 
[OpenJDK 64-Bit Server VM (Oracle Corporation)] on java1.7.0_09
Type "help", "copyright", "credits" or "license" for more information.
>>> x = frozenset([frozenset([1,2]), frozenset([3,4])])
>>> sorted(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot compare sets using cmp()
msg8733 (view) Author: Jim Baker (zyasoft) Date: 2014-06-19.06:55:49
We should revisit Jeff's patch.

Incidentally I removed the implementation that Jython used to have of its own sort function, as ported from CPython. I do not recommend going back to that - we have enough bugs to fix ;)
History
Date User Action Args
2014-06-19 16:01:12santa4ntsetnosy: + santa4nt
2014-06-19 06:55:49zyasoftsetnosy: + zyasoft
messages: + msg8733
2013-02-20 00:07:59fwierzbickisetversions: + Jython 2.7, - 2.5.2rc
2013-02-14 02:57:52fwierzbickisetmessages: + msg7680
2012-08-10 21:12:22fwierzbickisetpriority: normal
2011-12-01 02:15:02fwierzbickisetresolution: accepted -> remind
messages: + msg6738
2011-12-01 01:13:55fwierzbickisetassignee: fwierzbicki
resolution: accepted
messages: + msg6737
2011-09-04 08:46:47jeff.allensetmessages: + msg6638
2011-09-03 02:11:38amaksetnosy: + amak
messages: + msg6629
2011-08-28 00:00:44jeff.allensetfiles: + Issue_1767_sort_inconsistent.patch
keywords: + patch
messages: + msg6619
2011-08-27 14:28:56jeff.allensetfiles: + IntDemo.py
nosy: + jeff.allen, fwierzbicki, juneau001
messages: + msg6618
2011-07-08 19:26:18eggsetmessages: + msg6565
2011-07-08 19:25:36eggsetmessages: + msg6564
2011-07-08 19:24:58eggcreate