Message6618

Author jeff.allen
Recipients egg, fwierzbicki, jeff.allen, juneau001
Date 2011-08-27.14:28:55
SpamBayes Score 2.3780977e-13
Marked as misclassified No
Message-id <1314455336.53.0.236785190316.issue1767@psf.upfronthosting.co.za>
In-reply-to
Content
I can offer the following analysis to the project.

The attached file is my variation on the demonstration supplied, using fixed pairs of integers in place of the random strings, for simplicity and reproducibility.

The inconsistency between __eq__() and __cmp__() is a risky choice, but allowable within Python as I read it.

PEP 207 is clear that sorting in Python should only use the less-than operation. If a less-than operation is not explicitly defined by __lt__(), Python will define it implicitly. Its strategy is quite complex, but in the present case cPython does that via the user-defined __cmp__().

Jython makes use of java.util.Collections.sort in its implementation of PyList.sort, which (via Arrays.sort) applies a Comparator object. In the circumstances of the demonstration code, Jython supplies a custom Comparator object based on PyObject._cmp().

The implementation of _cmp() resorts first to __eq__(), returning zero if the result is True (non-zero). The _cmp() function then tries __lt__() and __gt__() including their reverse counterparts. Only if it still has no answer does it find its way to the user-defined __cmp__(). As the user has defined __eq__() in the class, this is the point at which things go wrong for those pairs of values, where key2 is the same.

Suppose we accept that the semantics of sorting in Jython should be exactly those of Python. The root of the problem is the use of Collections.sort, with different semantics. Two possible solutions are:
1. implement a sort utility distinct from the Java library.
2. define a (Java) Comparator via __lt__() rather than _cmp(). 

The logic of _cmp() and the other comparison operators in Jython is terribly complex, essentially undocumented, and has variants in a number of built-in types. Perhaps the complexity is necessary for the semantics of Python. It makes me wary of trying, but if the analysis is correct, the necessary change is localised. So I will give solution 2 some thought. A concern is that PyList.sort may not be the only sort affected by the issue.
History
Date User Action Args
2011-08-27 14:28:56jeff.allensetmessageid: <1314455336.53.0.236785190316.issue1767@psf.upfronthosting.co.za>
2011-08-27 14:28:56jeff.allensetrecipients: + jeff.allen, fwierzbicki, juneau001, egg
2011-08-27 14:28:56jeff.allenlinkissue1767 messages
2011-08-27 14:28:56jeff.allencreate