Title: Object comparison of strings fails, "a" is "a" gives false
Type: Severity: normal
Components: Versions: Jython 2.7
Status: closed Resolution: invalid
Dependencies: Superseder:
Assigned To: Nosy List: pjac, zyasoft
Priority: Keywords:

Created on 2014-05-07.17:11:05 by pjac, last changed 2014-05-10.05:45:34 by zyasoft.

msg8342 (view) Author: Peter (pjac) Date: 2014-05-07.17:11:04
Arguably object comparison (using "is") of strings is perhaps expected to be implementation dependant, but this could be viewed as a regression from Jython 2.5:

C Python:

$ python
Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "a" is "a"
>>> quit()

$ pypy2.2
Python 2.7.3 (87aa9de10f9c, Nov 24 2013, 20:57:21)
[PyPy 2.2.1 with GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``the zen attitude to programming:
reducing the oopses in your life''
>>>> "a" is "a"
>>>> quit()

Jython 2.5

$ jython2.5
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06) 
[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_65
Type "help", "copyright", "credits" or "license" for more information.
>>> "a" is "a"
>>> quit()

Jython 2.7 beta 2:

$ jython2.7
"a" Jython 2.7b2 (default:a5bc0032cf79+, Apr 22 2014, 21:20:17) 
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_55
Type "help", "copyright", "credits" or "license" for more information.
>>> "a" is "a"
>>> 4 is 4
>>> quit()

All tests show above were on Mac OS X.
msg8343 (view) Author: Jim Baker (zyasoft) Date: 2014-05-07.19:20:28
So this is a question of whether string literals like "a" should be automatically interned or not.

You can guarantee this with the intern() builtin function. So using this function is guaranteed to work on any Python implementation, regardless of how the string is constructed, literal or not:

intern("a") is intern("a")

My first inclination is to agree with the submitter and say "implementation dependent" and not to support such automatic interning of string literals, even if it occasionally will accidentally work or has worked in the past.

Consider working with ints:

>>> 42 is 42

For Jython, this is True because of Jython's internal caching of ints in the range [-100, 900]. So on Jython, asking if 60000 is 60000 will return False, for example.

CPython takes a different approach, and it's somewhat related to reference counting and probably arenas. But we can investigate a bit further:

>>> a = 60000; b = 60000; a is b # True


>>> a = 60000
>>> b = 60000
>>> a is b  # False

I did write a test program to see what CPython's cached range actually is - it's range(-5, 257)

for i in range(-1000, 1000):
    a = i
    b = i * 2 / 2  # Need to do a trivial calculation to avoid object assignment
    print i, a is b
msg8344 (view) Author: Jim Baker (zyasoft) Date: 2014-05-07.20:41:13
Here's one more interesting detail I found re CPython:

> And to make it doubly clear: there is not interning going on here at all. Immutable literals are instead stored as constants with the bytecode. Interning does take place for names used in code, but not for string values created by the program unless specifically interned by the intern() function. –  Martijn Pieters
Date User Action Args
2014-05-10 05:45:34zyasoftsetstatus: open -> closed
resolution: invalid
2014-05-07 20:41:13zyasoftsetmessages: + msg8344
2014-05-07 19:20:29zyasoftsetnosy: + zyasoft
messages: + msg8343
2014-05-07 17:11:17pjacsetversions: + Jython 2.7
2014-05-07 17:11:05pjaccreate