Issue1189

classification
Title: MD5 hash is incorrectly calculated when string contains non-latin chars and using python md5 lib
Type: Severity: major
Components: Core Versions: Deferred
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pjenvey Nosy List: noelob, pjenvey, zyasoft
Priority: low Keywords:

Created on 2008-12-03.15:54:39 by noelob, last changed 2009-07-07.03:42:39 by pjenvey.

Messages
msg3869 (view) Author: Noel O'Brien (noelob) Date: 2008-12-03.15:54:38
When using pythons md5 library to create a hex digest of a string
containing non-latin characters, an incorrect hash is returned. The
following code shows the difference:

# encoding: uft-8

from java.math import BigInteger
from java.security import MessageDigest
from java.security import NoSuchAlgorithmException
from java.lang import String
import md5

a = u"Gráin amháiñ ©ðƒ©óíßðƒóíıßðƒ‚íó©ı"
#a = "A lovely string to encrypt!"
b = String(a)

digest = MessageDigest.getInstance("MD5")
bytes = b.getBytes()
digest.reset()
java_hash = BigInteger(1, digest.digest(bytes)).toString(16)
print "Hash using Java:\t", java_hash

print "Hash in Python:\t\t", md5.new(a).hexdigest()
msg3873 (view) Author: Jim Baker (zyasoft) Date: 2008-12-05.18:23:17
CPython 2.5 takes str (byte strings), automatically coercing unicode
string via a encode to ascii as necessary.

So this is not even legal in CPython, since it throws
UnicodeEncodeError: 'ascii' codec... Nonetheless, we need more
investigation.
msg3874 (view) Author: Jim Baker (zyasoft) Date: 2008-12-05.18:25:15
To correct some ambiguity in my last change: "CPython's md5" takes...
msg4886 (view) Author: Philip Jenvey (pjenvey) Date: 2009-07-07.03:42:39
We now act like CPython as of r6517. You'll need to convert non ascii 
(defaultencoding) input to raw strs first, like Java, or a 
UnicodeEncodeError will be raised
History
Date User Action Args
2009-07-07 03:42:39pjenveysetstatus: open -> closed
resolution: fixed
messages: + msg4886
2009-06-21 23:03:55pjenveysetassignee: zyasoft -> pjenvey
nosy: + pjenvey
2009-03-14 02:39:12fwierzbickisetpriority: low
2008-12-05 18:25:15zyasoftsetmessages: + msg3874
2008-12-05 18:23:17zyasoftsetmessages: + msg3873
2008-12-05 18:11:43zyasoftsetassignee: zyasoft
nosy: + zyasoft
2008-12-03 15:54:39noelobcreate