Issue1722306

classification
Title: OverflowError in UDP Socket Implementation
Type: Severity: normal
Components: Library Versions:
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: amak Nosy List: additv, amak, cgroves, pekka.klarck
Priority: normal Keywords:

Created on 2007-05-20.20:24:00 by additv, last changed 2007-06-18.06:28:13 by cgroves.

Messages
msg1611 (view) Author: additv (additv) Date: 2007-05-20.20:24:00
When using UDP sockets in Jython, calling sendto (socket.py:356) will raise "OverflowError: value too large for byte"
The problematic line is 361:
bytes = jarray.array(map(ord, data), 'b')

This seems to be due to a difference between the results of a call to "ord" and the java defintion of a byte.
Ord returns values between 0 and 255, while a java byte is a value between -128 and 127. 

Thus if a character has an ascii code > 127 this error occurs.
msg1612 (view) Author: Pekka Klärck (pekka.klarck) Date: 2007-05-21.07:13:14
Assuming that the describtion of the problem in the summary is correct it should be pretty simple to fix this. All that is needed would be changing the mentioned line to following.

bytes = jarray.array(map(lambda d : ord(d)-128, data), 'b') 

Could you try this with your original data? If it works this could perhaps be fixed already for 2.2.
msg1613 (view) Author: additv (additv) Date: 2007-05-21.16:18:47
For some reason the recommended change gives me errors in the data.  
I have found that adding:

import java.lang.String

at the beginning of the file, and changing the line to:

bytes = java.lang.String(data).getBytes()

seems to fix the problem however.
msg1614 (view) Author: additv (additv) Date: 2007-05-21.16:59:47
Update: Using java.lang.String mangles characters > 127

if you run:
print java.lang.String('\x80')
it returns 63

running:
ord('\x80')
returns 128 (as it should)

running
print java.lang.String('\x81')


it looks like using:
bytes = java.lang.String(data).getBytes("ISO-8859-1")
works though

msg1615 (view) Author: additv (additv) Date: 2007-05-21.17:10:00
Here's the patch:

13a14
> import java.lang.String
361c362
<         bytes = jarray.array(map(ord, data), 'b')
---
>         bytes = java.lang.String(data).getBytes("ISO-8859-1")
msg1616 (view) Author: Charlie Groves (cgroves) Date: 2007-05-21.17:20:48
We're replacing socket with a new implementation based on java.nio in 2.2rc1.  It'd be quite helpful if you tested your code with the new version.  You can drop it in as explained in http://wiki.python.org/jython/NewSocketModule or just wait for rc1 to come out.  It should be in a week or two.
msg1617 (view) Author: additv (additv) Date: 2007-05-21.17:42:57
The same bug is in the new socket code, and the same fix can fix it.  Here's the patch for the updated socket code:

28a29
> import java.lang.String
656c657
<         bytes = jarray.array(map(ord, data), 'b')
---
>         bytes = java.lang.String(data).getBytes("ISO-8859-1")
msg1618 (view) Author: Alan Kennedy (amak) Date: 2007-05-22.17:47:37
The error report is correct.

The proposed fix is also correct; java.lang.String(data).getBytes('iso-8859-1') should always return the correct bytes, as long as [0 <= ch <= 255 for ch in data].

Fix applied at revision 3237. Also, extra unit test case added to check for eight-bit safety.

Thanks for the report.
msg1619 (view) Author: Charlie Groves (cgroves) Date: 2007-05-30.05:16:14
Need to actually get the socket stuff into trunk then we can close this.
msg1620 (view) Author: Charlie Groves (cgroves) Date: 2007-06-18.06:28:13
Pulled over to trunk in r3256.
History
Date User Action Args
2007-05-20 20:24:00additvcreate