Issue2164
Created on 2014-06-10.20:57:28 by zyasoft, last changed 2018-03-16.22:54:51 by jeff.allen.
msg8624 (view) |
Author: Jim Baker (zyasoft) |
Date: 2014-06-10.20:57:27 |
|
Difference between CPython and Jython seen with this example:
# -*- coding: utf-8 -*-
import codecs
data = memoryview(b"中文")
text, decoded_bytes = codecs.utf_8_decode(data)
assert text == u"中文"
assert type(text) is unicode
assert decoded_bytes == 6
This works fine on CPython. On Jython, it fails with TypeError: utf_8_decode(): 1st arg can't be coerced to String
Current workaround is to use tobytes on the memoryview object:
text, decoded_bytes = codecs.utf_8_decode(data.tobytes())
|
msg8625 (view) |
Author: Jim Baker (zyasoft) |
Date: 2014-06-10.20:57:38 |
|
Target beta 4
|
msg8633 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2014-06-12.19:20:08 |
|
I'd happily take this on unless someone is itching to get to know the buffer interface better.
|
msg8638 (view) |
Author: Santoso Wijaya (santa4nt) |
Date: 2014-06-13.18:04:19 |
|
Sounds interesting to me. Any tips?
|
msg8639 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2014-06-13.20:38:07 |
|
I decided step 1 was to make PyBuffer extend AutoCloseable, because this work by Indra Talip would have been neater:
http://hg.python.org/jython/rev/355bb70327e0
Been meaning to since Java 7. So I've done that (testing now, maybe push tonight). You can take over from there if you like.
This article is about the buffer protocol: https://wiki.python.org/jython/BufferProtocol , but it needs to be updated with the change I just made.
If you look into how some choice codecs work, at the bottom they all seem to depend on entry points in modules/_codecs.java, so it's those that need changing. For a start, accept a PyObject obytes argument, then something like:
if (obytes instanceof BufferProtocol) {
try (PyBuffer bytes = ((BufferProtocol)obytes).getBuffer(PyBUF.SIMPLE)) {
...
}
} else {
throw Py.TypeError("must be string or buffer, not " ... )
}
You should then find the existing code bytes.charAt() still works, or it might be better to say this stuff really is bytes now. The soft option is ask for it as a String again, but IMO that's perpetuating a misdemeanor.
My worry was that a lot of helper methods, and maybe some clients of these methods, would have to change signature, so it would end up really quite extensive. Maybe they should anyway.
I couldn't find a test that exposes this problem, so I was going to add to test_codecs_jy.py, something like:
def round_trip(u, name) :
s = u.encode(name)
dec = codecs.getdecoder(name)
for B in (buffer, memoryview, bytearray) :
self.assertEqual(u, dec(B(s))[0])
(I think that's correct.) Then call it with a variety of unicode strings and codec names.
|
msg8642 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2014-06-14.14:40:54 |
|
Ok, I committed the helpful change to PyBuffer and made the Wiki change.
|
msg8688 (view) |
Author: Jim Baker (zyasoft) |
Date: 2014-06-19.00:34:58 |
|
Jeff, thanks, sounds like a reasonable set of changes that we need to propagate through the codecs implementation.
|
msg9006 (view) |
Author: Jim Baker (zyasoft) |
Date: 2014-09-18.02:33:02 |
|
Target beta 4
|
msg11812 (view) |
Author: Jeff Allen (jeff.allen) |
Date: 2018-03-16.22:54:50 |
|
Guess I'll take it on then.
|
|
Date |
User |
Action |
Args |
2019-07-21 07:25:12 | jeff.allen | link | issue2788 dependencies |
2018-03-16 22:54:51 | jeff.allen | set | priority: normal assignee: jeff.allen messages:
+ msg11812 |
2014-09-18 02:33:02 | zyasoft | set | resolution: remind messages:
+ msg9006 |
2014-06-19 00:34:58 | zyasoft | set | messages:
+ msg8688 |
2014-06-14 14:40:54 | jeff.allen | set | messages:
+ msg8642 |
2014-06-13 20:38:08 | jeff.allen | set | messages:
+ msg8639 |
2014-06-13 18:04:19 | santa4nt | set | messages:
+ msg8638 |
2014-06-12 19:20:08 | jeff.allen | set | nosy:
+ jeff.allen messages:
+ msg8633 |
2014-06-11 01:51:44 | santa4nt | set | type: behaviour |
2014-06-11 01:51:37 | santa4nt | set | nosy:
+ santa4nt components:
+ Core versions:
+ Jython 2.7 |
2014-06-10 20:57:39 | zyasoft | set | messages:
+ msg8625 |
2014-06-10 20:57:28 | zyasoft | create | |
|