Issue2164

classification

Title:	codecs do not accept memoryview objects for decoding
Type:	behaviour	Severity:	normal
Components:	Core	Versions:	Jython 2.7
		Milestone:

process

Status:	open	Resolution:	remind
Dependencies:		Superseder:
Assigned To:	jeff.allen	Nosy List:	jeff.allen, santa4nt, zyasoft
Priority:	normal	Keywords:

Created on 2014-06-10.20:57:28 by zyasoft, last changed 2018-03-16.22:54:51 by jeff.allen.

Messages
msg8624 (view)	Author: Jim Baker (zyasoft)	Date: 2014-06-10.20:57:27
Difference between CPython and Jython seen with this example: # -- coding: utf-8 -- import codecs data = memoryview(b"中文") text, decoded_bytes = codecs.utf_8_decode(data) assert text == u"中文" assert type(text) is unicode assert decoded_bytes == 6 This works fine on CPython. On Jython, it fails with TypeError: utf_8_decode(): 1st arg can't be coerced to String Current workaround is to use tobytes on the memoryview object: text, decoded_bytes = codecs.utf_8_decode(data.tobytes())
msg8625 (view)	Author: Jim Baker (zyasoft)	Date: 2014-06-10.20:57:38
Target beta 4
msg8633 (view)	Author: Jeff Allen (jeff.allen)	Date: 2014-06-12.19:20:08
I'd happily take this on unless someone is itching to get to know the buffer interface better.
msg8638 (view)	Author: Santoso Wijaya (santa4nt)	Date: 2014-06-13.18:04:19
Sounds interesting to me. Any tips?
msg8639 (view)	Author: Jeff Allen (jeff.allen)	Date: 2014-06-13.20:38:07
I decided step 1 was to make PyBuffer extend AutoCloseable, because this work by Indra Talip would have been neater: http://hg.python.org/jython/rev/355bb70327e0 Been meaning to since Java 7. So I've done that (testing now, maybe push tonight). You can take over from there if you like. This article is about the buffer protocol: https://wiki.python.org/jython/BufferProtocol , but it needs to be updated with the change I just made. If you look into how some choice codecs work, at the bottom they all seem to depend on entry points in modules/_codecs.java, so it's those that need changing. For a start, accept a PyObject obytes argument, then something like: if (obytes instanceof BufferProtocol) { try (PyBuffer bytes = ((BufferProtocol)obytes).getBuffer(PyBUF.SIMPLE)) { ... } } else { throw Py.TypeError("must be string or buffer, not " ... ) } You should then find the existing code bytes.charAt() still works, or it might be better to say this stuff really is bytes now. The soft option is ask for it as a String again, but IMO that's perpetuating a misdemeanor. My worry was that a lot of helper methods, and maybe some clients of these methods, would have to change signature, so it would end up really quite extensive. Maybe they should anyway. I couldn't find a test that exposes this problem, so I was going to add to test_codecs_jy.py, something like: def round_trip(u, name) : s = u.encode(name) dec = codecs.getdecoder(name) for B in (buffer, memoryview, bytearray) : self.assertEqual(u, dec(B(s))[0]) (I think that's correct.) Then call it with a variety of unicode strings and codec names.
msg8642 (view)	Author: Jeff Allen (jeff.allen)	Date: 2014-06-14.14:40:54
Ok, I committed the helpful change to PyBuffer and made the Wiki change.
msg8688 (view)	Author: Jim Baker (zyasoft)	Date: 2014-06-19.00:34:58
Jeff, thanks, sounds like a reasonable set of changes that we need to propagate through the codecs implementation.
msg9006 (view)	Author: Jim Baker (zyasoft)	Date: 2014-09-18.02:33:02
Target beta 4
msg11812 (view)	Author: Jeff Allen (jeff.allen)	Date: 2018-03-16.22:54:50
Guess I'll take it on then.

History
Date	User	Action	Args
2019-07-21 07:25:12	jeff.allen	link	issue2788 dependencies
2018-03-16 22:54:51	jeff.allen	set	priority: normal assignee: jeff.allen messages: + msg11812
2014-09-18 02:33:02	zyasoft	set	resolution: remind messages: + msg9006
2014-06-19 00:34:58	zyasoft	set	messages: + msg8688
2014-06-14 14:40:54	jeff.allen	set	messages: + msg8642
2014-06-13 20:38:08	jeff.allen	set	messages: + msg8639
2014-06-13 18:04:19	santa4nt	set	messages: + msg8638
2014-06-12 19:20:08	jeff.allen	set	nosy: + jeff.allen messages: + msg8633
2014-06-11 01:51:44	santa4nt	set	type: behaviour
2014-06-11 01:51:37	santa4nt	set	nosy: + santa4nt components: + Core versions: + Jython 2.7
2014-06-10 20:57:39	zyasoft	set	messages: + msg8625
2014-06-10 20:57:28	zyasoft	create