Created on 2018-11-03.11:57:09 by jeff.allen, last changed 2018-12-01.19:59:39 by jeff.allen.
|msg12167 (view)||Author: Jeff Allen (jeff.allen)||Date: 2018-11-03.11:57:08|
Jython array.array supports the buffer protocol experimentally. However, it only creates byte views of the data, which is incorrect (and has to be readonly) for anything that is not actually byte data. Support for plain byte data is essentially the same as bytearray, and reliable. Beyond 'b', I suggest it should claim not to support the action. This was observed first in https://github.com/jythontools/jython/pull/119, alongside a problem with unicode, but the offered solution there is not satisfactory for the array.array issue.
|msg12189 (view)||Author: Jeff Allen (jeff.allen)||Date: 2018-11-28.21:21:02|
Actually, this isn't quite the problem I thought. PyArray supports the buffer interface because in CPython you can write: >>> ab = array.array('b', [65, 66, 67]) >>> list(buffer(ab)) ['A', 'B', 'C'] >>> ai = array.array('i', [65, 66, 67]) >>> list(buffer(ai)) ['A', '\x00', '\x00', '\x00', 'B', '\x00', '\x00', '\x00', 'C', '\x00', '\x00', '\x00'] (Little-endian machine: YMMV, and Java is big-endian.) In Jython we have only one (internal) BufferProtocol so the side-effect of supporting buffer() as we should is that array.array also responds to memoryview(). However, if array.array were to respond to memoryview(), as it does in Python 3, it should do so with a properly structured buffer, that is: >>> mi = memoryview(ai) >>> mi.itemsize, mi.format, mi.shape (4, 'i', (3,)) For compatibility with CPython 2, it probably shouldn't try.
|msg12190 (view)||Author: Jeff Allen (jeff.allen)||Date: 2018-12-01.19:59:37|
The fact that we have only one (internal) BufferProtocol makes this tricky to solve. We cannot remove it from PyArray, or it ceases to support buffer(). And not just buffer, since an array.array is accessed as bytes in several other places in the core. After a bit of deep thinking, I realised that the tool for this is the flags argument passed to getBuffer(). memoryview asks for a structured view (FULL) because it can understand the navigational arrays, while buffer insists on flat bytes (SIMPLE). So we make PyArray.getBuffer reject requests for structured data. However, this implies that wherever we use the buffer API to access bytes (except in PyMemoryview) we need to ask for flat bytes, or it will stop working for array.array. We haven't always done that. (I always found it difficult to decide.) I've gone for throwing a ClassCastException from the getBuffer() where we implement BufferProtocol as far as Java is concerned, but don't *really* implement it. I though this an odd choice, but it turns out to work quite well. I tried raising a TypeError directly, but one cannot give it the right message at that point, and it is awkward to catch in Java. Clients of the buffer interface already create the right kind of message, and just need to do it as well when a ClassCastException is thrown. We can use the same approach for disabling PyUnicode.getBuffer. (I have.) However, there are places where the buffer protocol is used to retrieve the characters of a PyUnicode on the (unconcious) assumption they are bytes, and these have to be reworked to accept PyUnicode consciously and do the right thing. I'm not sure I've found all the right places: the regression tests don't cover it very well. All this adds up to a sprawling change set, but often the change to a file is not large. https://hg.python.org/jython/rev/616f1bbd4ec9
|2018-12-01 19:59:39||jeff.allen||set||status: open -> pending|
messages: + msg12190
title: Over-stretched array.array support for buffer protocol -> Restrict array.array support for buffer protocol
|2018-11-28 21:21:02||jeff.allen||set||messages: + msg12189|