Issue1854873

classification
Title: Patch for [ 1841644 ] Mmap module is not in jython
Type: rfe Severity: normal
Components: Library Versions: Jython 2.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: cgroves, fwierzbicki, mehendran, pjenvey, zyasoft
Priority: normal Keywords: patch

Created on 2007-12-20.13:36:40 by mehendran, last changed 2014-06-19.06:07:22 by zyasoft.

Files
File name Uploaded Description Edit Remove
mmap_module_v1.diff mehendran, 2007-12-20.13:36:40 mmap module
mmap_doc.txt mehendran, 2007-12-20.13:42:19 implementation doc
Messages
msg2971 (view) Author: Mehendran (mehendran) Date: 2007-12-20.13:36:40
I have uploaded the patch for mmap module. 
Please send your valuable comments.
Thanks,
Mehendran
msg2972 (view) Author: Mehendran (mehendran) Date: 2007-12-20.13:42:20
I have attached a doc that explains about issues, differences
and implementation details of mmap module.
Suggestions and Comments are highly expected. 
File Added: mmap_doc.txt
msg2973 (view) Author: Charlie Groves (cgroves) Date: 2007-12-24.06:09:36
The javadocs for FileChannel.map say that the mapped buffer isn't tied to the channel that created it, so that even if the channel is closed, the buffer is still available.  Why not just require that the fileno that comes in be an instance of FileChannel, since that's what we return for fileno in instances of files?  Changing the type of object passed into mmap.mmap makes this module different enough from the CPython version that sharing a name with it is probably more harmful than good.  Definitely any CPython code that uses mmap will have to be special cased to work with Jython.

I don't understand why PyMmap needs to extend PyString.  The mmap type in CPython is not an instance of str.  However in Jython, extending PyString and keeping a String copy of the mmapped data breaks the general use case of mmapping.  From my understanding, when you mmap a file you're expecting the OS to map it into memory as you're reading it, and then to page it out as parts of it are no longer being read.  Since this patch reads the whole file into a Java String, it's actually far worse than reading the file into a single byte array and holding onto it: it'll use twice the memory of the file for chars instead of bytes.  This difference is going to make mmap useless for anyone mapping a large file.  Where all is it expected to be an actual instance of str?  I see re.search checking that the pattern is a str, but not the object to be searched.

What do you mean "a doubt about implementing the buffer interface"?  I'm not sure what the buffer interface is.

Your check for os.name starting with "Lin" as a check for unix is a pretty bad idea since there are several unix variants out there that have a JVM that don't start with "Lin".  Is there any reason to actually support an os distinction on Jython's version of mmap?  Since it's the same across all platforms, why not just use the access keyword and raise errors for flag, prot and tagname?  Platform independent mmapping has to use access already.

I'd really like to avoid using internal Sun packages in Jython if at all possible: if we use them, we can't run on all JVMs.  Is there any reason for the close method to actually "close" the mmaped buffer?  Why not just null out the reference to it and let the garbage collector get it when it will?  Then you can just check for the buffer being null on any operation on the mmap type and raise an exception if it happens.

These are the major issues I noticed in a quick check of the patch.  There may be smaller issues turned up by a closer examination, but I figured it was a good idea to get these out of the way before doing that.
msg2974 (view) Author: Mehendran (mehendran) Date: 2008-01-31.15:06:35
For the point no 3:

In jython, the function re.search() requires both the pattern and the object to be searched to be a string instance.In python, the object to be searched can be either a string or a buffer object. 
Python provides support for "buffer interface"(please see http://docs.python.org/api/bufferObjects.html) which jython does not.Thus in python, array & mmap objects(which implement the buffer interface) along with strings can be used as arguments to functions like file.write(),re.search() etc. which accept buffer objects/strings as arguments.In jython, due to lack of buffer support, functions like file.write() and re.search() accept only a string instance as argument.As per Python's implementation of mmap, mmap is required to expose the buffer interface so that it can be used as argument to re.search()/file.write()....and this is precisely what is tested in the test-file.
Instead of deriving mmap from PyString, the code for re.search() can be modified to deal with mmap object as a special case(for the sole purpose of making the tests pass!).Another way to do it would be to subclass PyString,but instead of maintaining a string that is a duplicate of the mmap contents,the string would be simply maintaind as an empty string.However,I guess it will be a better idea to skip the test where a mmap instance is checked to see whether it has buffer-like behaviour.Even Jython's array implementation has no buffer support.   
Is there any specific reason why jython does not have buffer support?

Please clarify...
msg2975 (view) Author: Charlie Groves (cgroves) Date: 2008-01-31.17:01:49
Ahh, the buffer interface doesn't exist because no one has needed it yet.  If re.search and file.write can be made to use it, I'd suggest adding it to Jython and having mmap use it.  You're right that special-casing re.search just to support mmap seems like a bad idea.  Oddly, I ran into the buffer type being used directly to provide a sliced view of a str, so adding it to Jython seems like a good idea especially if it can be added to array.

That said, I seem to recollect removing extends PyString from PyMmmap and not having a problem with that when initially evaluating this patch, but it was a while ago.  You might try that first.
msg7368 (view) Author: Frank Wierzbicki (fwierzbicki) Date: 2012-08-10.18:28:51
Hi Mehendran - I'm sorry that this patch has lingered so long - the way we implement Python classes has changed considerably and 2.7 is actually growing a buffer API. Are you still around and interested in pushing this forward? Sorry again for the way this patch has been neglected.
msg8720 (view) Author: Jim Baker (zyasoft) Date: 2014-06-19.06:07:22
Could revisit given our support for the buffer protocol with Jeff Allen's work.

Conceptually this seems straightforward. The actual implementation will have to be significantly different than the proposed patch, per what Charlie already mentioned.
History
Date User Action Args
2014-06-19 06:07:22zyasoftsetnosy: + zyasoft
messages: + msg8720
2013-02-25 18:50:18fwierzbickisetversions: + Jython 2.5, - 2.5.1
2012-08-10 18:28:51fwierzbickisetnosy: + fwierzbicki
messages: + msg7368
2010-04-20 21:31:50pjenveysetnosy: + pjenvey
2009-03-14 02:57:02fwierzbickisettype: rfe
versions: + 2.5.1
2008-12-15 16:01:36fwierzbickisetcomponents: + Library, - None
2007-12-20 13:36:40mehendrancreate