Issue2699

classification
Title: Implement PyFile_IncUseCount/PyFile_DecUseCount (?)
Type: rfe Severity: normal
Components: Versions: Jython 2.7
Milestone: Jython 2.7.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: stefan.richthofer Nosy List: jeff.allen, stefan.richthofer, zyasoft
Priority: normal Keywords:

Created on 2018-08-03.22:56:26 by stefan.richthofer, last changed 2018-08-05.10:56:04 by stefan.richthofer.

Messages
msg12064 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2018-08-03.22:56:24
Via FileIO.__int__() Jython offers access to the file handle which a user could use in C code (e.g. by JNI or JNR or so independently from JyNI) to access the file directly. However according to PyFile_IncUseCount doc (https://docs.python.org/2/c-api/file.html#c.PyFile_IncUseCount) one would have to call PyFile_IncUseCount/PyFile_DecUseCount API around such an access. Since this API is missing in Jython a user currently does not have a chance to do it right. "Who would directly use the file handle to access the file?" one might ask, which I guess is why it was never implemented. Now there is an answer: C-extensions!

The actual reason I come up with is that we are currently implementing PyFile API in JyNI, see https://github.com/Stewori/JyNI/issues/11#issuecomment-410309767. FileIO.__int__ giving public access to the handle also in Jython made me realize that this API also has a use case in Jython itself, an exotic one though. Not so exotic for C extensions via JyNI.

How to implement it in FileIO is straight forward. A use counter would be maintained (AtomicInt I suppose) and close() would fail if called while the counter is >0. Main question is whether the API should be actually in PyFile rather than or in addition to FileIO. In that case it would simply forward the counter calls to FileIO if the underlying IO object (private TextIOBase file;) is actually backed by a FileIO.

If we can agree this should be added I would like to get it into 2.7.2, so setting the milestone accordingly. Otherwise it would block the next JyNI release potentially for a long time.

Without this implementation JyNI would have to monkeypatch FileIO (e.g. wrap it into a JyNIFileIO). While possible this would be a hassle.
Please tell me if there is already some way to prevent FileIO from closing (in a concurrent access setting) or if I overlook some simpler solution. I looked into FileChannelAPI but it seems that a close is never prevented. Instead, the other accessing threads would be interrupted with a AsynchronousCloseException, see https://docs.oracle.com/javase/7/docs/api/java/nio/channels/InterruptibleChannel.html.
msg12065 (view) Author: Jeff Allen (jeff.allen) Date: 2018-08-04.09:54:12
One of the joys of Jython (USPs if you like) is that it leaves reference counting to the JVM. Another is that it runs on the JVM, which has certain security/safety features.

When jnr-posix etc. reach past the JVM with an integer file handle or a FILE* it breaks this, yet we seem to have to do that because Python libraries permit this direct engagement with C libraries behind CPython, for those who want to do low-level things. I have some discomfort about this.

Java wants to protect us from this. Or is that rather protect its JVM from us? You can see this in one of the things blocking Java 9: improved modularisation stops us getting an integer file handle. It never worked on Windows and now it doesn't on Unix unless you go to some lengths to disable the safety features.

If you're running C, I guess you already circumvented the safety features, and the convenience of not reference-counting since you must sometimes have to lock objects down. We must be careful not to create (I mean we must recover from) the situation in which Jython *only* runs with the safety turned off. I think this leads us to one or more runtime options that say "it's ok, turn off protection X", with which certain low-level features are enabled. But also it means we echew these unsafe features elsewhere in Jython (and test that we have). That's if future versions of the JVM give us the option at all.

I'm slightly foxed by the stated motivation for PyFile_IncUseCount. If I open a file in one thread and use it, meanwhile another thread might close it, don't I deserve the mess I get into? :)

In CPython 2.7 source, I only see PyFile_(Inc|Dec)UseCount used as brackets around a place where the GIL is released, with no hint of threads racing to close a file. I expecetd it to work as a deferred close, but actually it just seems to raise an error if you get it wrong. I notice this feature has disappeared by Python 3.6, the library relying instead on C stdio (https://www.commandlinux.com/man-page/man3/flockfile.3.html) to make a thread wait, or a no-op where that's not available.

In Jython, the GIL is permanently released. I feel I haven't entirely grasped the implications of that and may not be alone.

Would it really affect C extensions if these were no-ops?
Does it actually protect against anything (in a sensible application) given the way threads work in Jython?
msg12066 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2018-08-04.13:18:49
I did not invent this API, just trying to get a tight support now that we are working on this part.

Indeed Java 9+ will make this more complicated. However that concerns FileIO.__int__ itself, not an accompanied use counter. FileIO.__int__ is already present - I did not invent it, we're just using it. What would you suggest to do with it in Java 9?
Remove it? Does actually something depend on it? In JyNI it is already confirmed that matplotlib needs it, but for JyNI it's enough to expose the FileDescriptor. We can move the black magic to JyNI then, it's already so packed with such magic that this won't be of much notice :)

So far access by JNI to private stuff is not restricted AFAICT. I was not aware it does not work on Windows. I assumed the field would then just contain the Windows file handle. A quick search lead to http://www.docjar.com/docs/api/sun/nio/ch/IOUtil.html IOUtil.fdVal. It's still not nice but gives the hope to work on Windows. https://www.linkedin.com/pulse/getting-unix-file-descriptor-java-greg-banks also mentions sun.misc.sharedSecrets, which is still present in Java 9 in jdc.internal.misc. This will require --add-exports java.base/jdk.internal.misc=your.module according to https://stackoverflow.com/questions/46612648/javalangaccess-and-sharedsecrets-in-java-9.

This might be a viable way to make FileIO.__int__ a bit more Java 9 proof. I hope --add-exports is a bit more sustainable than --illegal-access.

> Would it really affect C extensions if these were no-ops?

That's no dead functionality in CPython, so in theory yes. In fact I don't know. We can implement the functions to do nothing except raising warnings. Then revisit if a C extension shows up that really triggers these warnings. On the other hand I thought adding the use counter is not a big deal and then we would be save. The concept would also be more complete in Jython. Indeed in Jython the main issue is to support FileIO.__int__, not an accompanied use counter. The counter is a trivial enhancement compared to that.

Python 3 is a different story, that will require extensive change anyway.
msg12069 (view) Author: Jeff Allen (jeff.allen) Date: 2018-08-05.07:03:00
Tackling your points in reverse, when something is removed in a later version I think it's a hint it wasn't the right idea. But as Guido points out (https://bugs.python.org/issue595601#msg59340), the whole i/o system is different in 3.

When you look at what PyFile_IncUseCount does (just a tripwire), it's not the lock I expected from the documentation. But in the source code and original issue it becomes clear that the concern was that operations after close, called by a *badly-written* program, may result in a C-level crash and this trick turns it into a Python IOError.

In CPython, macros count up/down immediately before/after you release the GIL. So uses (of FILE_BEGIN_ALLOW_THREADS and FILE_END_ALLOW_THREADS) define regions where actions on a file may happen concurrently with taking the GIL and closing the file (an error). The count is equal to the number of threads using the file but not holding the GIL. We get an IOError if close() is called while this count is non-zero, which will occur outside any of these regions, by the one thread that at that point holds the GIL. But we don't hava a GIL, so bracketing all uses of the underlying file with atomic up and down operations doesn't provide the benefit you seek in Jython.

We check the file is still open before every i/o operation but we don't synchronise the methods to make this effective in the face of concurrency. We should, if an unhandled Java exception in *badly-written* programs a concern. (I think the original author decided not, but synchronisation is not as expensive as it used to be.) And then the macros we're talking about, when they appear in CPython/JyNI extensions, should mean take and release the inherent lock on the FileIO.

FileIO.__int__() is troublesome but not really the topic here. JyNI and some low-level Python will need to reach the C-level fd or FILE*, I accept. Java has taken away the reflective access to it unless you run with an --add-exports, and this will be unacceptable to some users, but probably only where C extensions are too. (Or they'll be sophisticated enough to manage the risks of both.) My point is only that we can't casually make Jython as a whole depend on --add-exports, as we do now (#2656).
msg12070 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2018-08-05.10:56:03
Exactly because Jython has no GIL, PyFile_IncUseCount/PyFile_DecUseCount should surround every direct file access via fd pointer. Correct me if I'm wrong, but I think the JVM cannot detect and guard external file access by this pointer. That's why it is internal API.

I also considered to forward calls of PyFile_IncUseCount/PyFile_DecUseCount to file channel lock/release. However that's not correct, because that would also lock file access. Use count is only supposed to prevent concurrent closing and there seems to be no way to achieve this directly in FileChannel.

Anyway. Implementing the monkeypatch seems to be easier than convincing you. Also regarding the perspective that we might have to get rid of FileIO.__int__. For now I will just observe actual use of PyFile_IncUseCount/PyFile_DecUseCount by a dummy implementation emitting warnings.

Agreed that --add-exports should not be a requirement in Jython. I still think we even depend on --illegal-access which is even worse I guess. We can move this critical stuff to JyNI if it comes to that.
History
Date User Action Args
2018-08-05 10:56:04stefan.richthofersetmessages: + msg12070
2018-08-05 07:03:02jeff.allensetmessages: + msg12069
2018-08-04 13:18:50stefan.richthofersetmessages: + msg12066
2018-08-04 09:54:14jeff.allensettype: behaviour -> rfe
messages: + msg12065
2018-08-03 22:56:26stefan.richthofercreate