Issue2410

classification
Title: Regression in PySystemStateTest (leading slash)
Type: behaviour Severity: normal
Components: Core Versions: Jython 2.7
Milestone:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jeff.allen Nosy List: jeff.allen, otmarhumbel, stefan.richthofer
Priority: normal Keywords: test failure causes

Created on 2015-10-10.10:05:00 by jeff.allen, last changed 2018-11-25.08:00:17 by jeff.allen.

Messages
msg10343 (view) Author: Jeff Allen (jeff.allen) Date: 2015-10-10.10:04:58
I notice that I get a regression test failure from:
ant -Dtest=PySystemStateTest singlejavatest

I think the test might be right and the code wrong: in the failing case, don't we need the slash?

I admit I don't run "ant javatest" anything like as often as "ant regrtest": mending my ways belatedly, I have traced the problem to https://hg.python.org/jython/rev/850f5980a2e3 .

I searched a bit and found: http://stackoverflow.com/questions/402683, which steers us towards what the Java platform provides, rather than wrestling as we do with the OS-specifics.
msg11894 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2018-04-12.13:19:08
Also see http://bugs.jython.org/issue2386 for the reasoning of making getJarFileName() public. Since it states "FileName" I suppose it should return a string that is valid as a file name, e.g. when used to construct a java.io.File object. A leading slash would break this assumption/promise (on Windows) as it is a malformed file name.

What solution would you suggest? What is the exact failure?
msg11896 (view) Author: Jeff Allen (jeff.allen) Date: 2018-04-12.20:22:09
Failure in more detail:

singlejavatest:
    [junit] Testsuite: org.python.core.PySystemStateTest
    [junit] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.313 sec
    [junit]
    [junit] Testcase: testGetJarFileNameFromURL(org.python.core.PySystemStateTest):     FAILED
    [junit] expected:<[/]some_dir/some.jar> but was:<[]some_dir/some.jar>
    [junit] junit.framework.ComparisonFailure: expected:<[/]some_dir/some.jar> but was:<[]some_dir/some.jar>
    [junit]     at org.python.core.PySystemStateTest.testGetJarFileNameFromURL(PySystemStateTest.java:21)

There isn't a drive letter involved here, but I appreciate the existence of drive letters invalidates otherwise correct manipulation in a lot of contexts. I've done some of that myself in the code base. (I never made getcwd() work correctly on the D-drive.)

Ideally we would depend on the nice people at Oracle/OpenJDK to get this stuff right. I'd love to re-work all this guff using java.nio.Paths one day. For now, I think java.io.File.toURI is probably the right answer.

>>> from java.io import File
>>> from java.util.jar import JarFile
>>> from java.net import URL
>>> f = File(r"dist\lib\test\blob.jar")
>>> f.getAbsoluteFile()
C:\Users\Jeff\Documents\Eclipse\jython-trunk\dist\lib\test\blob.jar
>>> f.toURI()
file:/C:/Users/Jeff/Documents/Eclipse/jython-trunk/dist/lib/test/blob.jar
>>> j = JarFile(r"dist\lib\test\blob.jar")
>>> list(j.entries())
[Blob.class, Blob.java]
>>> u = URL("jar:" + str(f.toURI()) + "!/Blob.java")
>>> u
jar:file:/C:/Users/Jeff/Documents/Eclipse/jython-trunk/dist/lib/test/blob.jar!/Blob.java
>>> import os
>>> os.path.relpath(u.getPath().split(':',1)[1].split('!')[0][1:])
u'dist\\lib\\test\\blob.jar'


To be honest, I've lightly lost the thread of #2386, and why we ended up with this test failure.:/ Does something in the round-trip I just made suggest a simple, cross-platform and correct version of getJarFileNameFromURL or getJarFileName ?
msg11897 (view) Author: Jeff Allen (jeff.allen) Date: 2018-04-12.20:50:44
Ah, is this what we want to reproduce URL -> file?

>>> u = URL("jar:file:/some_dir/some.jar!/a/package/with/A.class")
>>> c = u.openConnection()
>>> File(c.getJarFileURL().toURI())
\some_dir\some.jar

And with that pesky drive:

>>> u = URL("jar:file:/C:/some_dir/some.jar!/a/package/with/A.class")
>>> c = u.openConnection()
>>> File(c.getJarFileURL().toURI())
C:\some_dir\some.jar
msg11900 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2018-04-13.17:15:30
Oh, sorry; in https://hg.python.org/jython/rev/850f5980a2e3 I overlooked that `some_path` can have different meaning than `/some_path`. However fixing paths like `/C:/some_path` was still important. Apart from that I did not touch getJarFileNameFromURL implementation. It stems from older work. As part of #2386 I just moved it from another module to public API. Changing it to make better use of Java API is the best solution I suppose. Also simplifies its code a lot.

E.g.
java.io.File(
URL("jar:file:/C:/some_dir/some.jar!/a/package/with/A.class")
.openConnection().getJarFileURL().toURI())).getAbsolutePath()
should do it...
msg12054 (view) Author: Jeff Allen (jeff.allen) Date: 2018-07-14.22:19:13
Hmmm ... I've learned a lot about URLs and a bit about JBoss (or WildFly now) trying to find a good answer to this. One thing I definitely understand is that one should not use a URLDecoder on a URL. (https://stackoverflow.com/questions/6519746/url-decode-in-java-6#6603584)

The best way to derive a file path from a URL is surely to use the facilities Java offers: it knows what platform we're on and is more likely to get the parsing right (UNC paths anyone?). And it's satisfyingly brief. But precisely because Java does this properly, I end up with the platform-specific separator character in my file path.

The behaviour I'm not totally sure about is the one for the JBoss protocols. I've gone for the same solution here, the one that produces the platform-specific slash. Essentially one simply replaces the vfs: or vfszip: with file: and assumes the URL is valid. It is difficult to know whether this is right: I'm fairly sure the current code only works in particular circumstances anyway. In general, these VFS URLs cannot be translated to a file path, but when they can, it makes sense to me that the path should be properly slashed, not as "required" by the test.

I'm going to assume that the forward-slash we currently end up with universally (and the test enforces) is just a consequence of the home-cooked parser, and of relying on the tolerance of java.io on Windows. As indeed is the pesky leading slash: #msg6157.

It would be a problem if some application (on Windows) depended on the forward slash, for example to re-compose a URL into VFS, but I've no evidence of such a use.

Now here:
https://hg.python.org/jython/rev/d83a36fe9f8e
Except I've credited it to Richie Bendall :( Not sure I can fix that now.
msg12055 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2018-07-15.01:15:16
Do we get now the slash before the drive letter on Windows?
I do not remember why it exactly caused issues, but there was something.
Regarding #2386 checking that os.path.dirname(os.path.realpath(sys.executable))
works as expected, on Windows, starting via java, bypassing the usual launcher. Sorry that I never added a test for #2386. On the other hand, how would you test a launcher-specific issue?

It looks like you forgot to write an entry in NEWS (sorry if I overlooked it).
msg12056 (view) Author: Jeff Allen (jeff.allen) Date: 2018-07-15.06:35:09
Thanks for checking that. I'll add a NEWS entry. Maybe in my mind it isn't really fixed until others have a chance to try it! (Ideally that would include trying it on JBoss/WildFly.)

I'll look into the C: prefix. The test shows that the file path is clean of spurious slashes for JBoss "C:\\some+dir...", but probably the JAR test should too.
msg12057 (view) Author: Stefan Richthofer (stefan.richthofer) Date: 2018-07-15.15:10:54
Sorry, I somehow messed up a sentence im my last message.
I wanted to say that checking os.path.dirname(os.path.realpath(sys.executable))
for workability with the new approach needs to be done (when launching via java on Windows). Maybe I can check that, but currently have other priorities.
msg12058 (view) Author: Jeff Allen (jeff.allen) Date: 2018-07-18.05:45:50
@Stefan: Ah sorry. It takes no time for me:
>>> import os, sys
>>> os.path.dirname(os.path.realpath(sys.executable))
'C:\\Users\\Jeff\\Documents\\Eclipse\\jython-trunk\\dist\\bin'
Works ok.

I know you worked on this to get distutils working. I'll check as well as I can I haven't broken it again, but if you spot me doing something that does, please say.

In response to the question, I beefed up the test to include URLs like jar:file:\C:\some\jy.jar!\A.class, and it passed (gives C:\some\jy.jar).

Then I threw in a couple of UNC paths for good measure, but here it failed. This seems to be a "won't fix" bug in Java. I have worked around it, but there may be no point because Jython simply does not work from a UNC path. :( This is because of a disagreement internal to java.net.URLClassLoader.getResourceAsStream(String name) about the acceptable form of a URL that bites us as soon as we try to read version.properties. The URL produced by java.lang.ClassLoader.getResource(String name) is rejected by java.net.URLConnection.getInputStream(). As far as I can find out, Java has embedded some early interpretation of the URL idea that is now contrary to standards it follows elsewhere, and (surprisingly) has not found its way out of the muddle.
msg12059 (view) Author: Jeff Allen (jeff.allen) Date: 2018-07-18.19:47:41
Claiming this fixed at https://hg.python.org/jython/rev/c0b0894950ff . 

Honestly I thought this was going to be half an hour's work. Educational though.
History
Date User Action Args
2018-11-25 08:00:17jeff.allensetstatus: pending -> closed
2018-07-18 19:47:42jeff.allensetstatus: open -> pending
resolution: accepted -> fixed
messages: + msg12059
2018-07-18 05:45:52jeff.allensetmessages: + msg12058
2018-07-15 15:10:54stefan.richthofersetmessages: + msg12057
2018-07-15 06:35:09jeff.allensetmessages: + msg12056
2018-07-15 01:15:17stefan.richthofersetmessages: + msg12055
2018-07-14 22:19:14jeff.allensetkeywords: - easy
nosy: + otmarhumbel
messages: + msg12054
2018-07-12 13:12:38jeff.allensetkeywords: + test failure causes
assignee: jeff.allen
resolution: accepted
2018-04-13 17:15:31stefan.richthofersetmessages: + msg11900
2018-04-12 20:50:45jeff.allensetmessages: + msg11897
2018-04-12 20:22:10jeff.allensetmessages: + msg11896
2018-04-12 13:19:09stefan.richthofersetmessages: + msg11894
2018-04-10 19:45:52jeff.allensetkeywords: + easy
priority: normal
2015-10-10 10:05:00jeff.allencreate