Issue2363
Created on 2015-05-28.21:49:32 by amit, last changed 2018-04-21.13:26:33 by psykiatris.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | Remove |
test.py | amit, 2015-05-28.21:49:31 | test file to reproduce the problem | ||
test_fileseek.py | psykiatris, 2018-04-06.03:59:10 | Expanded test file. | ||
test_fileseek.py | psykiatris, 2018-04-13.23:47:06 | Updated test | ||
test_fileio1.patch | psykiatris, 2018-04-19.05:26:52 | |||
test_file_jy1.patch | psykiatris, 2018-04-21.13:26:32 | Patch amending SeekTellTest to test_file_jy |
Messages | |||
---|---|---|---|
msg10091 (view) | Author: Amit Srivastava (amit) | Date: 2015-05-28.21:49:31 | |
There seems to be an inconsistency after readline due to the readahead. The attached test script (runs fine for CPython) reproduces the problem. This could be related to issue1793, but since that issue was closed and am creating a new one. |
|||
msg10655 (view) | Author: Jim Baker (zyasoft) | Date: 2016-01-15.15:33:38 | |
My limited experience in looking at this problem is that is one of more undefined aspects of CPython. (To see this, just try varying the provided test script a bit.) We should investigate more and see what standard behavior we can extract. |
|||
msg11877 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-05.16:11:45 | |
I wanted to take a look at this, so I ran this on 2.7.2, It ws consistently 100 bytes off, so I put -50 for the seek_back. The file.tell() returned a position higher than the beginning current position, still 100 bytes off. ============== Current fd position: 4202 seeking back: 50 fd position after seek: 4252 Traceback (most recent call last): File "/home/patrick/jython/dist/Lib/test/bugs/issue_2363.py", line 13, in <module> assert file.tell()==4152 AssertionError ============= Does this mean that file.tell() doesn't give the absolute position, or it doesn't indicate the last position in the file? Populating the test.log file with different characters produce differrent results, obviously. |
|||
msg11879 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-05.20:37:31 | |
Never mind. I was wrong. On Thu, Apr 5, 2018 at 9:11 AM Patrick Palczewski <report@bugs.jython.org> wrote: > > Patrick Palczewski <psykiatris@gmail.com> added the comment: > > I wanted to take a look at this, so I ran this on 2.7.2, It ws > consistently 100 bytes off, so I put -50 for the seek_back. The file.tell() > returned a position higher than the beginning current position, still 100 > bytes off. > > ============== > > Current fd position: 4202 > seeking back: 50 > fd position after seek: 4252 > Traceback (most recent call last): > File "/home/patrick/jython/dist/Lib/test/bugs/issue_2363.py", line 13, > in <module> > assert file.tell()==4152 > AssertionError > > ============= > > Does this mean that file.tell() doesn't give the absolute position, or it > doesn't indicate the last position in the file? > > Populating the test.log file with different characters produce differrent > results, obviously. > > ---------- > nosy: +psykiatris > > _______________________________________ > Jython tracker <report@bugs.jython.org> > <http://bugs.jython.org/issue2363> > _______________________________________ > -- Patrick Palczewski, Esq. |
|||
msg11880 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-06.03:59:10 | |
I developed a more complete test to see what happens for each process. readline() by itself returns incorrect file positions, but will pass if changed to readlines() Only changes I made were that I did not assign to the data variable, and I created variables for the tell() for easier printing. Other than that, it's essentially the same code as the original test. Attached test_fileseek.py Patrick |
|||
msg11882 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-06.13:51:08 | |
I hate it when I goof up. But, readline() does not work correctly when seek is used. I updated my test script to better display the info, and to allow people to pass size and the seek amount to the function. it ended up clearly showing Iin the seekboth function, this: Output: File position from read(): 1024 File position from readline(): 4202 Amount to seek back: -1024 New position in file: 3278 Wrong position The function returns the correct position in Python 2.7.12 But, really, readline() *should* only read the line to the newline, which is 101 (test.log shows the first line as 100 '.' with a newline.) This is verified by the seekreadline call in the test file. |
|||
msg11890 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-11.05:24:00 | |
I have added a patch for the test_fileseek.py to add a test reversing the readline/red. What I relized is while the read buffer may be larger, the readline reads to the newline. Since the newline is at the 101st position in the test.log, it apparently skewing the results. if you reverse it, run readline() first, then read(), then the results are correct. Here is my output for both: 2nd patch fixes results. I had a couple of prints backwards. But the final result is below: ======== --Result of read-- File position: 1024 <-- read limit Amount to seek back: -200 New position: 824 Correct position! --- Result of readline() --- File position: 101 <-- includes NEWLINE Amount to seek back: -100 New position in file: 200 Wrong position --- Result of both() --- File position from read(): 4202 File position from readline(): 101 Amount to seek back: -100 New position in file: 4201 Wrong position --- Result of both_reversed() --- File position from readline(): 101 File position from read(): 1125 Amount to seek back: -100 New position in file: 1025 Correct position On 04/06/2018 06:51 AM, Patrick Palczewski wrote: > Patrick Palczewski <psykiatris@gmail.com> added the comment: > |
|||
msg11899 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-13.15:37:56 | |
Uploading a final patch for test_fileseek. Cleaned up code for clarity and readability. |
|||
msg11902 (view) | Author: Jeff Allen (jeff.allen) | Date: 2018-04-13.19:54:20 | |
Now we have a failing test (based on certain expectations) but: 1. Does CPython also fail this test? 2. Can you develop a change to Jython that makes it pass? If the answer to #1 is yes, we ought to think seriously whether it matters that Jython fails. Are our expectations correct/meaningful? The question in my mind is whether we should really expect a buffered text file with encoding, soft-spacing and universal line-end processsing to seek accurately. The streams from the io module support seek, but not arithmetically in the obvious way: the argument must be a sort of opaque "cookie" or "marker" returned by tell(). The built-in file is not from the io module, and is likely less sophisticated. Patrick: This will be what Jim is getting at in his comment #msg10655. If you simply prove that the behaviour is undefined in these cases, that's useful work we've been neglecting and you will have closed another issue. Assuming there is something to fix here, a useful patch would consist of: 1. an addition to the test suite that fails in the present version. 2. a change to Jython that makes the test pass. 3. no accidental inclusions of other files. We won't want your src/test_fileseek.py as it stands in the codebase, but you're probably now in a position to add the failing test(s) to (most likely) Lib/test/test_file_jy.py, based on it. If you look in there, and other tests modules nearby, you'll see how unit tests work. If you haven't used unittest before it is a useful module to learn about. There are tests for file.seek in Lib/test/test_file.py: why don't those fail? Could you attach just the latest version of test_fileseek.py as an entire file (not patches)? We can do without the old versions: if you can't remove them I can. |
|||
msg11903 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-13.23:47:06 | |
I'm glad I don't sew. I'm terrible at patching... 1. Yes, all the test functions in test_fileseek.py passes in Python 2.7.12. In all Jython version I have (2.5.3, 2.7.0 and 2.7.2), the seekreadline and seekboth test functions fail. I will look at the test_file.py and see what goes on. Those tests probably pass because the test file used is different. So i will look and copy it and try with test_fileseek.py I'm attaching the final test_fileseek.py. No more, no more! Please delete all old version. :/ I appreciate your guidance. |
|||
msg11904 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-14.13:59:39 | |
Jeff, I haven't delved into the unittest module, but I did create a shell script to manually run a test from the command-line. Now that I'm in a position of providing more help to the Jython development, I will study the unittest and improve my own test files. That being said, I looked in both test_file.py and test_file_jy.py and from what I can see, there a[[ears tp be NO tests for file seeking. I used find and looked for the word .seek. test-file had several references to sys.stdin.seek. Test_file_jy had one, fp.seek(0) which didn't do anything. I'll look in the io tests to see if I can track it down. On 04/13/2018 12:54 PM, Jeff Allen wrote: > Jeff Allen <ja.py@farowl.co.uk> added the comment: > > Now we have a failing test (based on certain expectations) but: > 1. Does CPython also fail this test? > 2. Can you develop a change to Jython that makes it pass? |
|||
msg11906 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-14.16:12:26 | |
While the fileseek tested "passed" in Python 2.7.12, here is the output: ========== readline() position-> 101 Seek back amt-------> -100 New position--------> 1 Correct position --- Result of seekboth() --- read() position------> 1024 readline() position--> 4202 Seek back amt--------> -100 New position---------> 4102 Correct position ======== If you look closely, the readline() test is correct. Readline() reads to the newline character. In the seekboth, however, readline() does NOT read to the newline, reading the whole file. Therefore, while it seems to pass, it actually fails because the "behavior" of readline() is to 'read the line from the file and retain the newline'. Correct? Now, there's a question of whether this is important, and maybe there's no application in the real world for its use, except looking for "marker" as you mentioned. I'm expanding my fileseek.py to include seeking forward, but I will NOT upload it... It's at the ad nauseum stage at htis point. |
|||
msg11907 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-15.13:11:22 | |
I apologize that this issue is becoming a very long thread. It dawned on me before I woke up this morning realizing that we may be dealing with the behavior of readline() and not the arithmetic, maybe. I did find a SeekTellTest in test_fileio, which helped me in drastically reducing my code for the read/readline seek test. Now to clarify my point: The test.log file consists of 3 lines, ending with a '\n': Line 1: 101 bytes Line 2: 4101 bytes : 1 & 2 = 4202 Line 3: 101 bytes : all lines = 4303 Now, read() reads all, so tell() will return 4303. But, since we limit it to 1024, it will mark position at 1024. When readline() is called, what happens? It read in lines 1 & 2 and produced a position of 4202 when we called tell(). When we did the seek, it produced an incorrect result in Jython because line 1 (101) was omitted from the calculation. Python passed because line 1 was included. We have 3 issues: 1. The math is incorrect. (the math passes in Python, fails in Jython) 2. Why did readline() read 2 lines, both of which ends with a newline? (fails in both Python and Jython) 3. Is it important? My output from Jython: --- Result of seekreadline() --- readline() position-> 101 Seek back amt-------> -100 New position--------> 200 Wrong position --- Result of seekboth() --- read() position------> 1024 readline() position--> 4202 Seek back amt--------> -100 New position---------> 4202 Wrong position |
|||
msg11908 (view) | Author: Jeff Allen (jeff.allen) | Date: 2018-04-15.17:03:12 | |
You're right, SeekTellTest in test_fileio is a good place to start. seek() calls I found in test_file are incidental. It sounds like you're homing in of some invalid behaviour where tell() is maybe returning only partial counts. To answer an earlier question, yes, readline() consumes the end of line, if there is one, and appends it to the returned bytes. I put it that way carefully, because an end of line is not necessarily an end-of-line *character*. Trying this in CPython 2.7.14: >>> binary = open('macbeth.txt', 'rb') >>> while "wood" not in binary.readline(): p = binary.tell() ... >>> p 55502L >>> binary.seek(p) >>> binary.readline() ' Makes wing to the rooky wood:\r\n' >>> text = open('macbeth.txt', 'r') >>> while "wood" not in text.readline(): p = text.tell() ... >>> p 55502L >>> text.seek(p) >>> text.readline() ' Makes wing to the rooky wood:\n' However, tell and seek are working in terms of the byte position at which the line was found, not a "character" offset in a stream where line endings have been converted. |
|||
msg11914 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-18.18:45:41 | |
I understand what youre saying. I'm working on the SeekTellTest in test_fileio.py to add the results of read() and readline(). Patch will be coming. On 04/15/2018 10:03 AM, Jeff Allen wrote: > Jeff Allen <ja.py@farowl.co.uk> added the comment: > > You're right, SeekTellTest in test_fileio is a good place to start. seek() calls I found in test_file are incidental. > > It sounds like you're homing in of some invalid behaviour where tell() is maybe returning only partial counts. > > |
|||
msg11915 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-19.05:26:52 | |
I'm attaching a patch file appending read() and readline() tests to the SeekTellTest in test_fileio.py. I had some very interesting results. |
|||
msg11916 (view) | Author: Jeff Allen (jeff.allen) | Date: 2018-04-19.06:56:54 | |
This would be better as a distinct test, and maybe in test_file_jy. We always aim *not* to have a special Jython variant of a CPython test (under the same file name). We do here already, but it would be nice not to make it diverge further. I think that earlier you demonstrated the bug only when the start position of the readline was not the start of the file. Using essentially your code and CPython 2.7.14 I get: testSeekTell (__main__.AutoFileTests2) ... FAIL ====================================================================== FAIL: testSeekTell (__main__.AutoFileTests2) ---------------------------------------------------------------------- Traceback (most recent call last): File "dist\Lib\test\test_fileio.py", line 63, in testSeekTell self.assertEqual(self.f.tell(), 31) AssertionError: 10L != 31 The first line of the file is terminated by the \x0a byte, that was inserted by self.f.write(bytes(range(20))). If we keep the file, instead of tidying up, we can see (CPython): >>> f = open('$test_1_tmp', 'r') >>> f.readline() '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n' >>> f.readline() '\x0b\x0c\r\x0eJython is great\n' >>> f.close() I was surprised the \r is not also a line end, but that's a thing you have to ask for explicitly: >>> f = open('$test_1_tmp', 'rU') # U = universal newlines mode >>> f.readline() '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n' >>> f.readline() '\x0b\x0c\n' >>> f.readline() '\x0eJython is great\n' >>> f.close() We perhaps have a bone to pick with CPython. Since this byte is at index 10 in the file, and the return from readline includes that byte, I think Jython is correct to return 11 and CPython is off by one! |
|||
msg11917 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-19.14:17:23 | |
Ah, so the initial readline() call returned the correct byte position, and did not include the text string I added.So readline() works, however, the seek calls afterwards return incorrect positions and will fail. I will look at test_file_jy and see about moving the test there. |
|||
msg11918 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-19.15:35:29 | |
I made the test pass, if I called seek(20) (to read in the text I appended). All other calculations were correct and the test passed. So, I started Jython and saw that if I called readline(), it read the first line of the file. Seeking realitively from what I thought the position would be, it actually started the seek from the END of the file, and not at the byte position of readline(). =============================== >>> f = open('seektest.log', 'wb') >>> f.write(b'To seek or not to seek,\nThat is the question\n') >>> f.close() >>> f = open('seektest.log', 'rb') >>> f.read() 'To seek or not to seek,\nThat is the question\n' >>> f.tell() 45L <--------- length of file >>> f.readline() '' >>> f.seek(0) >>> f.readline() 'To seek or not to seek,\n' >>> f.tell() 24L <-------- byte position of first line >>> f.seek(-10, 1) >>> f.tell() 35L <---- started seek from the end of the file, position 45 - 10 >>> f.seek(5, 1) >>> f.tell() 40L If I set the position at the start of line 2, it calculates correctly: >>> f.seek(24) >>> f.readline() 'That is the question\n' >>> f.tell() 45L >>> f.seek(-10, 1) >>> f.tell() 35L have I muddied up the issue, or what? ;/ |
|||
msg11919 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-19.17:53:16 | |
More fun: =========== Jython 2.7.2a1+ (default:623eaa3d7834, Apr 18 2018, 07:30:30) [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_162 Type "help", "copyright", "credits" or "license" for more information. >>> f = open('testseek.log', 'rb') >>> f.read() 'To seek or not to seek,\nThat is the quewtion.\n' >>> f.tell() 46L >>> f.seek(-25, 1) >>> f.tell() 21L >>> f.read() 'k,\nThat is the quewtion.\n' >>> f.seek(0) >>> f.tell() 0L >>> f.readline() 'To seek or not to seek,\n' >>> f.tell() 24L >>> f.seek(10) >>> f.readline() ' not to seek,\n' >>> f.seek(-5, 1) >>> f.tell() 41L >>> f.readline() 'ion.\n' >>> f.readline(40) '' >>> f.tell() 46L >>> f.seek(0) >>> f.tell() 0L >>> f.readline(40) 'To seek or not to seek,\n' >>> f.tell() 24L >>> f.seek(-5, 1) >>> f.tell() 41L >>> yes, I've had too much coffee.... |
|||
msg11920 (view) | Author: Patrick Palczewski (psykiatris) | Date: 2018-04-21.13:26:32 | |
Uploading patch for test_file_jy. It's clear that when readline is called, the tell() shows the byte position, however, when seek() is called, it begins at the end of the file and goes from there. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2018-04-21 13:26:33 | psykiatris | set | files:
+ test_file_jy1.patch messages: + msg11920 |
2018-04-19 17:53:17 | psykiatris | set | messages: + msg11919 |
2018-04-19 15:35:29 | psykiatris | set | messages: + msg11918 |
2018-04-19 14:17:24 | psykiatris | set | messages: + msg11917 |
2018-04-19 06:56:54 | jeff.allen | set | messages: + msg11916 |
2018-04-19 05:35:06 | jeff.allen | set | files: - unnamed |
2018-04-19 05:26:54 | psykiatris | set | files:
+ test_fileio1.patch messages: + msg11915 |
2018-04-18 18:45:41 | psykiatris | set | files:
+ unnamed messages: + msg11914 |
2018-04-15 17:03:13 | jeff.allen | set | messages: + msg11908 |
2018-04-15 15:36:43 | jeff.allen | set | files: - test_fileseek.py |
2018-04-15 15:35:38 | jeff.allen | set | files: - unnamed |
2018-04-15 15:34:58 | jeff.allen | set | files: - test_fileseek_final.patch |
2018-04-15 15:34:47 | jeff.allen | set | files: - test_fileseek2.patch |
2018-04-15 15:34:40 | jeff.allen | set | files: - test_fileseek1.patch |
2018-04-15 13:11:23 | psykiatris | set | messages: + msg11907 |
2018-04-14 16:12:26 | psykiatris | set | messages: + msg11906 |
2018-04-14 13:59:39 | psykiatris | set | files:
+ unnamed messages: + msg11904 |
2018-04-13 23:47:07 | psykiatris | set | files:
+ test_fileseek.py messages: + msg11903 |
2018-04-13 19:54:21 | jeff.allen | set | messages: + msg11902 |
2018-04-13 17:29:12 | tuck182 | set | nosy: - tuck182 |
2018-04-13 15:37:57 | psykiatris | set | files:
+ test_fileseek_final.patch messages: + msg11899 |
2018-04-11 18:14:36 | jeff.allen | set | nosy: + jeff.allen |
2018-04-11 18:14:00 | jeff.allen | set | files: - unnamed |
2018-04-11 18:13:48 | jeff.allen | set | files: - unnamed |
2018-04-11 05:24:00 | psykiatris | set | files:
+ unnamed, test_fileseek1.patch, test_fileseek2.patch keywords: + patch messages: + msg11890 |
2018-04-06 13:51:09 | psykiatris | set | files:
+ test_fileseek.py messages: + msg11882 |
2018-04-06 03:59:12 | psykiatris | set | files:
+ test_fileseek.py messages: + msg11880 |
2018-04-05 20:37:31 | psykiatris | set | files:
+ unnamed messages: + msg11879 |
2018-04-05 16:11:45 | psykiatris | set | nosy:
+ psykiatris messages: + msg11877 |
2016-01-15 15:33:38 | zyasoft | set | messages: + msg10655 |
2015-05-28 21:49:32 | amit | create |
Supported by Python Software Foundation,
Powered by Roundup