Issue2363

classification
Title: relative seeks works incorrectly after readline
Type: behaviour Severity: major
Components: Core Versions: Jython 2.7, Jython 2.5
Milestone:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amit, fwierzbicki, jeff.allen, psykiatris, santa4nt, zyasoft
Priority: Keywords: patch

Created on 2015-05-28.21:49:32 by amit, last changed 2018-04-21.13:26:33 by psykiatris.

Files
File name Uploaded Description Edit Remove
test.py amit, 2015-05-28.21:49:31 test file to reproduce the problem
test_fileseek.py psykiatris, 2018-04-06.03:59:10 Expanded test file.
test_fileseek.py psykiatris, 2018-04-13.23:47:06 Updated test
test_fileio1.patch psykiatris, 2018-04-19.05:26:52
test_file_jy1.patch psykiatris, 2018-04-21.13:26:32 Patch amending SeekTellTest to test_file_jy
Messages
msg10091 (view) Author: Amit Srivastava (amit) Date: 2015-05-28.21:49:31
There seems to be an inconsistency after readline due to the readahead. The attached test script (runs fine for CPython) reproduces the problem.

This could be related to issue1793, but since that issue was closed and am creating a new one.
msg10655 (view) Author: Jim Baker (zyasoft) Date: 2016-01-15.15:33:38
My limited experience in looking at this problem is that is one of more undefined aspects of CPython. (To see this, just try varying the provided test script a bit.) We should investigate more and see what standard behavior we can extract.
msg11877 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-05.16:11:45
I wanted to take a look at this, so I ran this on 2.7.2, It ws consistently 100 bytes off, so I put -50 for the seek_back. The file.tell() returned a position higher than the beginning current position, still 100 bytes off.

==============

Current fd position:    4202
seeking back:           50
fd position after seek: 4252
Traceback (most recent call last):
  File "/home/patrick/jython/dist/Lib/test/bugs/issue_2363.py", line 13, in <module>
    assert file.tell()==4152
AssertionError

=============

Does this mean that file.tell() doesn't give the absolute position, or it doesn't indicate the last position in the file?

Populating the test.log file with different characters produce differrent results, obviously.
msg11879 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-05.20:37:31
Never mind. I was wrong.

On Thu, Apr 5, 2018 at 9:11 AM Patrick Palczewski <report@bugs.jython.org>
wrote:

>
> Patrick Palczewski <psykiatris@gmail.com> added the comment:
>
> I wanted to take a look at this, so I ran this on 2.7.2, It ws
> consistently 100 bytes off, so I put -50 for the seek_back. The file.tell()
> returned a position higher than the beginning current position, still 100
> bytes off.
>
> ==============
>
> Current fd position:    4202
> seeking back:           50
> fd position after seek: 4252
> Traceback (most recent call last):
>   File "/home/patrick/jython/dist/Lib/test/bugs/issue_2363.py", line 13,
> in <module>
>     assert file.tell()==4152
> AssertionError
>
> =============
>
> Does this mean that file.tell() doesn't give the absolute position, or it
> doesn't indicate the last position in the file?
>
> Populating the test.log file with different characters produce differrent
> results, obviously.
>
> ----------
> nosy: +psykiatris
>
> _______________________________________
> Jython tracker <report@bugs.jython.org>
> <http://bugs.jython.org/issue2363>
> _______________________________________
>
-- 
Patrick Palczewski, Esq.
msg11880 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-06.03:59:10
I developed a more complete test to see what happens for each process. readline() by itself returns incorrect file positions, but will pass if changed to readlines()

Only changes I made were that I did not assign to the data variable, and I created variables for the tell() for easier printing. Other than that, it's essentially the same code as the original test.

Attached test_fileseek.py

Patrick
msg11882 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-06.13:51:08
I hate it when I goof up. But, readline() does not work correctly when seek is used.

I updated my test script to better display the info, and to allow people to pass size and the seek amount to the function. it ended up clearly showing Iin the seekboth function, this:

Output:
File position from read():      1024
File position from readline():  4202
Amount to seek back:           -1024
New position in file:           3278
Wrong position

The function returns the correct position in Python 2.7.12

But, really, readline() *should* only read the line to the newline, which is 101 (test.log shows the first line as 100 '.' with a newline.) This is verified by the seekreadline call in the test file.
msg11890 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-11.05:24:00
I have added a patch for the test_fileseek.py to add a test reversing 
the readline/red.

What I relized is while the read buffer may be larger, the readline 
reads to the newline. Since the newline is at the 101st position in the 
test.log, it apparently skewing the results. if you reverse it, run 
readline() first, then read(), then the results are correct. Here is my 
output for both:

2nd patch fixes results. I had a couple of prints backwards. But the 
final result is below:

========

--Result of read--
File position:        1024 <-- read limit
Amount to seek back:  -200
New position:          824
Correct position!
--- Result of readline() ---
File position:         101 <-- includes NEWLINE
Amount to seek back:  -100
New position in file:  200
Wrong position
--- Result of both() ---
File position from read():      4202
File position from readline():   101
Amount to seek back:            -100
New position in file:           4201
Wrong position
--- Result of both_reversed() ---
File position from readline():   101
File position from read():      1125
Amount to seek back:            -100
New position in file:           1025
Correct position

On 04/06/2018 06:51 AM, Patrick Palczewski wrote:
> Patrick Palczewski <psykiatris@gmail.com> added the comment:
>
msg11899 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-13.15:37:56
Uploading a final patch for test_fileseek. Cleaned up code for clarity and readability.
msg11902 (view) Author: Jeff Allen (jeff.allen) Date: 2018-04-13.19:54:20
Now we have a failing test (based on certain expectations) but:
1. Does CPython also fail this test?
2. Can you develop a change to Jython that makes it pass?

If the answer to #1 is yes, we ought to think seriously whether it matters that Jython fails. Are our expectations correct/meaningful?

The question in my mind is whether we should really expect a buffered text file with encoding, soft-spacing and universal line-end processsing to seek accurately. The streams from the io module support seek, but not arithmetically in the obvious way: the argument must be a sort of opaque "cookie" or "marker" returned by tell(). The built-in file is not from the io module, and is likely less sophisticated.


Patrick:

This will be what Jim is getting at in his comment #msg10655. If you simply prove that the behaviour is undefined in these cases, that's useful work we've been neglecting and you will have closed another issue.

Assuming there is something to fix here, a useful patch would consist of:
1. an addition to the test suite that fails in the present version.
2. a change to Jython that makes the test pass.
3. no accidental inclusions of other files.

We won't want your src/test_fileseek.py as it stands in the codebase, but you're probably now in a position to add the failing test(s) to (most likely) Lib/test/test_file_jy.py, based on it. If you look in there, and other tests modules nearby, you'll see how unit tests work. If you haven't used unittest before it is a useful module to learn about.

There are tests for file.seek in Lib/test/test_file.py: why don't those fail?

Could you attach just the latest version of test_fileseek.py as an entire file (not patches)? We can do without the old versions: if you can't remove them I can.
msg11903 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-13.23:47:06
I'm glad I don't sew. I'm terrible at patching...

1. Yes, all the test functions in test_fileseek.py passes in Python 2.7.12. In all Jython version I have (2.5.3, 2.7.0 and 2.7.2), the seekreadline and seekboth test functions fail.

I will look at the test_file.py and see what goes on. Those tests probably pass because the test file used is different. So i will look and copy it and try with test_fileseek.py


I'm attaching the final test_fileseek.py. No more, no more! Please delete all old version. :/

I appreciate your guidance.
msg11904 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-14.13:59:39
Jeff,

I haven't delved into the unittest module, but I did create a shell 
script to manually run a test from the command-line. Now that I'm in a 
position of providing more help to the Jython development, I will study 
the unittest and improve my own test files.

That being said, I looked in both test_file.py and test_file_jy.py and 
from what I can see, there a[[ears tp be NO tests for file seeking. I 
used find and looked for the word .seek. test-file had several 
references to sys.stdin.seek. Test_file_jy had one, fp.seek(0) which 
didn't do anything.

I'll look in the io tests to see if I can track it down.

On 04/13/2018 12:54 PM, Jeff Allen wrote:
> Jeff Allen <ja.py@farowl.co.uk> added the comment:
>
> Now we have a failing test (based on certain expectations) but:
> 1. Does CPython also fail this test?
> 2. Can you develop a change to Jython that makes it pass?
msg11906 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-14.16:12:26
While the fileseek tested "passed" in Python 2.7.12, here is the output:
==========
readline() position->  101
Seek back amt-------> -100
New position-------->    1
Correct position

--- Result of seekboth() ---

read() position------> 1024
readline() position--> 4202
Seek back amt--------> -100
New position---------> 4102
Correct position
========

If you look closely, the readline() test is correct. Readline() reads to the newline character. In the seekboth, however, readline() does NOT read to the newline, reading the whole file. Therefore, while it seems to pass, it actually fails because the "behavior" of readline() is to 'read the line from the file and retain the newline'. Correct?

Now, there's a question of whether this is important, and maybe there's no application in the real world for its use, except looking for "marker" as you mentioned.

I'm expanding my fileseek.py to include seeking forward, but I will NOT upload it... It's at the ad nauseum stage at htis point.
msg11907 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-15.13:11:22
I apologize that this issue is becoming a very long thread. It dawned on me before I woke up this morning realizing that we may be dealing with the behavior of readline() and not the arithmetic, maybe. 

I did find a SeekTellTest in test_fileio, which helped me in drastically reducing my code for the read/readline seek test.

Now to clarify my point: The test.log file consists of 3 lines, ending with a '\n':

Line 1:  101 bytes
Line 2: 4101 bytes : 1 & 2 = 4202
Line 3: 101 bytes  : all lines = 4303

Now, read() reads all, so tell() will return 4303. But, since we limit it to 1024, it will mark position at 1024.

When readline() is called, what happens? It read in lines 1 & 2 and produced a position of 4202 when we called tell(). When we did the seek, it produced an incorrect result in Jython because line 1 (101) was omitted from the calculation. Python passed because line 1 was included.

We have 3 issues:
1. The math is incorrect. (the math passes in Python, fails in Jython)
2. Why did readline() read 2 lines, both of which ends with a newline? (fails in both Python and Jython)
3. Is it important?
My output from Jython:

--- Result of seekreadline() ---

readline() position->  101
Seek back amt-------> -100
New position-------->  200
Wrong position

--- Result of seekboth() ---

read() position------> 1024
readline() position--> 4202
Seek back amt--------> -100
New position---------> 4202
Wrong position
msg11908 (view) Author: Jeff Allen (jeff.allen) Date: 2018-04-15.17:03:12
You're right, SeekTellTest in test_fileio is a good place to start. seek() calls I found in test_file are incidental.

It sounds like you're homing in of some invalid behaviour where tell() is maybe returning only partial counts.

To answer an earlier question, yes, readline() consumes the end of line, if there is one, and appends it to the returned bytes. I put it that way carefully, because an end of line is not necessarily an end-of-line *character*. Trying this in CPython 2.7.14:

>>> binary = open('macbeth.txt', 'rb')
>>> while "wood" not in binary.readline(): p = binary.tell()
...
>>> p
55502L
>>> binary.seek(p)
>>> binary.readline()
'    Makes wing to the rooky wood:\r\n'


>>> text = open('macbeth.txt', 'r')
>>> while "wood" not in text.readline(): p = text.tell()
...
>>> p
55502L
>>> text.seek(p)
>>> text.readline()
'    Makes wing to the rooky wood:\n'


However, tell and seek are working in terms of the byte position at which the line was found, not a "character" offset in a stream where line endings have been converted.
msg11914 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-18.18:45:41
I understand what youre saying.

I'm working on the SeekTellTest in test_fileio.py to add the results of 
read() and readline().  Patch will be coming.

On 04/15/2018 10:03 AM, Jeff Allen wrote:
> Jeff Allen <ja.py@farowl.co.uk> added the comment:
>
> You're right, SeekTellTest in test_fileio is a good place to start. seek() calls I found in test_file are incidental.
>
> It sounds like you're homing in of some invalid behaviour where tell() is maybe returning only partial counts.
>
>
msg11915 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-19.05:26:52
I'm attaching a patch file appending read() and readline() tests to the SeekTellTest in test_fileio.py.

I had some very interesting results.
msg11916 (view) Author: Jeff Allen (jeff.allen) Date: 2018-04-19.06:56:54
This would be better as a distinct test, and maybe in test_file_jy. We always aim *not* to have a special Jython variant of a CPython test (under the same file name). We do here already, but it would be nice not to make it diverge further.

I think that earlier you demonstrated the bug only when the start position of the readline was not the start of the file. Using essentially your code and CPython 2.7.14 I get:

testSeekTell (__main__.AutoFileTests2) ... FAIL

======================================================================
FAIL: testSeekTell (__main__.AutoFileTests2)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "dist\Lib\test\test_fileio.py", line 63, in testSeekTell
    self.assertEqual(self.f.tell(), 31)
AssertionError: 10L != 31

The first line of the file is terminated by the \x0a byte, that was inserted by self.f.write(bytes(range(20))). If we keep the file, instead of tidying up, we can see (CPython):

>>> f = open('$test_1_tmp', 'r')
>>> f.readline()
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n'
>>> f.readline()
'\x0b\x0c\r\x0eJython is great\n'
>>> f.close()

I was surprised the \r is not also a line end, but that's a thing you have to ask for explicitly:
>>> f = open('$test_1_tmp', 'rU') # U = universal newlines mode
>>> f.readline()
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n'
>>> f.readline()
'\x0b\x0c\n'
>>> f.readline()
'\x0eJython is great\n'
>>> f.close()

We perhaps have a bone to pick with CPython. Since this byte is at index 10 in the file, and the return from readline includes that byte, I think Jython is correct to return 11 and CPython is off by one!
msg11917 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-19.14:17:23
Ah, so the initial readline() call returned the correct byte position, and did not include the text string I added.So readline() works, however, the seek calls afterwards return incorrect positions and will fail.

I will look at test_file_jy and see about moving the test there.
msg11918 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-19.15:35:29
I made the test pass, if I called seek(20) (to read in the text I appended). All other calculations were correct and the test passed. So, I started Jython and saw that if I called readline(), it read the first line of the file. Seeking realitively from what I thought the position would be, it actually started the seek from the END of the file, and not at the byte position of readline().

===============================
>>> f = open('seektest.log', 'wb')
>>> f.write(b'To seek or not to seek,\nThat is the question\n')
>>> f.close()
>>> f = open('seektest.log', 'rb')
>>> f.read()
'To seek or not to seek,\nThat is the question\n'
>>> f.tell()
45L            <--------- length of file
>>> f.readline()
''
>>> f.seek(0)
>>> f.readline()
'To seek or not to seek,\n'
>>> f.tell()
24L       <-------- byte position of first line
>>> f.seek(-10, 1)
>>> f.tell()
35L      <---- started seek from the end of the file, position 45 - 10
>>> f.seek(5, 1)
>>> f.tell()
40L

If I set the position at the start of line 2, it calculates correctly:

>>> f.seek(24)
>>> f.readline()
'That is the question\n'
>>> f.tell()
45L
>>> f.seek(-10, 1)
>>> f.tell()
35L

have I muddied up the issue, or what? ;/
msg11919 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-19.17:53:16
More fun:

===========
Jython 2.7.2a1+ (default:623eaa3d7834, Apr 18 2018, 07:30:30) 
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.8.0_162
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('testseek.log', 'rb')
>>> f.read()
'To seek or not to seek,\nThat is the quewtion.\n'
>>> f.tell()
46L
>>> f.seek(-25, 1)
>>> f.tell()
21L
>>> f.read()
'k,\nThat is the quewtion.\n'
>>> f.seek(0)
>>> f.tell()
0L
>>> f.readline()
'To seek or not to seek,\n'
>>> f.tell()
24L
>>> f.seek(10)
>>> f.readline()
' not to seek,\n'
>>> f.seek(-5, 1)
>>> f.tell()
41L
>>> f.readline()
'ion.\n'
>>> f.readline(40)
''
>>> f.tell()
46L
>>> f.seek(0)
>>> f.tell()
0L
>>> f.readline(40)
'To seek or not to seek,\n'
>>> f.tell()
24L
>>> f.seek(-5, 1)
>>> f.tell()
41L
>>> 

yes, I've had too much coffee....
msg11920 (view) Author: Patrick Palczewski (psykiatris) Date: 2018-04-21.13:26:32
Uploading patch for test_file_jy.

It's clear that when readline is called, the tell() shows the byte position, however, when seek() is called, it begins at the end of the file and goes from there.
History
Date User Action Args
2018-04-21 13:26:33psykiatrissetfiles: + test_file_jy1.patch
messages: + msg11920
2018-04-19 17:53:17psykiatrissetmessages: + msg11919
2018-04-19 15:35:29psykiatrissetmessages: + msg11918
2018-04-19 14:17:24psykiatrissetmessages: + msg11917
2018-04-19 06:56:54jeff.allensetmessages: + msg11916
2018-04-19 05:35:06jeff.allensetfiles: - unnamed
2018-04-19 05:26:54psykiatrissetfiles: + test_fileio1.patch
messages: + msg11915
2018-04-18 18:45:41psykiatrissetfiles: + unnamed
messages: + msg11914
2018-04-15 17:03:13jeff.allensetmessages: + msg11908
2018-04-15 15:36:43jeff.allensetfiles: - test_fileseek.py
2018-04-15 15:35:38jeff.allensetfiles: - unnamed
2018-04-15 15:34:58jeff.allensetfiles: - test_fileseek_final.patch
2018-04-15 15:34:47jeff.allensetfiles: - test_fileseek2.patch
2018-04-15 15:34:40jeff.allensetfiles: - test_fileseek1.patch
2018-04-15 13:11:23psykiatrissetmessages: + msg11907
2018-04-14 16:12:26psykiatrissetmessages: + msg11906
2018-04-14 13:59:39psykiatrissetfiles: + unnamed
messages: + msg11904
2018-04-13 23:47:07psykiatrissetfiles: + test_fileseek.py
messages: + msg11903
2018-04-13 19:54:21jeff.allensetmessages: + msg11902
2018-04-13 17:29:12tuck182setnosy: - tuck182
2018-04-13 15:37:57psykiatrissetfiles: + test_fileseek_final.patch
messages: + msg11899
2018-04-11 18:14:36jeff.allensetnosy: + jeff.allen
2018-04-11 18:14:00jeff.allensetfiles: - unnamed
2018-04-11 18:13:48jeff.allensetfiles: - unnamed
2018-04-11 05:24:00psykiatrissetfiles: + unnamed, test_fileseek1.patch, test_fileseek2.patch
keywords: + patch
messages: + msg11890
2018-04-06 13:51:09psykiatrissetfiles: + test_fileseek.py
messages: + msg11882
2018-04-06 03:59:12psykiatrissetfiles: + test_fileseek.py
messages: + msg11880
2018-04-05 20:37:31psykiatrissetfiles: + unnamed
messages: + msg11879
2018-04-05 16:11:45psykiatrissetnosy: + psykiatris
messages: + msg11877
2016-01-15 15:33:38zyasoftsetmessages: + msg10655
2015-05-28 21:49:32amitcreate