Python script appends to an existing file but stops after 8192 bytes

2018-06-10 23:09:37

I have a problem with a Python program. It's a simple program that writes prime numbers to a file. I wrote it to practice Python. Everything goes fine until several hunder lines are written. Then the writing stops although the program continues to run until the highest number is reached.

I debugd the program and found that the write_prime function is called for each prime number, but is doesn't write to the file. I tested the program under Linux and under Windows7 and on both systems the same problem occurs, although on Windows it writes less lines. The reason for that seems to me that writing stops after 8192 characters (1 block) and Windows uses two characters for end of line, so it takes less lines to fill a block.

I wrote a similar program, but in that version I only write to the file without reading it (I use a list to store the prime numbers and loop through the list instead of through the file). In that program I don't have this problem, so I think it has something to do with the fact that the programs reads and writes to the same file.

Who can help me?

Here I show the output in Both Linux and Windows:

Under Linux (Ubuntu 14,10) with highest number is 8000:

1;2;0

2;3;1

3;5;2

4;7;2

…

750;5693;4

751;5701;8

752;51007;7993;30

The block of 8192 bytes ends after position 5 at line 752. After that we see one more prime number: the 1007th prime number and the number itself is 7993.

Under windows7 with highest number is 8000:

The file starts with the same numbers 2, 3, 5, 7 etc. and ends with

…

689;5171;4

690;5179;8

691;511007;7993;30

So the file is about 60 lines shorter. It is 8,206 bytes. I think this is because windows ends each line with 'rn' (2 characters) and Linux ends each line with 'n' (1 character).

So in both cases writing ends after one block.

Here is the full program:

"""  
Calculate prime numbers  

written and tested with python 3.4.2  

- enter the highest number   
- calculate all prime numbers up and until the highest number  

It is possible to run the program more often. It checks out the highest prime  
number in file primenumber.lst and continues from there.  

The file primenumbers.lst consists of lines with 3 fields separated by ';':  
- counter  
- prime number  
- distance to the previous prime number  

"""

from os.path import isfile  

def write_prime(counter, prime, distance):  
    f.write('%d;%d;%dn' % (counter, prime, distance))  
    return  

"""  
position the file at the last position with seek(0,2) (2 = last position  
in file) and go back with read(1) and seek(position,0) until you found the n of  
the previous record. Than read the next line (the last line of the file) with  
readline() and split the line into the three fields.  
If the file was not found, then the program runs for the first time.  
In that case you write the prime numbers 2, 3 and 5 to the file.  
You write these three prime number, so we can skip the other numbers that end  
with 5 to save time, for those are not prime numbers.  
"""  

if isfile("primenumber.lst"):  
    f = open("primenumber.lst", "r")  
    f.seek(0,2)  
    position = f.tell() - 2  
    f.seek(position, 0)  

    while f.read(1) != 'n':  
        position -= 1  
        f.seek(position,0)  
    line = str(f.readline()).split(';')  
    print(line)  
    counter = int(line[0])  
    previous = int(line[1])  
else:  
    f = open("primenumber.lst", "w")  
    write_prime(1, 2, 0)  
    write_prime(2, 3, 1)  
    write_prime(3, 5, 2)  
    counter = 3  
    previous = 5  

f.close()  

print('The highest prime number so far is %s' % str(previous))  
startnumber = previous + 1  
highest = int(input("Highest number: "))  

"""  
Walk through all the numbers until the highest number (entered at the screen).  
Skip the numbers that end with 0, 2, 4, 5, 6 or 8, for those are no primes.  

Divide each number by all the prime numbers you found before, until you reach  
the root of the number.  
If the modulo of one division is 0, the number is no prime number and you  
break out of the 'for line in f' loop.  
If it is a prime number write it to the file and break out of the loop.  
"""  

f = open("primenumber.lst", "r+")  

for i in range(startnumber,highest + 1):      # loop to the highest number  
    if str(i)[-1] in ('1', '3', '7', '9'):  
        f.seek(0)  
        root = int(i ** 0.5)  
        for line in f:                       # read all primes in the file  
            line_str = line.split(';')  
            x = int(line_str[1])  

            if i % x == 0:   
                break                   
            if x > (root):   
                counter += 1  
                distance = i - previous  
                previous = i  
                write_prime(counter, i, distance)  
                f.flush()  
                break                 

f.close()

f = open("primenumber.lst", "r+")  

for i in range(startnumber,highest + 1):      # loop to the highest number  
    if str(i)[-1] in ('1', '3', '7', '9'):  
        f.seek(0)  
        root = int(i ** 0.5)  
        for line in f:                       # read *some* primes in the file  
            line_str = line.split(';')  
            x = int(line_str[1])  

            if i % x == 0:   
                break                   
            if x > (root):                   # shortcut when we reach the root
                counter += 1  
                distance = i - previous  
                previous = i  
                write_prime(counter, i, distance)  # No seek before write!
                f.flush()  
                break

This will start reading from the beginning of the file, until it encounters a line that either is a divisor (thus proving the new number is not prime) or is larger than the approximate (floating point) square root of the new number. In the latter case, it immediately writes the number.

Note that this means each number will be written not at the end, but somewhere just past its square root. And once one of these written numbers is parseable as a large number, this behaviour will reoccur and the position stops moving; you're only writing in the middle of the list.

In addition, I'm not sure you're even writing at a proper line position; you're reading with buffered text I/O and not reaching the end or seeking before you switch to writing. Thus while the current f.next() position might be on line N, the actual file pointer might be rounded up to a read-ahead buffer size, such as 4096.

The combination can explain your last line: 691;511007;7993;30 is actually a partially overwritten line, which is why it has too many fields. After 5179 we expect to see 5189, but that's only the portion up to 691;51 ; the remainder, 1007;7993;30 , comes from a much later iteration. While 7993 is indeed prime, many numbers have been overwritten, and eventually this line would be used to assume any number under 511007**2 is also prime. At that point 511007 might be overwritten with an even larger number, and the file size will abruptly grow, but with incorrectly checked numbers.

Even append only mode is implied in the open() documentation to be non-portable, so a seek using SEEK_END before you write is probably the way to go.

As a final flaw, what happens if the square root of the candidate number is higher than the last number in the file? (Admittedly, this should not occur as the square root is lower than the prior prime, per Bertrand's postulate.)

The structuring of your program makes in horribly hard to read and undertand. But I think the problem is that you are jumping around in the file overwriting old answers, can't really tell exactly where it happens (too hard to follow) but it seems to be the behaviour of your f.seek(0) along with f.flush() in the loop that finds new primes On top of this your program is faulty when the files exists already, the file search you do does not work.

A restructuring of your program using the 'a' mode for file append, doing a correct fileread to get found primes before starting to find new primes yields a good working program for me finding all primes and writing them correctly to the file. Though it still needs a lot of more work structure wise imo.

So here goes

First off, your code for checking if the prime file exists and if so set the values for counter and previous is unnecessary complex with the seek and tell commands and it does not work. Just read all the lines in the file then check the last one for highest values, no need to do any position changes of the read pointer.

if isfile("primenumber.lst"):  
    f = open("primenumber.lst", "r")
    lines = f.readlines()
    last = [l.split(';') for l in lines][-1]
    counter = int(last[0])
    previous = int(last[1])

Secondly instead of reading back the found primes from the file each iteration read the primes in once and store in a variable

## Get found primes so far
f = open("primenumber.lst", "r")  
found_primes = []
for line in f:
    line_str = line.split(';')
    found_primes.append(int(line_str[1]))
f.close()

Then using 'a' mode start finding and writing new primes to the file, updating the found_prime variable as we go along

f = open("primenumber.lst", "a")
for i in range(startnumber,highest + 1):
    if str(i)[-1] in ('1', '3', '7', '9'):
        root = int(i ** 0.5)
        for prime in found_primes:
            if i % prime == 0:
                break;
            if prime > (root):
                counter += 1
                distance = i - previous
                previous = i
                write_prime(counter, i, distance)
                found_primes.append(i)
                break
f.close()

With these modifications I found all primes up to 20000 and they are correctly written to the file

链接地址: http://www.djcxy.com/p/31546.html

上一篇: 以大字符串替换

下一篇: Python脚本附加到现有文件，但在8192字节后停止