Python脚本附加到现有文件，但在8192字节后停止

2018-06-10 23:09:36

我有一个Python程序的问题。这是一个简单的程序，将素数编写到文件中。我写它来练习Python。一切都很顺利，直到编写了几条hunder线。然后，尽管程序继续运行直到达到最高编号，但写入停止。

我调试了程序，发现write_prime函数是为每个素数调用的，但是不会写入文件。我在Linux下和Windows7下测试了这个程序，并且在两个系统上都出现了同样的问题，尽管在Windows上它写入的线较少。之所以这样，我认为在8192个字符（1个块）之后停止写入，而Windows使用两个字符作为行结束，因此填充块需要更少的行。

我写了一个类似的程序，但在该版本中，我只写入文件而不读取它（我使用列表来存储素数并通过列表而不是通过文件循环）。在那个程序中我没有这个问题，所以我认为这与程序读取和写入同一个文件的事实有关。

谁能帮我？

在这里，我将在Linux和Windows中显示输出：

在Linux（Ubuntu 14,10）下，编号最高的是8000：

1; 2; 0

2; 3; 1

3; 5; 2

4; 7; 2

...

750; 5693; 4

751; 5701; 8

752; 51007; 7993; 30

8192字节块在第752行的位置5之后结束。在那之后，我们看到一个更多的素数：第1007个素数并且数字本身是7993。

在Windows7中最高的号码是8000：

该文件以相同的数字2,3,5,7等开始，并以。结尾

...

689; 5171; 4

690; 5179; 8

691; 511007; 7993; 30

所以这个文件大约要缩短60行。它是8,206字节。我认为这是因为Windows以' r n'（2个字符）结尾每行，并且Linux以' n'（1个字符）结束每行。

所以在这两种情况下，写一个块后结束。

这是完整的程序：

"""  
Calculate prime numbers  

written and tested with python 3.4.2  

- enter the highest number   
- calculate all prime numbers up and until the highest number  

It is possible to run the program more often. It checks out the highest prime  
number in file primenumber.lst and continues from there.  

The file primenumbers.lst consists of lines with 3 fields separated by ';':  
- counter  
- prime number  
- distance to the previous prime number  

"""

from os.path import isfile  

def write_prime(counter, prime, distance):  
    f.write('%d;%d;%dn' % (counter, prime, distance))  
    return  

"""  
position the file at the last position with seek(0,2) (2 = last position  
in file) and go back with read(1) and seek(position,0) until you found the n of  
the previous record. Than read the next line (the last line of the file) with  
readline() and split the line into the three fields.  
If the file was not found, then the program runs for the first time.  
In that case you write the prime numbers 2, 3 and 5 to the file.  
You write these three prime number, so we can skip the other numbers that end  
with 5 to save time, for those are not prime numbers.  
"""  

if isfile("primenumber.lst"):  
    f = open("primenumber.lst", "r")  
    f.seek(0,2)  
    position = f.tell() - 2  
    f.seek(position, 0)  

    while f.read(1) != 'n':  
        position -= 1  
        f.seek(position,0)  
    line = str(f.readline()).split(';')  
    print(line)  
    counter = int(line[0])  
    previous = int(line[1])  
else:  
    f = open("primenumber.lst", "w")  
    write_prime(1, 2, 0)  
    write_prime(2, 3, 1)  
    write_prime(3, 5, 2)  
    counter = 3  
    previous = 5  

f.close()  

print('The highest prime number so far is %s' % str(previous))  
startnumber = previous + 1  
highest = int(input("Highest number: "))  

"""  
Walk through all the numbers until the highest number (entered at the screen).  
Skip the numbers that end with 0, 2, 4, 5, 6 or 8, for those are no primes.  

Divide each number by all the prime numbers you found before, until you reach  
the root of the number.  
If the modulo of one division is 0, the number is no prime number and you  
break out of the 'for line in f' loop.  
If it is a prime number write it to the file and break out of the loop.  
"""  

f = open("primenumber.lst", "r+")  

for i in range(startnumber,highest + 1):      # loop to the highest number  
    if str(i)[-1] in ('1', '3', '7', '9'):  
        f.seek(0)  
        root = int(i ** 0.5)  
        for line in f:                       # read all primes in the file  
            line_str = line.split(';')  
            x = int(line_str[1])  

            if i % x == 0:   
                break                   
            if x > (root):   
                counter += 1  
                distance = i - previous  
                previous = i  
                write_prime(counter, i, distance)  
                f.flush()  
                break                 

f.close()

f = open("primenumber.lst", "r+")  

for i in range(startnumber,highest + 1):      # loop to the highest number  
    if str(i)[-1] in ('1', '3', '7', '9'):  
        f.seek(0)  
        root = int(i ** 0.5)  
        for line in f:                       # read *some* primes in the file  
            line_str = line.split(';')  
            x = int(line_str[1])  

            if i % x == 0:   
                break                   
            if x > (root):                   # shortcut when we reach the root
                counter += 1  
                distance = i - previous  
                previous = i  
                write_prime(counter, i, distance)  # No seek before write!
                f.flush()  
                break

这将从文件的开始处开始读取，直到遇到一条或者是除数的线（从而证明新数不是素数）或大于新数的近似（浮点）平方根。在后一种情况下，它立即写入该号码。

请注意，这意味着每个数字将不会写在最后，而是刚刚超过其平方根的地方。一旦这些书面数字中的一个可以被解析为一个大数字，这种行为将会重新出现并且位置停止移动; 你只是写在列表中间。

另外，我不确定你甚至在正确的位置上写字。你正在读缓冲文本I / O，并没有达到目的，或在你转向写作之前寻找。因此，虽然当前的f.next（）位置可能位于第N行，但实际文件指针可能会四舍五入为预读缓冲区大小，如4096。

这个组合可以解释你的最后一行： 691;511007;7993;30实际上是部分覆盖的行，这就是为什么它有太多的字段。 5179之后，我们预计会看到5189，但这只是691;51的部分691;51 ; 余下的1007;7993;30来自一个很晚的迭代。虽然7993确实是素数，但许多数字已被覆盖，并且最终这条线将被用于假定511007**2以下的任何数字也是素数。此时511007可能会被更大的数字覆盖，并且文件大小会突然增大，但数字不正确。

即使仅追加模式在open（）文档中也是不可移植的，因此在编写之前使用SEEK_END进行寻找可能是一种SEEK_END的方法。

作为最终的缺陷，如果候选号码的平方根高于文件中的最后一个号码，会发生什么情况？（不可否认，这不应该发生，因为根据伯特兰的假设，平方根低于前面的素数。）

程序的结构化使得难以阅读和发现。但我认为问题在于你在文件中跳过覆盖旧的答案，无法确切地说出它发生的位置（太难以遵循），但它似乎是f.seek(0)的行为找到新素数的循环中的f.flush()在文件已存在的情况下，此程序出现故障时，您执行的文件搜索不起作用。

使用'a'模式对文件追加进行重构，在开始查找新素数之前进行正确的文件读取以获得找到的素数会产生一个很好的工作程序，可以找到所有素数并将它们正确写入文件。虽然它仍然需要更多的工作架构智慧型。

所以在这里

首先，用于检查素数文件是否存在的代码以及如果是这样，设置counter和previous的值对于seek和tell命令来说是不复杂的，并且它不起作用。只需读取文件中的所有行，然后检查最后一行的最高值，不需要对读指针进行任何位置更改。

if isfile("primenumber.lst"):  
    f = open("primenumber.lst", "r")
    lines = f.readlines()
    last = [l.split(';') for l in lines][-1]
    counter = int(last[0])
    previous = int(last[1])

其次，不是从文件中读回找到的素数，而是每次迭代读取素数并存储在变量中

## Get found primes so far
f = open("primenumber.lst", "r")  
found_primes = []
for line in f:
    line_str = line.split(';')
    found_primes.append(int(line_str[1]))
f.close()

然后使用“a”模式开始查找并向文件写入新素数，并随着我们的进行更新found_prime变量

f = open("primenumber.lst", "a")
for i in range(startnumber,highest + 1):
    if str(i)[-1] in ('1', '3', '7', '9'):
        root = int(i ** 0.5)
        for prime in found_primes:
            if i % prime == 0:
                break;
            if prime > (root):
                counter += 1
                distance = i - previous
                previous = i
                write_prime(counter, i, distance)
                found_primes.append(i)
                break
f.close()

通过这些修改，我发现所有素数高达20000，并且它们被正确写入文件

链接地址: http://www.djcxy.com/p/31545.html

上一篇: Python script appends to an existing file but stops after 8192 bytes

下一篇: Why is Python faster than C++ in this case?