Open/read command in Tcl 8.5 for large files
Sorry if the title doesn't match my question well, I'm still unsure as to how I should put it.
Anyway, I've been using Tcl/Tk on Windows ( wish
) for a while now and haven't encountered any problem on the script I wrote until recently. The script is supposed to break down a large txt file into smaller files that can be imported to excel (I'm talking about breaking down a file with maybe 25M lines which comes around 2.55 GB).
My current script is something like that:
set data [open "file.txt" r]
set data1 [open "File Part1.txt" w]
set data2 [open "File Part2.txt" w]
set data3 [open "File Part3.txt" w]
set data4 [open "File Part4.txt" w]
set data5 [open "File Part5.txt" w]
set count 0
while {[gets $data line] != -1} {
if {$count > 4000000} {
puts $data5 $line
} elseif {$count > 3000000} {
puts $data4 $line
} elseif {$count > 2000000} {
puts $data3 $line
} elseif {$count > 1000000} {
puts $data2 $line
} else {
puts $data1 $line
}
incr count
}
close $data
close $data1
close $data2
close $data3
close $data4
close $data5
And I alter the numbers within the if
to get the desired number of lines per file, or add/remove any elseif
where required.
The problem is, with the latest file I got, I end up with only about half the data (1.22 GB instead of 2.55 GB) and I was wondering if there was a line which told Tcl to ignore the limit that it can read. I tried to look for it, but I didn't find anything (or anything that I could understand well; I'm still quite the amateur at Tcl ^^;). Can anyone help me?
EDIT (update): I found a program to open large text files and managed to get a preview of the contents of the file directly. There are actually 16,756,263 lines. I changed the script to:
set data [open "file.txt" r]
set data1 [open "File Part1.txt" w]
set count 0
while {[gets $data line] != -1} {
incr count
}
puts $data1 $count
close $data
close $data1
to get where the script is blocking and it stopped here:
There's a character that the text editor is not recognising in the middle line showing as a little square. I tried to use fconfigure
like evil otto suggested but I'm afraid I don't quite understand how the channelID
, name
or value
work exactly to escape that character. Um... help?
reEDIT : I managed to find out how fconfigure
worked! Thanks evil otto! Um, I'm not sure how I can 'choose' your answer since it's a comment instead of a proper answer...
Is it possible there is any binary data in "file.txt"? Under windows, tcl will flag eof if it reads a ^Z
(the default eofchar
) in a file. You can turn this off with fconfigure
:
fconfigure $data -eofchar {}
See the docs for full details.
I ran your script on a Mac, which is Unix-based, and noticed the following:
incr count
should be at the beginning of the loop--a minor point. 上一篇: git从整个历史中删除所有已删除的文件
下一篇: 在Tcl 8.5中打开/读取大文件的命令