In relative terms, how fast should TCL on Windows 10 be?

2018-06-24 04:17:51

I have the latest TCL build from Active State installed on a desktop and laptop both running Windows 10. I'm new to TCL and a novice developer and my reason for learning TCL is to enhance my value on the F5 platform. I figured a good first step would be to stop the occasional work I do in VBScript and port that to TCL. Learning the language itself is coming along alright, but I'm worried my project isn't viable due to performance. My VBScripts absolutely destroy my TCL scripts in performance. I didn't expect that outcome as my understanding was TCL was so "fast" and that's why it was chosen by F5 for iRules etc.

So the question is, am I doing something wrong? Is the port for Windows just not quite there? Perhaps I misunderstood the way in which TCL is fast and it's not fast for file parsing applications?

My test application is a firewall log parser. Take a log with 6 million hits and find the unique src/dst/port/policy entries and count them; split up into accept and deny. Opening the file and reading the lines is fine, TCL processes 18k lines/second while VBScript does 11k. As soon as I do anything with the data, the tide turns. I need to break the four pieces of data noted above from the line read and put in array. I've "split" the line, done a for-next to read and match each part of the line, that's the slowest. I've done a regexp with subvariables that extracts all four elements in a single line, and that's much faster, but it's twice as slow as doing four regexps with a single variable and then cleaning the excess data from the match away with trims. But even this method is four times slower than VBScript with ad-hoc splits/for-next matching and trims. On my desktop, i get 7k lines/second with TCL and 25k with VBscript.

Then there's the array, I assume because my 3-dimensional array isn't a real array that searching through 3x as many lines is slowing it down. I may try to break up the array so it's looking through a third of the data currently. But the truth is, by the time the script gets to the point where there's a couple hundred entries in the array, it's dropped from processing 7k lines/second to less than 2k. My VBscript drops from about 25k lines to 22k lines. And so I don't see much hope.

I guess what I'm looking for in an answer, for those with TCL experience and general programming experience, is TCL natively slower than VB and other scripts for what I'm doing? Is it the port for Windows that's slowing it down? What kind of applications is TCL "fast" at or good at? If I need to try a different kind of project than reading and manipulating data from files I'm open to that.

edited to add code examples as requested:

while { [gets $infile line] >= 0 } {

some other commands I'm cutting out for the sake of space, they don't contribute to slowness

regexp {srcip=(.*)srcport.*dstip=(.*)dstport=(.*)dstint.*policyid=(.*)dstcount} $line -> srcip dstip dstport policyid

the above was unexpectedly slow. the fasted way to extract data I've found so far

regexp {srcip=(.*)srcport} $line srcip
set srcip [string trim $srcip "cdiloprsty="] 
regexp {dstip=(.*)dstport} $line dstip
set dstip [string trim $dstip "cdiloprsty="] 
regexp {dstport=(.*)dstint} $line dstport
set dstport [string trim $dstport "cdiloprsty="]
regexp {policyid=(.*)dstcount} $line a policyid
set policyid [string trim $policyid "cdiloprsty="]

Here is the array search that really bogs down after a while:

set start [array startsearch uList]
while {[array anymore uList $start]} {
    incr f
    #"key" returns the NAME of the association and uList(key) the VALUE associated with name
    set key [array nextelement uList $start]
    if  {$uCheck == $uList($key)} {
        ##puts "$key CONDITOIN MET"

        set flag true
        adduList $uCheck $key $flag2
        set flag2 false
        break
    }
}

Your question is still a bit broad in scope.

F5 has published some comment why they choose Tcl and how it is fast for their specific usecases. This is actually a bit different to a log parsing usecase, as they do all the heavy lifting in C-code (via custom commands) and use Tcl mostly as a fast dispatcher and for a bit of flow control. And Tcl is really good at that compared to various other languages.

For things like log parsing, Tcl is often beaten in performance by languages like Python and Perl in simple benchmarks. There are a variety of reasons for that, here are some of them:

Tcl uses a different regexp style (DFA), which are more robust for nasty patterns, but slower for simple patterns.

Tcl has a more abstract I/O layer than for example Python, and usually converts the input to unicode, which has some overhead if you do not disable it (via fconfigure )

Tcl has proper multithreading, instead of a global lock which costs around 10-20% performance for single threaded usecases.

So how to get your code fast(er)?

Try a more specific regular expression, those greedy .* patterns are bad for performance.

Try to use string commands instead of regexp, some string first commands followed by string range could be faster than a regexp for these simple patterns.

Use a different structure for that array, you probably want either a dict or some form of nested list .

Put your code inside a proc , do not put it all in a toplevel script and use local variables instead of globals to make the bytecode faster.

If you want, use one thread for reading lines from file and multiple threads for extracting data, like a typical producer-consumer pattern.

链接地址: http://www.djcxy.com/p/67744.html

上一篇: 在tcl中用foreach命令读取文件

下一篇: 相对而言，Windows 10上的TCL应该多快？