Why was this regex calling substcont an excessive number of times?
This is more out of curiosity than anything else, as I'm failing to find any useful info on Google about this function (CORE::substcont)
In profiling and optimising some old, slow, XML parsing code I've found that the following regex is calling substcont 31 times for each time the line is executed, and taking a huge amount of time:
Calls: 10000 Time: 2.65s Sub calls: 320000 Time in subs: 1.15s`
$handle =~s/(>)s*(<)/$1n$2/g;
# spent 1.09s making 310000 calls to main::CORE:substcont, avg 4µs/call
# spent 58.8ms making 10000 calls to main::CORE:subst, avg 6µs/call
Compared to the immediately preceding line:
Calls: 10000 Time: 371ms Sub calls: 30000 Time in subs: 221ms
$handle =~s/(.*)s*(<?)/$1n$2/g;
# spent 136ms making 10000 calls to main::CORE:subst, avg 14µs/call
# spent 84.6ms making 20000 calls to main::CORE:substcont, avg 4µs/call
The number of substcont calls is quite surprising, especially seeing as I would've thought that the second regex would be more expensive. This is, obviously, why profiling is a Good Thing ;-)
I've subsequently changed both these line to remove the unneccessary backrefs, with dramatic results for the badly-behaving line:
Calls:10000 Time: 393ms Sub calls: 10000 Time in subs: 341ms
$handle =~s/>s*</>n</g;
# spent 341ms making 10000 calls to main::CORE:subst, avg 34µs/call
substcont
is Perl's internal name for the "substitution iterator". Something to do with s///
. Based on what little information I have, it seems substcont
is triggered when doing a backref. That is, when $1
is present. You can play with it a bit using B::Concise.
Here's the opcodes of a simple regex without a backref.
$ perl -MO=Concise,-exec -we'$foo = "foo"; $foo =~ s/(foo)/bar/ig'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const[PV "foo"] s
4 <#> gvsv[*foo] s
5 <2> sassign vKS/2
6 <;> nextstate(main 1 -e:1) v:{
7 <#> gvsv[*foo] s
8 <$> const[PV "bar"] s
9 </> subst(/"(foo)"/) vKS
a <@> leave[1 ref] vKP/REFC
-e syntax OK
And one with.
$ perl -MO=Concise,-exec -we'$foo = "foo"; $foo =~ s/(foo)/$1/ig'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const[PV "foo"] s
4 <#> gvsv[*foo] s
5 <2> sassign vKS/2
6 <;> nextstate(main 1 -e:1) v:{
7 <#> gvsv[*foo] s
8 </> subst(/"(foo)"/ replstart->9) vKS
9 <#> gvsv[*1] s
a <|> substcont(other->8) sK/1
b <@> leave[1 ref] vKP/REFC
-e syntax OK
That's all I can offer. You may want to try Rx, mjd's old regex debugger.
链接地址: http://www.djcxy.com/p/45908.html上一篇: GUI框架如何工作?