Repeating numbered capture groups in Perl
Imagine I'm trying to parse the following html using Perl regex:
<h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p>
<h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p> <p>num4</p>
using the following regular expression:
<h4>([ws]*)</h4>(?:<p>([ws]+)</p>)+
How would the numbered groups be structured in Perl? $1 would obviously contain the <h4>
tag text, but when the capture groups repeat, are the captured <p>
tags then sent to $2 $3 and $4? Is there a good way to capture all the <p>
tags in an array? Is this even something perl supports? Or am I forced to write a single regex for <h4>
, then another for the <p>
's?
(I'm aware I could use HTML::Tree
or something similar to parse the html, but this is just a simplified example I'm using to help describe the question, I'm really only interested in how repeated numbered capture groups work in Perl)
When you repeat a capturing group, only the last matching group will be stored in the matcher.
If you want to get each match from a repeating group, you could use a replaceAll with a callback function or iterate through the matches one by one.
Most languages also have a "match all", which I don't know how to do in perl. This usually stores all matches into an array for you, but repeating groups are still stored only as last matched group.
链接地址: http://www.djcxy.com/p/74814.html上一篇: 直接在C ++表达式中使用正则表达式捕获
下一篇: 在Perl中重复编号的捕获组