Repeating numbered capture groups in Perl

Imagine I'm trying to parse the following html using Perl regex:

<h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p>
<h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p> <p>num4</p>

using the following regular expression:

<h4>([ws]*)</h4>(?:<p>([ws]+)</p>)+

How would the numbered groups be structured in Perl? $1 would obviously contain the <h4> tag text, but when the capture groups repeat, are the captured <p> tags then sent to $2 $3 and $4? Is there a good way to capture all the <p> tags in an array? Is this even something perl supports? Or am I forced to write a single regex for <h4> , then another for the <p> 's?

(I'm aware I could use HTML::Tree or something similar to parse the html, but this is just a simplified example I'm using to help describe the question, I'm really only interested in how repeated numbered capture groups work in Perl)


When you repeat a capturing group, only the last matching group will be stored in the matcher.

If you want to get each match from a repeating group, you could use a replaceAll with a callback function or iterate through the matches one by one.

Most languages also have a "match all", which I don't know how to do in perl. This usually stores all matches into an array for you, but repeating groups are still stored only as last matched group.

链接地址: http://www.djcxy.com/p/74814.html

上一篇: 直接在C ++表达式中使用正则表达式捕获

下一篇: 在Perl中重复编号的捕获组