line regex greedy group
I'm attempting to parse the following example text in Python:
Foo 1
foo1Text
Bar
bar1Text
Baz
baz1Text
Foo 2
foo2Text
Bar
bar2Text
Baz
baz2Text
# and so on up to Foo/Bar/Baz N
Now, the regex I'm using is:
([S ]+)(n*)([sS]*?)Bar([sS]*?)Baz([sS]*?)
Now - what I'd like to do is lift out the text relevant to foo
/ bar
/ baz
. However, with the lazy qualifier on the end of the regex, ?
the expression stops short and misses the baz2text
. Conversely, making it greedy matches everything else as part of the last group.
I'd prefer to not use a numeric qualifier if possible and broadly match things based on:
{title}
{stuff about title}
Bar
{stuff about Bar}
Baz
{stuff about Baz}
So I may iterate through each match and extract groups accordingly. Please note, I've not phrased this around extracting concrete output. I'm mostly interested in getting the regex 'groups' so they represent: {title}
, {stuff about title}
, {stuff about bar}
, {stuff about Baz}
I was putzing around with regex101 to see if I could determine the right incantation to no avail.
This is one of those problems where its easy enough to do manually. But then I wouldn't learn anything! :) I'd love to know if there's some cleaner mechanism / strategy I should be using here.
Thanks much
If you know that Foo
is the next group after Baz
, then what you need is a lookahead: ([S ]+)(n*)([sS]*?)Bar([sS]*?)Baz([sS]*?)(?=Foo)
.
Lookaheads are zero-width assertions, so it ensures a match immediately follows but doesn't change the current position.
链接地址: http://www.djcxy.com/p/74780.html上一篇: R中的正则表达式组捕获多个捕获
下一篇: 行正则表达式贪婪组