line regex greedy group
I'm attempting to parse the following example text in Python:
Foo 1
foo1Text
Bar
bar1Text
Baz
baz1Text
Foo 2
foo2Text
Bar
bar2Text
Baz
baz2Text
# and so on up to Foo/Bar/Baz N
Now, the regex I'm using is:
([S ]+)(n*)([sS]*?)Bar([sS]*?)Baz([sS]*?)
Now - what I'd like to do is lift out the text relevant to foo / bar / baz . However, with the lazy qualifier on the end of the regex, ? the expression stops short and misses the baz2text . Conversely, making it greedy matches everything else as part of the last group.
I'd prefer to not use a numeric qualifier if possible and broadly match things based on:
{title}
{stuff about title}
Bar
{stuff about Bar}
Baz
{stuff about Baz}
So I may iterate through each match and extract groups accordingly. Please note, I've not phrased this around extracting concrete output. I'm mostly interested in getting the regex 'groups' so they represent: {title} , {stuff about title} , {stuff about bar} , {stuff about Baz}
I was putzing around with regex101 to see if I could determine the right incantation to no avail.
This is one of those problems where its easy enough to do manually. But then I wouldn't learn anything! :) I'd love to know if there's some cleaner mechanism / strategy I should be using here.
Thanks much
If you know that Foo is the next group after Baz , then what you need is a lookahead: ([S ]+)(n*)([sS]*?)Bar([sS]*?)Baz([sS]*?)(?=Foo) .
Lookaheads are zero-width assertions, so it ensures a match immediately follows but doesn't change the current position.
链接地址: http://www.djcxy.com/p/74780.html上一篇: R中的正则表达式组捕获多个捕获
下一篇: 行正则表达式贪婪组
