line regex greedy group

2018-06-26 17:42:57

I'm attempting to parse the following example text in Python:

Foo 1
foo1Text

Bar 
bar1Text

Baz 
baz1Text

Foo 2
foo2Text

Bar 
bar2Text

Baz 
baz2Text

# and so on up to Foo/Bar/Baz N

Now, the regex I'm using is:

([S ]+)(n*)([sS]*?)Bar([sS]*?)Baz([sS]*?)

Now - what I'd like to do is lift out the text relevant to foo / bar / baz . However, with the lazy qualifier on the end of the regex, ? the expression stops short and misses the baz2text . Conversely, making it greedy matches everything else as part of the last group.

I'd prefer to not use a numeric qualifier if possible and broadly match things based on:

{title}
{stuff about title}

Bar
{stuff about Bar}

Baz
{stuff about Baz}

So I may iterate through each match and extract groups accordingly. Please note, I've not phrased this around extracting concrete output. I'm mostly interested in getting the regex 'groups' so they represent: {title} , {stuff about title} , {stuff about bar} , {stuff about Baz}

I was putzing around with regex101 to see if I could determine the right incantation to no avail.

This is one of those problems where its easy enough to do manually. But then I wouldn't learn anything! :) I'd love to know if there's some cleaner mechanism / strategy I should be using here.

Thanks much

If you know that Foo is the next group after Baz , then what you need is a lookahead: ([S ]+)(n*)([sS]*?)Bar([sS]*?)Baz([sS]*?)(?=Foo) .

Lookaheads are zero-width assertions, so it ensures a match immediately follows but doesn't change the current position.

链接地址: http://www.djcxy.com/p/74780.html

上一篇: R中的正则表达式组捕获多个捕获

下一篇: 行正则表达式贪婪组