Regex to find named capturing groups with Go programming language
I'm looking for a regex to find named capturing groups in (other) regex strings.
Example: I want to find (?P<country>m((a|b).+)n)
, (?P<city>.+)
and (?P<street>(5|6). .+)
in the following regex:
/(?P<country>m((a|b).+)n)/(?P<city>.+)/(?P<street>(5|6). .+)
I tried the following regex to find the named capturing groups:
var subGroups string = `((.+))*?`
var prefixedSubGroups string = `.+` + subGroups
var postfixedSubGroups string = subGroups + `.+`
var surroundedSubGroups string = `.+` + subGroups + `.+`
var capturingGroupNameRegex *regexp.RichRegexp = regexp.MustCompile(
`(?U)` +
`(?P<.+>` +
`(` + prefixedSubGroups + `|` + postfixedSubGroups + `|` + surroundedSubGroups + `)` +
`)`)
?U
makes greedy quantifiers( +
and *
) non-greedy, and non-greedy quantifiers ( *?
) greedy. Details in the Go regex documentation.
But it doesn't work because parenthesis are not matched correctly.
Matching arbitrarily nested parentheses correctly is not possible with regular expressions because arbitrary (recursive) nesting cannot be described by a regular language.
Some modern regex flavor do support recursion (Perl, PCRE) or balanced matching (.NET), but Go is not one of them (the docs explicitly say that Perl's (?R)
construct is not supported by the RE2 library that Go's regex package appears to be based on). You need to build a recursive descent parser, not a regex.
下一篇: 正则表达式用Go编程语言查找命名的捕获组