Match several occurrences or zero (in this order) using regular expressions
I want to match roman numbers using Groovy regular expressions (I have not tried this in Java but should be the same). I found an answer in this website in which someone suggested the following regex:
/M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})/
The problem is that a expression like /V?I{0,3}/
is not greedy in Groovy. So for a string like "Book number VII" the matcher /V?I{0,3}/
returns "V" and not "VII" as it would be desired.
Obviously if we use the pattern /VI+/
then we DO get the match "VII"... but this solution is not valid if the string is something like "Book number V" as we will get no matches...
I tried to force the maximum character catching by using a greedy quantifier /VI{0,3}+/
or even /VI*+/
but I still get the match "V" over "VII"
Any ideas?
为什么不只是(IX | IV | V?I {1,3} | V)?
I found what my mistake was. Thing is that patterns like /V?I{0,3}/
or /V?I*/
are met even by EMPTY strings... so for a string like "Book VII" the matcher will throw the following result matches:
Result[0] --> ''
Result[1] --> ''
Result[2] --> ''
Result[3] --> ''
Result[4] --> ''
Result[5] --> 'VII'
Result[6] --> ''
The greedy result is there (Result[5]) alright. My problem was that I was always picking the first match (Result[0]) and that is only valid if the pattern is not met by empty strings.
For instance, the suggested pattern /V?I{1,3}|V/
will throw only one result, so picking the first result match is Ok:
Result[0] --> 'VII'
... This is so since the pattern is not met by empty strings.
Hope this helps others
链接地址: http://www.djcxy.com/p/76930.html上一篇: 贪婪的量词