Indices of all matches of a regex
I am trying to match all the occurrences of a regex and get the indices as a result. The example from Real World Haskell says I can do
string =~ regex :: [(Int, Int)]
However, this is broken since the regex library has been updated since the publication of RWH. (See All matches of regex in Haskell and "=~" raise "No instance for (RegexContext Regex [Char] [String])"). What is the correct way to do this?
Update:
I found matchAll which might give me what I want. I have no idea how to use it, though.
The key to using matchAll
is using the type annotation :: Regex
when creating regexs:
import Text.Regex
import Text.Regex.Base
re = makeRegex "[^aeiou]" :: Regex
test = matchAll re "the quick brown fox"
This returns a list of arrays. To get a list of (offset,length) pairs, just access the first element of each array:
import Data.Array ((!))
matches = map (!0) $ matchAll re "the quick brown fox"
-- [(0,1),(1,1),(3,1),(4,1),(7,1),(8,1),(9,1),(10,1),(11,1),(13,1),(14,1),(15,1),(16,1),(18,1)]
To use the =~
operator, things may have changed since RWH. You should use the predefined types MatchOffset
and MatchLength
and the special type constructor AllMatches
:
import Text.Regex.Posix
re = "[^aeiou]"
text = "the quick brown fox"
test1 = text =~ re :: Bool
-- True
test2 = text =~ re :: String
-- "t"
test3 = text =~ re :: (MatchOffset,MatchLength)
-- (0,1)
test4 = text =~ re :: AllMatches [] (MatchOffset, MatchLength)
-- (not showable)
test4' = getAllMatches $ (text =~ re :: AllMatches [] (MatchOffset, MatchLength))
-- [(0,1),(1,1),(3,1),(4,1),(7,1),(8,1),(9,1),(10,1),(11,1),(13,1),(14,1),(15,1),(16,1),(18,1)]
See the docs for Text.Regex.Base.Context for more details on what contexts are available.
UPDATE: I believe the type constructor AllMatches
was introduced to resolve the ambiguity introduced when an regex has subexpressions -- eg:
foo = "axx ayy" =~ "a(.)([^a])"
test1 = getAllMatches $ (foo :: AllMatches [] (MatchOffset, MatchLength))
-- [(0,3),(3,3)]
-- returns the locations of "axx" and "ayy" but no subexpression info
test2 = foo :: MatchArray
-- array (0,2) [(0,(0,3)),(1,(1,1)),(2,(2,1))]
-- returns only the match with "axx"
Both are essentially a list of offset-length pairs, but they mean different things.
链接地址: http://www.djcxy.com/p/77312.html上一篇: 在VBA中减去范围(Excel)
下一篇: 正则表达式的所有匹配的索引