从R中的文本中提取模式
我的数据如下所示:
t <- "The data is like hi hi hi hi and hi hi end"
我的正则表达式是:
grammer <- "[[:space:]]*(hi)+[[:space:]]"
在执行下面两行之后:
res <- gregexpr(grammer, t)
regmatches(t, res)
我得到了输出:
[[1]]
[1] " hi " "hi " "hi " "hi " " hi " "hi "
不过,我想要的东西是: " hi hi hi hi "
和" hi hi "
你可以这样做,
> t<-"The data is like hi hi hi hi and hi hi end"
> grammer<-"[[:space:]]*(hi[[:space:]])+[[:space:]]*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "
要么
> grammer<-"[[:space:]]*(hi[[:space:]])+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "
要么
> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"[[:space:]]*(hi>[[:space:]]?)+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi " " hi"
没有领先或以下空格。
> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"hi>([[:space:]]hi)*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] "hi hi hi hi" "hi hi" "hi"
说明:
[[:space:]]*
匹配空格字符零次或多次。 (hi[[:space:]])+
匹配字符串hi
和下面的空格一次或多次。 上一篇: Extracting patterns from text in R
下一篇: Detecting if an exception is a Corrupted State Exception