从R中的文本中提取模式

2018-06-29 09:42:55

我的数据如下所示：

t <- "The data is like hi hi hi hi  and hi hi end"

我的正则表达式是：

grammer <- "[[:space:]]*(hi)+[[:space:]]"

在执行下面两行之后：

res <- gregexpr(grammer, t)
regmatches(t, res)

我得到了输出：

 [[1]]
 [1] " hi " "hi "  "hi "  "hi "  " hi " "hi "

不过，我想要的东西是： " hi hi hi hi "和" hi hi "

你可以这样做，

> t<-"The data is like hi hi hi hi  and hi hi end"
> grammer<-"[[:space:]]*(hi[[:space:]])+[[:space:]]*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi  " " hi hi "

要么

> grammer<-"[[:space:]]*(hi[[:space:]])+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "

要么

> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"[[:space:]]*(hi>[[:space:]]?)+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "       " hi"

没有领先或以下空格。

> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"hi>([[:space:]]hi)*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] "hi hi hi hi" "hi hi"       "hi"

说明：

[[:space:]]*匹配空格字符零次或多次。

(hi[[:space:]])+匹配字符串hi和下面的空格一次或多次。

链接地址: http://www.djcxy.com/p/82085.html

上一篇: Extracting patterns from text in R

下一篇: Detecting if an exception is a Corrupted State Exception