Extracting patterns from text in R

2018-06-29 09:42:56

My data is like:

t <- "The data is like hi hi hi hi  and hi hi end"

and my regular expression is:

grammer <- "[[:space:]]*(hi)+[[:space:]]"

After executing below two lines:

res <- gregexpr(grammer, t)
regmatches(t, res)

I got output:

 [[1]]
 [1] " hi " "hi "  "hi "  "hi "  " hi " "hi "

however, I want something like: " hi hi hi hi " and " hi hi "

You could do like this,

> t<-"The data is like hi hi hi hi  and hi hi end"
> grammer<-"[[:space:]]*(hi[[:space:]])+[[:space:]]*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi  " " hi hi "

> grammer<-"[[:space:]]*(hi[[:space:]])+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "

> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"[[:space:]]*(hi>[[:space:]]?)+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "       " hi"

Without leading or following spaces.

> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"hi>([[:space:]]hi)*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] "hi hi hi hi" "hi hi"       "hi"

Explanation:

[[:space:]]* Matches a space character zero or more times.

(hi[[:space:]])+ Matches the string hi and the following space one or more times.

链接地址: http://www.djcxy.com/p/82086.html

上一篇: 如何确定一个实体是否适合O（N ^ 2）中给定的框内？

下一篇: 从R中的文本中提取模式