Extracting patterns from text in R
My data is like:
t <- "The data is like hi hi hi hi and hi hi end"
and my regular expression is:
grammer <- "[[:space:]]*(hi)+[[:space:]]"
After executing below two lines:
res <- gregexpr(grammer, t)
regmatches(t, res)
I got output:
[[1]]
[1] " hi " "hi " "hi " "hi " " hi " "hi "
however, I want something like: " hi hi hi hi "
and " hi hi "
You could do like this,
> t<-"The data is like hi hi hi hi and hi hi end"
> grammer<-"[[:space:]]*(hi[[:space:]])+[[:space:]]*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "
OR
> grammer<-"[[:space:]]*(hi[[:space:]])+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi "
OR
> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"[[:space:]]*(hi>[[:space:]]?)+"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] " hi hi hi hi " " hi hi " " hi"
Without leading or following spaces.
> t <- "The data is like hi hi hi hi and hi hi end hi"
> grammer<-"hi>([[:space:]]hi)*"
> res<-gregexpr(grammer, t)
> regmatches(t, res)
[[1]]
[1] "hi hi hi hi" "hi hi" "hi"
Explanation:
[[:space:]]*
Matches a space character zero or more times. (hi[[:space:]])+
Matches the string hi
and the following space one or more times. 上一篇: 如何确定一个实体是否适合O(N ^ 2)中给定的框内?
下一篇: 从R中的文本中提取模式