R regex gsub separate letters and numbers

I have a string that's mixed letters and numbers:

"The sample is 22mg"

I'd like to split strings where a number is immediately followed by letter like this:

"The sample is 22 mg"

I've tried this:

gsub('[0-9]+[[aA-zZ]]', '[0-9]+ [[aA-zZ]]', 'This is a test 22mg')

but am not getting the desired results.

Any suggestions?


You need to use capturing parentheses in the regular expression and group references in the replacement. For example:

gsub('([0-9])([[:alpha:]])', '1 2', 'This is a test 22mg')

There's nothing R-specific here; the R help for regex and gsub should be of some use.


You need backreferencing:

test <- "The sample is 22mg"
> gsub("([0-9])([a-zA-Z])","1 2",test)
[1] "The sample is 22 mg"

Anything in parentheses gets remembered. Then they're accessed by 1 (for the first entity in parens), 2, etc. The first backslash escapes the backslash's interpretation in R so that it gets passed to the regular expression parser.

链接地址: http://www.djcxy.com/p/38312.html

上一篇: 将时间重新格式化为可以操纵的数据

下一篇: R正则表达式gsub分隔字母和数字