Inconsistent behavior between str

The documentation for str_split in the stringr package states that for the pattern argument:

If "" splits into individual characters.

which suggests it behaves the same as strsplit in this regard. However,

library(stringr)
str_split("abcab","")
[[1]]
[1] ""  "a" "b" "c" "a" "b"

with a leading empty string. This compares with,

strsplit("abcab","")
[[1]]
[1] "a" "b" "c" "a" "b"

Leading empty strings seems to be normal behavior when splitting on non-empty strings,

strsplit("abcab","ab")
[[1]]
[1] ""  "c"

but even then, str_split generates an 'extra' trailing empty string:

str_split("abcab","ab")
[[1]]
[1] ""  "c" "" 

Is this discrepancy a bug, feature, an error in the documentation or just a different notion of what's 'expected behavior'?


If you use commas as delimiters, the "expected" (your mileage may vary) result is more obvious:

# expect "" "2" "3" "4" ""

strsplit(",2,3,4,", ",")
# [[1]]
# [1] ""  "2" "3" "4"

str_split(",2,3,4,", ",")
# [[1]]
# [1] ""  "2" "3" "4" "" 

If I have n commas then I expect (n+1) elements to be returned. So I prefer the results from str_split . However, I wouldn't necessarily call this a bug in strsplit , since in performs as advertised:

(from ?strplit) Note that this means that if there is a match at the beginning of a (non-empty) string, the first element of the output is '""', but if there is a match at the end of the string, the output is the same as with the match removed.

"" is trickier, as there is no way to count the number of times "" appears in a string. Therefore treating it as a special case seems justified.

(from ?str_split) If '""' splits into individual characters.

Based on this I suggest you have found a bug and should take hadley's advice and report it!

链接地址: http://www.djcxy.com/p/54748.html

上一篇: 如何为子域配置Facebook应用程序

下一篇: str之间的行为不一致