When should I use setDT() instead of data.table() to create a data.table?
I am having difficulty grasping the essence of the setDT()
function. As I read code on SO, I frequently come across the use of setDT()
to create a data.table. Of course the use of data.table() is ubiquitous. I feel like I solidly comprehend the nature of data.table()
yet the relevance of setDT()
eludes me. ?setDT
tells me this:
setDT converts lists (both named and unnamed) and data.frames to data.tables by reference.
as well as:
In data.table parlance, all set* functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column.
So this makes me think I should only use setDT()
to make a data.table, right? Is setDT()
simply a list to data.table converter?
library(data.table)
a <- letters[c(19,20,1,3,11,15,22,5,18,6,12,15,23)]
b <- seq(1,41,pi)
ab <- data.frame(a,b)
d <- data.table(ab)
e <- setDT(ab)
str(d)
#Classes ‘data.table’ and 'data.frame': 13 obs. of 2 variables:
# $ a: Factor w/ 12 levels "a","c","e","f",..: 9 10 1 2 5 7 11 3 8 4 ...
# $ b: num 1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>
str(e)
#Classes ‘data.table’ and 'data.frame': 13 obs. of 2 variables:
# $ a: Factor w/ 12 levels "a","c","e","f",..: 9 10 1 2 5 7 11 3 8 4 ...
# $ b: num 1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>
Seemingly no difference in this instance. In another instance the difference is evident:
ba <- list(a,b)
f <- data.table(ba)
g <- setDT(ba)
str(f)
#Classes ‘data.table’ and 'data.frame': 2 obs. of 1 variable:
# $ ba:List of 2
# ..$ : chr "s" "t" "a" "c" ...
# ..$ : num 1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>
str(g)
#Classes ‘data.table’ and 'data.frame': 13 obs. of 2 variables:
# $ V1: chr "s" "t" "a" "c" ...
# $ V2: num 1 4.14 7.28 10.42 13.57 ...
# - attr(*, ".internal.selfref")=<externalptr>
When should I use setDT()
? What makes setDT()
relevant? Why not just make the original data.table()
function capable of doing what setDT()
is able to do?
setDT()
is not a replacement for data.table()
. It's a more efficient replacement for as.data.table()
which can be used with certain types of objects.
mydata <- as.data.table(mydata)
will copy the object behind mydata
, convert the copy to a data.table
, then change the mydata
symbol to point to the copy. setDT(mydata)
will change the object behind mydata
to a data.table
. No copying is done. So what's a realistic situation to use setDT()
? When you can't control the class of the original data. For instance, most packages for working with databases give data.frame
output. In that case, your code would be something like
mydata <- dbGetQuery(conn, "SELECT * FROM mytable") # Returns a data.frame
setDT(mydata) # Make it a data.table
When should you use as.data.table(x)
? Whenever x
isn't a list
or data.frame
. The most common use is for matrices.
上一篇: 用Python编写一个列表到一个文件