How to do a data.table merge operation

note: this question and the following answers refer to data.table versions < 1.5.3; v. 1.5.3 was released in Feb 2011 to resolve this issue. see more recent treatment (03-2012): Translating SQL joins on foreign keys to R data.table syntax I've been digging through the documentation for the data.table package (a replacement for data.frame that's much more efficient for certain opera

如何做一个data.table合并操作

注意:这个问题和以下答案涉及data.table版本<1.5.3; v。1.5.3于2011年2月发布以解决此问题。 查看更多最近的处理(03-2012):将外键上的SQL连接转换为R data.table语法 我一直在挖掘data.table包的文档(替代data.frame,这对于某些操作来说效率更高),包括Josh Reich在NYC R Meetup(pdf)上对SQL和data.table的介绍,但不能把这个完全无关紧要的操作算出来。 > x <- DT(a=1:3, b=2:4, key='a') > x a

What is difference between dataframe and list in R?

What is difference between dataframe and list in R ? Which one should be used when? Which is easier to loop over? Exact problem: I have to first store 3 string elements like "a", "b", "c". Later for each of these, I need to append 3 more elements; for instance for "a" I have to add "a1", "a2", "a3". Later I have to use n

R中的数据框和列表有什么区别?

R 数据框和列表有什么区别? 哪一个应该在什么时候使用? 哪个更容易循环? 确切的问题:我必须先存储3个字符串元素,比如“a”,“b”,“c”。 对于其中的每一个,我需要追加3个元素; 例如“a”,我必须添加“a1”,“a2”,“a3”。 后来我必须使用嵌套for循环来访问这些元素。 所以我很困惑使用数据框或列表或其他数据类型,我可以先存储然后追加(每列的类型)? 目前我遇到错误,比如“要替换的项目数量不是替换长度的倍数”

break/exit script

I have a program that does some data analysis and is a few hundred lines long. Very early on in the program, I want to do some quality control and if there is not enough data, I want the program to terminate and return to the R console. Otherwise, I want the rest of the code to execute. I've tried break , browser , and quit and none of them stop the execution of the rest of the program (

休息/退出脚本

我有一个程序可以做一些数据分析,并且有几百行。 在程序早期,我想做一些质量控制,如果没有足够的数据,我希望程序终止并返回到R控制台。 否则,我想要执行其余的代码。 我尝试了break , browser和quit并且他们都没有停止执行其他程序(并quit停止执行以及完全退出R,这不是我想要发生的事情)。 我最后的手段是创建一个if-else语句如下: if(n < 500){} else{*insert rest of program here*} 但这似乎是不好的

How to sort a data frame by date

I need to sort a data frame by date in R. The dates are all in the form of "dd/mm/yyyy". The dates are in the 3rd column. The column header is V3. I have seen how to sort a data frame by column and I have seen how to convert the string into a date value. I can't combine the two in order to sort the data frame by date. Assuming your data frame is named d , d[order(as.Date(d$V3

如何按日期排序数据框

我需要在R中按日期对数据框进行排序。日期都是“dd / mm / yyyy”的形式。 日期在第3栏。 列标题是V3。 我已经看到如何按列对数据框进行排序,并且我已经看到如何将字符串转换为日期值。 我无法将这两者结合起来,以便按日期对数据框进行排序。 假设你的数据框被命名为d , d[order(as.Date(d$V3, format="%d/%m/%Y")),] 阅读我的博客文章,按列的内容对数据框排序,如果没有意义的话。 如果您想按降序对日期进行排序,

How to delete columns that contain ONLY NAs?

I have a data.frame containing some columns with all NA values, how can I delete them from the data.frame. Can I use the function na.omit(...) specifying some additional arguments? One way of doing it: df[, colSums(is.na(df)) != nrow(df)] If the count of NAs in a column is equal to the number of rows, it must be entirely NA. Or similarly df[colSums(!is.na(df)) > 0] It seeems like

如何删除只包含NAs的列?

我有一个data.frame包含一些所有NA值的列,我如何从data.frame中删除它们。 我可以使用该功能吗? na.omit(...) 指定一些额外的参数? 一种做法: df[, colSums(is.na(df)) != nrow(df)] 如果列中的NAs数量等于行数,则它必须完全是NA。 或类似地 df[colSums(!is.na(df)) > 0] 它seeems像你想删除与都只列NA S,留下列了一些行那些具有NA秒。 我会这样做(但我相信有一个有效的矢量化的soution: #set seed for

Learning R. Where does one Start?

I've been using R for a little over a year now and it's been a successful venture. But all to often, I find that there is something that I can't figure out for lack of knowing how to find it or an example of it. Stackoverflow, Could you recommend a pathway for learning R in a manner that provides one with a toolset at their disposal to solve problems of a statistical nature? Th

学习R.从哪里开始?

我已经使用R了一年多了,这是一个成功的冒险。 但是所有这些都经常发生,我发现有些东西因为我不知道如何找到它或者它的一个例子而不知道。 堆栈溢出, 你能否推荐一种学习R的途径,以提供一个可供其使用的工具集来解决统计性问题? 互联网上有很多知识,在r-project网站和邮件列表之间,但它似乎是“无处不在”,而且在你实际寻找它的时候并不存在。 例如,当我第一次开始使用R时,我经历了“Intro to R”。 然后我读了语

Understanding the order() function

I'm trying to understand how the order() function works. I was under the impression that it returned a permutation of indices, which when sorted, would sort the original vector. For instance, > a <- c(45,50,10,96) > order(a) [1] 3 1 2 4 I would have expected this to return c(2, 3, 1, 4) , since the list sorted would be 10 45 50 96. Can someone help me understand the return val

了解order()函数

我试图理解order()函数是如何工作的。 我的印象是,它返回了索引的排列,当排序时,它会对原始向量进行排序。 例如, > a <- c(45,50,10,96) > order(a) [1] 3 1 2 4 我预计这会返回c(2, 3, 1, 4) ,因为排序的列表将是10 45 50 96。 有人能帮我理解这个函数的返回值吗? 这似乎解释了它。 order的定义是a[order(a)]按升序排列。 这适用于你的例子,其中正确的顺序是第四,第二,第一,然后第三个元素。

`( What sorcery is this?

In an answer to another question, @Marek posted the following solution: https://stackoverflow.com/a/10432263/636656 dat <- structure(list(product = c(11L, 11L, 9L, 9L, 6L, 1L, 11L, 5L, 7L, 11L, 5L, 11L, 4L, 3L, 10L, 7L, 10L, 5L, 9L, 8L)), .Names = "product", row.names = c(NA, -20L), class = "data.frame") `levels<-`( factor(dat$product), list(Tylenol=

(这是什么巫术?

在回答另一个问题时,@Marek发布了以下解决方案:https://stackoverflow.com/a/10432263/636656 dat <- structure(list(product = c(11L, 11L, 9L, 9L, 6L, 1L, 11L, 5L, 7L, 11L, 5L, 11L, 4L, 3L, 10L, 7L, 10L, 5L, 9L, 8L)), .Names = "product", row.names = c(NA, -20L), class = "data.frame") `levels<-`( factor(dat$product), list(Tylenol=1:3, Advil=4:6, Bayer=7:9

How to make good reproducible pandas examples

Having spent a decent amount of time watching both the r and pandas tags on SO, the impression that I get is that pandas questions are less likely to contain reproducible data. This is something that the R community has been pretty good about encouraging, and thanks to guides like this, newcomers are able to get some help on putting together these examples. People who are able to read these gui

如何制作好重现熊猫的例子

花了相当多的时间观看SO上的r和pandas标签,我得到的印象是pandas问题不太可能包含可再现的数据。 这是R社区在鼓励方面做得非常好的原因,并且由于这样的指导,新人能够在将这些例子放在一起时得到一些帮助。 能够阅读这些指南并返回可复制数据的人通常会有更好的运气来获得他们的问题的答案。 我们如何为pandas问题创建好的重现性示例? 简单的数据框可以放在一起,例如: import pandas as pd df = pd.DataFrame({'user':

How do I replace NA values with zeros in an R dataframe?

I have a data.frame and some columns have NA values. I want to replace the NA s with zeros. How I do this? See my comment in @gsk3 answer. A simple example: > m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10) > d <- as.data.frame(m) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 4 3 NA 3 7 6 6 10 6 5 2 9 8 9 5 10 NA 2 1 7 2 3 1 1 6 3 6 NA 1 4 1 6 4 N

如何用R数据框中的零代替NA值?

我有一个data.frame和一些列有NA值。 我想用零代替NA 。 我如何做到这一点? 在@ gsk3答案中看到我的评论。 一个简单的例子: > m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10) > d <- as.data.frame(m) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 4 3 NA 3 7 6 6 10 6 5 2 9 8 9 5 10 NA 2 1 7 2 3 1 1 6 3 6 NA 1 4 1 6 4 NA 4 NA 7 10 2 NA 4 1 8 5 1 2