aggregate over 2 groups
I'm trying to understand how to aggregate my output. I've created some dummy data which approximates my actual data, which is: hundreds of group1, 3 levels of group2, and several dozen validation logicals. Apologies if this seems simple, I've hunted and pecked alot, and have to say that as a newbie to R, the huge variety of tools (the apply family, ddply, aggregate, table, reshape, etc) out there is both wonderful and a bit scary:)
#create data
group1 <- paste("Group", rep(LETTERS[1:7], sep=''))
group2 <- c("UNC", "UNC", "SS", "LS", "LS", "SS", "UNC")
valid1 <- c("Y", "N", NA, "N", "Y", "Y", "N")
valid2 <- c("N", "N", "Y", "N", "N", "Y", "N")
valid3 <- c(1.4, 1.2, NA, 0.7, 0.3, NA, 1.7)
valid4 <- c(0.4, 0.3, 0.53, 0.66, 0.3, 0.3, 0.71)
valid5 <- c(8.5, 11.2,NA, NA, 8.3, NA, 11.7)
testdata <- data.frame(cbind(group, group2, valid1, valid2, valid3, valid4, valid5))
valid <- function(testdata){
for(i in group)
val1 <- ifelse(valid1=="Y", 1,0)
val2 <- ifelse(valid2=="Y", 1,0)
val3 <- ifelse(valid3>=1.0, 1,0)
val4 <- ifelse(valid4<=0.5, 1,0)
val5 <- ifelse(valid5>=10.0, 1,0)
test.out <- data.frame(cbind(group1,group2, val1, val2, val3, val4, val5))
}
validtry <- valid(testdata)'
Then, I need to turn these logicals into numeric so they can be summed:
#make validations numeric
# why doesn't this work:
# validtry[,3:7] <- as.numeric(validtry[,3:7])
#but these do
validtry[,3] <- as.numeric(validtry[,3])
validtry[,4] <- as.numeric(validtry[,4])
validtry[,5] <- as.numeric(validtry[,5])
validtry[,6] <- as.numeric(validtry[,6])
validtry[,7] <- as.numeric(validtry[,7])
######
#summarize validtry
#sum on both groups
aggregate(validtry[,3:7], by=list(validtry$group1, validtry$group2), sum, na.rm=T)
#sum on one group
aggregate(validtry[,3:7], by=list(validtry$group2), sum, na.rm=T)
So, these last two get me close, but I think I need something different? I trying to sum across both rows and columns for the two groups. I'm familiar with tapply, but that doesn't seem to get it.
thanks in advance!!
It is not clear about the expected output. My guess is:
testdata <- data.frame(group1, group2, valid1, valid2, valid3, valid4, valid5)
str1 <- c("valid1=='Y'", "valid2=='Y'", "valid3>=1.0", "valid4 <=0.5", "valid5>=10.0")
validtry <- testdata
#Though I used eval(parse(...)), it is not that recommended
validtry[,-(1:2)] <- lapply(str1, function(x) 1*with(testdata, eval(parse(text=x))))
library(reshape2)
lst <- lapply(validtry[3:7], function(x)
dcast(data.frame(validtry[1:2], x), group1~group2, value.var="x", sum, na.rm=TRUE))
lst[[1]]
# group1 LS SS UNC
#1 Group A 0 0 1
#2 Group B 0 0 0
#3 Group C 0 0 0
#4 Group D 0 0 0
#5 Group E 1 0 0
#6 Group F 0 1 0
#7 Group G 0 0 0
链接地址: http://www.djcxy.com/p/38292.html
上一篇: 使用应用系列的多种功能,汇总,等等
下一篇: 总计超过2组