R: cor.test by group with ddply
I am trying to calculate the correlation between two numeric columns in a data frame for each level of a factor. Here is an example data frame:
concentration <-(c(3, 8, 4, 7, 3, 1, 3, 3, 8, 6))
area <-c(0.5, 0.9, 0.3, 0.4, 0.5, 0.8, 0.9, 0.2, 0.7, 0.7)
area_type <-c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
data_frame <-data.frame(concentration, area, area_type)
In this example, I want to calculate the correlation between concentration and area for each level of area_type. I want to use cor.test rather than cor because I want p-values and kendall tau values. I have tried to do this using ddply:
ddply(data_frame, "area_type", summarise,
corr=(cor.test(data_frame$area, data_frame$concentration,
alternative="two.sided", method="kendall") ) )
However, I am having a problem with the output: it is organized differently from the normal Kendall cor.test output, which states z value, p-value, alternative hypothesis, and tau estimate. Instead of that, I get the output below. I don't know what each row of the output indicates. In addition, the output values are the same for each level of area_type.
area_type corr
1 A 0.3766218
2 A NULL
3 A 0.7064547
4 A 0.1001252
5 A 0
6 A two.sided
7 A Kendall's rank correlation tau
8 A data_frame$area and data_frame$concentration
9 B 0.3766218
10 B NULL
11 B 0.7064547
12 B 0.1001252
13 B 0
14 B two.sided
15 B Kendall's rank correlation tau
16 B data_frame$area and data_frame$concentration
What am I doing wrong with ddply? Or are there other ways of doing this? Thanks.
You can add an additional column with the names of corr. Also, your syntax is slightly incorrect. The .
specifies that the variable is from the data frame you've specified. Then remove the data_frame$ or else it will use the entire data frame:
ddply(data_frame, .(area_type), summarise, corr=(cor.test(area, concentration, alternative="two.sided", method="kendall")), name=names(corr) )
Which gives:
area_type corr name
1 A -0.285133 statistic
2 A NULL parameter
3 A 0.7755423 p.value
4 A -0.1259882 estimate
5 A 0 null.value
6 A two.sided alternative
7 A Kendall's rank correlation tau method
8 A area and concentration data.name
9 B 6 statistic
10 B NULL parameter
11 B 0.8166667 p.value
12 B 0.2 estimate
13 B 0 null.value
14 B two.sided alternative
15 B Kendall's rank correlation tau method
16 B area and concentration data.name
statistic is the z-value and estimate is the tau estimate.
EDIT: You can also do it like this to only pull what you want:
corfun<-function(x, y) {
corr=(cor.test(x, y,
alternative="two.sided", method="kendall"))
}
ddply(data_frame, .(area_type), summarise,z=corfun(area,concentration)$statistic,
pval=corfun(area,concentration)$p.value,
tau.est=corfun(area,concentration)$estimate,
alt=corfun(area,concentration)$alternative
)
Which gives:
area_type z pval tau.est alt 1 A -0.285133 0.7755423 -0.1259882 two.sided 2 B 6.000000 0.8166667 0.2000000 two.sided
Part of the reason this is not working is the cor.test returns:
Pearson's product-moment correlation
data: data_frame$concentration and data_frame$area
t = 0.5047, df = 8, p-value = 0.6274
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5104148 0.7250936
sample estimates:
cor
0.1756652
This information cannot be put into a data.frame (which ddply does) without future complicating the code. If you can provide the exact information you need then I can provide further assistance. I would look at just using
corrTest <- ddply(.data = data_frame,
.variables = .(area_type),
.fun = cor(concentration, area,))
method="kendall")))
I haven't test this code but this is the route I would take initially and work from here.
链接地址: http://www.djcxy.com/p/57750.html上一篇: 用R来检验相关性假设= 0.5
下一篇: R:由ddply组测试