R: cor.test by group with ddply

2018-06-20 12:55:21

I am trying to calculate the correlation between two numeric columns in a data frame for each level of a factor. Here is an example data frame:

concentration <-(c(3, 8, 4, 7, 3, 1, 3, 3, 8, 6))
area <-c(0.5, 0.9, 0.3, 0.4, 0.5, 0.8, 0.9, 0.2, 0.7, 0.7)
area_type <-c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B")
data_frame <-data.frame(concentration, area, area_type)

In this example, I want to calculate the correlation between concentration and area for each level of area_type. I want to use cor.test rather than cor because I want p-values and kendall tau values. I have tried to do this using ddply:

ddply(data_frame, "area_type", summarise,
  corr=(cor.test(data_frame$area, data_frame$concentration,
                 alternative="two.sided", method="kendall") ) )

However, I am having a problem with the output: it is organized differently from the normal Kendall cor.test output, which states z value, p-value, alternative hypothesis, and tau estimate. Instead of that, I get the output below. I don't know what each row of the output indicates. In addition, the output values are the same for each level of area_type.

  area_type                                         corr
1          A                                    0.3766218
2          A                                         NULL
3          A                                    0.7064547
4          A                                    0.1001252
5          A                                            0
6          A                                    two.sided
7          A               Kendall's rank correlation tau
8          A data_frame$area and data_frame$concentration
9          B                                    0.3766218
10         B                                         NULL
11         B                                    0.7064547
12         B                                    0.1001252
13         B                                            0
14         B                                    two.sided
15         B               Kendall's rank correlation tau
16         B data_frame$area and data_frame$concentration

What am I doing wrong with ddply? Or are there other ways of doing this? Thanks.

You can add an additional column with the names of corr. Also, your syntax is slightly incorrect. The . specifies that the variable is from the data frame you've specified. Then remove the data_frame$ or else it will use the entire data frame:

ddply(data_frame, .(area_type), summarise, corr=(cor.test(area, concentration, alternative="two.sided", method="kendall")), name=names(corr) )

Which gives:

   area_type                           corr        name
1          A                      -0.285133   statistic
2          A                           NULL   parameter
3          A                      0.7755423     p.value
4          A                     -0.1259882    estimate
5          A                              0  null.value
6          A                      two.sided alternative
7          A Kendall's rank correlation tau      method
8          A         area and concentration   data.name
9          B                              6   statistic
10         B                           NULL   parameter
11         B                      0.8166667     p.value
12         B                            0.2    estimate
13         B                              0  null.value
14         B                      two.sided alternative
15         B Kendall's rank correlation tau      method
16         B         area and concentration   data.name

statistic is the z-value and estimate is the tau estimate.

EDIT: You can also do it like this to only pull what you want:

corfun<-function(x, y) {
  corr=(cor.test(x, y,
                 alternative="two.sided", method="kendall"))
}

ddply(data_frame, .(area_type), summarise,z=corfun(area,concentration)$statistic,
      pval=corfun(area,concentration)$p.value,
      tau.est=corfun(area,concentration)$estimate,
      alt=corfun(area,concentration)$alternative
      )

Which gives:

area_type z pval tau.est alt 1 A -0.285133 0.7755423 -0.1259882 two.sided 2 B 6.000000 0.8166667 0.2000000 two.sided

Part of the reason this is not working is the cor.test returns:

Pearson's product-moment correlation

data:  data_frame$concentration and data_frame$area
t = 0.5047, df = 8, p-value = 0.6274
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5104148  0.7250936
sample estimates:
  cor 
  0.1756652

This information cannot be put into a data.frame (which ddply does) without future complicating the code. If you can provide the exact information you need then I can provide further assistance. I would look at just using

corrTest <- ddply(.data = data_frame, 
                 .variables = .(area_type), 
                 .fun = cor(concentration, area,))
                                method="kendall")))

I haven't test this code but this is the route I would take initially and work from here.

链接地址: http://www.djcxy.com/p/57750.html

上一篇: 用R来检验相关性假设= 0.5

下一篇: R：由ddply组测试