Export summary of multiple regressions from list

2018-06-10 11:06:37

I have a list of multiple regressions completed via this code using the standard dataset mtcars .

models <- lapply(paste("mpg", names(mtcars)[-1], sep = "~"), formula)
res.models <- lapply(models, FUN = function(x) {summary(lm(formula = x, data = mtcars))})
names(res.models) <- paste("mpg", names(mtcars)[-1], sep = "~")

Where I now have a list of multiple regressions against the first column "mpg". From here I am trying to export certain summary statistics such as; intercept, coefficient and r.squared.

I have tried using a loop which I've included below.

for (i in 1:length(res.models))
{
  res <- res.models[[i]]
  res_bound <- NULL
  intercept <- res$coefficients[1]
  coef <- res$coefficients[2]
  r <- res$r.squared
  res_bound <- cbind(intercept, coef, r)
}

Although this gets me a dataframe it only includes the results from the last regression model, a 1 row by 3 column dataframe . Furthermore, I would like to have the "terms" of each regression in the table to distinguish between which model I am looking at (eg mpg vs cyl or mpg vs hp).

Am I simply missing a step in my loop? The ultimate goal is to write.csv the final dataframe .

If you want to do it in base R:

res <- lapply(seq_along(res.models), function(i) {

  data.frame(model = names(res.models)[i],
             intercept = res.models[[i]]$coefficients[1],
             coef = res.models[[i]]$coefficients[2],
             r = res.models[[i]]$r.squared,
             stringsAsFactors = FALSE)

})

do.call(rbind, res)

Output:

      model intercept        coef         r
1   mpg~cyl 37.884576 -2.87579014 0.7261800
2  mpg~disp 29.599855 -0.04121512 0.7183433
3    mpg~hp 30.098861 -0.06822828 0.6024373
4  mpg~drat -7.524618  7.67823260 0.4639952
5    mpg~wt 37.285126 -5.34447157 0.7528328
6  mpg~qsec -5.114038  1.41212484 0.1752963
7    mpg~vs 16.616667  7.94047619 0.4409477
8    mpg~am 17.147368  7.24493927 0.3597989
9  mpg~gear  5.623333  3.92333333 0.2306734
10 mpg~carb 25.872334 -2.05571870 0.3035184

The reason for seq_along(res.models) instead of just res.models is so we can also grab the name for the associated slot in the list and drop it into the data frame you're making.

You can use purrr::map_df to apply broom::glance to each model and then collect the results into a data.frame:

purrr::map_df(res.models, broom::glance, .id = 'formula')
#>     formula r.squared adj.r.squared    sigma statistic      p.value df
#> 1   mpg~cyl 0.7261800     0.7170527 3.205902 79.561028 6.112687e-10  2
#> 2  mpg~disp 0.7183433     0.7089548 3.251454 76.512660 9.380327e-10  2
#> 3    mpg~hp 0.6024373     0.5891853 3.862962 45.459803 1.787835e-07  2
#> 4  mpg~drat 0.4639952     0.4461283 4.485409 25.969645 1.776240e-05  2
#> 5    mpg~wt 0.7528328     0.7445939 3.045882 91.375325 1.293959e-10  2
#> 6  mpg~qsec 0.1752963     0.1478062 5.563738  6.376702 1.708199e-02  2
#> 7    mpg~vs 0.4409477     0.4223126 4.580827 23.662241 3.415937e-05  2
#> 8    mpg~am 0.3597989     0.3384589 4.902029 16.860279 2.850207e-04  2
#> 9  mpg~gear 0.2306734     0.2050292 5.373695  8.995144 5.400948e-03  2
#> 10 mpg~carb 0.3035184     0.2803024 5.112961 13.073646 1.084446e-03  2

You could do something similar with broom::tidy for the coefficients, or broom::augment for the residuals. Note that broom functions are intended to be called on the models themselves, not the summaries, but you can keep the whole thing in a pipeline, if you like:

library(purrr)

names(mtcars)[-1] %>% 
    paste('mpg ~', .) %>%    # or start with `models` at this point
    map(lm, data = mtcars) %>% 
    map_df(broom::glance, .id = 'formula')
#>    formula r.squared adj.r.squared    sigma statistic      p.value df
#> 1        1 0.7261800     0.7170527 3.205902 79.561028 6.112687e-10  2
#> 2        2 0.7183433     0.7089548 3.251454 76.512660 9.380327e-10  2
#> 3        3 0.6024373     0.5891853 3.862962 45.459803 1.787835e-07  2
#> 4        4 0.4639952     0.4461283 4.485409 25.969645 1.776240e-05  2
#> 5        5 0.7528328     0.7445939 3.045882 91.375325 1.293959e-10  2
#> 6        6 0.1752963     0.1478062 5.563738  6.376702 1.708199e-02  2
#> 7        7 0.4409477     0.4223126 4.580827 23.662241 3.415937e-05  2
#> 8        8 0.3597989     0.3384589 4.902029 16.860279 2.850207e-04  2
#> 9        9 0.2306734     0.2050292 5.373695  8.995144 5.400948e-03  2
#> 10      10 0.3035184     0.2803024 5.112961 13.073646 1.084446e-03  2
#>       logLik      AIC      BIC deviance df.residual
#> 1  -81.65321 169.3064 173.7036 308.3342          30
#> 2  -82.10469 170.2094 174.6066 317.1587          30
#> 3  -87.61931 181.2386 185.6358 447.6743          30
#> 4  -92.39996 190.7999 195.1971 603.5667          30
#> 5  -80.01471 166.0294 170.4266 278.3219          30
#> 6  -99.29406 204.5881 208.9853 928.6553          30
#> 7  -93.07356 192.1471 196.5443 629.5193          30
#> 8  -95.24219 196.4844 200.8816 720.8966          30
#> 9  -98.18192 202.3638 206.7611 866.2980          30
#> 10 -96.59033 199.1807 203.5779 784.2711          30

Note you get a few extra variables that can't aren't contained in the summary.

链接地址: http://www.djcxy.com/p/30952.html

上一篇: 如何为两个变量包含颜色渐变

下一篇: 从列表导出多个回归摘要