Export summary of multiple regressions from list
I have a list of multiple regressions completed via this code using the standard dataset mtcars
.
models <- lapply(paste("mpg", names(mtcars)[-1], sep = "~"), formula)
res.models <- lapply(models, FUN = function(x) {summary(lm(formula = x, data = mtcars))})
names(res.models) <- paste("mpg", names(mtcars)[-1], sep = "~")
Where I now have a list of multiple regressions against the first column "mpg". From here I am trying to export certain summary statistics such as; intercept, coefficient and r.squared.
I have tried using a loop which I've included below.
for (i in 1:length(res.models))
{
res <- res.models[[i]]
res_bound <- NULL
intercept <- res$coefficients[1]
coef <- res$coefficients[2]
r <- res$r.squared
res_bound <- cbind(intercept, coef, r)
}
Although this gets me a dataframe
it only includes the results from the last regression model, a 1 row by 3 column dataframe
. Furthermore, I would like to have the "terms" of each regression in the table to distinguish between which model I am looking at (eg mpg vs cyl or mpg vs hp).
Am I simply missing a step in my loop? The ultimate goal is to write.csv
the final dataframe
.
If you want to do it in base R:
res <- lapply(seq_along(res.models), function(i) {
data.frame(model = names(res.models)[i],
intercept = res.models[[i]]$coefficients[1],
coef = res.models[[i]]$coefficients[2],
r = res.models[[i]]$r.squared,
stringsAsFactors = FALSE)
})
do.call(rbind, res)
Output:
model intercept coef r
1 mpg~cyl 37.884576 -2.87579014 0.7261800
2 mpg~disp 29.599855 -0.04121512 0.7183433
3 mpg~hp 30.098861 -0.06822828 0.6024373
4 mpg~drat -7.524618 7.67823260 0.4639952
5 mpg~wt 37.285126 -5.34447157 0.7528328
6 mpg~qsec -5.114038 1.41212484 0.1752963
7 mpg~vs 16.616667 7.94047619 0.4409477
8 mpg~am 17.147368 7.24493927 0.3597989
9 mpg~gear 5.623333 3.92333333 0.2306734
10 mpg~carb 25.872334 -2.05571870 0.3035184
The reason for seq_along(res.models)
instead of just res.models
is so we can also grab the name for the associated slot in the list and drop it into the data frame you're making.
You can use purrr::map_df
to apply broom::glance
to each model and then collect the results into a data.frame:
purrr::map_df(res.models, broom::glance, .id = 'formula')
#> formula r.squared adj.r.squared sigma statistic p.value df
#> 1 mpg~cyl 0.7261800 0.7170527 3.205902 79.561028 6.112687e-10 2
#> 2 mpg~disp 0.7183433 0.7089548 3.251454 76.512660 9.380327e-10 2
#> 3 mpg~hp 0.6024373 0.5891853 3.862962 45.459803 1.787835e-07 2
#> 4 mpg~drat 0.4639952 0.4461283 4.485409 25.969645 1.776240e-05 2
#> 5 mpg~wt 0.7528328 0.7445939 3.045882 91.375325 1.293959e-10 2
#> 6 mpg~qsec 0.1752963 0.1478062 5.563738 6.376702 1.708199e-02 2
#> 7 mpg~vs 0.4409477 0.4223126 4.580827 23.662241 3.415937e-05 2
#> 8 mpg~am 0.3597989 0.3384589 4.902029 16.860279 2.850207e-04 2
#> 9 mpg~gear 0.2306734 0.2050292 5.373695 8.995144 5.400948e-03 2
#> 10 mpg~carb 0.3035184 0.2803024 5.112961 13.073646 1.084446e-03 2
You could do something similar with broom::tidy
for the coefficients, or broom::augment
for the residuals. Note that broom functions are intended to be called on the models themselves, not the summaries, but you can keep the whole thing in a pipeline, if you like:
library(purrr)
names(mtcars)[-1] %>%
paste('mpg ~', .) %>% # or start with `models` at this point
map(lm, data = mtcars) %>%
map_df(broom::glance, .id = 'formula')
#> formula r.squared adj.r.squared sigma statistic p.value df
#> 1 1 0.7261800 0.7170527 3.205902 79.561028 6.112687e-10 2
#> 2 2 0.7183433 0.7089548 3.251454 76.512660 9.380327e-10 2
#> 3 3 0.6024373 0.5891853 3.862962 45.459803 1.787835e-07 2
#> 4 4 0.4639952 0.4461283 4.485409 25.969645 1.776240e-05 2
#> 5 5 0.7528328 0.7445939 3.045882 91.375325 1.293959e-10 2
#> 6 6 0.1752963 0.1478062 5.563738 6.376702 1.708199e-02 2
#> 7 7 0.4409477 0.4223126 4.580827 23.662241 3.415937e-05 2
#> 8 8 0.3597989 0.3384589 4.902029 16.860279 2.850207e-04 2
#> 9 9 0.2306734 0.2050292 5.373695 8.995144 5.400948e-03 2
#> 10 10 0.3035184 0.2803024 5.112961 13.073646 1.084446e-03 2
#> logLik AIC BIC deviance df.residual
#> 1 -81.65321 169.3064 173.7036 308.3342 30
#> 2 -82.10469 170.2094 174.6066 317.1587 30
#> 3 -87.61931 181.2386 185.6358 447.6743 30
#> 4 -92.39996 190.7999 195.1971 603.5667 30
#> 5 -80.01471 166.0294 170.4266 278.3219 30
#> 6 -99.29406 204.5881 208.9853 928.6553 30
#> 7 -93.07356 192.1471 196.5443 629.5193 30
#> 8 -95.24219 196.4844 200.8816 720.8966 30
#> 9 -98.18192 202.3638 206.7611 866.2980 30
#> 10 -96.59033 199.1807 203.5779 784.2711 30
Note you get a few extra variables that can't aren't contained in the summary.
链接地址: http://www.djcxy.com/p/30952.html上一篇: 如何为两个变量包含颜色渐变
下一篇: 从列表导出多个回归摘要