How to reproduce $resample and $result of 'train' object in caret?

I'm new to the amazing caret package and try to reproduce some of the objects from the train() output from a lm model with resampling method = 'timeslice'.

  • Why does the $result$RMSE and $result$Rsquared in my example differ from the output from the function defaultSummary($pred$pred, $pred$obs)?
  • What data is used to calculate RMSE, Rsquared, MAE in $resample?

    no_cores <- detectCores() - 1  
    cls = makeCluster(no_cores)
    #str(economics) <-[,-1]) #drop 'date' column
    #trainControl() with parallel processing and 1 step forecasts by TimeSlices------------------------
    samplesCount = nrow(
    initialWindow  = 10
    h = 1
    s = 0
    M = 1 # no of models that are evaluated during each resample (tuning parameters)
    resamplesCount = length(createTimeSlices(1:samplesCount, initialWindow, horizon = h, fixedWindow = TRUE, skip = s)$test)
    seeds <- vector(mode = "list", length = resamplesCount + 1)   # length = B+1, B = number of resamples
    for(i in 1:resamplesCount) seeds[[i]] <-, M)  # The first B elements of the list should be vectors of integers of >= length M where M is the number of models being evaluated for each resample.
    seeds[[(resamplesCount+1)]] <-, 1) # The last element of the list only needs to be a single integer (for the final model)
     <- trainControl(
      method = "timeslice", initialWindow = initialWindow, horizon = h, skip = s,    # data splitting
      returnResamp = "all",
      savePredictions = "all",
      seeds = seeds,
      allowParallel = TRUE)
     <- train( unemploy ~ ., data =,
                      method = "lm",
                      trControl =
  • Output:

    Linear Regression 
    574 samples
      4 predictor
    No pre-processing
    Resampling: Rolling Forecasting Origin Resampling (1 held-out with a fixed window) 
    Summary of sample sizes: 10, 10, 10, 10, 10, 10, ... 
    Resampling results:
      RMSE     Rsquared  MAE    
      250.072  NaN       250.072
    Tuning parameter 'intercept' was held constant at a value of TRUE

    Why isn't the output for RMSE and Rsquared the same as when calculated with defaultSummary() ?

    dat <-$pred$pred,$pred$obs))
    colnames(dat) <- c("pred", "obs")
    > defaultSummary(dat)
          RMSE   Rsquared        MAE 
    394.440680   0.978365 250.072031 

    How can I reproduce the results in $resample?

    > head($resample)
           RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015

    Session Info:

    > sessionInfo()
    R version 3.4.2 (2017-09-28)
    Platform: x86_64-w64-mingw32/x64 (64-bit)
    Running under: Windows >= 8 x64 (build 9200)
    Matrix products: default
    [1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252
    [4] LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    
    attached base packages:
    [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
    other attached packages:
     [1] fpp_0.5             tseries_0.10-42     lmtest_0.9-35       zoo_1.8-0          
     [5] expsmooth_2.3       fma_2.3             forecast_8.2        mlbench_2.1-1      
     [9] spikeslab_1.1.5     randomForest_4.6-12 lars_1.2            doParallel_1.0.11  
    [13] iterators_1.0.8     foreach_1.4.3       caret_6.0-77.9000   ggplot2_2.2.1      
    [17] lattice_0.20-35 

    I found the answer to my questions here:

    Q1. Why does the $result$RMSE and $result$Rsquared in my example differ from the output from the function defaultSummary($pred$pred, $pred$obs)?

    A: The output from train is calculated as the average of the holdouts. In my example:

        # The output is the mean of $resample
        mean($resample$RMSE)  # =250.072
        mean($resample$MAE)   # =250.072

    Q2. What data is used to calculate RMSE, Rsquared, MAE in $resample?

    > head($resample)
    RMSE Rsquared       MAE intercept    Resample
    1  16.33273       NA  16.33273      TRUE Training010
    2 232.16184       NA 232.16184      TRUE Training011
    3 197.65143       NA 197.65143      TRUE Training012
    4 393.29469       NA 393.29469      TRUE Training013
    5 129.99157       NA 129.99157      TRUE Training014
    6  60.95649       NA  60.95649      TRUE Training015
    first_holdout <- subset($pred, Resample == "Training010")
    > first_holdout
    pred        obs rowIndex intercept    Resample
    1 2756.333 2740       11      TRUE Training010  # only 1 row since 1 step forecast horizon
    # Calculate RMSE, Rsquared and MAE for the holdout set
    postResample(first_holdout$pred, first_holdout$obs)
    > postResample(first_holdout$pred, first_holdout$obs)
    RMSE     Rsquared      MAE 
    16.33273       NA     16.33273

    My confusion here was mainly caused by the fact that Rsquared was NA. But since the forcast horizon was 1 step all the hold out samples only have one row and thus no Rsquared can be calculated.


    上一篇: Caret培训问题R

    下一篇: 如何在插入符中再现$ train和'train'对象的$ resample和$ result?