Using R: 时间序列预测—— ARIMA、灰色模型 GM(1,1)、神经网络与混合预测(下)

在上篇文章《R for Everything: 时间序列预测—— ARIMA、灰色模型 GM(1,1)、神经网络与混合预测(上)》中,我们分别利用 ARIMA、GM(1,1) 和基于时间序列的神经网络对 Internet Usage per Minute 数据集进行了 20 条新记录的预测,简要代码如下:

set.seed(1234)
data("WWWusage")

# Select the first 80 records as the train dataset
WWWusage_train <-
  WWWusage[1:80]

# Select the last 20 records as the test dataset
WWWusage_test <-
  WWWusage[81:100]

library("forecast")
# ARIMA model with best p,d,q combination by AAIC selection
mod_arima <-
  auto.arima(WWWusage_train)

# forecast another 20 records
forecast_arima <-
  forecast(mod_arima, h = 20)

# GM(1,1) model
library("greyforecasting")

# `term = 20` means forecasting another 20 records
mod_gm <-
  gm(WWWusage_train, term = 20)

# NNet model
mod_nnet <-
  nnetar(WWWusage_train)

# forecast another 20 records
forecast_nnet <-
  forecast(mod_nnet, h = 20)

正如我们在上篇文章的结尾处所说,神经网络模型拥有最低的 RMSE,表明在一定程度上他可以相对良好的拟合我们的案例数据,然而 Hansen (2005) 认为使用最佳的混合模型组合取代单一模型能够长生更好的结果。

因此混合预测模型应运而生,按照 ForecastComb 的说明,这类模型通过几何或回归的方式,将一组预测模型汇合为单独的预测模型。

The R package ForecastComb presents functions to pool individual model forecasts using geometric- and regression-based forecast combination methods. ForecastComb combines the functionality of the packages ForecastCombinations and GeomComb under a unified user interface and convenience functions.

ForecastComb 详细使用指南可以参考 Christoph, Raviv, and Roetzer (2019)

Forecast Comb 101

数据格式化

具体而言,ForecastComb 支持处理三种数据集:

  1. Only training set;
  2. Training set + future forecasts
  3. Full training + test set

在进行混合预测之前,我们首先需要通过 foreccomb 将数据集转换为 foreccomb 类对象。

参考其参数:

foreccomb(observed_vector, prediction_matrix, 
          newobs = NULL, newpreds = NULL, 
          byrow = FALSE, na.impute = TRUE, criterion = "RMSE")

使用 foreccomb 格式化数据,我们至少需要提供观测值向量(observed_vector)以及多个模型的预测值矩阵(prediction_matrix)。其中 prediction_matrix 中每列为单一模型预测值。

就我们的安利而言,我们需要将 WWWusage_train 定义为 observed_vector,并将 mod_arima, mod_gm 以及 mod_nnet 三个模型中的 Fitted Value 提取并合并为 data.frame 之后定义为 prediction_matrix

df_fitted_by_mods <-
  data.frame(
    arima = as.numeric(mod_arima$fitted),
    gm = as.numeric(mod_gm$fitted),
    nnet = as.numeric(mod_nnet$fitted)
)

knitr::kable(df_fitted_by_mods)
arima gm nnet
87.91200 88.0000 NA
86.30366 127.4155 NA
81.15244 127.3598 83.14060
87.32344 127.3042 86.89844
83.88979 127.2485 86.22192
83.43081 127.1929 84.84035
86.38455 127.1373 86.89844
80.10926 127.0818 83.55223
88.62262 127.0263 87.60547
89.56923 126.9707 90.95200
89.34744 126.9153 90.25237
93.05023 126.8598 92.91612
106.88018 126.8044 107.52216
105.72064 126.7490 110.16199
120.04024 126.6936 122.12901
137.62819 126.6382 139.03600
145.66687 126.5829 147.08846
151.15265 126.5276 151.16116
154.04517 126.4723 153.49332
147.41174 126.4170 147.21139
147.03795 126.3618 144.62680
146.35771 126.3065 144.56693
151.53086 126.2513 148.83363
135.11412 126.1962 137.58690
123.62527 126.1410 124.83029
133.95742 126.0859 131.90867
146.43963 126.0308 144.79762
152.26300 125.9757 152.15441
150.77263 125.9207 150.66859
145.40585 125.8657 144.62680
142.93133 125.8107 141.36986
135.45720 125.7557 136.01915
129.54920 125.7007 130.56519
129.83295 125.6458 130.38460
131.56674 125.5909 132.65891
126.50583 125.5360 129.47603
123.88278 125.4812 126.00064
127.02816 125.4263 128.47001
138.15757 125.3715 137.80578
139.55717 125.3167 140.93649
142.08663 125.2620 141.84314
143.20565 125.2072 142.71139
158.29032 125.1525 155.24808
164.95937 125.0978 165.80278
172.98184 125.0432 172.64921
170.42355 124.9885 171.46607
171.90379 124.9339 170.73737
172.67058 124.8793 171.67232
171.67436 124.8247 170.82890
176.37709 124.7702 174.26492
174.95512 124.7157 174.36043
168.69337 124.6612 168.13930
173.60575 124.6067 170.82890
175.43917 124.5522 174.26492
173.30112 124.4978 172.67612
163.79201 124.4434 163.04148
163.09118 124.3890 159.70846
146.94167 124.3347 145.80884
130.86621 124.2803 130.90889
124.20251 124.2260 123.90240
113.20621 124.1718 114.57303
105.79951 124.1175 105.42121
98.13525 124.0633 97.56579
102.62906 124.0090 100.78698
95.36609 123.9549 96.62393
100.76468 123.9007 99.35709
89.70514 123.8465 91.84973
82.80494 123.7924 84.28110
82.08490 123.7383 83.14060
84.93000 123.6843 85.46564
89.87680 123.6302 90.05750
89.82193 123.5762 91.04678
86.49138 123.5222 87.92357
82.40418 123.4682 84.37283
88.37005 123.4143 87.69998
91.17750 123.3603 91.87126
92.16153 123.3064 92.91612
90.43594 123.2525 91.27047
97.60234 123.1987 96.82519
107.01698 123.1448 108.77635
# The NNet model often generate NA value
# please significantly consider how to
# process NA value and how to explain

# In this case, we use`na.impute = TRUE`
# to solve NA value,
# but it is not always the best practice

library("ForecastComb")

fc_dat <-
  foreccomb(observed_vector = WWWusage_train,
            prediction_matrix = df_fitted_by_mods,
            na.impute = TRUE)
## A subset of the individual forecasts included NA values and has been imputed.
fc_dat
## $Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Forecasts_Train
##        arima       gm      nnet
## 1   87.91200  88.0000 119.21442
## 2   86.30366 127.4155 100.40488
## 3   81.15244 127.3598  83.14060
## 4   87.32344 127.3042  86.89844
## 5   83.88979 127.2485  86.22192
## 6   83.43081 127.1929  84.84035
## 7   86.38455 127.1373  86.89844
## 8   80.10926 127.0818  83.55223
## 9   88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## $nmodels
## [1] 3
## 
## $modelnames
## [1] "arima" "gm"    "nnet" 
## 
## attr(,"class")
## [1] "foreccomb"

在上述步骤中,我们生成了可用于混合模型训练的 fc_dat 数据集,但是应当注意到系统的错误提醒:

non-finite value inf; using BIG value.

A subset of the individual forecasts included NA values and has been imputed.

另外通过查看 fc_dat,我们也不难注意到 nnet 第一个缺失值明显大于 arima 以及 gm,这也可能会造成后续预测的不准确,不过我们且看后续再进行进一步处理。

模型拟合

使用 auto_combine 可以进行混合预测,目前 auto_combine 支持根据 RMSE、MAE 以及 MAPE 确定最优混合参数。

auto_combine(fc_dat)

如果没有意外,这一步无法运行,会提示:

Error in `-.default`(observed_vector, prediction_matrix) : 
  time-series/vector length mismatch

这是因为在 foreccomb 这一步中使用 na.imputat = TRUE 差值缺失值导致了数据趋势错误,故此我们尝试使用 ariam 和 gm 模型的预测值均值用作 nnet 模型预测值的缺失值进行填充:

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:greyforecasting':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Use the mean value of arima and gm 
# as the missing value of nnet
# but this may be not a good practice

df_fitted_by_mods_miss_val_fix <-
  mutate(df_fitted_by_mods,
         nnet = if_else(is.na(nnet),
                        (arima+gm)/2, nnet))

fc_dat_miss_val_fix <-
  foreccomb(observed_vector = WWWusage_train,
            prediction_matrix = df_fitted_by_mods_miss_val_fix,
            na.impute = FALSE)

fc_dat_miss_val_fix
## $Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Forecasts_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##        arima       gm      nnet
##  1  87.91200  88.0000  87.95600
##  2  86.30366 127.4155 106.85958
##  3  81.15244 127.3598  83.14060
##  4  87.32344 127.3042  86.89844
##  5  83.88979 127.2485  86.22192
##  6  83.43081 127.1929  84.84035
##  7  86.38455 127.1373  86.89844
##  8  80.10926 127.0818  83.55223
##  9  88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## $nmodels
## [1] 3
## 
## $modelnames
## [1] "arima" "gm"    "nnet" 
## 
## attr(,"class")
## [1] "foreccomb"

再次进行混合预测:

mod_comb <-
  auto_combine(fc_dat_miss_val_fix)
## Optimization algorithm chooses number of retained models for trimmed eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses number of retained models for trimmed bias-corrected eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses trim factor for trimmed mean approach...
## Algorithm finished. Optimized trim factor: 0.34
## Optimization algorithm chooses trim factor for winsorized mean approach...
## Algorithm finished. Optimized trim factor: 0.5
mod_comb
## $Method
## [1] "Ordinary Least Squares Regression"
## 
## $Models
## [1] "arima" "gm"    "nnet" 
## 
## $Fitted
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  87.60187  88.64077  82.35508  88.23902  85.05656  84.54363  87.37633
##  [8]  81.42104  89.46373  90.54932  90.29842  93.85551 107.45621 106.57138
## [15] 120.44347 137.62928 145.50485 150.78402 153.57908 147.10121 146.58597
## [22] 145.95903 150.96563 135.22288 123.87977 133.78489 146.04048 151.84668
## [29] 150.38495 145.07980 142.60139 135.41781 129.65758 129.90293 131.63598
## [36] 126.80058 124.17198 127.20710 137.99064 139.47555 141.84413 142.92213
## [43] 157.53086 164.32235 172.10282 169.68603 170.98753 171.74831 170.78065
## [50] 175.30235 174.00844 167.87406 172.53751 174.43893 172.37890 163.05166
## [57] 162.18794 146.51256 130.83818 124.28445 113.61943 106.24429  98.72029
## [64] 103.03691  96.12530 101.23568  90.63426  83.82711  83.09189  85.84321
## [71]  90.66480  90.67887  87.42765  83.45698  89.12495  91.96486  92.93137
## [78]  91.24424  98.15692 107.54875
## 
## $Accuracy_Train
##                     ME    RMSE      MAE        MPE     MAPE       ACF1
## Test set -1.776303e-16 3.02758 2.402171 -0.0741498 2.056542 -0.0140366
##          Theil's U
## Test set 0.6035688
## 
## $Input_Data
## $Input_Data$Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Input_Data$Forecasts_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##        arima       gm      nnet
##  1  87.91200  88.0000  87.95600
##  2  86.30366 127.4155 106.85958
##  3  81.15244 127.3598  83.14060
##  4  87.32344 127.3042  86.89844
##  5  83.88979 127.2485  86.22192
##  6  83.43081 127.1929  84.84035
##  7  86.38455 127.1373  86.89844
##  8  80.10926 127.0818  83.55223
##  9  88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## 
## $Predict
## function (object, newpreds) 
## {
##     coef <- c(object$Intercept, object$Weights)
##     pred <- as.vector(coef %*% t(cbind(1, newpreds)))
##     return(pred)
## }
## <bytecode: 0x7f8cd87d7768>
## <environment: namespace:ForecastComb>
## 
## $Intercept
## [1] -1.326066
## 
## $Weights
## [1] 0.91320942 0.03167767 0.06660417
## 
## attr(,"class")
## [1] "foreccomb_res"

结果显示通过 Ordinary Least Squares Regression (最小二乘法) 进行混合预测可以得到最小的 RMSE,此时 arima、gm、nnet 三模型的权重依次是 0.9132094, 0.0316777, 0.0666042。然而我们依然应该使用测试集验证模型方可用于进一步的预测。

模型验证

自动完成

模型验证可以在 auto_combine 之前的数据预处理过程中一并完成:

df_pred_by_mods <-
  data.frame(
    arima = as.numeric(forecast_arima$mean),
    gm = as.numeric(mod_gm$forecasts),
    nnet = as.numeric(forecast_nnet$mean)
)

fc_train_n_test <-
  foreccomb(observed_vector = WWWusage_train,
            prediction_matrix = df_fitted_by_mods_miss_val_fix,
            newobs = WWWusage_test,
            newpreds = as.matrix(df_pred_by_mods),
            na.impute = FALSE)

mod_train_n_test_comb <-
  auto_combine(fc_train_n_test)
## Optimization algorithm chooses number of retained models for trimmed eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses number of retained models for trimmed bias-corrected eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses trim factor for trimmed mean approach...
## Algorithm finished. Optimized trim factor: 0.34
## Optimization algorithm chooses trim factor for winsorized mean approach...
## Algorithm finished. Optimized trim factor: 0.5
fc_train_n_test
## $Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Forecasts_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##        arima       gm      nnet
##  1  87.91200  88.0000  87.95600
##  2  86.30366 127.4155 106.85958
##  3  81.15244 127.3598  83.14060
##  4  87.32344 127.3042  86.89844
##  5  83.88979 127.2485  86.22192
##  6  83.43081 127.1929  84.84035
##  7  86.38455 127.1373  86.89844
##  8  80.10926 127.0818  83.55223
##  9  88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## $Actual_Test
##  [1] 121 135 145 149 156 165 171 175 177 182 193 204 208 210 215 222 228 226 222
## [20] 220
## 
## $Forecasts_Test
##          arima       gm     nnet
##  [1,] 117.0633 123.0910 121.0024
##  [2,] 121.4699 123.0372 132.9642
##  [3,] 124.2189 122.9835 142.8745
##  [4,] 125.9340 122.9297 149.8028
##  [5,] 127.0039 122.8760 154.0405
##  [6,] 127.6714 122.8223 155.9769
##  [7,] 128.0878 122.7686 155.8329
##  [8,] 128.3476 122.7150 153.6791
##  [9,] 128.5097 122.6614 149.6578
## [10,] 128.6108 122.6078 144.4252
## [11,] 128.6739 122.5542 139.2503
## [12,] 128.7132 122.5006 135.3176
## [13,] 128.7378 122.4471 133.1738
## [14,] 128.7531 122.3936 132.7999
## [15,] 128.7626 122.3401 133.8105
## [16,] 128.7686 122.2867 135.6047
## [17,] 128.7723 122.2332 137.5464
## [18,] 128.7746 122.1798 139.1455
## [19,] 128.7761 122.1264 140.1396
## [20,] 128.7770 122.0731 140.4769
## 
## $nmodels
## [1] 3
## 
## $modelnames
## [1] "arima" "gm"    "nnet" 
## 
## attr(,"class")
## [1] "foreccomb"

此时可以通过 mod_train_n_test_comb$Forecasts_Test 获得模型预测值,通过 Forecasts_Test$Accuracy_Test 获得包含 RMSE 在内的预测准确度指标。

手动实现

然而在某些情况下,在这个案例中 auto_combine 会报遇到报错:

Error in newpred_matrix %*% weights : 
  requires numeric/complex matrix/vector arguments

好在,案例中使用了最小二乘法,也就是线性回归,所以通过截距与斜率(变量的权重)也很容易手动混合预测:

library("purrr")
forecast_comb <-
  mod_comb$Intercept +
  reduce(map2(mod_comb$Weights, df_pred_by_mods, `*`),`+`)

forecast_comb
##  [1] 117.5358 122.3549 125.5237 127.5496 128.8073 129.5441 129.9131 130.0052
##  [9] 129.8836 129.6258 129.3370 129.1093 128.9873 128.9746 129.0490 129.1722
## [17] 129.3033 129.4102 129.4760 129.4976
rmse_comb <-
  sqrt(mean((as.numeric(forecast_comb) - as.numeric(WWWusage_test))^2, na.rm = TRUE))

rmse_comb
## [1] 65.56671

混合预测模型的 RMSE = 65.5667112,与上篇中 ARIMA 模型性能相似,实际上根据 arima、gm、nnet 三模型的权重 0.9132094, 0.0316777, 0.0666042 就不难猜出,此案例中混合模型结果应该是相当接近于 ARIMA 的预测结果的。

至于如何选择模型,还需要根据实际的应用情况进一步考虑。

尾声

截止至 2021 年 5 月,ForecastComb 距离上一次提交已经过去将近 3 年,程序运行不顺利在所难免——就像我们提到的 auto_combine 报错。此外, 并非所有的场景都会选择出最小二乘法用作组合模型的方法,但是只要阅读 ForecastComb 源代码中的 R 脚本,通过手动实现预测也并不困难。


欢迎通过邮箱微博, Twitter以及知乎与我联系。也欢迎关注我的博客。如果能对我的 Github 感兴趣,就再欢迎不过啦!

Christoph, E., Eran Raviv, and Gernot Roetzer. 2019. “Forecast Combinations in R Using the ForecastComb Package.” The R Journal 10 (January): 262. https://doi.org/10.32614/RJ-2018-052.
Hansen, Bruce E. 2005. “Challenges for Econometric Model Selection.” Econometric Theory 21 (01). https://doi.org/10.1017/S0266466605050048.