Using R: 时间序列预测—— ARIMA、灰色模型 GM(1,1)、神经网络与混合预测（下）

在上篇文章《R for Everything: 时间序列预测—— ARIMA、灰色模型 GM(1,1)、神经网络与混合预测（上）》中，我们分别利用 ARIMA、GM(1,1) 和基于时间序列的神经网络对 Internet Usage per Minute 数据集进行了 20 条新记录的预测，简要代码如下：

set.seed(1234)
data("WWWusage")

# Select the first 80 records as the train dataset
WWWusage_train <-
  WWWusage[1:80]

# Select the last 20 records as the test dataset
WWWusage_test <-
  WWWusage[81:100]

library("forecast")
# ARIMA model with best p,d,q combination by AAIC selection
mod_arima <-
  auto.arima(WWWusage_train)

# forecast another 20 records
forecast_arima <-
  forecast(mod_arima, h = 20)

# GM(1,1) model
library("greyforecasting")

# `term = 20` means forecasting another 20 records
mod_gm <-
  gm(WWWusage_train, term = 20)

# NNet model
mod_nnet <-
  nnetar(WWWusage_train)

# forecast another 20 records
forecast_nnet <-
  forecast(mod_nnet, h = 20)

正如我们在上篇文章的结尾处所说，神经网络模型拥有最低的 RMSE，表明在一定程度上他可以相对良好的拟合我们的案例数据，然而 Hansen (2005) 认为使用最佳的混合模型组合取代单一模型能够长生更好的结果。

因此混合预测模型应运而生，按照 ForecastComb 的说明，这类模型通过几何或回归的方式，将一组预测模型汇合为单独的预测模型。

The R package ForecastComb presents functions to pool individual model forecasts using geometric- and regression-based forecast combination methods. ForecastComb combines the functionality of the packages ForecastCombinations and GeomComb under a unified user interface and convenience functions.

ForecastComb 详细使用指南可以参考 Christoph, Raviv, and Roetzer (2019)。

Forecast Comb 101

数据格式化

具体而言，ForecastComb 支持处理三种数据集：

Only training set;

Training set + future forecasts

Full training + test set

在进行混合预测之前，我们首先需要通过 foreccomb 将数据集转换为 foreccomb 类对象。

参考其参数:

foreccomb(observed_vector, prediction_matrix, 
          newobs = NULL, newpreds = NULL, 
          byrow = FALSE, na.impute = TRUE, criterion = "RMSE")

使用 foreccomb 格式化数据，我们至少需要提供观测值向量（observed_vector）以及多个模型的预测值矩阵（prediction_matrix）。其中 prediction_matrix 中每列为单一模型预测值。

就我们的安利而言，我们需要将 WWWusage_train 定义为 observed_vector，并将 mod_arima, mod_gm 以及 mod_nnet 三个模型中的 Fitted Value 提取并合并为 data.frame 之后定义为 prediction_matrix。

df_fitted_by_mods <-
  data.frame(
    arima = as.numeric(mod_arima$fitted),
    gm = as.numeric(mod_gm$fitted),
    nnet = as.numeric(mod_nnet$fitted)
)

knitr::kable(df_fitted_by_mods)

arima	gm	nnet
87.91200	88.0000	NA
86.30366	127.4155	NA
81.15244	127.3598	83.14060
87.32344	127.3042	86.89844
83.88979	127.2485	86.22192
83.43081	127.1929	84.84035
86.38455	127.1373	86.89844
80.10926	127.0818	83.55223
88.62262	127.0263	87.60547
89.56923	126.9707	90.95200
89.34744	126.9153	90.25237
93.05023	126.8598	92.91612
106.88018	126.8044	107.52216
105.72064	126.7490	110.16199
120.04024	126.6936	122.12901
137.62819	126.6382	139.03600
145.66687	126.5829	147.08846
151.15265	126.5276	151.16116
154.04517	126.4723	153.49332
147.41174	126.4170	147.21139
147.03795	126.3618	144.62680
146.35771	126.3065	144.56693
151.53086	126.2513	148.83363
135.11412	126.1962	137.58690
123.62527	126.1410	124.83029
133.95742	126.0859	131.90867
146.43963	126.0308	144.79762
152.26300	125.9757	152.15441
150.77263	125.9207	150.66859
145.40585	125.8657	144.62680
142.93133	125.8107	141.36986
135.45720	125.7557	136.01915
129.54920	125.7007	130.56519
129.83295	125.6458	130.38460
131.56674	125.5909	132.65891
126.50583	125.5360	129.47603
123.88278	125.4812	126.00064
127.02816	125.4263	128.47001
138.15757	125.3715	137.80578
139.55717	125.3167	140.93649
142.08663	125.2620	141.84314
143.20565	125.2072	142.71139
158.29032	125.1525	155.24808
164.95937	125.0978	165.80278
172.98184	125.0432	172.64921
170.42355	124.9885	171.46607
171.90379	124.9339	170.73737
172.67058	124.8793	171.67232
171.67436	124.8247	170.82890
176.37709	124.7702	174.26492
174.95512	124.7157	174.36043
168.69337	124.6612	168.13930
173.60575	124.6067	170.82890
175.43917	124.5522	174.26492
173.30112	124.4978	172.67612
163.79201	124.4434	163.04148
163.09118	124.3890	159.70846
146.94167	124.3347	145.80884
130.86621	124.2803	130.90889
124.20251	124.2260	123.90240
113.20621	124.1718	114.57303
105.79951	124.1175	105.42121
98.13525	124.0633	97.56579
102.62906	124.0090	100.78698
95.36609	123.9549	96.62393
100.76468	123.9007	99.35709
89.70514	123.8465	91.84973
82.80494	123.7924	84.28110
82.08490	123.7383	83.14060
84.93000	123.6843	85.46564
89.87680	123.6302	90.05750
89.82193	123.5762	91.04678
86.49138	123.5222	87.92357
82.40418	123.4682	84.37283
88.37005	123.4143	87.69998
91.17750	123.3603	91.87126
92.16153	123.3064	92.91612
90.43594	123.2525	91.27047
97.60234	123.1987	96.82519
107.01698	123.1448	108.77635

# The NNet model often generate NA value
# please significantly consider how to
# process NA value and how to explain

# In this case, we use`na.impute = TRUE`
# to solve NA value,
# but it is not always the best practice

library("ForecastComb")

fc_dat <-
  foreccomb(observed_vector = WWWusage_train,
            prediction_matrix = df_fitted_by_mods,
            na.impute = TRUE)

## A subset of the individual forecasts included NA values and has been imputed.

fc_dat

## $Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Forecasts_Train
##        arima       gm      nnet
## 1   87.91200  88.0000 119.21442
## 2   86.30366 127.4155 100.40488
## 3   81.15244 127.3598  83.14060
## 4   87.32344 127.3042  86.89844
## 5   83.88979 127.2485  86.22192
## 6   83.43081 127.1929  84.84035
## 7   86.38455 127.1373  86.89844
## 8   80.10926 127.0818  83.55223
## 9   88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## $nmodels
## [1] 3
## 
## $modelnames
## [1] "arima" "gm"    "nnet" 
## 
## attr(,"class")
## [1] "foreccomb"

在上述步骤中，我们生成了可用于混合模型训练的 fc_dat 数据集，但是应当注意到系统的错误提醒:

non-finite value inf; using BIG value.

A subset of the individual forecasts included NA values and has been imputed.

另外通过查看 fc_dat，我们也不难注意到 nnet 第一个缺失值明显大于 arima 以及 gm，这也可能会造成后续预测的不准确，不过我们且看后续再进行进一步处理。

模型拟合

使用 auto_combine 可以进行混合预测，目前 auto_combine 支持根据 RMSE、MAE 以及 MAPE 确定最优混合参数。

auto_combine(fc_dat)

如果没有意外，这一步无法运行，会提示：

Error in `-.default`(observed_vector, prediction_matrix) : 
  time-series/vector length mismatch

这是因为在 foreccomb 这一步中使用 na.imputat = TRUE 差值缺失值导致了数据趋势错误，故此我们尝试使用 ariam 和 gm 模型的预测值均值用作 nnet 模型预测值的缺失值进行填充：

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:greyforecasting':
## 
##     combine

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Use the mean value of arima and gm 
# as the missing value of nnet
# but this may be not a good practice

df_fitted_by_mods_miss_val_fix <-
  mutate(df_fitted_by_mods,
         nnet = if_else(is.na(nnet),
                        (arima+gm)/2, nnet))

fc_dat_miss_val_fix <-
  foreccomb(observed_vector = WWWusage_train,
            prediction_matrix = df_fitted_by_mods_miss_val_fix,
            na.impute = FALSE)

fc_dat_miss_val_fix

## $Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Forecasts_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##        arima       gm      nnet
##  1  87.91200  88.0000  87.95600
##  2  86.30366 127.4155 106.85958
##  3  81.15244 127.3598  83.14060
##  4  87.32344 127.3042  86.89844
##  5  83.88979 127.2485  86.22192
##  6  83.43081 127.1929  84.84035
##  7  86.38455 127.1373  86.89844
##  8  80.10926 127.0818  83.55223
##  9  88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## $nmodels
## [1] 3
## 
## $modelnames
## [1] "arima" "gm"    "nnet" 
## 
## attr(,"class")
## [1] "foreccomb"

再次进行混合预测：

mod_comb <-
  auto_combine(fc_dat_miss_val_fix)

## Optimization algorithm chooses number of retained models for trimmed eigenvector approach...

## Algorithm finished. Optimized number of retained models: 1

## Optimization algorithm chooses number of retained models for trimmed bias-corrected eigenvector approach...

## Algorithm finished. Optimized number of retained models: 1

## Optimization algorithm chooses trim factor for trimmed mean approach...

## Algorithm finished. Optimized trim factor: 0.34

## Optimization algorithm chooses trim factor for winsorized mean approach...

## Algorithm finished. Optimized trim factor: 0.5

mod_comb

## $Method
## [1] "Ordinary Least Squares Regression"
## 
## $Models
## [1] "arima" "gm"    "nnet" 
## 
## $Fitted
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  87.60187  88.64077  82.35508  88.23902  85.05656  84.54363  87.37633
##  [8]  81.42104  89.46373  90.54932  90.29842  93.85551 107.45621 106.57138
## [15] 120.44347 137.62928 145.50485 150.78402 153.57908 147.10121 146.58597
## [22] 145.95903 150.96563 135.22288 123.87977 133.78489 146.04048 151.84668
## [29] 150.38495 145.07980 142.60139 135.41781 129.65758 129.90293 131.63598
## [36] 126.80058 124.17198 127.20710 137.99064 139.47555 141.84413 142.92213
## [43] 157.53086 164.32235 172.10282 169.68603 170.98753 171.74831 170.78065
## [50] 175.30235 174.00844 167.87406 172.53751 174.43893 172.37890 163.05166
## [57] 162.18794 146.51256 130.83818 124.28445 113.61943 106.24429  98.72029
## [64] 103.03691  96.12530 101.23568  90.63426  83.82711  83.09189  85.84321
## [71]  90.66480  90.67887  87.42765  83.45698  89.12495  91.96486  92.93137
## [78]  91.24424  98.15692 107.54875
## 
## $Accuracy_Train
##                     ME    RMSE      MAE        MPE     MAPE       ACF1
## Test set -1.776303e-16 3.02758 2.402171 -0.0741498 2.056542 -0.0140366
##          Theil's U
## Test set 0.6035688
## 
## $Input_Data
## $Input_Data$Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Input_Data$Forecasts_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##        arima       gm      nnet
##  1  87.91200  88.0000  87.95600
##  2  86.30366 127.4155 106.85958
##  3  81.15244 127.3598  83.14060
##  4  87.32344 127.3042  86.89844
##  5  83.88979 127.2485  86.22192
##  6  83.43081 127.1929  84.84035
##  7  86.38455 127.1373  86.89844
##  8  80.10926 127.0818  83.55223
##  9  88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## 
## $Predict
## function (object, newpreds) 
## {
##     coef <- c(object$Intercept, object$Weights)
##     pred <- as.vector(coef %*% t(cbind(1, newpreds)))
##     return(pred)
## }
## <bytecode: 0x7f8cd87d7768>
## <environment: namespace:ForecastComb>
## 
## $Intercept
## [1] -1.326066
## 
## $Weights
## [1] 0.91320942 0.03167767 0.06660417
## 
## attr(,"class")
## [1] "foreccomb_res"

结果显示通过 Ordinary Least Squares Regression (最小二乘法) 进行混合预测可以得到最小的 RMSE，此时 arima、gm、nnet 三模型的权重依次是 0.9132094, 0.0316777, 0.0666042。然而我们依然应该使用测试集验证模型方可用于进一步的预测。

模型验证

自动完成

模型验证可以在 auto_combine 之前的数据预处理过程中一并完成：

df_pred_by_mods <-
  data.frame(
    arima = as.numeric(forecast_arima$mean),
    gm = as.numeric(mod_gm$forecasts),
    nnet = as.numeric(forecast_nnet$mean)
)

fc_train_n_test <-
  foreccomb(observed_vector = WWWusage_train,
            prediction_matrix = df_fitted_by_mods_miss_val_fix,
            newobs = WWWusage_test,
            newpreds = as.matrix(df_pred_by_mods),
            na.impute = FALSE)

mod_train_n_test_comb <-
  auto_combine(fc_train_n_test)

## Optimization algorithm chooses number of retained models for trimmed eigenvector approach...

## Algorithm finished. Optimized number of retained models: 1

## Optimization algorithm chooses number of retained models for trimmed bias-corrected eigenvector approach...

## Algorithm finished. Optimized number of retained models: 1

## Optimization algorithm chooses trim factor for trimmed mean approach...

## Algorithm finished. Optimized trim factor: 0.34

## Optimization algorithm chooses trim factor for winsorized mean approach...

## Algorithm finished. Optimized trim factor: 0.5

fc_train_n_test

## $Actual_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##  [1]  88  84  85  85  84  85  83  85  88  89  91  99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102  99  99  95  88  84  84  87  89  88  85  86  89  91
## [77]  91  94 101 110
## 
## $Forecasts_Train
## Time Series:
## Start = 1 
## End = 80 
## Frequency = 1 
##        arima       gm      nnet
##  1  87.91200  88.0000  87.95600
##  2  86.30366 127.4155 106.85958
##  3  81.15244 127.3598  83.14060
##  4  87.32344 127.3042  86.89844
##  5  83.88979 127.2485  86.22192
##  6  83.43081 127.1929  84.84035
##  7  86.38455 127.1373  86.89844
##  8  80.10926 127.0818  83.55223
##  9  88.62262 127.0263  87.60547
## 10  89.56923 126.9707  90.95200
## 11  89.34744 126.9153  90.25237
## 12  93.05023 126.8598  92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63  98.13525 124.0633  97.56579
## 64 102.62906 124.0090 100.78698
## 65  95.36609 123.9549  96.62393
## 66 100.76468 123.9007  99.35709
## 67  89.70514 123.8465  91.84973
## 68  82.80494 123.7924  84.28110
## 69  82.08490 123.7383  83.14060
## 70  84.93000 123.6843  85.46564
## 71  89.87680 123.6302  90.05750
## 72  89.82193 123.5762  91.04678
## 73  86.49138 123.5222  87.92357
## 74  82.40418 123.4682  84.37283
## 75  88.37005 123.4143  87.69998
## 76  91.17750 123.3603  91.87126
## 77  92.16153 123.3064  92.91612
## 78  90.43594 123.2525  91.27047
## 79  97.60234 123.1987  96.82519
## 80 107.01698 123.1448 108.77635
## 
## $Actual_Test
##  [1] 121 135 145 149 156 165 171 175 177 182 193 204 208 210 215 222 228 226 222
## [20] 220
## 
## $Forecasts_Test
##          arima       gm     nnet
##  [1,] 117.0633 123.0910 121.0024
##  [2,] 121.4699 123.0372 132.9642
##  [3,] 124.2189 122.9835 142.8745
##  [4,] 125.9340 122.9297 149.8028
##  [5,] 127.0039 122.8760 154.0405
##  [6,] 127.6714 122.8223 155.9769
##  [7,] 128.0878 122.7686 155.8329
##  [8,] 128.3476 122.7150 153.6791
##  [9,] 128.5097 122.6614 149.6578
## [10,] 128.6108 122.6078 144.4252
## [11,] 128.6739 122.5542 139.2503
## [12,] 128.7132 122.5006 135.3176
## [13,] 128.7378 122.4471 133.1738
## [14,] 128.7531 122.3936 132.7999
## [15,] 128.7626 122.3401 133.8105
## [16,] 128.7686 122.2867 135.6047
## [17,] 128.7723 122.2332 137.5464
## [18,] 128.7746 122.1798 139.1455
## [19,] 128.7761 122.1264 140.1396
## [20,] 128.7770 122.0731 140.4769
## 
## $nmodels
## [1] 3
## 
## $modelnames
## [1] "arima" "gm"    "nnet" 
## 
## attr(,"class")
## [1] "foreccomb"

此时可以通过 mod_train_n_test_comb$Forecasts_Test 获得模型预测值，通过 Forecasts_Test$Accuracy_Test 获得包含 RMSE 在内的预测准确度指标。

手动实现

然而在某些情况下，在这个案例中 auto_combine 会报遇到报错：

Error in newpred_matrix %*% weights : 
  requires numeric/complex matrix/vector arguments

好在，案例中使用了最小二乘法，也就是线性回归，所以通过截距与斜率（变量的权重）也很容易手动混合预测：

library("purrr")
forecast_comb <-
  mod_comb$Intercept +
  reduce(map2(mod_comb$Weights, df_pred_by_mods, `*`),`+`)

forecast_comb

##  [1] 117.5358 122.3549 125.5237 127.5496 128.8073 129.5441 129.9131 130.0052
##  [9] 129.8836 129.6258 129.3370 129.1093 128.9873 128.9746 129.0490 129.1722
## [17] 129.3033 129.4102 129.4760 129.4976

rmse_comb <-
  sqrt(mean((as.numeric(forecast_comb) - as.numeric(WWWusage_test))^2, na.rm = TRUE))

rmse_comb

## [1] 65.56671

混合预测模型的 RMSE = 65.5667112，与上篇中 ARIMA 模型性能相似，实际上根据 arima、gm、nnet 三模型的权重 0.9132094, 0.0316777, 0.0666042 就不难猜出，此案例中混合模型结果应该是相当接近于 ARIMA 的预测结果的。

至于如何选择模型，还需要根据实际的应用情况进一步考虑。

尾声

截止至 2021 年 5 月，ForecastComb 距离上一次提交已经过去将近 3 年，程序运行不顺利在所难免——就像我们提到的 auto_combine 报错。此外，并非所有的场景都会选择出最小二乘法用作组合模型的方法，但是只要阅读 ForecastComb 源代码中的 R 脚本，通过手动实现预测也并不困难。

欢迎通过邮箱，微博, Twitter以及知乎与我联系。也欢迎关注我的博客。如果能对我的 Github 感兴趣，就再欢迎不过啦！

Christoph, E., Eran Raviv, and Gernot Roetzer. 2019. “Forecast Combinations in R Using the ForecastComb Package.” The R Journal 10 (January): 262. https://doi.org/10.32614/RJ-2018-052.

Hansen, Bruce E. 2005. “Challenges for Econometric Model Selection.” Econometric Theory 21 (01). https://doi.org/10.1017/S0266466605050048.

Thursday, April 29, 2021