在上篇文章《R for Everything: 时间序列预测—— ARIMA、灰色模型 GM(1,1)、神经网络与混合预测(上)》中,我们分别利用 ARIMA、GM(1,1) 和基于时间序列的神经网络对 Internet Usage per Minute 数据集进行了 20 条新记录的预测,简要代码如下:
set.seed(1234)
data("WWWusage")
# Select the first 80 records as the train dataset
WWWusage_train <-
WWWusage[1:80]
# Select the last 20 records as the test dataset
WWWusage_test <-
WWWusage[81:100]
library("forecast")
# ARIMA model with best p,d,q combination by AAIC selection
mod_arima <-
auto.arima(WWWusage_train)
# forecast another 20 records
forecast_arima <-
forecast(mod_arima, h = 20)
# GM(1,1) model
library("greyforecasting")
# `term = 20` means forecasting another 20 records
mod_gm <-
gm(WWWusage_train, term = 20)
# NNet model
mod_nnet <-
nnetar(WWWusage_train)
# forecast another 20 records
forecast_nnet <-
forecast(mod_nnet, h = 20)
正如我们在上篇文章的结尾处所说,神经网络模型拥有最低的 RMSE,表明在一定程度上他可以相对良好的拟合我们的案例数据,然而 Hansen (2005) 认为使用最佳的混合模型组合取代单一模型能够长生更好的结果。
因此混合预测模型应运而生,按照 ForecastComb
的说明,这类模型通过几何或回归的方式,将一组预测模型汇合为单独的预测模型。
The R package ForecastComb presents functions to pool individual model forecasts using geometric- and regression-based forecast combination methods. ForecastComb combines the functionality of the packages ForecastCombinations and GeomComb under a unified user interface and convenience functions.
ForecastComb
详细使用指南可以参考 Christoph, Raviv, and Roetzer (2019)。
Forecast Comb 101
数据格式化
具体而言,ForecastComb
支持处理三种数据集:
- Only training set;
- Training set + future forecasts
- Full training + test set
在进行混合预测之前,我们首先需要通过 foreccomb
将数据集转换为 foreccomb
类对象。
参考其参数:
foreccomb(observed_vector, prediction_matrix,
newobs = NULL, newpreds = NULL,
byrow = FALSE, na.impute = TRUE, criterion = "RMSE")
使用 foreccomb
格式化数据,我们至少需要提供观测值向量(observed_vector
)以及多个模型的预测值矩阵(prediction_matrix
)。其中 prediction_matrix
中每列为单一模型预测值。
就我们的安利而言,我们需要将 WWWusage_train
定义为 observed_vector
,并将 mod_arima
, mod_gm
以及 mod_nnet
三个模型中的 Fitted Value 提取并合并为 data.frame
之后定义为 prediction_matrix
。
df_fitted_by_mods <-
data.frame(
arima = as.numeric(mod_arima$fitted),
gm = as.numeric(mod_gm$fitted),
nnet = as.numeric(mod_nnet$fitted)
)
knitr::kable(df_fitted_by_mods)
arima | gm | nnet |
---|---|---|
87.91200 | 88.0000 | NA |
86.30366 | 127.4155 | NA |
81.15244 | 127.3598 | 83.14060 |
87.32344 | 127.3042 | 86.89844 |
83.88979 | 127.2485 | 86.22192 |
83.43081 | 127.1929 | 84.84035 |
86.38455 | 127.1373 | 86.89844 |
80.10926 | 127.0818 | 83.55223 |
88.62262 | 127.0263 | 87.60547 |
89.56923 | 126.9707 | 90.95200 |
89.34744 | 126.9153 | 90.25237 |
93.05023 | 126.8598 | 92.91612 |
106.88018 | 126.8044 | 107.52216 |
105.72064 | 126.7490 | 110.16199 |
120.04024 | 126.6936 | 122.12901 |
137.62819 | 126.6382 | 139.03600 |
145.66687 | 126.5829 | 147.08846 |
151.15265 | 126.5276 | 151.16116 |
154.04517 | 126.4723 | 153.49332 |
147.41174 | 126.4170 | 147.21139 |
147.03795 | 126.3618 | 144.62680 |
146.35771 | 126.3065 | 144.56693 |
151.53086 | 126.2513 | 148.83363 |
135.11412 | 126.1962 | 137.58690 |
123.62527 | 126.1410 | 124.83029 |
133.95742 | 126.0859 | 131.90867 |
146.43963 | 126.0308 | 144.79762 |
152.26300 | 125.9757 | 152.15441 |
150.77263 | 125.9207 | 150.66859 |
145.40585 | 125.8657 | 144.62680 |
142.93133 | 125.8107 | 141.36986 |
135.45720 | 125.7557 | 136.01915 |
129.54920 | 125.7007 | 130.56519 |
129.83295 | 125.6458 | 130.38460 |
131.56674 | 125.5909 | 132.65891 |
126.50583 | 125.5360 | 129.47603 |
123.88278 | 125.4812 | 126.00064 |
127.02816 | 125.4263 | 128.47001 |
138.15757 | 125.3715 | 137.80578 |
139.55717 | 125.3167 | 140.93649 |
142.08663 | 125.2620 | 141.84314 |
143.20565 | 125.2072 | 142.71139 |
158.29032 | 125.1525 | 155.24808 |
164.95937 | 125.0978 | 165.80278 |
172.98184 | 125.0432 | 172.64921 |
170.42355 | 124.9885 | 171.46607 |
171.90379 | 124.9339 | 170.73737 |
172.67058 | 124.8793 | 171.67232 |
171.67436 | 124.8247 | 170.82890 |
176.37709 | 124.7702 | 174.26492 |
174.95512 | 124.7157 | 174.36043 |
168.69337 | 124.6612 | 168.13930 |
173.60575 | 124.6067 | 170.82890 |
175.43917 | 124.5522 | 174.26492 |
173.30112 | 124.4978 | 172.67612 |
163.79201 | 124.4434 | 163.04148 |
163.09118 | 124.3890 | 159.70846 |
146.94167 | 124.3347 | 145.80884 |
130.86621 | 124.2803 | 130.90889 |
124.20251 | 124.2260 | 123.90240 |
113.20621 | 124.1718 | 114.57303 |
105.79951 | 124.1175 | 105.42121 |
98.13525 | 124.0633 | 97.56579 |
102.62906 | 124.0090 | 100.78698 |
95.36609 | 123.9549 | 96.62393 |
100.76468 | 123.9007 | 99.35709 |
89.70514 | 123.8465 | 91.84973 |
82.80494 | 123.7924 | 84.28110 |
82.08490 | 123.7383 | 83.14060 |
84.93000 | 123.6843 | 85.46564 |
89.87680 | 123.6302 | 90.05750 |
89.82193 | 123.5762 | 91.04678 |
86.49138 | 123.5222 | 87.92357 |
82.40418 | 123.4682 | 84.37283 |
88.37005 | 123.4143 | 87.69998 |
91.17750 | 123.3603 | 91.87126 |
92.16153 | 123.3064 | 92.91612 |
90.43594 | 123.2525 | 91.27047 |
97.60234 | 123.1987 | 96.82519 |
107.01698 | 123.1448 | 108.77635 |
# The NNet model often generate NA value
# please significantly consider how to
# process NA value and how to explain
# In this case, we use`na.impute = TRUE`
# to solve NA value,
# but it is not always the best practice
library("ForecastComb")
fc_dat <-
foreccomb(observed_vector = WWWusage_train,
prediction_matrix = df_fitted_by_mods,
na.impute = TRUE)
## A subset of the individual forecasts included NA values and has been imputed.
fc_dat
## $Actual_Train
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## [1] 88 84 85 85 84 85 83 85 88 89 91 99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102 99 99 95 88 84 84 87 89 88 85 86 89 91
## [77] 91 94 101 110
##
## $Forecasts_Train
## arima gm nnet
## 1 87.91200 88.0000 119.21442
## 2 86.30366 127.4155 100.40488
## 3 81.15244 127.3598 83.14060
## 4 87.32344 127.3042 86.89844
## 5 83.88979 127.2485 86.22192
## 6 83.43081 127.1929 84.84035
## 7 86.38455 127.1373 86.89844
## 8 80.10926 127.0818 83.55223
## 9 88.62262 127.0263 87.60547
## 10 89.56923 126.9707 90.95200
## 11 89.34744 126.9153 90.25237
## 12 93.05023 126.8598 92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63 98.13525 124.0633 97.56579
## 64 102.62906 124.0090 100.78698
## 65 95.36609 123.9549 96.62393
## 66 100.76468 123.9007 99.35709
## 67 89.70514 123.8465 91.84973
## 68 82.80494 123.7924 84.28110
## 69 82.08490 123.7383 83.14060
## 70 84.93000 123.6843 85.46564
## 71 89.87680 123.6302 90.05750
## 72 89.82193 123.5762 91.04678
## 73 86.49138 123.5222 87.92357
## 74 82.40418 123.4682 84.37283
## 75 88.37005 123.4143 87.69998
## 76 91.17750 123.3603 91.87126
## 77 92.16153 123.3064 92.91612
## 78 90.43594 123.2525 91.27047
## 79 97.60234 123.1987 96.82519
## 80 107.01698 123.1448 108.77635
##
## $nmodels
## [1] 3
##
## $modelnames
## [1] "arima" "gm" "nnet"
##
## attr(,"class")
## [1] "foreccomb"
在上述步骤中,我们生成了可用于混合模型训练的 fc_dat
数据集,但是应当注意到系统的错误提醒:
non-finite value inf; using BIG value.
A subset of the individual forecasts included NA values and has been imputed.
另外通过查看 fc_dat
,我们也不难注意到 nnet 第一个缺失值明显大于 arima
以及 gm
,这也可能会造成后续预测的不准确,不过我们且看后续再进行进一步处理。
模型拟合
使用 auto_combine
可以进行混合预测,目前 auto_combine
支持根据 RMSE、MAE 以及 MAPE 确定最优混合参数。
auto_combine(fc_dat)
如果没有意外,这一步无法运行,会提示:
Error in `-.default`(observed_vector, prediction_matrix) :
time-series/vector length mismatch
这是因为在 foreccomb
这一步中使用 na.imputat = TRUE
差值缺失值导致了数据趋势错误,故此我们尝试使用 ariam 和 gm 模型的预测值均值用作 nnet 模型预测值的缺失值进行填充:
library("dplyr")
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:greyforecasting':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Use the mean value of arima and gm
# as the missing value of nnet
# but this may be not a good practice
df_fitted_by_mods_miss_val_fix <-
mutate(df_fitted_by_mods,
nnet = if_else(is.na(nnet),
(arima+gm)/2, nnet))
fc_dat_miss_val_fix <-
foreccomb(observed_vector = WWWusage_train,
prediction_matrix = df_fitted_by_mods_miss_val_fix,
na.impute = FALSE)
fc_dat_miss_val_fix
## $Actual_Train
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## [1] 88 84 85 85 84 85 83 85 88 89 91 99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102 99 99 95 88 84 84 87 89 88 85 86 89 91
## [77] 91 94 101 110
##
## $Forecasts_Train
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## arima gm nnet
## 1 87.91200 88.0000 87.95600
## 2 86.30366 127.4155 106.85958
## 3 81.15244 127.3598 83.14060
## 4 87.32344 127.3042 86.89844
## 5 83.88979 127.2485 86.22192
## 6 83.43081 127.1929 84.84035
## 7 86.38455 127.1373 86.89844
## 8 80.10926 127.0818 83.55223
## 9 88.62262 127.0263 87.60547
## 10 89.56923 126.9707 90.95200
## 11 89.34744 126.9153 90.25237
## 12 93.05023 126.8598 92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63 98.13525 124.0633 97.56579
## 64 102.62906 124.0090 100.78698
## 65 95.36609 123.9549 96.62393
## 66 100.76468 123.9007 99.35709
## 67 89.70514 123.8465 91.84973
## 68 82.80494 123.7924 84.28110
## 69 82.08490 123.7383 83.14060
## 70 84.93000 123.6843 85.46564
## 71 89.87680 123.6302 90.05750
## 72 89.82193 123.5762 91.04678
## 73 86.49138 123.5222 87.92357
## 74 82.40418 123.4682 84.37283
## 75 88.37005 123.4143 87.69998
## 76 91.17750 123.3603 91.87126
## 77 92.16153 123.3064 92.91612
## 78 90.43594 123.2525 91.27047
## 79 97.60234 123.1987 96.82519
## 80 107.01698 123.1448 108.77635
##
## $nmodels
## [1] 3
##
## $modelnames
## [1] "arima" "gm" "nnet"
##
## attr(,"class")
## [1] "foreccomb"
再次进行混合预测:
mod_comb <-
auto_combine(fc_dat_miss_val_fix)
## Optimization algorithm chooses number of retained models for trimmed eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses number of retained models for trimmed bias-corrected eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses trim factor for trimmed mean approach...
## Algorithm finished. Optimized trim factor: 0.34
## Optimization algorithm chooses trim factor for winsorized mean approach...
## Algorithm finished. Optimized trim factor: 0.5
mod_comb
## $Method
## [1] "Ordinary Least Squares Regression"
##
## $Models
## [1] "arima" "gm" "nnet"
##
## $Fitted
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## [1] 87.60187 88.64077 82.35508 88.23902 85.05656 84.54363 87.37633
## [8] 81.42104 89.46373 90.54932 90.29842 93.85551 107.45621 106.57138
## [15] 120.44347 137.62928 145.50485 150.78402 153.57908 147.10121 146.58597
## [22] 145.95903 150.96563 135.22288 123.87977 133.78489 146.04048 151.84668
## [29] 150.38495 145.07980 142.60139 135.41781 129.65758 129.90293 131.63598
## [36] 126.80058 124.17198 127.20710 137.99064 139.47555 141.84413 142.92213
## [43] 157.53086 164.32235 172.10282 169.68603 170.98753 171.74831 170.78065
## [50] 175.30235 174.00844 167.87406 172.53751 174.43893 172.37890 163.05166
## [57] 162.18794 146.51256 130.83818 124.28445 113.61943 106.24429 98.72029
## [64] 103.03691 96.12530 101.23568 90.63426 83.82711 83.09189 85.84321
## [71] 90.66480 90.67887 87.42765 83.45698 89.12495 91.96486 92.93137
## [78] 91.24424 98.15692 107.54875
##
## $Accuracy_Train
## ME RMSE MAE MPE MAPE ACF1
## Test set -1.776303e-16 3.02758 2.402171 -0.0741498 2.056542 -0.0140366
## Theil's U
## Test set 0.6035688
##
## $Input_Data
## $Input_Data$Actual_Train
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## [1] 88 84 85 85 84 85 83 85 88 89 91 99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102 99 99 95 88 84 84 87 89 88 85 86 89 91
## [77] 91 94 101 110
##
## $Input_Data$Forecasts_Train
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## arima gm nnet
## 1 87.91200 88.0000 87.95600
## 2 86.30366 127.4155 106.85958
## 3 81.15244 127.3598 83.14060
## 4 87.32344 127.3042 86.89844
## 5 83.88979 127.2485 86.22192
## 6 83.43081 127.1929 84.84035
## 7 86.38455 127.1373 86.89844
## 8 80.10926 127.0818 83.55223
## 9 88.62262 127.0263 87.60547
## 10 89.56923 126.9707 90.95200
## 11 89.34744 126.9153 90.25237
## 12 93.05023 126.8598 92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63 98.13525 124.0633 97.56579
## 64 102.62906 124.0090 100.78698
## 65 95.36609 123.9549 96.62393
## 66 100.76468 123.9007 99.35709
## 67 89.70514 123.8465 91.84973
## 68 82.80494 123.7924 84.28110
## 69 82.08490 123.7383 83.14060
## 70 84.93000 123.6843 85.46564
## 71 89.87680 123.6302 90.05750
## 72 89.82193 123.5762 91.04678
## 73 86.49138 123.5222 87.92357
## 74 82.40418 123.4682 84.37283
## 75 88.37005 123.4143 87.69998
## 76 91.17750 123.3603 91.87126
## 77 92.16153 123.3064 92.91612
## 78 90.43594 123.2525 91.27047
## 79 97.60234 123.1987 96.82519
## 80 107.01698 123.1448 108.77635
##
##
## $Predict
## function (object, newpreds)
## {
## coef <- c(object$Intercept, object$Weights)
## pred <- as.vector(coef %*% t(cbind(1, newpreds)))
## return(pred)
## }
## <bytecode: 0x7f8cd87d7768>
## <environment: namespace:ForecastComb>
##
## $Intercept
## [1] -1.326066
##
## $Weights
## [1] 0.91320942 0.03167767 0.06660417
##
## attr(,"class")
## [1] "foreccomb_res"
结果显示通过 Ordinary Least Squares Regression (最小二乘法) 进行混合预测可以得到最小的 RMSE,此时 arima、gm、nnet 三模型的权重依次是 0.9132094, 0.0316777, 0.0666042。然而我们依然应该使用测试集验证模型方可用于进一步的预测。
模型验证
自动完成
模型验证可以在 auto_combine
之前的数据预处理过程中一并完成:
df_pred_by_mods <-
data.frame(
arima = as.numeric(forecast_arima$mean),
gm = as.numeric(mod_gm$forecasts),
nnet = as.numeric(forecast_nnet$mean)
)
fc_train_n_test <-
foreccomb(observed_vector = WWWusage_train,
prediction_matrix = df_fitted_by_mods_miss_val_fix,
newobs = WWWusage_test,
newpreds = as.matrix(df_pred_by_mods),
na.impute = FALSE)
mod_train_n_test_comb <-
auto_combine(fc_train_n_test)
## Optimization algorithm chooses number of retained models for trimmed eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses number of retained models for trimmed bias-corrected eigenvector approach...
## Algorithm finished. Optimized number of retained models: 1
## Optimization algorithm chooses trim factor for trimmed mean approach...
## Algorithm finished. Optimized trim factor: 0.34
## Optimization algorithm chooses trim factor for winsorized mean approach...
## Algorithm finished. Optimized trim factor: 0.5
fc_train_n_test
## $Actual_Train
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## [1] 88 84 85 85 84 85 83 85 88 89 91 99 104 112 126 138 146 151 150
## [20] 148 147 149 143 132 131 139 147 150 148 145 140 134 131 131 129 126 126 132
## [39] 137 140 142 150 159 167 170 171 172 172 174 175 172 172 174 174 169 165 156
## [58] 142 131 121 112 104 102 99 99 95 88 84 84 87 89 88 85 86 89 91
## [77] 91 94 101 110
##
## $Forecasts_Train
## Time Series:
## Start = 1
## End = 80
## Frequency = 1
## arima gm nnet
## 1 87.91200 88.0000 87.95600
## 2 86.30366 127.4155 106.85958
## 3 81.15244 127.3598 83.14060
## 4 87.32344 127.3042 86.89844
## 5 83.88979 127.2485 86.22192
## 6 83.43081 127.1929 84.84035
## 7 86.38455 127.1373 86.89844
## 8 80.10926 127.0818 83.55223
## 9 88.62262 127.0263 87.60547
## 10 89.56923 126.9707 90.95200
## 11 89.34744 126.9153 90.25237
## 12 93.05023 126.8598 92.91612
## 13 106.88018 126.8044 107.52216
## 14 105.72064 126.7490 110.16199
## 15 120.04024 126.6936 122.12901
## 16 137.62819 126.6382 139.03600
## 17 145.66687 126.5829 147.08846
## 18 151.15265 126.5276 151.16116
## 19 154.04517 126.4723 153.49332
## 20 147.41174 126.4170 147.21139
## 21 147.03795 126.3618 144.62680
## 22 146.35771 126.3065 144.56693
## 23 151.53086 126.2513 148.83363
## 24 135.11412 126.1962 137.58690
## 25 123.62527 126.1410 124.83029
## 26 133.95742 126.0859 131.90867
## 27 146.43963 126.0308 144.79762
## 28 152.26300 125.9757 152.15441
## 29 150.77263 125.9207 150.66859
## 30 145.40585 125.8657 144.62680
## 31 142.93133 125.8107 141.36986
## 32 135.45720 125.7557 136.01915
## 33 129.54920 125.7007 130.56519
## 34 129.83295 125.6458 130.38460
## 35 131.56674 125.5909 132.65891
## 36 126.50583 125.5360 129.47603
## 37 123.88278 125.4812 126.00064
## 38 127.02816 125.4263 128.47001
## 39 138.15757 125.3715 137.80578
## 40 139.55717 125.3167 140.93649
## 41 142.08663 125.2620 141.84314
## 42 143.20565 125.2072 142.71139
## 43 158.29032 125.1525 155.24808
## 44 164.95937 125.0978 165.80278
## 45 172.98184 125.0432 172.64921
## 46 170.42355 124.9885 171.46607
## 47 171.90379 124.9339 170.73737
## 48 172.67058 124.8793 171.67232
## 49 171.67436 124.8247 170.82890
## 50 176.37709 124.7702 174.26492
## 51 174.95512 124.7157 174.36043
## 52 168.69337 124.6612 168.13930
## 53 173.60575 124.6067 170.82890
## 54 175.43917 124.5522 174.26492
## 55 173.30112 124.4978 172.67612
## 56 163.79201 124.4434 163.04148
## 57 163.09118 124.3890 159.70846
## 58 146.94167 124.3347 145.80884
## 59 130.86621 124.2803 130.90889
## 60 124.20251 124.2260 123.90240
## 61 113.20621 124.1718 114.57303
## 62 105.79951 124.1175 105.42121
## 63 98.13525 124.0633 97.56579
## 64 102.62906 124.0090 100.78698
## 65 95.36609 123.9549 96.62393
## 66 100.76468 123.9007 99.35709
## 67 89.70514 123.8465 91.84973
## 68 82.80494 123.7924 84.28110
## 69 82.08490 123.7383 83.14060
## 70 84.93000 123.6843 85.46564
## 71 89.87680 123.6302 90.05750
## 72 89.82193 123.5762 91.04678
## 73 86.49138 123.5222 87.92357
## 74 82.40418 123.4682 84.37283
## 75 88.37005 123.4143 87.69998
## 76 91.17750 123.3603 91.87126
## 77 92.16153 123.3064 92.91612
## 78 90.43594 123.2525 91.27047
## 79 97.60234 123.1987 96.82519
## 80 107.01698 123.1448 108.77635
##
## $Actual_Test
## [1] 121 135 145 149 156 165 171 175 177 182 193 204 208 210 215 222 228 226 222
## [20] 220
##
## $Forecasts_Test
## arima gm nnet
## [1,] 117.0633 123.0910 121.0024
## [2,] 121.4699 123.0372 132.9642
## [3,] 124.2189 122.9835 142.8745
## [4,] 125.9340 122.9297 149.8028
## [5,] 127.0039 122.8760 154.0405
## [6,] 127.6714 122.8223 155.9769
## [7,] 128.0878 122.7686 155.8329
## [8,] 128.3476 122.7150 153.6791
## [9,] 128.5097 122.6614 149.6578
## [10,] 128.6108 122.6078 144.4252
## [11,] 128.6739 122.5542 139.2503
## [12,] 128.7132 122.5006 135.3176
## [13,] 128.7378 122.4471 133.1738
## [14,] 128.7531 122.3936 132.7999
## [15,] 128.7626 122.3401 133.8105
## [16,] 128.7686 122.2867 135.6047
## [17,] 128.7723 122.2332 137.5464
## [18,] 128.7746 122.1798 139.1455
## [19,] 128.7761 122.1264 140.1396
## [20,] 128.7770 122.0731 140.4769
##
## $nmodels
## [1] 3
##
## $modelnames
## [1] "arima" "gm" "nnet"
##
## attr(,"class")
## [1] "foreccomb"
此时可以通过 mod_train_n_test_comb$Forecasts_Test
获得模型预测值,通过 Forecasts_Test$Accuracy_Test
获得包含 RMSE 在内的预测准确度指标。
手动实现
然而在某些情况下,在这个案例中 auto_combine
会报遇到报错:
Error in newpred_matrix %*% weights :
requires numeric/complex matrix/vector arguments
好在,案例中使用了最小二乘法,也就是线性回归,所以通过截距与斜率(变量的权重)也很容易手动混合预测:
library("purrr")
forecast_comb <-
mod_comb$Intercept +
reduce(map2(mod_comb$Weights, df_pred_by_mods, `*`),`+`)
forecast_comb
## [1] 117.5358 122.3549 125.5237 127.5496 128.8073 129.5441 129.9131 130.0052
## [9] 129.8836 129.6258 129.3370 129.1093 128.9873 128.9746 129.0490 129.1722
## [17] 129.3033 129.4102 129.4760 129.4976
rmse_comb <-
sqrt(mean((as.numeric(forecast_comb) - as.numeric(WWWusage_test))^2, na.rm = TRUE))
rmse_comb
## [1] 65.56671
混合预测模型的 RMSE = 65.5667112,与上篇中 ARIMA 模型性能相似,实际上根据 arima、gm、nnet 三模型的权重 0.9132094, 0.0316777, 0.0666042 就不难猜出,此案例中混合模型结果应该是相当接近于 ARIMA 的预测结果的。
至于如何选择模型,还需要根据实际的应用情况进一步考虑。
尾声
截止至 2021 年 5 月,ForecastComb
距离上一次提交已经过去将近 3 年,程序运行不顺利在所难免——就像我们提到的 auto_combine
报错。此外, 并非所有的场景都会选择出最小二乘法用作组合模型的方法,但是只要阅读 ForecastComb
源代码中的 R 脚本,通过手动实现预测也并不困难。
欢迎通过邮箱,微博, Twitter以及知乎与我联系。也欢迎关注我的博客。如果能对我的 Github 感兴趣,就再欢迎不过啦!