This post is part of a series on extending the Prophet forecasting model so that it can do other things. You should read part one on the default Prophet forecasting model first.
For me, one of the most useful features of Prophet is how easily you can add regression features to the model. This enables you to provide information to the machine learning model about things like the dates when email campaigns were sent or when your PPC budget increased; and you can use your marketing calendar to add information about when these things will happen in the future too.
Being able to do this in a spreadsheet is one of the main things that makes Forecast Forge great.
This works great when you can know the future values of your regressors. But what if you don’t? Can you make a forecast for a metric and then use the output of this forecast as a regressor for another forecast? Kind of…
One can also use as a regressor another time series that has been forecasted with a time series model, such as Prophet. For instance, if r(t) is included as a regressor for y(t), Prophet can be used to forecast r(t) and then that forecast can be plugged in as the future values when forecasting y(t). A note of caution around this approach: This will probably not be useful unless r(t) is somehow easier to forecast then y(t). This is because error in the forecast of r(t) will produce error in the forecast of y(t) – The prophet documentation
You can use the point estimate from one forecast as the input to another but there is no way to propagate the uncertainty. This is what I will show you how to do in this post.
Recall the default Stan model from last time:
y ~ normal(
trend
.* (1 + X * (beta .* s_m))
+ X * (beta .* s_a),
sigma_obs
);
X
is a matrix of regressor columns; we need to somehow extend this to include
the output and uncertainties from another forecast.
To make things clearer, let’s have y_1
as the metric for the first forecast
that doesn’t depend on the results of any other forecast and then y_2
as the
metric which depends on y_1
as a regressor.
The model for y_1
is just the same as before:
y_1 ~ normal(
trend_1
.* (1 + X_1 * (beta_1 .* s_m_1))
+ X_1 * (beta_1 .* s_a_1),
sigma_obs_1
);
Notice how we also have specific X
, beta
, sigma_obs
etc. for this
timeseries.
Thanks to the magic of Stan we don’t need to worry about uncertainty or anything
like that (Stan will figure that out for us); all that needs to happen is to add
y_1
as an extra column in X_2
.
y_2 ~ normal(
trend_2
.* (1 + append_col(X_2,y_1) * (beta_2 .* s_m_2))
+ append_col(X_2,y_1) * (beta_2 .* s_a_2),
sigma_obs_2
);
This means that the variables beta_2
, s_m_2
and s_a_2
all need to be one
row larger than they are by default. The extra row in beta_2
is the regression
coefficient for y_1
on y_2
and the extra row in s_m_2
and s_a_2
tells
the model whether this is an additive or multiplicative regression.
The model is the key part; the rest (apart from one part that we’ll get to
later) is book keeping. The only tricky part is knowing which parameters need
extra rows because there is an extra y_1
in a matrix multiplication and which
do not.
Here is the model:
functions {
matrix get_changepoint_matrix(vector t, vector t_change, int T, int S) {
// Assumes t and t_change are sorted.
matrix[T, S] A;
row_vector[S] a_row;
int cp_idx;
// Start with an empty matrix.
A = rep_matrix(0, T, S);
a_row = rep_row_vector(0, S);
cp_idx = 1;
// Fill in each row of A.
for (i in 1:T) {
while ((cp_idx <= S) && (t[i] >= t_change[cp_idx])) {
a_row[cp_idx] = 1;
cp_idx = cp_idx + 1;
}
A[i] = a_row;
}
return A;
}
// Logistic trend functions
vector logistic_gamma(real k, real m, vector delta, vector t_change, int S) {
vector[S] gamma; // adjusted offsets, for piecewise continuity
vector[S + 1] k_s; // actual rate in each segment
real m_pr;
// Compute the rate in each segment
k_s = append_row(k, k + cumulative_sum(delta));
// Piecewise offsets
m_pr = m; // The offset in the previous segment
for (i in 1:S) {
gamma[i] = (t_change[i] - m_pr) * (1 - k_s[i] / k_s[i + 1]);
m_pr = m_pr + gamma[i]; // update for the next segment
}
return gamma;
}
vector logistic_trend(
real k,
real m,
vector delta,
vector t,
vector cap,
matrix A,
vector t_change,
int S
) {
vector[S] gamma;
gamma = logistic_gamma(k, m, delta, t_change, S);
return cap .* inv_logit((k + A * delta) .* (t - (m + A * gamma)));
}
// Linear trend function
vector linear_trend(
real k,
real m,
vector delta,
vector t,
matrix A,
vector t_change
) {
return (k + A * delta) .* t + (m + A * (-t_change .* delta));
}
// Flat trend function
vector flat_trend(
real m,
int T
) {
return rep_vector(m, T);
}
}
data {
int T; // Number of time periods
int<lower=1> K_1; // Number of regressors
int<lower=1> K_2;
vector[T] t; // Time
vector[T] cap_1; // Capacities for logistic trend
vector[T] cap_2;
vector[T] y_1; // Time series
vector[T] y_2;
int S_1; // Number of changepoints
int S_2;
vector[S_1] t_change_1; // Times of trend changepoints
vector[S_2] t_change_2;
matrix[T,K_1] X_1; // Regressors
matrix[T,K_2] X_2;
vector[K_1] sigmas_1; // Scale on seasonality prior
vector[K_2+1] sigmas_2;
real<lower=0> tau_1; // Scale on changepoints prior
real<lower=0> tau_2;
int trend_indicator_1; // 0 for linear, 1 for logistic, 2 for flat
int trend_indicator_2;
vector[K_1] s_a_1; // Indicator of additive features
vector[K_2+1] s_a_2;
vector[K_1] s_m_1; // Indicator of multiplicative features
vector[K_2+1] s_m_2;
}
transformed data {
matrix[T, S_1] A_1;
matrix[T, S_2] A_2;
A_1 = get_changepoint_matrix(t, t_change_1, T, S_1);
A_2 = get_changepoint_matrix(t, t_change_2, T, S_2);
}
parameters {
real k_1; // Base trend growth rate
real k_2;
real m_1; // Trend offset
real m_2;
vector[S_1] delta_1; // Trend rate adjustments
vector[S_2] delta_2;
real<lower=0> sigma_obs_1;// Observation noise
real<lower=0> sigma_obs_2;
vector[K_1] beta_1; // Regressor coefficients
vector[K_2+1] beta_2;
}
transformed parameters {
vector[T] trend_1;
vector[T] trend_2;
if (trend_indicator_1 == 0) {
trend_1 = linear_trend(k_1, m_1, delta_1, t, A_1, t_change_1);
} else if (trend_indicator_1 == 1) {
trend_1 = logistic_trend(k_1, m_1, delta_1, t, cap_1, A_1, t_change_1, S_1);
} else if (trend_indicator_1 == 2) {
trend_1 = flat_trend(m_1, T);
}
// same for trend_2
if (trend_indicator_2 == 0) {
trend_2 = linear_trend(k_2, m_2, delta_2, t, A_2, t_change_2);
} else if (trend_indicator_2 == 1) {
trend_2 = logistic_trend(k_2, m_2, delta_2, t, cap_2, A_2, t_change_2, S_2);
} else if (trend_indicator_1 == 2) {
trend_2 = flat_trend(m_2, T);
}
}
model {
//priors
k_1 ~ normal(0, 5);
m_1 ~ normal(0, 5);
delta_1 ~ double_exponential(0, tau_1);
sigma_obs_1 ~ normal(0, 0.5);
beta_1 ~ normal(0, sigmas_1);
k_2 ~ normal(0, 5);
m_2 ~ normal(0, 5);
delta_2 ~ double_exponential(0, tau_2);
sigma_obs_2 ~ normal(0, 0.5);
beta_2 ~ normal(0, sigmas_2);
// Likelihood
y_1 ~ normal(
trend_1
.* (1 + X_1 * (beta_1 .* s_m_1))
+ X_1 * (beta_1 .* s_a_1),
sigma_obs_1
);
y_2 ~ normal(
trend_2
.* (1 + append_col(X_2,y_1) * (beta_2 .* s_m_2))
+ append_col(X_2,y_1) * (beta_2 .* s_a_2),
sigma_obs_2
);
}
This model will compile and fit returning the parameter estimates for both the
y_1
and the y_2
timeseries. But it won’t actually make the forecast for
either of them. Instead Prophet does this bit in Python or R rather than in
Stan.
We’ve gone completely off piste with the Stan model here so the python/R code that Prophet wraps around Stan won’t work.
You could take the approach Prophet
uses
and do the sampling in Python (or
R)
but I think it is simpler to keep everything in Stan by using a generated quantities
block. Prophet don’t take this approach by default because of a bug
in Rstan which causes it to take a very long time to load predictions for large
models back into R. I don’t fit particularly large models and I know Stan a lot
better than I know Python so the generated quantities
approach is better for
me.
This means extra variables must be provided to Stan:
X_pred
containing the future values of regressorscap_pred
for the future capacities (you can miss this out if you aren’t
using a logistic trend in Stan)S_pred
is an upper bound on the number of future changepoints. If you have
spaced changepoints evenly though the training data then you can do the same
here.T_pred
to tell the model how many predictions you want to make.n_samp
is the number of samples to generate.Generating the output breaks down into two parts:
y_hat
I’m not sure why the original Prophet code doesn’t just do the sampling and then
calculate y_hat
from there. But I’ll follow the same approach here.
As with most things in Stan there is a lot of declaration before you get to the
meat of the generated quantities
block:
generated quantities {
vector[T_pred] y_hat_2;
vector[T_pred] trend_hat_2;
matrix[T_pred, S_2] A_pred_2;
matrix[T_pred, n_samp] trend_samples_2;
matrix[T_pred, n_samp] y_pred_2;
vector[S_1 + S_pred_2] t_change_sim_2;
vector[S_1 + S_pred_2] delta_sim_2;
real lambda_2;
matrix[T_pred, S_2 + S_pred_2] A_sim_2;
vector[T_pred] y_hat_1;
vector[T_pred] trend_hat_1;
matrix[T_pred, S_1] A_pred_1;
matrix[T_pred, n_samp] trend_samples_1;
matrix[T_pred, n_samp] y_pred_1;
vector[S_1 + S_pred_1] t_change_sim_1;
vector[S_1 + S_pred_1] delta_sim_1;
real lambda_1;
matrix[T_pred, S_1 + S_pred_1] A_sim_1;
if(T_pred > 0) { // need to make predictions
A_pred_1 = get_changepoint_matrix(t_pred, t_change_1, T_pred, S_1);
trend_hat_1 = get_trend(
k_1, offset_1, delta_1, t_pred, cap_pred_1, A_pred_1,
t_change_1, S_1, trend_indicator_1, T_pred
);
// calculate the y_hat
y_hat_1 = trend_hat_1 .* (1 + X_pred_1 * (beta_1 .* s_m_1))
+ X_pred_1 * (beta_1 .* s_a_1);
// bookeeping for the simulation
for (i in 1:S_1) {
t_change_sim_1[i] = t_change_1[i];
delta_sim_1[i] = delta_1[i];
}
lambda_1 = mean(fabs(delta_1)) + 1e-8;
A_pred_2 = get_changepoint_matrix(t_pred, t_change_2, T_pred, S_2);
trend_hat_2 = get_trend(
k_2, offset_2, delta_2, t_pred, cap_pred_2, A_pred_2,
t_change_2, S_2, trend_indicator_2, T_pred
);
y_hat_2 = trend_hat_2 .* (1 + append_col(X_pred_2,y_hat_1) * (beta_2 .* s_m_2))
+ append_col(X_pred_2,y_hat_1) * (beta_2 .* s_a_2);
for (i in 1:S_2) {
t_change_sim_2[i] = t_change_2[i];
delta_sim_2[i] = delta_2[i];
}
lambda_2 = mean(fabs(delta_2)) + 1e-8;
// Now do the sampling
for (i in 1:n_samp) {
if (S_pred_1 > 0) {
//Sample new changepoints from a Poisson process with rate S
//Sample changepoint deltas from Laplace(lambda)
t_change_sim_1[S_1 + 1] = 1 + exponential_rng(S_1);
for (j in (S_1 + 2):(S_1 + S_pred_1)) {
t_change_sim_1[j] = t_change_sim_1[j - 1] + exponential_rng(S_1);
}
for (j in (S_1 + 1): (S_1 + S_pred_1)) {
delta_sim_1[j] = double_exponential_rng(0, lambda_1);
}
}
// Compute trend with these changepoints
A_sim_1 = get_changepoint_matrix(t_pred, t_change_sim_1, T_pred, S_1 + S_pred_1);
trend_samples_1[:, i] = get_trend(
k_1, offset_1, delta_sim_1, t_pred, cap_pred_1, A_sim_1,
t_change_sim_1, S_1 + S_pred_1,
trend_indicator_1, T_pred
);
y_pred_1[:,i] = trend_samples_1[:, i] .* (1 + X_pred_1 * (beta_1 .* s_m_1))
+ X_pred_1 * (beta_1 .* s_a_1);
if (S_pred_2 > 0) {
//Sample new changepoints from a Poisson process with rate S
//Sample changepoint deltas from Laplace(lambda)
t_change_sim_2[S_2 + 1] = 1 + exponential_rng(S_2);
for (j in (S_2 + 2):(S_2 + S_pred_2)) {
t_change_sim_2[j] = t_change_sim_2[j - 1] + exponential_rng(S_2);
}
for (j in (S_2 + 1): (S_2 + S_pred_2)) {
delta_sim_2[j] = double_exponential_rng(0, lambda_2);
}
}
// Compute trend with these changepoints
A_sim_2 = get_changepoint_matrix(t_pred, t_change_sim_2, T_pred, S_2 + S_pred_2);
trend_samples_2[:, i] = get_trend(
k_2, offset_2, delta_sim_2, t_pred, cap_pred_2, A_sim_2,
t_change_sim_2, S_2 + S_pred_2,
trend_indicator_2, T_pred
);
y_pred_2[:,i] = trend_samples_2[:, i] .* (1 + append_col(X_pred_2,y_pred_1[:,i]) * (beta_2 .* s_m_2))
+ append_col(X_pred_2,y_pred_1[:,i]) * (beta_2 .* s_a_2);
}
}
}
This block will generate estimates for y_hat_1
and y_hat_2
and return
samples as y_pred_1
and y_pred_2
.
You can download the full Stan file here.
This method allows you to use the forecast for one timeseries and a regressor
for the forecast of another. Next time I will show you how to model functional
constraints; for example y_3 = y_1 + y_2
.
If you've read this far and understood everything then you might not be in the target audience for the Forecast Forge Google Sheets Addon.
However, you probably have collegues who are! I would like to talk to them so please consider hooking me up with an introduction (fergie@forecastforge.com).
And feel free to email me with any other questions/comments you may have; I'd love to hear from you.