Recently I attempted to answer a fairly common question: Given this historical data, when should we expect to reach the next milestone?
One of the companies I support has a product impacting a lot of people. Each day this product reaches roughly 100,000 new users. I was given the past two years of data and asked "when will we reach 300 million users?" My instinct was to answer the question using a spreadsheet. So I started with Excel:
I could have stopped here. This guess was likely sufficient. However, I wanted to try answering the same question with a machine learning model, for a couple of reasons:
I spent some time looking around for an approach that would fit this type of data & question. I landed on Prophet, a tool built for forecasting time series data where seasonal effects (yearly, weekly, daily, holidays, etc.) are factored into the trends and predictions. Perfect.
Prophet models can be built in either Python or R. I opted to give R a shot (I have always wanted a reason to try R). So I followed the Prophet R API Documentation:
From this point on I told Prophet what to forecast:
future <- make_future_dataframe(m, periods = 1461)
I decided to predict 1461 days, or 4 years, into the future. We should have definitely hit 300 million users by then.
After a few seconds, Prophet apparently ran over a thousand different potential outcomes and chose their best forecast (fascinating!). Then I asked R for a plot of the forecast.
plot(m, forecast)
The dark blue is Prophet's best guess at total number of users impacted over time. The light blue represents the range of possibilities, taking any possible outcome into account. The chart is somewhat difficult to read but it appears they are predicting 300 million users near the end of 2024. I run this command to see the raw forecast:
(forecast[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])
So, according to this forecast model, they are predicting we reach 300 million users on December 24, 2024. Interesting! This tells me the model must have noticed a slight downward trend over the past two years. I assume they are projecting this trend to continue into the future. Prophet can show us the seasonal components they picked up in the data with this command:
prophet_plot_components(m, forecast)
They are definitely observing differences across the months and days of the week, in addition to the overall trend. At this point, a data scientist can fine tune the prediction. However, for my purposes, this "out of the box" answer was sufficient.
So, back to answer the original question: