ey guys I need your help. I want to predict rice production in India using a simple regression. For this I have a dataset with the yield and production data for the last 40 years. As explanatory variables I have the daily data on rainfall, temperature etc. Now to my problem. Obviously the number of observations of the y-variable (40) does not match the number of observations of the x-variable (about 15,000). Thus a regression is not feasible. What is the best way to proceed?
Average the weather data over the year and thus estimate the y-variable, i.e. a kind of undersampling of the x-variable. Of course, this means that important data such as outliers are lost.
Add the annual production values for each weather entry in the associated year. This would give us the same y value 365 times. Doesn't sound reasonable to me either.
What other ideas do you guys have? If interested, I'll be happy to attach the datasets as well.
I've been trying to use FBProphet to train a model with historical hourly data and forecast at an hourly frequency for the next 5 days. I see that the trend is captured somewhat, but the hourly data variation is not at all captured. Could anyone please advise what settings may help capture this variation?
Training and test data with corresponding predicted values are as seen .
How to train LSTM model on multiple time series data where each time series has different timesteps/lags.
Use case: I have daily transactions(WIthdrawals) of 100 ATMS for last 5 years. Need to forecast next withdrawals based on timesteps for each ATM.
How to use embedding(latent representation for each ATM ID) if I pass ATM ID as feature and train on all data at once
How to pass all time series data to LSTM at once, how LSTM will distinguish one time series from another?
I have a question about how to calculate features for future time frames. Consider the below dataset and consider today is: 2019-11-11. I have last 2 years of daily data and below is last 6 rows:
Date, Temperature, Sales
2019-11-06, 25.5, 500000
2019-11-07, 24.2, 550000
2019-11-08, 25.1, 560000
2019-11-09, 22.6, 510000
2019-11-10, 22.3, 520000
2019-11-11, 24.4, 535000
Now I have to predict Sales for 2019-11-12, 2019-11-13, 2019-11-14. In order to predict sales for those dates, I have to provide below test data to the machine learning trained model:
Date, Temperature
2019-11-12, temperatureX
2019-11-13, temperatureY
2019-11-14, temperatureZ
What will be values for temperatureX, temperatureY and temperatureZ since these values will be coming from future as well?
There are different solutions.
I suggest you start with Time-series Forecasting using Azure AutoML or to dig deeper Auto-train a time-series forecast model
If you need an interpretable model you could train a Linear Model (LM) or an other regression model in R or Python. It might make sense to derive some features from the date such as month, day of month, season or so. This approach has the benefit that you can calculate the confidence interval.
As it is a multivariate time series, also have a look at Vector Auto Regression, see A Multivariate Time Series Guide to Forecasting and Modeling (with Python codes)
If you are just interested in predicting values, you can try a recurrent neural network or LSTM. See GitHub Azure/DeepLearningForTimeSeriesForecasting
Easy answer? You can't predict if you don't have the independent variables that explain your target at prediction time.
That being said, you can usually get a weather forecast for at least a week ahead of any date through a simple web search. So if you don't require a very large max horizon, you can use predicted weather forecasts for your temperature values (x, y, and z). Your retraining period would then become weekly, or however far out you're able to find existing weather forecasts.
Ref: https://datascience.stackexchange.com/questions/27171/what-to-give-as-predictors-to-predict-future-values
I have a AR time series model that measures sales in each month going back 10 years using statsmodel. Right now, my model predicts values in existing months (the independent variable) but I'm having difficulty finding a way to build on my model to forecast future sales in the next 2-3 years (2019-2022).
To provide context, here is the code I used to compare my predicted values against existing time periods vs. the actual values under the same time period.
# make predictions
predictions = model_fitted.predict(
start=len(train_data),
end=len(train_data) + len(test_data)-1,
dynamic=False)
# create a comparison dataframe
compare_df = pd.concat(
[df['stationary'].tail(12),
predictions], axis=1).rename(
columns={'stationary': 'actual', 0:'predicted'})
#plot the two values
compare_df.plot()
Below is the output of the code above.
Any tips would be much appreciated!