I need some help with my school assignment using excel.
How can I prepare a dataset in 10 steps or less so that time series forecasting of the overall performance for each employee in the next 4 quarters can be carried out.
I tried and I can't think of other ways.
Related
ey guys I need your help. I want to predict rice production in India using a simple regression. For this I have a dataset with the yield and production data for the last 40 years. As explanatory variables I have the daily data on rainfall, temperature etc. Now to my problem. Obviously the number of observations of the y-variable (40) does not match the number of observations of the x-variable (about 15,000). Thus a regression is not feasible. What is the best way to proceed?
Average the weather data over the year and thus estimate the y-variable, i.e. a kind of undersampling of the x-variable. Of course, this means that important data such as outliers are lost.
Add the annual production values for each weather entry in the associated year. This would give us the same y value 365 times. Doesn't sound reasonable to me either.
What other ideas do you guys have? If interested, I'll be happy to attach the datasets as well.
I'm a beginner and trying to use Azure Machine Learning Studio to run a forecasting task automatically, but I'm very confused about how to determine the parameter - Forecast Horizon.
I have found some explanations on the official website.
forecast horizon: Indicate how many time units (minutes/hours/days/weeks/months/years) will the model be able to predict to the future. The further the model is required to predict into the future, the less accurate it becomes.
My data increments each hour and I want to predict the values of the next 24 hours one time (namely, multistep-ahead forecasting). Should I set Forecast Horizon to be 24?
In addition, I have run the forecasting experiments twice with the same settings except for Forecast Horizon, in which one is set to be 1 and the other is 24. I expected the predictions are more accurate with Forecast Horizon = 1 (can I understand it with one-step forecasting?), but it's worse, therefore I doubt my understanding of Forecast Horizon is wrong.
Here are images of my prediction results with different forecast horizons:
The test set includes all data of a year (24*365 points).
Starting with Forecasting tasks, it require the time_column_name and forecast_horizon parameters to configure your experiment.
Forecast Horizon is the number of periods forward you want to forecast. The horizon is measured in time series frequency units. The forecaster should predict out units based on the time period of your training data, such as monthly or weekly.
In your case you can set your Frequency to hourly forecast and Forecast horizon to 24.
REFERENCES:
Forecast demand with no-code automated machine learning
Im trying to build an excel sheet that calculates synthetic options prices and greeks for time series data to model intraday options pricing, input is simply intraday price data, say Tick level to 5 minute interval. I found this https://www.thebiccountant.com/2021/12/28/black-scholes-option-pricing-with-power-query-in-power-bi/ which provides for powerBI and Black Scholes but possibly not very accurately. I prefer the Binomial method (I have used this excellent tutuorial to build a manual version for a large number of strikes but it takes a long time to calculate and is very very complex and also inaccurate due to not being able to calculate many steps before topping excel out: https://www.macroption.com/binomial-option-pricing-excel/).
Does anyone have any idea if this is possible to create an entire column in Power Query that will calculate bionomially derived options pricing using >100 even up to 1000 steps? The reason is intraday pricing using high resolution data 5min, 1min, Seconds and Tick I think needs a large number of steps to properly converge. This is just about doing a good enough model that can be used for visualising the progress of a trade on a given day.
Any pointers on how this could be done and calculated using M Language would be much appreciated and useful!
I have a question about how to calculate features for future time frames. Consider the below dataset and consider today is: 2019-11-11. I have last 2 years of daily data and below is last 6 rows:
Date, Temperature, Sales
2019-11-06, 25.5, 500000
2019-11-07, 24.2, 550000
2019-11-08, 25.1, 560000
2019-11-09, 22.6, 510000
2019-11-10, 22.3, 520000
2019-11-11, 24.4, 535000
Now I have to predict Sales for 2019-11-12, 2019-11-13, 2019-11-14. In order to predict sales for those dates, I have to provide below test data to the machine learning trained model:
Date, Temperature
2019-11-12, temperatureX
2019-11-13, temperatureY
2019-11-14, temperatureZ
What will be values for temperatureX, temperatureY and temperatureZ since these values will be coming from future as well?
There are different solutions.
I suggest you start with Time-series Forecasting using Azure AutoML or to dig deeper Auto-train a time-series forecast model
If you need an interpretable model you could train a Linear Model (LM) or an other regression model in R or Python. It might make sense to derive some features from the date such as month, day of month, season or so. This approach has the benefit that you can calculate the confidence interval.
As it is a multivariate time series, also have a look at Vector Auto Regression, see A Multivariate Time Series Guide to Forecasting and Modeling (with Python codes)
If you are just interested in predicting values, you can try a recurrent neural network or LSTM. See GitHub Azure/DeepLearningForTimeSeriesForecasting
Easy answer? You can't predict if you don't have the independent variables that explain your target at prediction time.
That being said, you can usually get a weather forecast for at least a week ahead of any date through a simple web search. So if you don't require a very large max horizon, you can use predicted weather forecasts for your temperature values (x, y, and z). Your retraining period would then become weekly, or however far out you're able to find existing weather forecasts.
Ref: https://datascience.stackexchange.com/questions/27171/what-to-give-as-predictors-to-predict-future-values
I'm wondering if there is a way to automatically select the amount of past data when calculating features.
For example, I might want to predict when a customer is going to make their next purchase, so it would be good to know a count of purchases or average purchase price by different date cutoffs. e.g. Purchases in the last 12 months, last 3 months, 7 days etc.
What is the best way to approach this with featuretools?
You can create a feature matrix thats uses only a certain amount of historical data using the training window parameter in featuretools.dfs. When training window is set, Featuretools will use the historical data between the cutoff time and cutoff_time - training_window. Here's the example from the documentation:
window_fm, window_features = ft.dfs(entityset=es,
target_entity="customers",
cutoff_time=cutoff_times,
cutoff_time_in_index=True,
training_window="1 hour")
When determining which data is valid for use, the training window will check if the time in the time_index column is within the training window.