How to predict something along with dates in python? - python-3.x

I have time series data , the two columns are traffic density and date. I wish to predict the density for next 7 days.
I am using arime time series forecasting. I am able to forecast density but I want to forecast density with time. How can it be done?

GO with RNN(LSTM) or FBProphet
Here's a good piece of work for FBProphet:
https://towardsdatascience.com/a-quick-start-of-time-series-forecasting-with-a-practical-example-using-fb-prophet-31c4447a2274
Here's a good piece of work for LSTM:
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
However you can also look into ARIMA Variants.

Related

How to use excel data to find period

I have three Excel columns of data from an experiment with a pendulum: time, angle displacement, and angular velocity. I was wondering if there is a way in Excel to calculate and then graph the period (and, if possible, display the function for the graph)... I realize it's kinda a dumb question. I'm still new at Excel.
Thanks for any pointers u can give!
In case the Analysis ToolPak is installed, one can use Tools->Data Analysis->Fourier Analysis. If the data is a superposition of harmonic functions (sin,cos), the corresponding frequencies (or inverse periods) will appear as peaks in the Fourier analysis.

Statsmodels seasonal decomposition - Trend not a straight line

This query refers to decomposition of classic Airline passengers data into Trend, Seasonal and Residual. We expect linear trend to be a straight line. However, the result is not so. I wonder what is the logic behind extraction of Trend. Can you please throw some light?
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(airline['Thousands of Passengers'], model='additive')
result.plot();
Two things to clarify:
1) Not all trends are linear
2) Even linear trends can be subject to some variation depending on the time series in question.
For instance, let's consider the trend for maximum air temperature in Dublin, Ireland over a number of years (modelled using statsmodels):
In this example, you can see that the trend both ascends and descends - given that air temperature is subject to changing seasons we would expect this.
In terms of the airline dataset, we can see that the trend is being observed over a number of years. Even when the observed, seasonal and residual components have been extracted, the trend itself will be subject to shifts over time.

I have a .CSV file that contains dates and gms value for those dates. Is it possible to apply Linear Regression to this?

Consider i have a .csv file that has two attributes that are dates and gms\revenue value for that date. Is it possible to apply Linear regression to predict the gms value for a particular date or does this come under time series regression analysis?I'm new to machine learning so any help would be appreciated.Thank you. this is the csv file and it has around 1800 records. Dates are continuous.
Time series models will be able to find patterns in the revenue over time.
Regression models will predict the revenue given an input set of variables.
Since you have time as the only predictor variable, it would come under time-series analysis. You can still try to solve it using regression, but your prediction will just be a line (or any other polynomial curve) incresing or decreasing with time. It will not be able to capture the seasonality and lag-dependent trends which are common in time-series data.

Convert GMM-UBM scores to equicalent accuracy percent

I have constructed a GMM-UBM model for the speaker recognition purpose. The output of models adapted for each speaker some scores calculated by log likelihood ratio. Now I want to convert these likelihood scores to equivalent number between 0 and 100. Can anybody guide me please?
There is no straightforward formula. You can do simple things like
prob = exp(logratio_score)
but those might not reflect the true distribution of your data. The computed probability percentage of your samples will not be uniformly distributed.
Ideally you need to take a large dataset and collect statistics on what acceptance/rejection rate do you have for what score. Then once you build a histogram you can normalize the score difference by that spectrogram to make sure that 30% of your subjects are accepted if you see the certain score difference. That normalization will allow you to create uniformly distributed probability percentages. See for example How to calculate the confidence intervals for likelihood ratios from a 2x2 table in the presence of cells with zeroes
This problem is rarely solved in speaker identification systems because confidence intervals is not what you want actually want to display. You need a simple accept/reject decision and for that you need to know the amount of false rejects and accept rate. So it is enough to find just a threshold, not build the whole distribution.

From one histogram, create a new histogram from just a mean or median?

Suppose I have a list of values that I can histogram and calculate descriptive statistics on such as mean, average, max, standard deviation, etc. Perhaps this histogram is bimodal or right skewed. Let’s call this group of data “DataSet1”.
Suppose I had just a mean or median of another set of data. Lets call that DataSet2. I do not have all the raw data for DataSet2, just the median or mean. There is a strong belief that DataSet1 and DataSet2 would show the same variability in values.
If I knew just a single value of either mean or median, can I apply the description statistics from DataSet1 to create a new histogram that mirrors the bimodal or right skewed behavior from DataSet1?
Thanks
Dan
Alternative intent:
I have 3 years of historical data, where the data definitely has a "day of week" trend to it. I am using a python api to apply seasonal ARIMA to forecast the next 7 days from the 3 years of historical data. The predicted value is great, but it is only 1 value. I would like to use that predicted value as the "mean" and create a histogram from the variability of values shown to exists historically by day of week.
so, today is thursday. Lets say i predict tomorrow to have a value of 78.6.
I want to sample potential values of tomorrow based upon a mean of 78.6 but with variability similar to that showed to exist on all historical fridays
If i look at historical fridays, perhaps it shows a skewed to the left behavior
so when i sample with a mean of 78.6, if i sampled 100 times, the values sampled, if plotted in a histogram, would also skew to the left
Hope that helps..

Resources