I am comparing two models, one with exponential smoothing and one with ARIMA.
For this specific assignment, it's enough that I compare the MSE of the two models.
So how do I compute the MSE of the ARIMA procedure?
This is the last assignment on this grueling course, help would be greatly appreciated!
proc arima does not specifically output the MSE, but proc model does. You can recreate the ARIMA model using proc model and the %AR and %MA macros.
proc model data=have;
endo y;
id date;
y = mu;
%AR(AR, 1, y, m=ML);
%MA(MA, 1, y, m=ML);
fit y;
run;
This specifies an ML-estimated ARMA(1,0,1) model with an intercept, mu.
proc model will then output the MSE of your model. Note that %MA must come after %AR, and both the %AR and %MA macros must come after the equation.
If you need a more complicated lag structure, you can specify additional options in either macro:
%AR(AR, 3, y, 1 3, M=ML)
This creates ML-estimated AR variables of order 3 whose variable prefix is AR using a subset of lags 1 and 3.
Here is an example of output using sashelp.air with the %AR macro:
Note that any differencing must be performed in a data step before entering proc model.
Related
I've been trying to understand how does a model trained with support vector machines for regression predict values. I have trained a model with the sklearn.svm.SVR, and now I'm wondering how to "manually" predict the outcome of an input.
Some background - the model is trained with kernel SVR, with RBF function and uses the dual formulation. So now I have arrays of the dual coefficients, the indexes of the support vectors, and the support vectors themselves.
I found the function which is used to fit the hyperplane but I've been unsuccessful in applying that to "manually" predict outcomes without the function .predict.
The few things I tried all include the dot products of the input (features) array, and all the support vectors.
If anyone ever needs this, I've managed to understand the equation and code it in python.
The following is the used equation for the dual formulation:
where N is the number of observations, and αi multiplied by yi are the dual coefficients found from the model's attributed model.dual_coef_. The xiT are some of the observations used for training (support vectors) accessed by the attribute model.support_vectors_ (transposed to allow multiplication of the two matrices), x is the input vector containing a value for each feature (its the one observation for which we want to get prediction), and b is the intercept accessed by model.intercept_.
The xiT and x, however, are the observations transformed in a higher-dimensional space, as explained by mery in this post.
The calculation of the transformation by RBF can be either applied manually step by stem or by using the sklearn.metrics.pairwise.rbf_kernel.
With the latter, the code would look like this (my case shows I have 589 support vectors, and 40 features).
First we access the coefficients and vectors:
support_vectors = model.support_vectors_
dual_coefs = model.dual_coef_[0]
Then:
pred = (np.matmul(dual_coefs.reshape(1,589),
rbf_kernel(support_vectors.reshape(589,40),
Y=input_array.reshape(1,40),
gamma=model.get_params()['gamma']
)
)
+ model.intercept_
)
If the RBF funcion needs to be applied manually, step by step, then:
vrbf = support_vectors.reshape(589,40) - input_array.reshape(1,40)
pred = (np.matmul(dual_coefs.reshape(1,589),
np.diag(np.exp(-model.get_params()['gamma'] *
np.matmul(vrbf, vrbf.T)
)
).reshape(589,1)
)
+ model.intercept_
)
I placed the .reshape() function even where it is not necessary, just to emphasize the shapes for the matrix operations.
These both give the same results as model.predict(input_array)
I used log-transformed data (dependent varibale=count) in my generalised additive model (using mgcv) and tried to plot the response by using "trans=plogis" as for logistic GAMs but the results don't seem right. Am I forgetting something here? When I used linear models for my data first, I plotted the least-square means. Any idea how I could plot the output of my GAMs in a more interpretable way other than on the log scale?
Cheers
Are you running a logistic regression for count data? Logistic regression is normally a binary variable or a proportion of binary outcomes.
That being said, the real question here is that you want to backtransform a variable that was fit on the log scale back to the original scale for plotting. That can be easily done using the itsadug package. I've simulated some silly data here just to show the code required.
With itsadug, you can visually inspect many aspects of GAM models. I'd encourage you to look at this: https://cran.r-project.org/web/packages/itsadug/vignettes/inspect.html
The transform argument of plot_smooth() can also be used with custom functions written in R. This can be useful if you have both centred and logged a dependent variable.
library(mgcv)
library(itsadug)
# Setting seed so it's reproducible
set.seed(123)
# Generating 50 samples from a uniform distribution
x <- runif(50, min = 20, max = 50)
# Taking the sin of x to create a dependent variable
y <- sin(x)
# Binding them to a dataframe
d <- data.frame(x, y)
# Logging the dependent variable after adding a constant to prevent negative values
d$log_y <- log(d$y + 1)
# Fitting a GAM to the transformed dependent variable
model_fit <- gam(log_y ~ s(x),
data = d)
# Using the plot_smooth function from itsadug to backtransform to original y scale
plot_smooth(model_fit,
view = "x",
transform = exp)
You can specify the trans function for back-transforming as :trans = function(x){exp(coef(gam)[1]+x)}, where gam is your fitted model, and coef(gam)[1] is the intercept.
I'm training a CNN for image classification. The same object (with the same label then) is present in the test set twice (like two view-point). I'd like to take advantage of this when predicting the class.
Right now the final layer is a Linear layer (PyTorch) and I'm using cross-entropy as loss function. I was wondering what is the best way to take the most confident prediction for each object. Should I first compute the LogSoftMax and take the class with the highest probability (among both arrays of predictions), or should I take the logits directly?
Since LogSoftMax preserves order, the largest logit will always correspond to the highest confidence. Therefore there's no need to perform the operation if all you're interested in is finding the index of most confident class.
Probably the easiest way to get the index of the most confident class is by using torch.argmax.
e.g.
batch_size = 5
num_logits = 10
y = torch.randn(batch_size, num_logits)
preds = torch.argmax(y, dim=1)
which in this case results in
>>> print(preds)
tensor([9, 7, 2, 4, 6])
I've fit a SARIMAX model using statsmodels as follows
mod = sm.tsa.statespace.SARIMAX(ratingCountsRSint,order=(2,0,0),seasonal_order=(1,0,0,52),enforce_stationarity=False,enforce_invertibility=False, freq='W')
results = mod.fit()
print(results.summary().tables[1])
In the results table I have a coefficient ar.S.L52 that shows as 0.0163. When I try to retrieve the coefficient using
seasonalAR=results.polynomial_seasonal_ar[52]
I get -0.0163. I'm wondering why the sign has turned around. The same thing happens with polynomial_ar. In the documentation it says that polynomial_seasonal_ar gives the "array containing seasonal autoregressive lag polynomial coefficients". I would have guessed that I should get exactly the same as in the summary table. Could someone clarify how that comes about and whether the actual coefficient of the lag is positive or negative?
I'll use an AR(1) model as an example, but the same principle applies to a seasonal model.
We typically write the AR(1) model as:
y_t = \phi_1 y_{t-1} + \varespilon_t
The parameter estimated by Statsmodels is \phi_1, and that is what is presented in the summary table.
When writing the AR(1) model in lag-polynomial form, we usually write it like:
\phi(L) y_t = \varepsilon_t
where \phi(L) = 1 - \phi L, and L is the lag operator. The coefficients of this lag polynomial are (1, -\phi). These coefficients are what are presented in the polynomial attributes in the results object.
l would like to average the scores of two different SVMs trained on different samples but same classes
# Data have the smae label x_1[1] has y_1[1] and x_2[1] has y_2[1]
# Where y_2[1] == y_1[1]
Dataset_1=(x_1,y)
Dataset_2=(x_2,y)
test_data=(test_sample,test_labels)
We have 50 classes. Same classes for dataset_1 and dataset_2 :
list(set(y_1))=list(set(y_2))
What l have tried :
from sklearn.svm import SVC
clf_1 = SVC(kernel='linear', random_state=42).fit(x_1, y)
clf_2 = SVC(kernel='linear', random_state=42).fit(x_2, y)
How to average clf_1 and clf_2 scores before doing :
predict(test_sample)
?
What l would like to do ?
Not sure I understand your question; to simply average the scores as in a typical ensemble, you should first get prediction probabilities from each model separately, and then just take their average:
pred1 = clf_1.predict_proba(test_sample)
pred2 = clf_2.predict_proba(test_sample)
pred = (pred1 + pred2)/2
In order to get prediction probabilities instead of hard classes, you should initialize the SVC using the additional argument probability=True.
Each row of pred will be an array of length 50, as many as your classes, with each element representing the probability that the sample belongs to the respective class.
After averaging, simply take the argmax of pred - just be sure that the order of the returned probabilities is OK; according to the docs:
The columns correspond to the classes in sorted order, as they appear in the attribute classes_
As I am not exactly sure what this means, run some checks with predictions on your training set, to be sure that the order is correct.