I've fit a SARIMAX model using statsmodels as follows
mod = sm.tsa.statespace.SARIMAX(ratingCountsRSint,order=(2,0,0),seasonal_order=(1,0,0,52),enforce_stationarity=False,enforce_invertibility=False, freq='W')
results = mod.fit()
print(results.summary().tables[1])
In the results table I have a coefficient ar.S.L52 that shows as 0.0163. When I try to retrieve the coefficient using
seasonalAR=results.polynomial_seasonal_ar[52]
I get -0.0163. I'm wondering why the sign has turned around. The same thing happens with polynomial_ar. In the documentation it says that polynomial_seasonal_ar gives the "array containing seasonal autoregressive lag polynomial coefficients". I would have guessed that I should get exactly the same as in the summary table. Could someone clarify how that comes about and whether the actual coefficient of the lag is positive or negative?
I'll use an AR(1) model as an example, but the same principle applies to a seasonal model.
We typically write the AR(1) model as:
y_t = \phi_1 y_{t-1} + \varespilon_t
The parameter estimated by Statsmodels is \phi_1, and that is what is presented in the summary table.
When writing the AR(1) model in lag-polynomial form, we usually write it like:
\phi(L) y_t = \varepsilon_t
where \phi(L) = 1 - \phi L, and L is the lag operator. The coefficients of this lag polynomial are (1, -\phi). These coefficients are what are presented in the polynomial attributes in the results object.
Related
I've been trying to understand how does a model trained with support vector machines for regression predict values. I have trained a model with the sklearn.svm.SVR, and now I'm wondering how to "manually" predict the outcome of an input.
Some background - the model is trained with kernel SVR, with RBF function and uses the dual formulation. So now I have arrays of the dual coefficients, the indexes of the support vectors, and the support vectors themselves.
I found the function which is used to fit the hyperplane but I've been unsuccessful in applying that to "manually" predict outcomes without the function .predict.
The few things I tried all include the dot products of the input (features) array, and all the support vectors.
If anyone ever needs this, I've managed to understand the equation and code it in python.
The following is the used equation for the dual formulation:
where N is the number of observations, and αi multiplied by yi are the dual coefficients found from the model's attributed model.dual_coef_. The xiT are some of the observations used for training (support vectors) accessed by the attribute model.support_vectors_ (transposed to allow multiplication of the two matrices), x is the input vector containing a value for each feature (its the one observation for which we want to get prediction), and b is the intercept accessed by model.intercept_.
The xiT and x, however, are the observations transformed in a higher-dimensional space, as explained by mery in this post.
The calculation of the transformation by RBF can be either applied manually step by stem or by using the sklearn.metrics.pairwise.rbf_kernel.
With the latter, the code would look like this (my case shows I have 589 support vectors, and 40 features).
First we access the coefficients and vectors:
support_vectors = model.support_vectors_
dual_coefs = model.dual_coef_[0]
Then:
pred = (np.matmul(dual_coefs.reshape(1,589),
rbf_kernel(support_vectors.reshape(589,40),
Y=input_array.reshape(1,40),
gamma=model.get_params()['gamma']
)
)
+ model.intercept_
)
If the RBF funcion needs to be applied manually, step by step, then:
vrbf = support_vectors.reshape(589,40) - input_array.reshape(1,40)
pred = (np.matmul(dual_coefs.reshape(1,589),
np.diag(np.exp(-model.get_params()['gamma'] *
np.matmul(vrbf, vrbf.T)
)
).reshape(589,1)
)
+ model.intercept_
)
I placed the .reshape() function even where it is not necessary, just to emphasize the shapes for the matrix operations.
These both give the same results as model.predict(input_array)
I am doing a project on multiclass semantic segmentation. I have formulated a model that outputs pretty descent segmented images by decreasing the loss value. However, I cannot evaluate the model performance in metrics, such as meanIoU or Dice coefficient.
In case of binary semantic segmentation it was easy just to set the threshold of 0.5, to classify the outputs as an object or background, but it does not work in the case of multiclass semantic segmentation. Could you please tell me how to obtain model performance on the aforementioned metrics? Any help will be highly appreciated!
By the way, I am using PyTorch framework and CamVid dataset.
If anyone is interested in this answer, please also look at this issue. The author of the issue points out that mIoU can be computed in a different way (and that method is more accepted in literature). So, consider that before using the implementation for any formal publication.
Basically, the other method suggested by the issue-poster is to separately accumulate the intersections and unions over the entire dataset and divide them at the final step. The method in the below original answer computes intersection and union for a batch of images, then divides them to get IoU for the current batch, and then takes a mean of the IoUs over the entire dataset.
However, this below given original method is problematic because the final mean IoU would vary with the batch-size. On the other hand, the mIoU would not vary with the batch size for the method mentioned in the issue as the separate accumulation would ensure that batch size is irrelevant (though higher batch size can definitely help speed up the evaluation).
Original answer:
Given below is an implementation of mean IoU (Intersection over Union) in PyTorch.
def mIOU(label, pred, num_classes=19):
pred = F.softmax(pred, dim=1)
pred = torch.argmax(pred, dim=1).squeeze(1)
iou_list = list()
present_iou_list = list()
pred = pred.view(-1)
label = label.view(-1)
# Note: Following for loop goes from 0 to (num_classes-1)
# and ignore_index is num_classes, thus ignore_index is
# not considered in computation of IoU.
for sem_class in range(num_classes):
pred_inds = (pred == sem_class)
target_inds = (label == sem_class)
if target_inds.long().sum().item() == 0:
iou_now = float('nan')
else:
intersection_now = (pred_inds[target_inds]).long().sum().item()
union_now = pred_inds.long().sum().item() + target_inds.long().sum().item() - intersection_now
iou_now = float(intersection_now) / float(union_now)
present_iou_list.append(iou_now)
iou_list.append(iou_now)
return np.mean(present_iou_list)
Prediction of your model will be in one-hot form, so first take softmax (if your model doesn't already) followed by argmax to get the index with the highest probability at each pixel. Then, we calculate IoU for each class (and take the mean over it at the end).
We can reshape both the prediction and the label as 1-D vectors (I read that it makes the computation faster). For each class, we first identify the indices of that class using pred_inds = (pred == sem_class) and target_inds = (label == sem_class). The resulting pred_inds and target_inds will have 1 at pixels labelled as that particular class while 0 for any other class.
Then, there is a possibility that the target does not contain that particular class at all. This will make that class's IoU calculation invalid as it is not present in the target. So, you assign such classes a NaN IoU (so you can identify them later) and not involve them in the calculation of the mean.
If the particular class is present in the target, then pred_inds[target_inds] will give a vector of 1s and 0s where indices with 1 are those where prediction and target are equal and zero otherwise. Taking the sum of all elements of this will give us the intersection.
If we add all the elements of pred_inds and target_inds, we'll get the union + intersection of pixels of that particular class. So, we subtract the already calculated intersection to get the union. Then, we can divide the intersection and union to get the IoU of that particular class and add it to a list of valid IoUs.
At the end, you take the mean of the entire list to get the mIoU. If you want the Dice Coefficient, you can calculate it in a similar fashion.
I'm using statsmodel to do simple and multiple linear regression and I'm getting bad R^2 values from the summary. The coefficients look to be calculated correctly, but I get an R^2 of 1.000 which is impossible for my data. I graphed it in excel and I should be getting around 0.93, not 1.
I'm using a mask to filter data to send into the model and I'm wondering if that could be the issue, but to me the data looks fine. I am fairly new to python and statsmodel so maybe I'm missing something here.
import statsmodels.api as sm
for i, df in enumerate(fallwy_xy): # Iterate through list of dataframes
if len(df.index) > 0: # Check if frame is empty or not
mask3 = (df['fnu'] >= low) # Mask data below 'low' variable
valid3 = df[mask3]
if len(valid3) > 0: # Check if there is data in range of mask3
X = valid3[['logfnu', 'logdischarge']]
y = valid3[['logssc']]
estm = sm.OLS(y, X).fit()
X = valid3[['logfnu']]
y = valid3[['logssc']]
ests = sm.OLS(y, X).fit()
I finally found out what was going on. Statsmodels by default does not incorporate a constant into its OLS regression equation, you have to call it out specifically with
X = sm.add_constant(X)
The reason the constant is so important is because without it, Statsmodels calculates R-squared differently, uncentered to be exact. If you do add a constant then the R-squared gets calculated the way most people calculate R-squared which is the centered version. Excel does not change the way it calculates R-squared when given a constant or not which is why when Statsmodels reported it's R-squared with no constant it as so different from Excel. The OLS Regression summary from Statsmodels actually points out the calculation method if it uses the uncentered no-constant, calculation by showing R-squared (uncentered): where the R-squared shows up in the summary table. The below links helped me figure this out.
add hasconstant indicator for R-squared and df calculations
Same model coeffs, different R^2 with statsmodels OLS and sci-kit learn linearregression
Warning : Rod Made a Mistake!
In WinBUGS, I am specifying a model with a multinomial likelihood function, and I need to make sure that the multinomial probabilities are all between 0 and 1 and sum to 1.
Here is the part of the code specifying the likelihood:
e[k,i,1:9] ~ dmulti(P[k,i,1:9],n[i,k])
Here, the array P[] specifies the probabilities for the multinomial distribution.
These probabilities are to be estimated from my data (the matrix e[]) using multiple linear regressions on a series of fixed and random effects. For instance, here is the multiple linear regression used to predict one of the elements of P[]:
P[k,1,2] <- intercept[1,2] + Slope1[1,2]*Covariate1[k] +
Slope2[1,2]*Covariate2[k] + Slope3[1,2]*Covariate3[k]
+ Slope4[1,2]*Covariate4[k] + RandomEffect1[group[k]] +
RandomEffect2[k]
At compiling, the model produces an error:
elements of proportion vector of multinomial e[1,1,1] must be between zero and one
If I understand this correctly, this means that the elements of the vector P[k,i,1:9] (the probability vector in the multinomial likelihood function above) may be very large (or small) numbers. In reality, they all need to be between 0 and 1, and sum to 1.
I am new to WinBUGS, but from reading around it seems that somehow using a beta regression rather than multiple linear regressions might be the way forward. However, although this would allow each element to be between 0 and 1, it doesn't seem to get to the heart of the problem, which is that all the elements of P[k,i,1:9] must be positive and sum to 1.
It may be that the response variable can very simply be transformed to be a proportion. I have tried this by trying to divide each element by the sum of P[k,i,1:9], but so far no success.
Any tips would be very gratefully appreciated!
(I have supplied the problematic sections of the model; the whole thing is fairly long.)
The usual way to do this would be to use the multinomial equivalent of a logit link to constrain the transformed probabilities to the interval (0,1). For example (for a single predictor but it is the same principle for as many predictors as you need):
Response[i, 1:Categories] ~ dmulti(prob[i, 1:Categories], Trials[i])
phi[i,1] <- 1
prob[i,1] <- 1 / sum(phi[i, 1:Categories])
for(c in 2:Categories){
log(phi[i,c]) <- intercept[c] + slope1[c] * Covariate1[i]
prob[i,c] <- phi[i,c] / sum(phi[i, 1:Categories])
}
For identifibility the value of phi[1] is set to 1, but the other values of intercept and slope1 are estimated independently. When the number of Categories is equal to 2, this collapses to the usual logistic regression but coded for a multinomial response:
log(phi[i,2]) <- intercept[2] + slope1[2] * Covariate1[i]
prob[i,2] <- phi[i, 2] / (1 + phi[i, 2])
prob[i,1] <- 1 / (1 + phi[i, 2])
ie:
logit(prob[i,2]) <- intercept[2] + slope1[2] * Covariate1[i]
prob[i,1] <- 1 - prob[i,2]
In this model I have indexed slope1 by the category, meaning that each level of the outcome has an independent relationship with the predictor. If you have an ordinal response and want to assume that the odds ratio associated with the covariate is consistent between successive levels of the response, then you can drop the index on slope1 (and reformulate the code slightly so that phi is cumulative) to get a proportional odds logistic regression (POLR).
Edit
Here is a link to some example code covering logistic regression, multinomial regression and POLR from a course I teach:
http://runjags.sourceforge.net/examples/squirrels.R
Note that it uses JAGS (rather than WinBUGS) but as far as I know there are no differences in model syntax for these types of models. If you want to quickly get started with runjags & JAGS from a WinBUGS background then you could follow this vignette:
http://runjags.sourceforge.net/quickjags.html
I am trying to implement Expectation Maximization algorithm(Gaussian Mixture Model) on a data set data=[[x,y],...]. I am using mv_norm.pdf(data, mean,cov) function to calculate cluster responsibilities. But after calculating new values of covariance (cov matrix) after 6-7 iterations, cov matrix is becoming singular i.e determinant of cov is 0 (very small value) and hence it is giving errors
ValueError: the input matrix must be positive semidefinite
and
raise np.linalg.LinAlgError('singular matrix')
Can someone suggest any solution for this?
#E-step: Compute cluster responsibilities, given cluster parameters
def calculate_cluster_responsibility(data,centroids,cov_m):
pdfmain=[[] for i in range(0,len(data))]
for i in range(0,len(data)):
sum1=0
pdfeach=[[] for m in range(0,len(centroids))]
pdfeach[0]=1/3.*mv_norm.pdf(data[i], mean=centroids[0],cov=[[cov_m[0][0][0],cov_m[0][0][1]],[cov_m[0][1][0],cov_m[0][1][1]]])
pdfeach[1]=1/3.*mv_norm.pdf(data[i], mean=centroids[1],cov=[[cov_m[1][0][0],cov_m[1][0][1]],[cov_m[1][1][0],cov_m[0][1][1]]])
pdfeach[2]=1/3.*mv_norm.pdf(data[i], mean=centroids[2],cov=[[cov_m[2][0][0],cov_m[2][0][1]],[cov_m[2][1][0],cov_m[2][1][1]]])
sum1+=pdfeach[0]+pdfeach[1]+pdfeach[2]
pdfeach[:] = [x / sum1 for x in pdfeach]
pdfmain[i]=pdfeach
global old_pdfmain
if old_pdfmain==pdfmain:
return
old_pdfmain=copy.deepcopy(pdfmain)
softcounts=[sum(i) for i in zip(*pdfmain)]
calculate_cluster_weights(data,centroids,pdfmain,soft counts)
Initially, I've passed [[3,0],[0,3]] for each cluster covariance since expected number of clusters is 3.
Can someone suggest any solution for this?
The problem is your data lies in some manifold of dimension strictly smaller than the input data. In other words for example your data lies on a circle, while you have 3 dimensional data. As a consequence when your method tries to estimate 3 dimensional ellipsoid (covariance matrix) that fits your data - it fails since the optimal one is a 2 dimensional ellipse (third dimension is 0).
How to fix it? You will need some regularization of your covariance estimator. There are many possible solutions, all in M step, not E step, the problem is with computing covariance:
Simple solution, instead of doing something like cov = np.cov(X) add some regularizing term, like cov = np.cov(X) + eps * np.identity(X.shape[1]) with small eps
Use nicer estimator like LedoitWolf estimator from scikit-learn.
Initially, I've passed [[3,0],[0,3]] for each cluster covariance since expected number of clusters is 3.
This makes no sense, covariance matrix values has nothing to do with amount of clusters. You can initialize it with anything more or less resonable.