Beta coefficients in a panel data random effects regression model - statistics

I wanted to get beta coefficients in my panel data random effects regression model in Stata. But then I noticed that the option "beta" is not allowed in the xtreg command.
It made me think if it is probably wrong to want standardised coefficients in a random effects model?
my model looks something like this -
xtreg y x##z, re

You can manually get standardized coefficients by 0-1 standardizing your variables before the command:
foreach v of varlist x y z {
qui sum `v'
replace `v' = (`v'-`r(mean)') / `r(sd)'
xtreg y x##z, re

Related

How does a trained SVR model predict values?

I've been trying to understand how does a model trained with support vector machines for regression predict values. I have trained a model with the sklearn.svm.SVR, and now I'm wondering how to "manually" predict the outcome of an input.
Some background - the model is trained with kernel SVR, with RBF function and uses the dual formulation. So now I have arrays of the dual coefficients, the indexes of the support vectors, and the support vectors themselves.
I found the function which is used to fit the hyperplane but I've been unsuccessful in applying that to "manually" predict outcomes without the function .predict.
The few things I tried all include the dot products of the input (features) array, and all the support vectors.
If anyone ever needs this, I've managed to understand the equation and code it in python.
The following is the used equation for the dual formulation:
where N is the number of observations, and αi multiplied by yi are the dual coefficients found from the model's attributed model.dual_coef_. The xiT are some of the observations used for training (support vectors) accessed by the attribute model.support_vectors_ (transposed to allow multiplication of the two matrices), x is the input vector containing a value for each feature (its the one observation for which we want to get prediction), and b is the intercept accessed by model.intercept_.
The xiT and x, however, are the observations transformed in a higher-dimensional space, as explained by mery in this post.
The calculation of the transformation by RBF can be either applied manually step by stem or by using the sklearn.metrics.pairwise.rbf_kernel.
With the latter, the code would look like this (my case shows I have 589 support vectors, and 40 features).
First we access the coefficients and vectors:
support_vectors = model.support_vectors_
dual_coefs = model.dual_coef_[0]
Then:
pred = (np.matmul(dual_coefs.reshape(1,589),
rbf_kernel(support_vectors.reshape(589,40),
Y=input_array.reshape(1,40),
gamma=model.get_params()['gamma']
)
)
+ model.intercept_
)
If the RBF funcion needs to be applied manually, step by step, then:
vrbf = support_vectors.reshape(589,40) - input_array.reshape(1,40)
pred = (np.matmul(dual_coefs.reshape(1,589),
np.diag(np.exp(-model.get_params()['gamma'] *
np.matmul(vrbf, vrbf.T)
)
).reshape(589,1)
)
+ model.intercept_
)
I placed the .reshape() function even where it is not necessary, just to emphasize the shapes for the matrix operations.
These both give the same results as model.predict(input_array)

Log transformed data in GAM, how to plot response?

I used log-transformed data (dependent varibale=count) in my generalised additive model (using mgcv) and tried to plot the response by using "trans=plogis" as for logistic GAMs but the results don't seem right. Am I forgetting something here? When I used linear models for my data first, I plotted the least-square means. Any idea how I could plot the output of my GAMs in a more interpretable way other than on the log scale?
Cheers
Are you running a logistic regression for count data? Logistic regression is normally a binary variable or a proportion of binary outcomes.
That being said, the real question here is that you want to backtransform a variable that was fit on the log scale back to the original scale for plotting. That can be easily done using the itsadug package. I've simulated some silly data here just to show the code required.
With itsadug, you can visually inspect many aspects of GAM models. I'd encourage you to look at this: https://cran.r-project.org/web/packages/itsadug/vignettes/inspect.html
The transform argument of plot_smooth() can also be used with custom functions written in R. This can be useful if you have both centred and logged a dependent variable.
library(mgcv)
library(itsadug)
# Setting seed so it's reproducible
set.seed(123)
# Generating 50 samples from a uniform distribution
x <- runif(50, min = 20, max = 50)
# Taking the sin of x to create a dependent variable
y <- sin(x)
# Binding them to a dataframe
d <- data.frame(x, y)
# Logging the dependent variable after adding a constant to prevent negative values
d$log_y <- log(d$y + 1)
# Fitting a GAM to the transformed dependent variable
model_fit <- gam(log_y ~ s(x),
data = d)
# Using the plot_smooth function from itsadug to backtransform to original y scale
plot_smooth(model_fit,
view = "x",
transform = exp)
You can specify the trans function for back-transforming as :trans = function(x){exp(coef(gam)[1]+x)}, where gam is your fitted model, and coef(gam)[1] is the intercept.

Why is statsmodels.api producing R^2 of 1.000?

I'm using statsmodel to do simple and multiple linear regression and I'm getting bad R^2 values from the summary. The coefficients look to be calculated correctly, but I get an R^2 of 1.000 which is impossible for my data. I graphed it in excel and I should be getting around 0.93, not 1.
I'm using a mask to filter data to send into the model and I'm wondering if that could be the issue, but to me the data looks fine. I am fairly new to python and statsmodel so maybe I'm missing something here.
import statsmodels.api as sm
for i, df in enumerate(fallwy_xy): # Iterate through list of dataframes
if len(df.index) > 0: # Check if frame is empty or not
mask3 = (df['fnu'] >= low) # Mask data below 'low' variable
valid3 = df[mask3]
if len(valid3) > 0: # Check if there is data in range of mask3
X = valid3[['logfnu', 'logdischarge']]
y = valid3[['logssc']]
estm = sm.OLS(y, X).fit()
X = valid3[['logfnu']]
y = valid3[['logssc']]
ests = sm.OLS(y, X).fit()
I finally found out what was going on. Statsmodels by default does not incorporate a constant into its OLS regression equation, you have to call it out specifically with
X = sm.add_constant(X)
The reason the constant is so important is because without it, Statsmodels calculates R-squared differently, uncentered to be exact. If you do add a constant then the R-squared gets calculated the way most people calculate R-squared which is the centered version. Excel does not change the way it calculates R-squared when given a constant or not which is why when Statsmodels reported it's R-squared with no constant it as so different from Excel. The OLS Regression summary from Statsmodels actually points out the calculation method if it uses the uncentered no-constant, calculation by showing R-squared (uncentered): where the R-squared shows up in the summary table. The below links helped me figure this out.
add hasconstant indicator for R-squared and df calculations
Same model coeffs, different R^2 with statsmodels OLS and sci-kit learn linearregression
Warning : Rod Made a Mistake!

sklearn customized standarization of data

Suppose I have a 2D numpy array:
X = np.array[
[..., ...],
[..., ...]]
And I want to standardize the data either with:
X = StandardScaler().fit_transform(X)
or:
X = (X - X.mean())/X.std()
The results are different. Why are they different?
Assuming X is a feature matrix of shape (n x m) (n instances and m features). We want to scale each feature so its instances are distributed with a mean of zero and with unit variance.
To do this you need to calculate the mean and standard deviation of each feature for the provided instances (column of X) and then calculate the scaled feature vectors. Currently you are calculating the mean and standard deviation of the whole dataset and scaling the data using these values: this will give you meaningless results in all but a few special cases (i.e., X = np.ones((100,2)) is such a special case).
Practically, to calculate these statistics for each feature you will need to set the axis parameter of the .mean() or .std() methods to 0. This will perform the calculations along the columns and return a (1 x m) shaped array (actually a (m,) array, but thats another story), where each value is the mean or standard deviation for the given column. You can then use numpy broadcasting to correctly scale the feature vectors.
The below example shows how you can correctly implement it manually. x1 and x2 are 2 features with 100 training instances. We store them in a feature matrix X.
x1 = np.linspace(0, 100, 100)
x2 = 10 * np.random.normal(size=100)
X = np.c_[x1, x2]
# scale the data using the sklearn implementation
X_scaled = StandardScaler().fit_transform(X)
# scale the data taking mean and std along columns
X_scaled_manual = (X - X.mean(axis=0)) / X.std(axis=0)
If you print the two you will see they match exactly, explicitly:
print(np.sum(X_scaled-X_scaled_manual))
returns 0.0.

Defining new kernels to use in approximate_kernel.py

I am trying to test a new kernel method in Kernel Ridge Regression and want to do this by implementing the Fastfood transformation (https://arxiv.org/abs/1408.3060). I can write a function which computes this transform but it isn't playing nicely with the kernel ridge regression function in sklearn. As a result I have gone to the source code for sklearn kernel ridge regression (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_ridge.py) and approximate_kernel.py (https://insight.io/github.com/scikit-learn/scikit-learn/blob/master/sklearn/kernel_approximation.py) in order to try and define this new kernel as a class definition in approximate_kernel.py. The problem is that I have no idea how to convert my construction to something which will work in the approximate_kernel KernelRidge programs. Would anybody be able to advise how best to do this please?
My construction for the fastfood transform is:
def fastfood_product(d):
'''
Constructs the fastfood matrix composition V = const*S*H*G*Pi*B where
S is a scaling matrix
H is Hadamard transform
G is a diagonal random Gaussian
Pi is a permutation matrix
B is a diagonal Rademacher matrix.
Inputs: n - dimensionality of the feature vectors for the kernel.
must be a power of two and be divisible by d. If not then can
pad the matrix with zeros but for simplicity assume this condition
is always met.
Output: V'''
S = np.zeros(shape=(d,d))
G = np.zeros_like(S)
B = np.zeros_like(S)
H = hadamard(d)
Pi = np.eye(d)
np.random.shuffle(Pi) # Permutation matrix
# Construct the simple matrices
np.fill_diagonal(B, 2*np.random.randint(low=0,high=2,size=(d,1)).flatten() - 1)
np.fill_diagonal(G, np.random.randn(G.shape[0],1)) # May want to change standard normal to arbitrary which will affect the scaling for V
np.fill_diagonal(S, np.linalg.norm(G,'fro')**(-0.5))
#print('Shapes of B {}, S {}, G {}, H{}, Pi {}'.format(B.shape, S.shape, G.shape, H.shape, Pi.shape))
V = d**(-0.5)*S.dot(H).dot(G).dot(Pi).dot(H).dot(B)
return V
def fastfood_feature_map(X, n):
'''Given a matrix X of data compute the fastfood transformation and feature mapping.
Input: X data of dimension d by m, n = the number of nonlinear basis functions to choose (power of 2)
Outputs: Phi - matrix of random features for fastfood kernel approximation.
Usage: Phi must be transposed for computation in the kernel ridge regression.
i.e solve ||Phi.T * w - b || + regulariser
Comments: This only uses a standard normal distribution but this could
be altered with different hyperparameters.'''
d,m = A.shape
V = fastfood_product(d)
Phi = n**(-0.5)*np.exp(1j*np.dot(V, X))
return Phi
I think the imports numpy as np and from linalg import hadamard will be necessary for the above.

Resources