How can I include a regularization in a pulp optimization - montecarlo

I wanted to know if there are ways to include a regularization (e.g. a Tikhonov regularization) in an optimization with pulp?
And if this is not easy, is there another smart way to include uncertainties in the function which should be minimized? If I have a function like min(price1 * x1 + price2 * x2) and I want know include a range for price1 and price2 (x1 and x2 should be found), what is a smart way to do this? I thought first about doing a set of simulations and randomize price1 and price2, but I guess this has not the same effect as a regularization
Thanks in advance!

Related

Implications of regressing Y = f(x1, x2) where Y = x1 + x2 + x3

In various papers I seen regressions of the sort of Y = f(x1, x2), where f() is usually a simple OLS and, importantly, Y = x1 + x2 + x3. In other words, regressors are exactly a part of Y.
These papers used regressors as a way to describe data rather than isolating a causal inference between X and Y. I was wondering what are the implication of the above strategy. To begin with, do numbers / significance test make any sense at all? thanks.
I understand that the approach mechanically fails if regressors included in the analyisis completely describe Y for obvious reasons (perfect collinearity). However, I would like to understand better the implication of only including some of the x in it.

multiple regression correlation effect

I would like to investigate the effects of two independent variables on a dependent variable. Suppose we have X1, X2 independent variables, and Y dependent variable.
I use two different approaches. In the first approach, to eliminate the effect of X1 on Y, I generate the conditional distribution of Y|X1 and perform regression using the second variable X2. When I check the correlations between X2 and Y|X1, I obtain relatively high correlations (R2>0.50). However, when I perform multiple regression over a wide range of data (X1 and X2), the effect of X2 on Y is decreased and becomes insignificant. How do these approaches give conflicting results? What is the most appropriate approach to determine the effect of X2 on Y for a given X1 value? Thanks.
It could be good to see the code or the above in mathematical notation.
For instance: did you include the constant terms?
What do you see when:
Y = B0 + B1X1 + B2X2
That will be the easiest to check, and B2 will give you probably what you want.
That model is still simple, you could explore something like:
Y = B0 + B1X1 + B2X2 + B3X1X2
or
Y = B0 + B1X1 + B2X2 + B3X1X2 + B4X1^2 + B5X2^2
And see if there are changes in the coefficients and if there are new significant coefficients.
You could go further and explore Structural Equation Models

Lasso regression, no variable was dropped

I am performing lasso regression in R for binary response variable.
I am using cv.glmnet to find the best lambda and using glmnet to check the coefficients for the best lambda case. When calling both functions, I specify standardize =TRUE and alpha = 1.
I have about 40 variables in my case and I am sure some of them are strongly correlated with each other from scatterplots and vif(when I was performing logistic regression on the same data).
The best lambda that I got from my lasso regression is <0.001 and no variable is dropped in the best model (with lambda = best lambda).
Wondering why no variable was dropped.
Basically it's because your lambda value is too small. lambda<0.001 means that your penalty is so small that it really don't matter at all. Look at this "stupid" example:
Let's generate some sample random data. Note that variable z and z1 are strongly corelated.
library(glmnet)
z<-rnorm(100)
data<-data.frame(y=3+rnorm(100),x1=rnorm(100),x2=rnorm(100),x3=rnorm(100),x4=rnorm(100),x5=rnorm(100),
x6=rnorm(100),x7=rnorm(100),x8=rnorm(100),x9=rnorm(100),x10=rnorm(100),z=z,z1=z+rnorm(100,0,0.3))
Now run some models:
gl<-glmnet(y=data$y,x=as.matrix(data[,-1]),alpha = 1)
plot(gl,xvar="lambda")
lambda equal 0.001 means log(lambda)=-6.907755 and even in this "stupid" example where we could think that the coefficients won't be significant (so values should be equal to 0) we will get small but nonzero values (like in the plot).
Coefficient from glmnet with lambda=0.001 are very similar to those from glm (like I said, small lambda equal no penalty for log-likelihood):
gl1<-glmnet(y=data$y,x=as.matrix(data[,-1]),alpha = 1,lambda=0.001)
gl2<-glm(data=data,formula=y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+z+z1)
gl1$beta
# x1 -0.10985215
# x2 -0.12337595
# x3 0.06088970
# x4 -0.12714515
# x5 -0.12237959
# x6 -0.01439966
# x7 0.02037826
# x8 0.22288055
# x9 -0.10131195
# x10 -0.04268274
# z -0.04526606
# z1 0.04628616
gl3$coefficients
(Intercept) x1 x2 x3 x4 x5 x6
2.98542594 -0.11104062 -0.12478162 0.06293879 -0.12833484 -0.12385855 -0.01556657
x7 x8 x9 x10 z z1
0.02071605 0.22408006 -0.10195640 -0.04419441 -0.04602251 0.04513612
Now look what is the difference on the coefficients from those two methods:
as.vector(gl1$beta)-as.vector(gl2$coefficients)[-1]
# [1] 0.0011884697 0.0014056731 -0.0020490872 0.0011896872 0.0014789566 0.0011669064
# [7] -0.0003377824 -0.0011995019 0.0006444471 0.0015116774 0.0007564556 0.00115004

Clustering objects with weighted attributes

I want to cluster a set of objects which have multiple attributes and some attributes are more important than others
is there a simple way to give these specific attributes a heavy weight in a way to give them more importance then the others?
Look - every instance of objects from your set might been represented as multidimensional vector (each attribute of your object is a component of vector). So, you might use distance-based clustering (distance between similar vectors is very small), such as k-means. You need to define your own distance function between vectors.
For example if your objects has 3 attributes (X Y Z), also that each attribute has its weight (importance) (wx wy wz).
According to this, for example, you might define distance function between two vectors (X1 Y1 Z1) and (X2 Y2 Z2) in such way (cosinus distance):
(wx^2*X1*X2+wy^2*Y1*Y2+wz^2*Z1*Z2)
dist= -----------------------------------------------------------------------
[(wx^2*X1^2+wy^2*Y1^2+wz^2*Z1^2)*(wx^2*X2^2+wy^2*Y2^2+wz^2*Z2^2)]^0,5

Given a set of points, how do I approximate the major axis of its shape?

Given a "shape" drawn by the user, I would like to "normalize" it so they all have similar size and orientation. What we have is a set of points. I can approximate the size using bounding box or circle, but the orientation is a bit more tricky.
The right way to do it, I think, is to calculate the majoraxis of its bounding ellipse. To do that you need to calculate the eigenvector of the covariance matrix. Doing so likely will be way too complicated for my need, since I am looking for some good-enough estimate. Picking min, max, and 20 random points could be some starter. Is there an easy way to approximate this?
Edit:
I found Power method to iteratively approximate eigenvector. Wikipedia article.
So far I am liking David's answer.
You'd be calculating the eigenvectors of a 2x2 matrix, which can be done with a few simple formulas, so it's not that complicated. In pseudocode:
// sums are over all points
b = -(sum(x * x) - sum(y * y)) / (2 * sum(x * y))
evec1_x = b + sqrt(b ** 2 + 1)
evec1_y = 1
evec2_x = b - sqrt(b ** 2 + 1)
evec2_y = 1
You could even do this by summing over only some of the points to get an estimate, if you expect that your chosen subset of points would be representative of the full set.
Edit: I think x and y must be translated to zero-mean, i.e. subtract mean from all x, y first (eed3si9n).
Here's a thought... What if you performed a linear regression on the points and used the slope of the resulting line? If not all of the points, at least a sample of them.
The r^2 value would also give you information about the general shape. The closer to 0, the more circular/uniform the shape is (circle/square). The closer to 1, the more stretched out the shape is (oval/rectangle).
The ultimate solution to this problem is running PCA
I wish I could find a nice little implementation for you to refer to...
Here you go! (assuming x is a nx2 vector)
def majAxis(x):
e,v = np.linalg.eig(np.cov(x.T)); return v[:,np.argmax(e)]

Resources