Intercept in linear regression - scikit-learn

I am new to machine learning and I am confused with what is the function of linear regression intercept parameter is doing.
When setting the parameter, fit_intercept=False, I get the .coef_ value as 287.986236, however, when setting fit_intercept=True, I get the .coef_ value as 225.81285046.
Why is there a difference? And I am not sure how to interpret the results and compare these values!
lm = LinearRegression(fit_intercept=False).fit(REStaten_[['GROSS_SQUARE_FEET']], REStaten_['SALE_PRICE'])
lm.coef_
# 287.986236
lm = LinearRegression(fit_intercept=True).fit(REStaten_[['GROSS_SQUARE_FEET']], REStaten_['SALE_PRICE'])
lm.coef_
# 225.81285046

The Slope and Intercept are the very important concept of Linear regression.
The slope indicates the steepness of a line and the intercept indicates the location where it intersects an axis.
If we set the Intercept as False then, no intercept will be used in calculations (e.g. data is expected to be already centered).
When we are using LR model in a dataset, It is trying to plot the "Line of best fit" by increasing or decreasing the Slope and Intercept values.
You are getting different .coef_ values because you are disabling the Intercept parameter in your first attempt and enabling it on your second attempt.
Hope this helps. For more info, you can refer the scikit-learn documentation.
Sk Learn Linear regression

Related

how to calculate BIC, score... WITHOUT fit?

I know that thanks to scikit tool, we can calculate BIC or score for Gaussian mixture model as shown below easily.
clf.fit(data)
bic=clf.bic(data)
score=clf.score(data)
http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html
but my question is, how to calculate bic or score WITHOUT using fit method, when I already have weights, means, covariances and data?
I could set as
clf = mixture.GaussianMixture(n_components=3, covariance_type='full')
clf.weights_=weights_list
clf.means_=means_list
clf.covariances_=covariances_list
or
clf.weights_init=weights_list
clf.means_init=means_list
clf.precisions_init =np.linalg.inv(covariances_list)
but when I try to get bic,
bic=clf.bic(data)
I get error message saying
sklearn.exceptions.NotFittedError: This GaussianMixture instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
I don'T want to run fit, because it will change given weights, means and covariances..
What can i do?
thanks
You need to set these three variables to pass the check_is_fitted test: 'weights_', 'means_', 'precisions_cholesky_'. 'weights_' and 'means_', you are setting correctly. And for calculating 'precisions_cholesky_' you need to have covariances_ which you do have.
So, just calculate that using this method here
from sklearn.mixture.gaussian_mixture import _compute_precision_cholesky
precisions_cholesky = _compute_precision_cholesky(covariances_list, 'full')
Change the "full" to appropriate covariance type and then set the result to clf using
clf.precisions_cholesky_ = precisions_cholesky
Make sure the shape of all these variables correspond correctly to your data.

Improving linear regression model by taking absolute value of predicted output?

I have a particular classification problem that I was able to improve using Python's abs() function. I am still somewhat new when it comes to machine learning, and I wanted to know if what I am doing is actually "allowed," so to speak, for improving a regression problem. The following line describes my method:
lr = linear_model.LinearRegression()
predicted = abs(cross_val_predict(lr, features, labels_postop_IS, cv=10))
I attempted this solution because linear regression can sometimes produce negative predictions values, even though my particular case, these predictions should never be negative, as they are a physical quantity.
Using the abs() function, my predictions produce a better fit for the data.
Is this allowed?
Why would it not be "allowed". I mean if you want to make certain statistical statements (like a 95% CI e.g.) you need to be careful. However, most ML practitioners do not care too much about underlying statistical assumptions and just want a blackbox model that can be evaluated based on accuracy or some other performance metric. So basically everything is allowed in ML, you just have to be careful not to overfit. Maybe a more sensible solution to your problem would be to use a function that truncates at 0 like f(x) = x if x > 0 else 0. This way larger negative values don't suddenly become large positive ones.
On a side note, you should probably try some other models as well with more parameters like a SVR with a non-linear kernel. The thing is obviously that a LR fits a line, and if this line is not parallel to your x-axis (thinking in the single variable case) it will inevitably lead to negative values at some point on the line. That's one reason for why it is often advised not to use LRs for predictions outside the "fitted" data.
A straight line y=a+bx will predict negative y for some x unless a>0 and b=0. Using logarithmic scale seems natural solution to fix this.
In the case of linear regression, there is no restriction on your outputs.
If your data is non-negative (as in your case the values are physical quantities and cannot be negative), you could model using a generalized linear model (GLM) with a log link function. This is known as Poisson regression and is helpful for modeling discrete non-negative counts such as the problem you described. The Poisson distribution is parameterized by a single value λ, which describes both the expected value and the variance of the distribution.
I cannot say your approach is wrong but a better way is to go towards the above method.
This results in an approach that you are attempting to fit a linear model to the log of your observations.

How to get started with Tensorflow

I am pretty new to Tensorflow, and I am currently learning it through given website https://www.tensorflow.org/get_started/get_started
It is said in the manual that:
We've created a model, but we don't know how good it is yet. To evaluate the model on training data, we need a y placeholder to provide the desired values, and we need to write a loss function.
A loss function measures how far apart the current model is from the provided data. We'll use a standard loss model for linear regression, which sums the squares of the deltas between the current model and the provided data. linear_model - y creates a vector where each element is the corresponding example's error delta. We call tf.square to square that error. Then, we sum all the squared errors to create a single scalar that abstracts the error of all examples using tf.reduce_sum:"
q1."we don't know how good it is yet.", I didn't understand this
quote as the simple model created is a simple slope equation and on
what it should train for?, as the model is a simple slope. Is it
require an perfect slope or what? why am I training that model and
for what?
q2.what is a loss function? Is loss function is used to determine the
accuracy of the model? Why is it required?
q3. I didn't understand " 'sums the squares of the deltas' between
the current model and the provided data."
q4.I didn't understood this part of code,"squared_deltas =
tf.square(linear_model - y)
this is the code:
y = tf.placeholder(tf.float32)
squared_deltas = tf.square(linear_model - y)
loss = tf.reduce_sum(squared_deltas)
print(sess.run(loss, {x:[1,2,3,4], y:[0,-1,-2,-3]}))
this may be simple questions, but I am a beginner to Tensorflow and having a hard time understanding it.
1) So you're kind of right about "Why should we train for a simple problem" but this is just an introduction piece. With any machine learning task you need to evaluate your model to see how good it is. In this case you are just trying to train to find the coefficients for the line of best fit.
2) A loss function in any machine learning context represents your error with your model. This usually means a function of your "distance" of your calculated value to the ground truth value. Think of it as an internal evaluation score. You want to minimise your loss so the gradients and parameter changes are based on your loss.
3/4) Your question here is more to do with least square regression. It's a statistical method to create lines of best fit between points. The deltas represent the differences between your calculated values and the truth values. The aim is to minimise the area of the squares and hence minise the error and have a better line of best fit.
What you are doing in this Tensorflow example is creating a machine learning model that will learn the coefficients for the line of best fit automatically using a least squares based system.
Pretty much all of your question have to-do with the loss function.
The loss function is a function that determines how far apart your output are from the expected (correct) output.
It has two usages:
Help the algorithm determine if the tweaking of the weight is helping going in the good or bad direction
Determinate the accuracy (~the number of time your system guesses the correct answer)
The loss function is the sum of the deltas witch is: the addition of the diff (delta) between the expected output and the actual output.
I think It's squared to magnifies the error the algorithm makes.

Can some coefficients be held constant during regression training in PySpark?

Is it possible to specify that certain coefficients should be held constant (at a pre-determined value) during the training of a regression model in PySpark?
For example, if I have the simple, single-feature data shown below, I can fit a straight line to it with linear regression and allow both coefficients to be fit. Then I get the green line.
However, if I somehow know the slope is 2.3, I can fix that coefficient to 2.3 and fit the intercept, which is the blue line.
This is a trivial example, but is there a means to do this in Spark (PySpark especially)?
Or is there a hook to add a custom cost function? (Then I could make the cost extremely large if certain coefficients are far from the expected value.)

sklearn: AUC score for LinearSVC and OneSVM

One option of the SVM classifier (SVC) is probability which is false by default. The documentation does not say what it does. Looking at libsvm source code, it seems to do some sort of cross-validation.
This option does not exist for LinearSVC nor OneSVM.
I need to calculate AUC scores for several SVM models, including these last two. Should I calculate the AUC score using decision_function(X) as the thresholds?
Answering my own question.
Firstly, it is a common "myth" that you need probabilities to draw the ROC curve. No, you need some kind of threshold in your model that you can change. The ROC curve is then drawn by changing this threshold. The point of the ROC curve being, of course, to see how well your model is reproducing the hypothesis by seeing how well it is ordering the observations.
In the case of SVM, there are two ways I see people drawing ROC curves for them:
using distance to the decision bondary, as I mentioned in my own question
using the bias term as your threshold in the SVM: http://researchgate.net/post/How_can_I_plot_determine_ROC_AUC_for_SVM. In fact, if you use SVC(probabilities=True) then probabilities will be calculated for you in this manner, by using CV, which you can then use to draw the ROC curve. But as mentioned in the link I provide, it is much faster if you draw the ROC curve directly by varying the bias.
I think #2 is the same as #1 if we are using a linear kernel, as in my own case, because varying the bias is varying the distance in this particular case.
In order to calculate AUC, using sklearn, you need a predict_proba method on your classifier; this is what the probability parameter on SVC does (you are correct that it's calculated using cross-validation). From the docs:
probability : boolean, optional (default=False)
Whether to enable probability estimates. This must be enabled prior to calling fit, and will slow down that method.
You can't use the decision function directly to compute AUC, since it's not a probability. I suppose you could scale the decision function to take values in the range [0,1], and compute AUC, however I'm not sure what statistical properties this will have; you certainly won't be able to use it to compare with ROC calculated using probabilities.

Resources