I need to run a PyMC3 model in a loop to estimate/make predictions every month. How do you reset the Theano graph? I'm familiar with Tensorflow and I know this can be done, but googling doesn't seem to lead to any solutions. Alternatively, how are you meant to run a PyMC3 model in a loop?
Related
I am not an expert on logistic regression, but I thought when solving it using lgfgs it was doing optimization, finding local minima for the objective function. But every time I run it using scikit-learn, it is returning the same results, even when I feed it a different random state.
Below is code that reproduces my issue.
First set up the problem by generating data
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn import datasets
# generate data
X, y = datasets.make_classification(n_samples=1000,
n_features=10,
n_redundant=4,
n_clusters_per_class=1,
random_state=42)
# Set up the test/training data
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25)
Second, train the model and inspect results
# Set up a different random state each time
rand_state = np.random.randint(1000)
print(rand_state)
model = LogisticRegression(max_iter=1000,
solver='lbfgs',
random_state=rand_state)
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
conf_mat = metrics.confusion_matrix(y_test, y_pred)
print(y_pred[:20],"\n", conf_mat)
I get the same y_pred (and obviously confusion matrix) every time I run this even though I'm using the lbfgs solver with a different random state each run. I'm confused, as I thought this was a stochastic solver that was traveling down a gradient into a local minimum.
Maybe I'm not properly randomizing the initial state? I haven't been able to figure it out from the documentation.
Discussion of Related Question
There is a related question, which I didn't find during my research:
Does logistic regression always find global optimum, assuming that the optimisation converges?
The answer there is that the cost function is convex, so if the numerical solution is well-behaved, it will find a global minimum. That is, there aren't a bunch of local minima that your optimization algorithm will get stuck in: it will reach the same (global) minimum each time (perhaps depending on the solver you choose?).
However, in the comments someone pointed out, depending on what solvers you choose there are cases when you will not reach the same solution, that it depends on the random_state parameter. At the very least, I think this would be helpful to resolve.
First, let me put in the answer what got this closed as duplicate earlier: a logistic regression problem (without perfect separation) has a global optimum, and so there are no local optima to get stuck in with different random seeds. If the solver converges satisfactorily, it will do so on the global optimum. So the only time random_state can have any effect is when the solver fails to converge.
Now, the documentation for LogisticRegression's parameter random_state states:
Used when solver == ‘sag’, ‘saga’ or ‘liblinear’ to shuffle the data. [...]
So for your code, with solver='lbfgs', indeed there is no expected effect.
It's not too hard to make sag and saga fail to converge, and with different random_states to end at different solutions; to make it easier, set max_iter=1. liblinear apparently does not use the random_state unless solving the dual, so also setting dual=True admits different solutions. I found that thanks to this comment on a github issue (the rest of the issue may be worth reading for more background).
I am new to XGBoost and I am currently working on a project where we have built an XGBoost classifier. Now we want to run some feature selection techniques. Is backward elimination method a good idea for this? I have used it in regression but I am not sure if/how to use it in a classification problem. Any leads will be greatly appreciated.
Note: I have already tried permutation line importance and it has yielded good results! Looking for another method to evaluate the features in the model.
Consider asking your question on Cross Validated since feature selection is more about theory/practice than code.
What is your concern ? Remove "noisy" features who drive down your results, obtain a sparse model ? Backward selection is one way to do of course. That being said, not sure if you are aware of this but XGBoost computes its own "variable importance" values.
# plot feature importance using built-in function
from xgboost import XGBClassifier
from xgboost import plot_importance
from matplotlib import pyplot
model = XGBClassifier()
model.fit(X, y)
# plot feature importance
plot_importance(model)
pyplot.show()
Something like this. This importance is based on how many times a feature is used to make a split. You can then define for instance a threshold below which you do not keep the variables. However do not forget that :
This variable importance has been obtained on the training data only
The removal of a variable with high importance may not affect your prediction error, e.g. if it is correlated with another highly important variable. Other tricks such as this one may exist.
I'm currently trying to implement YOLOv3 in TensorFlow, using the Estimator API. However, I'm stuck at the loss function. YOLOv3 makes predictions at three scales and I can't figure out, how to calculate the loss for all of them. I've already looked at the paper and also tried to find the loss function in the darknet source code but can't figure it out. I've also looked at the code for the loss function of another implementation YOLOv3 TensorFlow implementation but this hasn't really helped me to understand the calculation of the loss either.
Can someone explain how exactly the loss for training is calculated while taking into account the predictions of all three scales?
Need Suggestion
I am trying to design a model to guess Facial-Points. Its a part of Kaggle Competition (https://www.kaggle.com/c/facial-keypoints-detection).
In this solution, I am trying to design a CNN model (using Keras Library), as a Multi-variable regression model to Predict the co-ordinates of Facial-points.
Issue Faced --> I am getting loss as "nan"
Solutions tried --
1. Tried optimizers - Adam, SGD
2. tested with Learning rate 0.01 to 0.00001
3. Tried with various batch sizes
Can anyone suggest, if I am missing something. The code is present in below link -
https://www.kaggle.com/saurabhrathor/facialpoints-practice
I am using sklearn to train a model. The train dataset is about 3000k, so i use SGDClassifier. The feature is not very good, so i know it may not converge. But i want SGDClassifier to stop early according to my setting just like max_iter = 1000. As far as I am concerned, the function SGDClassifier has no parameter like max_iter. How can i do it?
This is the code.
This is the print information.
Any help will be appreciated...
This is weird, by default in scikit-learn 0.18.2, n_iter is set to 5 epochs. Can you please update your question with a script that makes it possible to reproduce the behavior using a toy dataset (for instance generated with numpy.random.randn or similar).
Note that in scikit-learn master and 0.19 once released, n_iter will be deprecated and replaced by max_iter and a tol (for instance set to 1e-3) to automatically stop when the objective function is no longer making progress.
The 20hours running could be not so strange since you have a dataset of 3000k and you use SGDClassifier that is slow. What processor do you have?
Try stopping it by using CTRL+C if you are in Windows. Then, use n_iter to control the number of iterations that you want. The default is 5 however.
Finally, if you want to save a model see here:
Save and Load Machine Learning Models in Python with scikit-learn