I've a Tensorflow model using
metrics = [tf.keras.metrics.SensitivityAtSpecificity(xx)]
I'm trying to implement a sklearn model with scoring equal to sensitivity or specificity, but if I set
scoring='sensitivity'
sklearn complains and if I check sorted(sklearn.metrics.SCORERS.keys()) I see that there isn't any sensitivity or specificity.
So how do I train a model with these 2 metrics?
Related
I want to perform a stacking CV regressor using random forest, lasso and support vector regressor models as independent models and random forest as meta regressor. I want to know how to perform the same in pyspark.The below mentioned code is in python and I want to convert in pyspark.
I couldn't find stacking cv regressor in pyspark MLlib library
svr = SVR(kernel='linear')
lasso = Lasso()
rf = RandomForestRegressor(n_estimators=5, random_state=RANDOM_SEED)
# Starting from v0.16.0, StackingCVRegressor supports
# `random_state` to get deterministic result.
stack = StackingCVRegressor(regressors=(svr, lasso, rf), meta_regressor=rf,random_state=RANDOM_SEED)
It is about Hyperparameter Tuning with GCP.
With estimators I can easily set the desired hyperparameterMetric to the proper metric on evaluation data. But I don't see how I can do that for a keras (tf.keras and keras) model?
I mean where can I "assign" the right metric? I need the hyperparameterMetric to be the metric for evaluation data.
Edit:
model.fit returns a dict like:
{'acc': [0.9843952109499714],
'loss': [0.050826362343496051],
'val_acc': [0.98403786838658314],
'val_loss': [0.0502210383056177]
}
Does GCP works now if I just set my desired validation metric to 'val_acc' in the config file without doing anything else?
you must use keras callback of tensorboard
use prefix "epoch_" for ParameterMetricTag=epoch_val_acc for validation accuracy and ParameterMetricTag=epoch_acc for training accuracy
What is difference between SGD classifier and SGD regressor in python sklearn? Also can we set batch size for faster performance in them?
Well, it's in the name. SGD Classifier is a model that is optimized (trained) using SGD (taking the gradient of the loss of each sample at a time and the model is updated along the way) in classification problems. It can represent a variety of classification models (SVM, logistic regression...) which is defined with the loss parameter. By default, it represents linear SVM. SGD Regressor is a model that is optimized (trained) using SGD for regression tasks. It's basically a linear model that is updated along the way with a decaying learning rate.
SGD {Stochastic Gradient Descent} is an optimization method, which is used by machine learning algorithms or models to optimize the loss function.
In the scikit-learn library, these model SGDClassifier and SGDRegressor, which might confuse you to think that SGD is a classifier and regressor.
But that's not the case.
SGDClassifier - it is a classifier optimized by SGD
SGDRegressor - it is a regressor optimized by SGD.
Stochastic gradient descent{SGD} does not support batch, it takes single training example at a time unlike {batch} Gradient descent.
Example using sklearn partial fit
from sklearn.linear_model import SGDClassifier
import random
clf2 = SGDClassifier(loss='log') # shuffle=True is useless here
shuffledRange = range(len(X))
n_iter = 5
for n in range(n_iter):
random.shuffle(shuffledRange)
shuffledX = [X[i] for i in shuffledRange]
shuffledY = [Y[i] for i in shuffledRange]
for batch in batches(range(len(shuffledX)), 10000):
clf2.partial_fit(shuffledX[batch[0]:batch[-1]+1], shuffledY[batch[0]:batch[-1]+1], classes=numpy.unique(Y))
Classifier predicts to which class belongs some data.
this picture is a cat (not a dog)
Regressor predicts usually probability to which class it belongs
this picture with 99% of probability is a cat
According to Spark ML docs random forest and gradient-boosted trees can be used for both: classification and regression problems:
https://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression
Suppose my "label" is taking integer values from 0..n and I want to train these classifiers for regression problem, predicting continuous variable value for the label field. However, I don't see in the documentation how both of these regressors should be configured for this problem and I don't see any class parameters which distinguish cases for regression vs classification. How both classifiers should be configured for regression problems, then?
There is no such configuration involved, simply because the regression & classification problems are actually handled by different submodules & classes in Spark ML; i.e. for classification, you should use (assuming PySpark):
from pyspark.ml.classification import GBTClassifier # GBT
from pyspark.ml.classification import RandomForestClassifier # RF
while for regression you should use respectively
from pyspark.ml.regression import GBTRegressor # GBT
from pyspark.ml.regression import RandomForestRegressor # RF
Check the Classification and regression overview in the docs for more details.
I'm trying to use caffe to simulate the SGDclassifier and Logisticregression linear models in sklearn. As we all know, in caffe, one "InnerProduct" layer plus one "Softmaxwithloss" layer represent a logistic regression Y = Logit(WX+b).
I'm now using the digits dataset in the sklearn datasets package as the trianing set(5/6 of all the data-label pairs) and testing set(the rest 1/6). However, the accuracy obtained by SGDclassifer() or LogisticRegression() could reach nearly 90%, while the accuracy obtained by two-layer Neural Network cannot exceed 30% after training. Is this because of the parameter settings or something else? The gap between them is just kind of too large.