Logloss metric in Fastai - pytorch

i'm doing a competition in zindi plateform which they are using The evaluation metric for this challenge as Log Loss.
so i'm working with fastai library and i want the metric log loss .. i didn't find LogLoss as metric in this library !
i tried some codes like the function provided by sklearn from sklearn.metrics import log_loss but i didn't work
the link of the competition : https://zindi.africa/competitions/basic-needs-basic-rights-kenya-tech4mentalhealth

if needed as a metric (typically mostly used as a loss) you should be able to use cross_entropy function from pytorch:
import torch.nn.functional as F
metrics=[F.cross_entropy,(plus other metrics if needed)]
model= cnn_learner(data, model, metrics=metrics,...)

Related

intel daal4py classifiers with scikit-learn

I am testing the sklearn-compatible wrappers for the latest version of the intel daal4py classifiers. The intel k-nearest classifier works fine with sklearn’s cross_val_score() and GridSearchCV. The performance boost from the intel classifier is significant and the intel and sklearn models provide generally comparable results across 10 different large public datasets and some simulated datasets.
The sklearn-compatible wrapper for the intel random forest classifier seems to be completely broken. The score() method does not work so I cannot proceed further with the intel random forest wrapper class.
I posted this at the intel AI Developer Forum, but I was wondering if anyone here has gotten the intel sklearn-compatible random forest classifier to work.
My next step is to test the native daal4py random forest object and possibly write my own wrapper because the native daal4py api is so different from sklearn. I was hoping to avoid this.
There seems to be some confusion on the intel site regarding the names of the wrapper classes.
I am using:
For k-nearest: daal4py.sklearn.neighbors.kdtree_knn_classifier (this
works fine)
For random forest:
daal4py.sklearn.ensemble.decision_forest.RandomForestClassifier
The failure in the intel RandomForestClassifier is in forest.py because n_classes_ is an int. n_classes_ matches the number of classes for the label variable that is passed. The label variable is an integer.
predictions = [np.zeros((n_samples, n_classes_[k]))
for k in range(self.n_outputs_)]
Please find below the steps we used to compute scores for daal4py RandomForestClassifier
(i) For cross_val_score
from daal4py.sklearn.ensemble.decision_forest import RandomForestClassifier
from sklearn.model_selection import cross_val_score
clf = RandomForestClassifier()
scores = cross_val_score(clf, train_data, train_labels, cv=3)
print(scores)
(ii)For GridSearchCV
from sklearn.model_selection import GridSearchCV
from daal4py.sklearn.ensemble.decision_forest import RandomForestClassifier
param_grid = {
'n_estimators': [200, 700],
'max_features': ['auto', 'sqrt', 'log2']
}
clf = RandomForestClassifier()
CV_rfc = GridSearchCV(estimator=clf, param_grid=param_grid, cv= 5)
CV_rfc.fit(train_data, train_labels)
score=CV_rfc.score(train_data, train_labels)

Check for Per class misclassification

I have split my data using the sklearn train_test_split function and i am using model.fit in keras to train.
Now during training it prints the training and validation statistics on to the terminal.
What i am interested is in though when it prints the validation stats like accuracy and loss, I want to know a per class missclaffication number.Is that possible? I am training a binary classifier.
Thanks in advance.

Stratified cross validation with Pytorch

My goal is to make binary classification, using neural network.
The problem is that dataset is unbalanced, I have 90% of class 1 and 10 of class 0.
To deal with it I want to use Stratified cross-validation.
The problem that is I am working with Pytorch, I can't find any example and documentation doesn't provide it, and I'm student, quite new for neural networks.
Can anybody help?
Thank you!
The easiest way I've found is to do you stratified splits before passing your data to Pytorch Dataset and DataLoader. That lets you avoid having to port all your code to skorch, which can break compatibility with some cluster computing frameworks.
Have a look at skorch. It's a scikit-learn compatible neural network library that wraps PyTorch. It has a function CVSplit for cross validation or you can use sklearn.
From the docs:
net = NeuralNetClassifier(
module=MyModule,
train_split=None,
)
from sklearn.model_selection import cross_val_predict
y_pred = cross_val_predict(net, X, y, cv=5)

Keras - Callback to calculate variance

Is there a possibility to calculate the variance of target values for each batch using keras custom callback?
Maybe in a similar manner to the Create keras callback to save model predictions and targets for each batch during training ?
Im using keras from tensorflow backend and python3.
Cheers,
Maks

Use caffe to simulate the SGDclassifier or Logisticregression linear models in sklearn

I'm trying to use caffe to simulate the SGDclassifier and Logisticregression linear models in sklearn. As we all know, in caffe, one "InnerProduct" layer plus one "Softmaxwithloss" layer represent a logistic regression Y = Logit(WX+b).
I'm now using the digits dataset in the sklearn datasets package as the trianing set(5/6 of all the data-label pairs) and testing set(the rest 1/6). However, the accuracy obtained by SGDclassifer() or LogisticRegression() could reach nearly 90%, while the accuracy obtained by two-layer Neural Network cannot exceed 30% after training. Is this because of the parameter settings or something else? The gap between them is just kind of too large.

Resources