shap.force_plot() raises Exeption: In v0.20 force_plot now requires the base value as the first parameter - decision-tree

I'm using Catboost and would like to visualize shap_values:
from catboost import CatBoostClassifier
model = CatBoostClassifier(iterations=300)
model.fit(X, y,cat_features=cat_features)
pool1 = Pool(data=X, label=y, cat_features=cat_features)
shap_values = model.get_feature_importance(data=pool1, fstr_type='ShapValues', verbose=10000)
shap_values.shape
Output: (32769, 10)
X.shape
Output: (32769, 9)
Then I do the following and an exception is raised:
shap.initjs()
shap.force_plot(shap_values[0,:-1], X.iloc[0,:])
Exception: In v0.20 force_plot now requires the base value as the first parameter! Try shap.force_plot(explainer.expected_value, shap_values) or for multi-output models try shap.force_plot(explainer.expected_value[0], shap_values[0]).
The following works, but I would like to make force_plot() work:
shap.initjs()
shap.summary_plot(shap_values[:,:-1], X)
I read the Documentation but can't make sense of explainer. I tried:
explainer = shap.TreeExplainer(model,data=pool1)
#Also tried:
explainer = shap.TreeExplainer(model,data=X)
but I get: TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Can anyone point me in the right direction? THX

I had the same error as below-
Exception: In v0.20 force_plot now requires the base value as the
first parameter! Try shap.force_plot(explainer.expected_value,
shap_values) or for multi-output models try
shap.force_plot(explainer.expected_value[0], shap_values[0]).
This helped me resolve the issue-
import shap
explainer = shap.TreeExplainer(model,data=X)
shap.initjs()
shap.force_plot(explainer.expected_value[0],X.iloc[0,:])
Also for the below issue -
TypeError: ufunc 'isnan' not supported for the input types, and the
inputs could not be safely coerced to any supported types according to
the casting rule ''safe''
Check your data, if it contains any NaN's or missing values.
Hope this helps!

try this:
shap.force_plot(explainer.expected_value, shap_values.values[0, :], X.iloc[0, :])

Building on #Sparsha's answer, since I was still getting errors, what worked for me was:
explainer = shap.TreeExplainer(model, data = X)
shap_values = explainer.shap_values(X_train)
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0], feature_names = explainer.data_feature_names)

Related

Pytorch VGG16 throwing a matrix multiplication RuntimeError during inference

I'm trying to extract VGG16 features of images as part of a project. However, at the time of extracting the features, I am met with an RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x49 and 25088x4096). The error is triggered at line 69 of vgg.py, at the x = self.classifier(x) instruction.
The simplest bit of code that I have found to reproduce the bug is the following:
import torchvision, torch
feature_extractor = torchvision.models.vgg16()
im_size = 224
a = torch.rand([3, im_size, im_size])
feature_extractor(a)
I don't think that the problem is the shape of the input tensor, since the error is raised pretty late in the forward function of VGG16. I can't think of a way to solve this. Does anybody know what I'm missing?
Not sure why the error appears sai late nor why the documentation doesn’t cover it, but the problem is indeed the tensor shape. The model expects one more index before all the other representing the mini batch. Therefore, the following code does not throw an error:
import torchvision, torch
feature_extractor = torchvision.models.vgg16()
im_size = 224
a = torch.rand([1, 3, im_size, im_size])
feature_extractor(a)
If you want to apply the model to a single image, use unsqueeze to apply an extra leading dimension:
a = torch.rand([3, im_size, im_size])
a = torch.unsqueeze(a,0)
feature_extractor(a)

sklearn.metrics.ConfusionMatrixDisplay using scientific notation

I am generating a confusion matrix as follows:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
cm = confusion_matrix(truth_labels, predicted_labels, labels=n_classes)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp = disp.plot(cmap="Blues")
plt.show()
However, some of my values for True Positive, True Negative, etc. are over 30,000, and they are being displayed in scientific format (3e+04). I want to show all digits and have found the values_format parameter in the ConfusionMatrixDisplay documentation. I have tried using it like this:
disp = ConfusionMatrixDisplay(confusion_matrix=cm, values_format='')
But I get a type error:
TypeError: __init__() got an unexpected keyword argument 'values_format'.
What I am doing wrong? Thanks in advance!
In case somebody runs into the same problem, I just found the answer. The values_format argument had to be added to disp.plot, not to the ConfusionMatrixDisplay call, as such:
disp.plot(cmap="Blues", values_format='')

padding in tf.data.Dataset in tensorflow

Code:
a=training_dataset.map(lambda x,y: (tf.pad(x,tf.constant([[13-int(tf.shape(x)[0]),0],[0,0]])),y))
gives the following error:
TypeError: in user code:
<ipython-input-32-b25101c2110a>:1 None *
a=training_dataset.map(lambda x,y: (tf.pad(tensor=x,paddings=tf.constant([[13-int(tf.shape(x)[0]),0],[0,0]]),mode="CONSTANT"),y))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:264 constant **
allow_broadcast=True)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:282 _constant_impl
allow_broadcast=allow_broadcast))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:456 make_tensor_proto
_AssertCompatible(values, dtype)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:333 _AssertCompatible
raise TypeError("Expected any non-tensor type, got a tensor instead.")
TypeError: Expected any non-tensor type, got a tensor instead.
However, when I use:
a=training_dataset.map(lambda x,y: (tf.pad(x,tf.constant([[1,0],[0,0]])),y))
Above code works fine.
This brings me to the conclusion that something is wrong with: 13-tf.shape(x)[0] but cannot understand what.
I tried converting the tf.shape(x)[0] to int(tf.shape(x)[0]) and still got the same error.
What I want the code to do:
I have a tf.data.Dataset object having variable length sequences of size (None,128) where the first dimension(None) is less than 13. I want to pad the sequences such that the size of every collection is 13 i.e (13,128).
Is there any alternate way (if the above problem cannot be solved)?
A solution that works:
using:
paddings = tf.concat(([[13-tf.shape(x)[0],0]], [[0,0]]), axis=0)
instead of using:
paddings = tf.constant([[13-tf.shape(x)[0],0],[0,0]])
works for me.
However, I still cannot figure out why the latter one did not work.

How to resolve KeyError: 'val_mean_absolute_error' Keras 2.3.1 and TensorFlow 2.0 From Chollet Deep Learning with Python

I am on section 3.7 of Chollet's book Deep Learning with Python.
The project is to find the median price of homes in a given Boston suburbs in the 1970's.
https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/3.7-predicting-house-prices.ipynb
At section "Validating our approach using K-fold validation" I try to run this block of code:
num_epochs = 500
all_mae_histories = []
for i in range(k):
print('processing fold #', i)
# Prepare the validation data: data from partition # k
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# Prepare the training data: data from all other partitions
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)
# Build the Keras model (already compiled)
model = build_model()
# Train the model (in silent mode, verbose=0)
history = model.fit(partial_train_data, partial_train_targets,
validation_data=(val_data, val_targets),
epochs=num_epochs, batch_size=1, verbose=0)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)
I get an error KeyError: 'val_mean_absolute_error'
mae_history = history.history['val_mean_absolute_error']
I am guessing the solution is figure out the correct parameter to replace val_mean_absolute_error. I've tried looking into some Keras documentation for what would be the correct key value. Anyone know the correct key value?
The problem in your code is that, when you compile your model, you do not add the specific 'mae' metric.
If you wanted to add the 'mae' metric in your code, you would need to do like this:
model.compile('sgd', metrics=[tf.keras.metrics.MeanAbsoluteError()])
model.compile('sgd', metrics=['mean_absolute_error'])
After this step, you can try to see if the correct name is val_mean_absolute_error or val_mae. Most likely, if you compile your model like I demonstrated in option 2, your code will work with "val_mean_absolute_error".
Also, you should also put the code snippet where you compile your model, it is missing in the question text from above(i.e. the build_model() function)
I replaced 'val_mean_absolute_error' with 'val_mae' and it worked for me
FYI, I had the same problem that persisted even after changing the line history.history['val_mae'] as described in the answer.
In my case, in order for the val_mae dict object to be present in history.history object, I needed to ensure that the model.fit() code included the 'validation_data = (val_data, val_targets)' argument. I neglected to do this initially.
I update it by below code line:
mae_history = history.history["mae"]
History object should contain the same names as what you compile.
For example:
mean_absolute_error gives val_mean_absolute_error
mae gives val_mae
accuracy gives val_accuracy
acc gives val_acc

How can I generate classification report by removing this error?

I want to generate classification report of dataset movie_reviews from corpus which has already target names [pos , neg]. but found an error.
Code:
movie_train_clf = Pipeline([('vect',CountVectorizer(stop_words='english')),('tfidf',TfidfTransformer()),('clas',BernoulliNB(fit_prior=True))])
movie_train_clas = movie_train_clf.fit(movie_train.data ,movie_train.target)
predict = movie_train_clas.predict(movie_train.data)
np.mean(predict==movie_train.target)
Now I use classification report
from sklearn.metrics import classification_report
print(classification_report(predict, movie_train_clas,target_names==target_names))
Error:
TypeError: iteration over a 0-d array.
please help me with correct syntax.
There are multiple errors in your code:
1) You have the wrong order of arguments in classification_report. As per the documentation:
classification_report(y_true, y_pred, ...
First argument is the true labels and second one is the predicted labels.
2) You are using movie_train_clas in the place of true labels. movie_train_clas as per your code is the return value of movie_train_clf.fit(), so its the movie_train_clf itself. fit() returns itself, so you cannot use that in place of ground truth labels.
3) As #AmiTavory spotted, the current error is due to comparison operator (==) used in place of assignment (=). The correct call to classification_report should be:
classification_report(movie_train.target, predict, target_names=target_names)

Resources