I have trained the resnet50_v1b_voc for two different objects and created the params file from the training. While the params files are working absolutely fine when I am doing prediction by loading only one throughout but getting issue in results if I load both of them together.
from gluoncv import model_zoo, data, utils
from matplotlib import pyplot as plt
net = model_zoo.get_model('faster_rcnn_resnet50_v1b_voc', pretrained=True, ctx=mx.gpu())
net.load_parameters('/home/ubuntu/abc/faster_rcnn_resnet50_v1b_voc_best.params', ctx=mx.gpu())
net.load_parameters('/home/ubuntu/xyz/faster_rcnn_resnet50_v1b_voc_best.params', ctx=mx.gpu())
net.reset_class(['abc'], reuse_weights={'abc': 'abc'})
class_IDs, scores, bounding_boxs = net(x)
# This works fine if I predict the abc object
# But when I do this the confidence score is too low for the xyz object
# Individually if I perform this task with xyz params the results are perfect
net.reset_class(['xyz'], reuse_weights={'xyz': 'xyz'})
class_IDs, scores, bounding_boxs = net(x)
Not sure what I am doing wrong here.
The probable cause of the issue is the reset_class():
reset_class(classes, reuse_weights=None): Resets class categories and class predictors.
The pre-trained model supports many classes in the training dataset.
After call net.reset_class(['abc'],…), the net output will be changed to only support one class: ‘abc’.
So, it would perform worse when predicting on ‘xyz’.
Related
If you run cross-val_score() or cross_validate() on a dataset, is the estimator trained using all the folds at the end of the run?
I read somewhere that cross-val_score takes a copy of the estimator. Whereas I thought this was how you train a model using k-fold.
Or, at the end of the cross_validate() or cross_val_score() you have a single estimator and then use that for predict()
Is my thinking correct?
You can refer to sklearn-document here.
If you do 3-Fold cross validation,
the sklearn will split your dataset to 3 parts. (For example, the 1st part contains 1st-3rd rows, 2nd part contains 4th-6th rows, and so on)
sklearn iterate to train new model 3 times with different training set and validation set
In the first round, it combine 1st and 2nd part together and use it as training set and test the model with 3rd part.
In the second round, it combine 1st and 3rd part together and use it as training set and test the model with 2nd part.
and so on.
So, after using cross-validate, you will get three models. If you want the model objects of each round, you can add parameter return_estimato=True. The result which is the dictionary will have another key named estimator containing the list of estimator of each training.
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from sklearn.metrics import confusion_matrix
from sklearn.svm import LinearSVC
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
print(sorted(cv_results.keys()))
#Output: ['estimator', 'fit_time', 'score_time', 'test_score']
cv_results['estimator']
#Output: [Lasso(), Lasso(), Lasso()]
However, in practice, the cross validation method is used only for testing the model. After you found the good model and parameter setting that give you the high cross-validation score. It will be better if you fit the model with the whole training set again and test the model with the testing set.
I have been trying to follow Francois example of a binary image classifier of cats and dogs. I have attempted to follow his example in another similar set in kaggle (https://www.kaggle.com/playlist/men-women-classification) and I want to achieve the following
Visualise the predictions that are wrong
Come out with the classification report
I already have a model with around 85% accuracy on the validation set but I want to know roughly what kind of images my model is getting wrong as well as coming up with a classification report with sklearn.metric's classification report.
However I do not know how does the image generator works and have a big problem trying to know how to pair the predictions with the labels of the test images.
from sklearn.metrics import classification_report
new_test_datagen = test_datagen.flow_from_directory(
directory = test_dir,
target_size=(150,150),
batch_size=1,
class_mode='binary',
seed = 42,
)
train_image = new_train_generator.next()
plt.imshow(train_image[0].reshape(150,150,-1))
print(train_image[1])
#I want to output images but I am not sure if this is the most efficient way of doing it
predictions = model.predict(test_generator)
predictions.shape
#The predictions is a numpy array of length 476 but I do not know what are the 'correct' labels found in my test set to validate it against this output.
model.evaluate(test_generator)
# [0.3109202980995178, 0.8886554837226868]
#train model
#here is one sample
sample = validation_X[0].reshape(1, -1)
#print the sample for reference
print(sample)
#show the weights for reference
print(model.get_weights())
#show prediction
print(model.predict(sample))
#another prediction that is the same as above
print(model.predict(sample))
#save model
model.save('mymodel.h5')
#reload model
model = load_model('mymodel.h5')
#sample looks to be the same as above
print(sample)
#weights also look to be the same as above
print(model.get_weights())
#prediction is different here?
print(model.predict(sample))
Why is my model predicting a different value after reloading it? I check and the sample is obviously the same, and from an eye test the weights look to be the same too. What could be causing the model to product a different prediction here?
If the models are loaded in two different instances, then you must always save the model weights and reload them. The model weights may look similar due to small real numbers, but you need to save and reload the weights so that the learned weights are the same.
I have been trying to build a machine learning model using Keras which predicts the radiation dose based on pre-treatment parameters. My dataset has approximately 2200 samples of which 20% goes into validation and testing.
The problem with the target variable is that it is very skewed since large radiation doses are much more rare than the small ones. Hence, I suspect that my regression model fails to predict the large values at all, and predicts everything around the mean, which is apparent from the figure. I have tried to log-normalise the target variable to make it more normally distributed, but it has had no effect.
Any suggestion how to fix this?
Target variable
Regression predictions
Computing individual sample weights based on 10 histogram bins helped in my case. See the code below:
import pandas as pd
import numpy as np
from sklearn.utils.class_weight import compute_sample_weight
hist, bin_edges = np.histogram(training_targets, bins = 10)
classes = training_targets.apply(lambda x: pd.cut(x, bin_edges, labels = False,
include_lowest = True)).values
sample_weights = compute_sample_weight('balanced', classes)
So, I can get sklearn.linear_model.LinearRegression to process my data - at least to run the script without raising any exceptions or warnings. The only issue is, that I am not trying to plot the results with matplotlib, but instead I want to see the estimators and diagnostic statistics for the model.
How can I get a model summary such as the slope and intercept (B0,B1), R squared adjusted, etc. to display in the console or populate into a variable instead of plotting this?
This is a generic copy of the script I ran:
import numpy as p
import pandas as pn
from sklearn import datasets, linear_model
z = pn.DataFrame(
{'a' : [1,2,3,4,5,6,7,8,9],
'b' : [9,8,7,6,5,4,3,2,1]
})
a2 = z['a'].values.reshape(9,1)
b2 = z['b'].values.reshape(9,1)
reg = linear_model.LinearRegression(fit_intercept=True)
reg.fit(a2,b2)
# print(reg.get_params(deep=True)) I tried this and it didn't print out the #information I wanted
# print(reg) # I tried this too
This ran without errors, but no output other than this appeared in the console:
{'n_jobs': 1, 'fit_intercept': True, 'copy_X': True, 'normalize': False}
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
Thanks for any info on how to get this to print a summary of the model.
sklearn's API is designed around fitting training data and then generating predictions on test data without exposing much if any information about how the model is fit. While you can sometimes find the estimated parameters of the model by accessing the fitted model object's coef_ attribute, you won't find much in the way of parameter description functionality. This is because there may be no way to provide this information in a uniform way. The API is designed to let you treat a linear regression the same a random forest.
Since you are interested in a linear model, you can get the information you're looking for, including confidence intervals, goodness-of-fit statistics, and the like from the statsmodels library. See their OLS example: http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/ols.html for details.