how to apply model developped in fast.ai / pytorch? - pytorch

I've trained a model which i'm trying to apply onto new data. I'm totally new to fast.ai
i'm creating my databunch as below (ds being the data i want to score):
bs = 64
data_lm = (TextList.from_df(df, path, cols='comment_text')
.split_by_rand_pct(0.1)
.label_for_lm()
.databunch(bs=bs))
The problem being that I cannot ommit the .split_by_rand_pct(0.1), so I cannot score the whole data
I then go and load/apply the model as below
data_clas = load_data(path, 'data_clas.pkl', bs=bs)
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.load('third');
preds, target = learn.get_preds(DatasetType.Test, ordered=True)
labels = preds.numpy()
But the problem is i'm only scoring 0.1 pct of my data as the first piece of code when I create the databunch is not correct...i'm wanting to apply the saved/loaded model onto the overall DF.
Many thanks in advance

a colleague of mine actually provided me with the solution, I'm posting it here in case it's useful to anyone.
learn.data.add_test(df['Contact_Text'])
preds,y = learn.get_preds(ds_type=DatasetType.Test)
preds

Related

MaskRCNN should find exactly one element

I trained a maskrcnn-model with matterport with one class to detect. It worked.
Now I want to predict some unseen images. I know that the object is present on each image and that it appears only once per image. How do I use my model to do so?
An possibility that came to my mind was:
num_results = 0
while num_results = 0:
model = mrcnn.model.MaskRCNN(mode='inference', config=pred_config)
model.load_weights('weight/path')
results = model.detect([img], verbose=1)
num_results = compute_num_of(results)
# lower DETECTION_MIN_CONFIDENCE from pred_config
But I think this is very time consuming because I load that model and its weights at every step. What would be best practice here?
Thanks

Can't get Keras Code Example #1 to work with multi-label dataset

Apologies in advance.
I am attempting to recreate this CNN (from the Keras Code Examples), with another dataset.
https://keras.io/examples/vision/image_classification_from_scratch/
The dataset I am using is one for retinal scans, and classifies images on a scale from 0-4. So, it's a multi-label image classification.
The Keras example used is binary classification (cats v dogs), though I would have hoped it wouldn't make much difference (maybe this is a big assumption on my part).
I skipped the 'image augmentation' part of the walkthrough. So, I have not created the
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
]
)
part. So, instead of:
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = data_augmentation(inputs)
# Entry block
x = layers.Rescaling(1.0 / 255)(x)
.......
at the beginning of the model, I have:
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = keras.Sequential(inputs)
# Entry block
x = layers.Rescaling(1.0 / 255)(x)
.......
However I keep getting different errors no matter how much I try to change things around, such as "TypeError: Keras symbolic inputs/outputs do not implement __len__.", or "ValueError: Exception encountered when calling layer "rescaling_3" (type Rescaling).".
What am I missing here?

Extract object features from Detectron2

I am really new to object detection, so sorry if this question seems too obvious.
I have a trained FasterRCNN model on detectron2 that detects objects, and I am trying to extract the features of each detected object for the output of my model's prediction. The tutorials available seems to look for ROI and make new predictions, along with their boxes and features. I have the boxes from inference, I just need to extract the features of each box. I added the code I've been working with below. Thank you
# Preprocessing
images = predictor.model.preprocess_image(inputs) # don't forget to preprocess
# Run Backbone Res1-Res4
features = predictor.model.backbone(images.tensor) # set of cnn features
#get proposed boxes + rois + features + predictions
# Run RoI head for each proposal (RoI Pooling + Res5)
proposal_boxes = [x.proposal_boxes for x in proposals]
features = [features[f] for f in predictor.model.roi_heads.in_features]
proposal_rois = predictor.model.roi_heads.box_pooler(features, proposal_boxes)
box_features = predictor.model.roi_heads.box_head(proposal_rois)
predictions = predictor.model.roi_heads.box_predictor(box_features)#found here: https://detectron2.readthedocs.io/_modules/detectron2/modeling/roi_heads/roi_heads.html
pred_instances, pred_inds = predictor.model.roi_heads.box_predictor.inference(predictions, proposals)
pred_instances = predictor.model.roi_heads.forward_with_given_boxes(features, pred_instances)
# output boxes, masks, scores, etc
pred_instances = predictor.model._postprocess(pred_instances, inputs, images.image_sizes) # scale box to orig size
# features of the proposed boxes
feats = box_features[pred_inds]
proposal_boxes = proposals[0].proposal_boxes[pred_inds]
This question has already been well discussed in this issue: https://github.com/facebookresearch/detectron2/issues/5.
The documentation also explains how to achieve this.

Machine Learning Predict Another Values

I'm really new at ML. I trained my dataset then I save it with pickle. My trained dataset has text and value. I'm trying to get an estimate from my new dataset, which has only text.
However, when I try to predict new values with my trained data, I'm getting an error, which says
ValueError: Number of features of the model must match the input. Model n_features is 17804 and input n_features is 24635
You can check my code below. What I have to do at this point ?
with open('trained.pickle', 'rb') as read_pickle:
loaded=pickle.load(read_pickle)
dataset2 = pandas.read_csv('/root/Desktop/predict.csv' , encoding='cp1252')
X2_train=dataset2['text']
train_tfIdf = vectorizer_tfidf.fit_transform(X2_train.values.astype('U'))
x = loaded.predict(train_tfIdf)
print(x)
fit_transform fits to the data and then transforms it, which you don't want to do while testing. It is like retraining the tfidf. So, for the purpose of prediction, I would suggest using the transform method simply.

How to apply random forest properly?

I am new to machine learning and python. Now I am trying to apply random forest to predict binary results of a target. In my data I have 24 predictors (1000 observations) where one of them is categorical(gender) and all the others numerical. Among numerical ones, there are two types of values which are volume of money in euros (very skewed and scaled) and numbers (number of transactions from an atm). I have transformed the big scale features and did the imputation. Last, I have checked correlation and collinearity and based on that removed some features (as a result I had 24 features.) Now when I implement RF it is always perfect in the training set while the ratios not so good according to crossvalidation. And even applying it in the test set it gives very very low recall values. How should I remedy this?
def classification_model(model, data, predictors, outcome):
# Fit the model:
model.fit(data[predictors], data[outcome])
# Make predictions on training set:
predictions = model.predict(data[predictors])
# Print accuracy
accuracy = metrics.accuracy_score(predictions, data[outcome])
print("Accuracy : %s" % "{0:.3%}".format(accuracy))
# Perform k-fold cross-validation with 5 folds
kf = KFold(data.shape[0], n_folds=5)
error = []
for train, test in kf:
# Filter training data
train_predictors = (data[predictors].iloc[train, :])
# The target we're using to train the algorithm.
train_target = data[outcome].iloc[train]
# Training the algorithm using the predictors and target.
model.fit(train_predictors, train_target)
# Record error from each cross-validation run
error.append(model.score(data[predictors].iloc[test, :], data[outcome].iloc[test]))
print("Cross-Validation Score : %s" % "{0:.3%}".format(np.mean(error)))
# Fit the model again so that it can be refered outside the function:
model.fit(data[predictors], data[outcome])
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20)
predictor_var = train.drop('Sold', axis=1).columns.values
classification_model(model,train,predictor_var,outcome_var)
#Create a series with feature importances:
featimp = pd.Series(model.feature_importances_, index=predictor_var).sort_values(ascending=False)
print(featimp)
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20, max_depth=20, oob_score = True)
predictor_var = ['fet1','fet2','fet3','fet4']
classification_model(model,train,predictor_var,outcome_var)
In Random Forest it is very easy to overfit. To resolve this you need to do parameter search a little more rigorously to know the best parameter to use. [Here](http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html
) is the link on how to do this: (from the scikit doc).
It is overfitting and you need to search for the best parameter that will work work on the model. The link provides implementation for Grid and Randomized search for hyper parameter estimation.
And it will also be fun to go through this MIT Artificial Intelligence lecture to get get deep theoretical orientation: https://www.youtube.com/watch?v=UHBmv7qCey4&t=318s.
Hope this helps!

Resources