Extract object features from Detectron2 - python-3.x

I am really new to object detection, so sorry if this question seems too obvious.
I have a trained FasterRCNN model on detectron2 that detects objects, and I am trying to extract the features of each detected object for the output of my model's prediction. The tutorials available seems to look for ROI and make new predictions, along with their boxes and features. I have the boxes from inference, I just need to extract the features of each box. I added the code I've been working with below. Thank you
# Preprocessing
images = predictor.model.preprocess_image(inputs) # don't forget to preprocess
# Run Backbone Res1-Res4
features = predictor.model.backbone(images.tensor) # set of cnn features
#get proposed boxes + rois + features + predictions
# Run RoI head for each proposal (RoI Pooling + Res5)
proposal_boxes = [x.proposal_boxes for x in proposals]
features = [features[f] for f in predictor.model.roi_heads.in_features]
proposal_rois = predictor.model.roi_heads.box_pooler(features, proposal_boxes)
box_features = predictor.model.roi_heads.box_head(proposal_rois)
predictions = predictor.model.roi_heads.box_predictor(box_features)#found here: https://detectron2.readthedocs.io/_modules/detectron2/modeling/roi_heads/roi_heads.html
pred_instances, pred_inds = predictor.model.roi_heads.box_predictor.inference(predictions, proposals)
pred_instances = predictor.model.roi_heads.forward_with_given_boxes(features, pred_instances)
# output boxes, masks, scores, etc
pred_instances = predictor.model._postprocess(pred_instances, inputs, images.image_sizes) # scale box to orig size
# features of the proposed boxes
feats = box_features[pred_inds]
proposal_boxes = proposals[0].proposal_boxes[pred_inds]

This question has already been well discussed in this issue: https://github.com/facebookresearch/detectron2/issues/5.
The documentation also explains how to achieve this.

Related

Can't get Keras Code Example #1 to work with multi-label dataset

Apologies in advance.
I am attempting to recreate this CNN (from the Keras Code Examples), with another dataset.
https://keras.io/examples/vision/image_classification_from_scratch/
The dataset I am using is one for retinal scans, and classifies images on a scale from 0-4. So, it's a multi-label image classification.
The Keras example used is binary classification (cats v dogs), though I would have hoped it wouldn't make much difference (maybe this is a big assumption on my part).
I skipped the 'image augmentation' part of the walkthrough. So, I have not created the
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
]
)
part. So, instead of:
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = data_augmentation(inputs)
# Entry block
x = layers.Rescaling(1.0 / 255)(x)
.......
at the beginning of the model, I have:
def make_model(input_shape, num_classes):
inputs = keras.Input(shape=input_shape)
# Image augmentation block
x = keras.Sequential(inputs)
# Entry block
x = layers.Rescaling(1.0 / 255)(x)
.......
However I keep getting different errors no matter how much I try to change things around, such as "TypeError: Keras symbolic inputs/outputs do not implement __len__.", or "ValueError: Exception encountered when calling layer "rescaling_3" (type Rescaling).".
What am I missing here?

how to apply model developped in fast.ai / pytorch?

I've trained a model which i'm trying to apply onto new data. I'm totally new to fast.ai
i'm creating my databunch as below (ds being the data i want to score):
bs = 64
data_lm = (TextList.from_df(df, path, cols='comment_text')
.split_by_rand_pct(0.1)
.label_for_lm()
.databunch(bs=bs))
The problem being that I cannot ommit the .split_by_rand_pct(0.1), so I cannot score the whole data
I then go and load/apply the model as below
data_clas = load_data(path, 'data_clas.pkl', bs=bs)
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.load_encoder('fine_tuned_enc')
learn.load('third');
preds, target = learn.get_preds(DatasetType.Test, ordered=True)
labels = preds.numpy()
But the problem is i'm only scoring 0.1 pct of my data as the first piece of code when I create the databunch is not correct...i'm wanting to apply the saved/loaded model onto the overall DF.
Many thanks in advance
a colleague of mine actually provided me with the solution, I'm posting it here in case it's useful to anyone.
learn.data.add_test(df['Contact_Text'])
preds,y = learn.get_preds(ds_type=DatasetType.Test)
preds

How to output the loss gradient backpropagation path through a PyTorch computational graph

I have implemented a new loss function in PyTorch.
#model_1 needs to be trained
outputs = model_1(input)
loss = myloss(outputs,labels)
#output is how much to resize an image
#label give the image file index
#Below I am explaining myloss() function
org_file_name = pic_ + str(labels[0]) + ".png"
new_image = compress(org_image,outputs)
accuracy_loss = run_pretrained_yolov3(org_image, new_image)
#next two lines are modifying the same DAG
prev_loss = torch.mean((outputs-labels)**2)
new_loss = (accuracy_loss/prev_loss.item())*prev_loss
new_loss.backward()
Can anyone plz help me suggesting how can I know regarding how the loss gradient backpropagation through the computational graph?
[i.e., Actually, inside the myloss() function, I used some other pre-trained model applied in testing mode to get the difference or final loss value.] Now I want to know whether my new_loss.grad backpropagated through model1 or first through yolov3 then through model1? pretrained yolov3 is used on testing mode only.
I have tried tensorboard, it's not providing me that option. Any suggestions will be highly helpful.

cross Validation in Sklearn using a Custom CV

I am dealing with a binary classification problem.
I have 2 lists of indexes listTrain and listTest, which are partitions of the training set (the actual test set will be used only later). I would like to use the samples associated with listTrain to estimate the parameters and the samples associated with listTest to evaluate the error in a cross validation process (hold out set approach).
However, I am not be able to find the correct way to pass this to the sklearn GridSearchCV.
The documentation says that I should create "An iterable yielding (train, test) splits as arrays of indices". However, I do not know how to create this.
grid_search = GridSearchCV(estimator = model, param_grid = param_grid,cv = custom_cv, n_jobs = -1, verbose = 0,scoring=errorType)
So, my question is how to create custom_cv based on these indexes to be used in this method?
X and y are respectivelly the features matrix and y is the vector of labels.
Example: Supose that I only have one hyperparameter alpha that belongs to the set{1,2,3}. I would like to set alpha=1, estimate the parameters of the model (for instance the coefficients os a regression) using the samples associated with listTrain and evaluate the error using the samples associated with listTest. Then I repeat the process for alpha=2 and finally for alpha=3. Then I choose the alpha that minimizes the error.
EDIT: Actual answer to question. Try passing cv command a generator of the indices:
def index_gen(listTrain, listTest):
yield listTrain, listTest
grid_search = GridSearchCV(estimator = model, param_grid =
param_grid,cv = index_gen(listTrain, listTest), n_jobs = -1,
verbose = 0,scoring=errorType)
EDIT: Before Edits:
As mentioned in the comment by desertnaut, what you are trying to do is bad ML practice, and you will end up with a biased estimate of the generalisation performance of the final model. Using the test set in the manner you're proposing will effectively leak test set information into the training stage, and give you an overestimate of the model's capability to classify unseen data. What I suggest in your case:
grid_search = GridSearchCV(estimator = model, param_grid = param_grid,cv = 5,
n_jobs = -1, verbose = 0,scoring=errorType)
grid_search.fit(x[listTrain], y[listTrain]
Now, your training set will be split into 5 (you can choose the number here) folds, trained using 4 of those folds on a specific set of hyperparameters, and tested the fold that was left out. This is repeated 5 times, till all of your training examples have been part of a left out set. This whole procedure is done for each hyperparameter setting you are testing (5x3 in this case)
grid_search.best_params_ will give you a dictionary of the parameters that performed the best over all 5 folds. These are the parameters that you use to train your final classifier, using again only the training set:
clf = LogisticRegression(**grid_search.best_params_).fit(x[listTrain],
y[listTrain])
Now, finally your classifier is tested on the test set and an unbiased estimate of the generalisation performance is given:
predictions = clf.predict(x[listTest])

How to apply random forest properly?

I am new to machine learning and python. Now I am trying to apply random forest to predict binary results of a target. In my data I have 24 predictors (1000 observations) where one of them is categorical(gender) and all the others numerical. Among numerical ones, there are two types of values which are volume of money in euros (very skewed and scaled) and numbers (number of transactions from an atm). I have transformed the big scale features and did the imputation. Last, I have checked correlation and collinearity and based on that removed some features (as a result I had 24 features.) Now when I implement RF it is always perfect in the training set while the ratios not so good according to crossvalidation. And even applying it in the test set it gives very very low recall values. How should I remedy this?
def classification_model(model, data, predictors, outcome):
# Fit the model:
model.fit(data[predictors], data[outcome])
# Make predictions on training set:
predictions = model.predict(data[predictors])
# Print accuracy
accuracy = metrics.accuracy_score(predictions, data[outcome])
print("Accuracy : %s" % "{0:.3%}".format(accuracy))
# Perform k-fold cross-validation with 5 folds
kf = KFold(data.shape[0], n_folds=5)
error = []
for train, test in kf:
# Filter training data
train_predictors = (data[predictors].iloc[train, :])
# The target we're using to train the algorithm.
train_target = data[outcome].iloc[train]
# Training the algorithm using the predictors and target.
model.fit(train_predictors, train_target)
# Record error from each cross-validation run
error.append(model.score(data[predictors].iloc[test, :], data[outcome].iloc[test]))
print("Cross-Validation Score : %s" % "{0:.3%}".format(np.mean(error)))
# Fit the model again so that it can be refered outside the function:
model.fit(data[predictors], data[outcome])
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20)
predictor_var = train.drop('Sold', axis=1).columns.values
classification_model(model,train,predictor_var,outcome_var)
#Create a series with feature importances:
featimp = pd.Series(model.feature_importances_, index=predictor_var).sort_values(ascending=False)
print(featimp)
outcome_var = 'Sold'
model = RandomForestClassifier(n_estimators=20, max_depth=20, oob_score = True)
predictor_var = ['fet1','fet2','fet3','fet4']
classification_model(model,train,predictor_var,outcome_var)
In Random Forest it is very easy to overfit. To resolve this you need to do parameter search a little more rigorously to know the best parameter to use. [Here](http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html
) is the link on how to do this: (from the scikit doc).
It is overfitting and you need to search for the best parameter that will work work on the model. The link provides implementation for Grid and Randomized search for hyper parameter estimation.
And it will also be fun to go through this MIT Artificial Intelligence lecture to get get deep theoretical orientation: https://www.youtube.com/watch?v=UHBmv7qCey4&t=318s.
Hope this helps!

Resources