Tensorflow Object Detection API (detect small object in 720x1280 image) - object

Problem: Can't detect a small object (Flying Drone) in 720x1280 frame.
Library: TensorFlow Object Detection API
Model: SSD MobileNet V1 COCO
Dataset: 260,000 Train | 10,000 Test (both 720x1280)
The object has sizes between 25x25 to 60x60. It has only one class.
Question 1: SSD_MobileNet_V1_COCO has a parameter image_resizer is 300x300. Does that mean the 720x1280 frame is resized before inputting into convolution layers?
Question 2: Do you suggest slicing the dataset into smaller regions with the object in it on randomized location and then feeding those slices as train/test set would help detecting the small object?

Related

Multi-Class imbalance with a 3D array

I have created a dataset for human activity recognition with accelerometer data (x,y,z) and HR (bpm). I have extracted the data into 2.5 seconds (20hz , equal to 50 samples), but there is class imbalance which I would like to balance with a method such as SMOTE. The problem is that I have not found a way to do this without corrupting the samples.
The shape is: (Y, 50,4) where Y is of arbitrary length.
That means that every new sample x has to have the same (x, 50, 4) shape.
The dataset will be used in training a CNN-LSTM model.
I can't do the over-and under-sampling by using reshape(-1,4) beforehand and then back to the original shape. This will corrupt the segments of length 50.
Any idea how this can be done, preferably with implemented libraries such as scikit-learn or imbalanced-learn?
Or is the best approach to implement class_weights when trainin the model in keras?

Model unable to identify distant objects

I have made a object recognition and detection model using tensorflow. It identifies the images which are clearly visible but its unable to identify if the same object is at a large distance. I am using Faster RCNN model. the model is able to identify the same object when it is closer but not when it is at a far distance. It has been trained already for the same object. How can i make the model identify objects at a distance?
You can resize and add padding using data augmentation to images with objects that are clearly visible so that they look like they are in a big distance and train your model further with those images

mAP using Tensorflow object detection API

After I train my object detector using the Tensorflow object detection API(to detect only cars).
I get an mAP value around 0.32 while running the eval.py script.
However, in the Tensorflow Detection model Zoo page(which can be accessed at https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md), the mAP's for various models are 100 times the value I have obtained but when I look at my model's predictions on the test set, they are actually pretty good.
Should I multiply the value which I have obtained by 100 or is there something wrong with the model that I have trained?

which is the most suitable method for training among model.fit(), model.train_on_batch(), model.fit_generator()

I have a training dataset of 600 images with (512*512*1) resolution categorized into 2 classes(300 images per class). Using some augmentation techniques I have increased the dataset to 10000 images. After having following preprocessing steps
all_images=np.array(all_images)/255.0
all_images=all_images.astype('float16')
all_images=all_images.reshape(-1,512,512,1)
saved these images to H5 file.
I am using an AlexNet architecture for classification purpose with 3 convolutional, 3 overlap max-pool layers.
I want to know which of the following cases will be best for training using Google Colab where memory size is limited to 12GB.
1. model.fit(x,y,validation_split=0.2)
# For this I have to load all data into memory and then applying an AlexNet to data will simply cause Resource-Exhaust error.
2. model.train_on_batch(x,y)
# For this I have written a script which randomly loads the data batch-wise from H5 file into the memory and train on that data. I am confused by the property of train_on_batch() i.e single gradient update. Do this will affect my training procedure or will it be same as model.fit().
3. model.fit_generator()
# giving the original directory of images to its data_generator function which automatically augments the data and then train using model.fit_generator(). I haven't tried this yet.
Please guide me which will be the best among these methods in my case. I have read many answers Here, Here, and Here about model.fit(), model.train_on_batch() and model.fit_generator() but I am still confused.
model.fit - suitable if you load the data as numpy-array and train without augmentation.
model.fit_generator - if your dataset is too big to fit in the memory or\and you want to apply augmentation on the fly.
model.train_on_batch - less common, usually used when training more than one model at a time (GAN for example)

predict() returns image similarities with SVM in scikit learn

A silly question: after i train my SVM in scikit-learn i have to use predict function: predict(X) for predicting at which class belongs? (http://scikit-learn.org/dev/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.predict)
X parameter is the image feature vector?
In case i give an image not trained (not trained because SVM ask at least 3 samples for class), what returns?
First remark: "predict() returns image similarities with SVM in scikit learn" is not a question. Please put a question in the header of Stack Overflow entries.
Second remark: the predict method of the SVC class in sklearn does not return "image similarities" but a class assignment prediction. Read the http://scikit-learn.org documentation and tutorials to understand what we mean by classification and prediction in machine learning.
X parameter is the image feature vector?
No, X is not "the image" feature vector: it is a set of image feature vectors with shape (n_samples, n_features) as explained in the documentation you refer to. In your case a sample is an image hence the expected shape would be (n_images, n_features). The predict API was design to compute many predictions at once for efficiency reason. If you want to compute a single prediction, you will have to wrap your single feature vector in an array with shape (1, n_features).
For instance if you have a single feature vector (1D) called my_single_image_features with shape (n_features,) you can call predict with:
predictions = clf.predict([my_single_image_features])
my_single_prediction = predictions[0]
Please note the [] signs around the my_single_image_features variable to turn it into a 2D array.
my_single_prediction will be an integer whose meaning depends on the integer values provided by you when calling the clf.fit(X_train, y_train) method in the first place.
In case i give an image not trained (not trained because SVM ask at least 3 samples for class), what returns?
An image is not "trained". Only the model is trained. Of course you can pass samples / images that are not part of the training set to the predict method. This is the whole purpose of machine learning: making predictions on new unseen data based on what you learn from the statistical regularities seen in the past training data.

Resources