Preparing image data to input into pre-built CNN - python-3.x

everyone.
I am trying to create a CNN which can upon being fed input of images classify which part of the image to focus upon. For that purpose, I have collected data by obtaining gaze data of humans for a given video and divided each video frame into 9 different areas. With the actual gaze data acting as the supervisory data, I am trying to make my system learn how to mimic a human's eye gaze.
For starters, I am using a pre-built CNN for the classification of the MNIST dataset using tensorflow. I am currently trying to make my dataset follow the format of MNIST dataset keras.datasets.mnist. I have video frames in .jpg format and the corresponding grid area as a NumPy array.
I am stuck on how to correctly label and format my images so that I can directly feed the image into the pre-built CNN. System I am using tensorflow 2.7.0, python 3.9.7 using conda.
Any help is very appreciated.

Related

Pytorch Geometric: How to create a small temporary dataset in a colab notebook from a list of pytorch geometric data objects

I am working in a Colab sheet and have generated a python list of Pytorch Geometric data objects. I now want to turn them into a dataset for use just in this notebook. How can I do this? The existing documentation seems geared towards long-term datasets.
When I worked on standard Pytorch I used a combination of torch.FloatTensor() and TensorDataset() to create my own dataset for use with random split.

how can i create labels for LSTM

i am working on video classification problem so i converted my videos into images every video converted to 7 images and then path them to vgg16 to extract features from those images and then use LSTM to learn the sequence of images for every video
when i feed the vgg16 results to my LSTM i need to give every 7 image one label because i am dealing with sequence so i need to create this lables by my self so how can i create labels in python
by the way its binary classification problem

Applying a pytorch CNN to video?

I am looking for some advice on how to apply a pytorch CNN to a video as opposed to an image.
Picture a drone flying over an area and using video to capture some objects below. I have a CNN trained on images of objects, and want to count the objects in the video.
Currently my strategy has been to convert the video to frames as PNGs and running the CNN on those PNGs. this seems inefficient, and I am struggling with how to count the objects without duplicating (frame 1 and frame 1+n will overlap).
It would be appreciated if someone had some advice, or a suggested tutorial/code set that did this. Thanks in advance.
PyTorch at the moment doesn't have support to detect and track objects in a video.
You would need to create your own logic for that.
The support is limited to read the video and audio from a file, read frames and timestamps, and write the video read more in here.
What you will basically need to do is to create an object tracking, frame by frame together by keeping their with their square positions and based on that decide if the same object or not.
If you have a drone flying and inspecting people you may check Kinetics to detect human actions:
ResNet 3D 18
ResNet MC 18
ResNet (2+1)D
All based on Kinetics-400
But the newer one is Kinetics-700.
try using torchvision and torch to recognize objects in a youtube video
https://dida.do/blog/how-to-recognise-objects-in-videos-with-pytorch

How i can to load own image to network in python?

I have made a convolutional neural network to mnist data. Now I want to change the input to my image. How can I do it? need to save the picture in a specific format?In addition, how save all picture and train one after the other?I use in tensorflow with python.
Tensorflow has support for bmp, gif, jpeg and png out of the box.
So load the data (read the file into memory as a 0D tensor of type string) then pass it to tf.image.decode_image or one of the specialized functions if it doesn't work for some reason.
You should get back the image as a tensor of shape [width, height, channels] (channels might be missing if you only have a single channel image, like grayscale).
To make this work nice you should have all the images in the same format. If you can load all the images into ram and pass them in bulk go for it since it's probably the easiest thing to do. Next easiest thing would be to copy the images into tensorflow.Example and to tf.TFRecordReader to do the shuffling and batching. If all else fails I think you can setup the input functions to read the images on demand and pipe them through the batching mechanism but I'm not sure how I would do that.
Here's a link to the tensorflow documentation related to images.

Query on Machine Learning with Scikit-Learn data uploading

I have trying to develop machine learning based image classification system using Scikit-Learn. I am trying to do is multi class classification. the biggest problem i am facing with Scikit-Learn is how to load the data. Then I came across one of the examples face_recognition.py. which using fetch_lfw_people to fetch data from internet. I could see this example actually does multi class classification. I was trying to find some documentation on the example but was unable to find. I have some question here, what does fetch_lfw_people do ? what does this function load in the lfw_people. Also what i saw in the data folder there are some text file .is the code reading the text files/? My main intention is to load my set of image data but i am unable to do it with fetch_lfw_people in case i change the path that my image folder by data_home and funneled=False.I get erros, I hope i get some answers here
First thing first. You can't directly give images as an input to your classifier. You have to extract some features from you images. Or you can load your image using opencv and use the numpy array as an input to your classifier.
I would suggest you to read some basics of image classification , like how you can train your classifier and all.
Coming to your question about fetch_lfw_people function. It might be downloading already pre-processed image data from any text file. If you are training from your images you have to first convert your image data to some numerical features.

Resources