i am working on video classification problem so i converted my videos into images every video converted to 7 images and then path them to vgg16 to extract features from those images and then use LSTM to learn the sequence of images for every video
when i feed the vgg16 results to my LSTM i need to give every 7 image one label because i am dealing with sequence so i need to create this lables by my self so how can i create labels in python
by the way its binary classification problem
Related
everyone.
I am trying to create a CNN which can upon being fed input of images classify which part of the image to focus upon. For that purpose, I have collected data by obtaining gaze data of humans for a given video and divided each video frame into 9 different areas. With the actual gaze data acting as the supervisory data, I am trying to make my system learn how to mimic a human's eye gaze.
For starters, I am using a pre-built CNN for the classification of the MNIST dataset using tensorflow. I am currently trying to make my dataset follow the format of MNIST dataset keras.datasets.mnist. I have video frames in .jpg format and the corresponding grid area as a NumPy array.
I am stuck on how to correctly label and format my images so that I can directly feed the image into the pre-built CNN. System I am using tensorflow 2.7.0, python 3.9.7 using conda.
Any help is very appreciated.
I have the following problem:
Input: a set of 6 images
Output: a probability for each image determining whether the image is the correct one out of the 6 images
I know how to create a CNN with keras, but not how to have multiple images as an input.
How would one solve this problem?
One way I can think of is to use a pre-trained model (VGG16 etc.) and extract out the vectors from some intermediate layer, then concat 6 vectors together then feed it into a neural network (or some other classification model) and train it as a multiclass classification task.
You can also use an Autoencoder and take the anomaly detection approach.
I have 10 different types of images in a folder. After the prediction of Images using VGG16 Module in a folder, I got some levels for those Images. How can I match those levels to the images in my folder and how can I segregate the one type of images in one folder?
Not getting anything.
('n04536866', 'violin', 0.98542005),
('n03028079', 'church', 0.35847503),
('n02690373', 'airliner', 0.945028),
('n03642806', 'laptop', 0.52074945),
I´m getting predictions like this, now i want to match these levels with my images and filter out the one kind of images in one folder.
Please read some basics about neural networks and image classification. The result of your prediction is an n-dimensional vector, where n is the number of ground truth labels, and the components of the vector are the probability for each class. So from the example above the neural network assume, that the input image which was used for this prediction has a probability of 98,54% to show a violin.
I have a classifier VGG16 and I'm feeding it with the crops of an image obtained by dividing the image in grids (3x3). So I input in sequence 9 crops of the same size (224x224) and per crop, I collect the output of fc7, shape (4096,) and I store it.
Then I concatenate the 9 outputs in a Tensor of shape (9, 4096,) and I feed it into an LSTM that threats the 9 crops as timesteps.
This is the flow of a single image, but of course, I have multiple images. So I have to process the 9 crops per image, store all the feature maps and then create a new dataset that consists of all the concatenations.
Is there a way to make it end-to-end? So basically process the 9 crops in once and input a Tensor of size (9,4096,) directly in the second part of the network?
I'm using Keras with Tensorflow backend.
I am currently having medical images from two sources. One is having JPEG format while other is having TIF format, TIF format is lossless while JPEG is lossy so if I convert TIF to JPEG there is a chance of data loss or can I mix both together and use it for training the CNN.
Using Keras with Tensorflow backend.
Neural networks, and Machine Learning models in general, do not take specific file formats as input, but expect matrices/tensors of real numbers as input. For RGB images this means a tensor with dimensions (width, height, 3). When the image is read from a file, its transformed automatically into a tensor, so it does not matter which kind of file format you use.