I have made a convolutional neural network to mnist data. Now I want to change the input to my image. How can I do it? need to save the picture in a specific format?In addition, how save all picture and train one after the other?I use in tensorflow with python.
Tensorflow has support for bmp, gif, jpeg and png out of the box.
So load the data (read the file into memory as a 0D tensor of type string) then pass it to tf.image.decode_image or one of the specialized functions if it doesn't work for some reason.
You should get back the image as a tensor of shape [width, height, channels] (channels might be missing if you only have a single channel image, like grayscale).
To make this work nice you should have all the images in the same format. If you can load all the images into ram and pass them in bulk go for it since it's probably the easiest thing to do. Next easiest thing would be to copy the images into tensorflow.Example and to tf.TFRecordReader to do the shuffling and batching. If all else fails I think you can setup the input functions to read the images on demand and pipe them through the batching mechanism but I'm not sure how I would do that.
Here's a link to the tensorflow documentation related to images.
Related
I'm currently trying to train a network with dozens of 3D images. But, these images can't fit in the memory at once that's why I chose to use a pipeline (tf.keras.sequence). This pipeline helps me load he images sequentially. What i want to do next is to patchify these images into patches of size 128x128x128, and I want to do it on the fly, meaning that i want to load an image, divide it to patches and then feed it to the model.
Is there a way in Keras to do this?
PS: I don't want to write the patches to the disk, I wan't to do the patchifying on the fly.
When saving a tensorflow-lite model, does extra information get saved as well as the model architecture? For example, image size, range of pixel values, etc.
Suppose that my model was trained on normalized images that had an RGB spectrum and were 100 *100 pixels in size.
Do I need to explicitly preprocess the images before performing inference in Android Studio / Flutter? Or does the tflite plugin automatically do this?
Thanks :-)
Yes, you need to preprocess the images in your app side to the same format as the images used for training, before passing them to the TFLite model.
An example of this preprocessing step is explained here:
https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/android/EXPLORE_THE_CODE.md#pre-process-bitmap-image
Can anyone tell me in which situations the above functions are used and how they affect the image size?
I want to resize the Cat V Dogs images and i am a bit confuse about how to use them.
There are lots of details in TorchVision documentation actually.
The typical use case is for object detection or image segmentation tasks, but other uses could exist.
Here is a non-exhaustive list of uses:
Resize is used in Convolutional Neural Networks to adapt the input image to the network input shape, in this case this is not data-augmentation but just pre-processing. It can also be used in Fully Convolutional Networks to emulate different scales for an input image, this is data-augmentation.
CenterCrop RandomCrop and RandomResizedCrop are used in segmentation tasks to train a network on fine details without impeding too much burden during training. For with a database of 2048x2048 images you can train on 512x512 sub-images and then at test time infer on full resolution images. It is also used in object detection networks as data-augmentation. The resized variant lets you combine the previous resize operation.
All of them potentially change the image resolution.
I am looking for some advice on how to apply a pytorch CNN to a video as opposed to an image.
Picture a drone flying over an area and using video to capture some objects below. I have a CNN trained on images of objects, and want to count the objects in the video.
Currently my strategy has been to convert the video to frames as PNGs and running the CNN on those PNGs. this seems inefficient, and I am struggling with how to count the objects without duplicating (frame 1 and frame 1+n will overlap).
It would be appreciated if someone had some advice, or a suggested tutorial/code set that did this. Thanks in advance.
PyTorch at the moment doesn't have support to detect and track objects in a video.
You would need to create your own logic for that.
The support is limited to read the video and audio from a file, read frames and timestamps, and write the video read more in here.
What you will basically need to do is to create an object tracking, frame by frame together by keeping their with their square positions and based on that decide if the same object or not.
If you have a drone flying and inspecting people you may check Kinetics to detect human actions:
ResNet 3D 18
ResNet MC 18
ResNet (2+1)D
All based on Kinetics-400
But the newer one is Kinetics-700.
try using torchvision and torch to recognize objects in a youtube video
https://dida.do/blog/how-to-recognise-objects-in-videos-with-pytorch
In the article for creating a dataaset for the TF object detection API [link], users are asked to store an object mask as:
a repeated list of single-channel encoded PNG strings, or a single dense 3D binary tensor where masks corresponding to each object are stacked along the first dimension
Since the article strongly suggests using a repeated list of single-channel encoded PNG strings, I would particularly be interested in knowing how to encode this. My annotations are typically from csv files, which I have no problem in generating the TFRecords file. Are there any instructions somewhere on how to make this conversion?
i make it works with pet dataset , on tensorflow you have 2 way with coco dataset tf record and with pet_tfrecord.
the first took JSON file
the second take XML and PNG
there is one application VGG could make annotations in PNG or in JSON, then you use the directory tree needed, i used the pet dataset example. but finally mask is not displayed, even with the example dataset...
Rather than the array of png, I ended up using a dense tensor, where each pixel value represents a class.
Note,, I’ve seen many other people who didn’t use sequential numbers and ended up having problems later. The idea makes sense, if I have 2 classes, label 1 as 0 and the other as 255. The rationale is that when you see this in grayscale it is obvious what gets labeled 1 or 255. This is impossible when you use 0,1,2,... However, this violates a lot of assumptions in downstream code (e.g. deeplab)