In the article for creating a dataaset for the TF object detection API [link], users are asked to store an object mask as:
a repeated list of single-channel encoded PNG strings, or a single dense 3D binary tensor where masks corresponding to each object are stacked along the first dimension
Since the article strongly suggests using a repeated list of single-channel encoded PNG strings, I would particularly be interested in knowing how to encode this. My annotations are typically from csv files, which I have no problem in generating the TFRecords file. Are there any instructions somewhere on how to make this conversion?
i make it works with pet dataset , on tensorflow you have 2 way with coco dataset tf record and with pet_tfrecord.
the first took JSON file
the second take XML and PNG
there is one application VGG could make annotations in PNG or in JSON, then you use the directory tree needed, i used the pet dataset example. but finally mask is not displayed, even with the example dataset...
Rather than the array of png, I ended up using a dense tensor, where each pixel value represents a class.
Note,, I’ve seen many other people who didn’t use sequential numbers and ended up having problems later. The idea makes sense, if I have 2 classes, label 1 as 0 and the other as 255. The rationale is that when you see this in grayscale it is obvious what gets labeled 1 or 255. This is impossible when you use 0,1,2,... However, this violates a lot of assumptions in downstream code (e.g. deeplab)
Related
I have 5 folders (which represent 5 classes, and each contain about 200 colored images), I want to use "Principal Component Analysis" for image classification.
previously I used Resnet to predict to which class each image belong. but now I want to use the PCA.
I am trying to apply that with code, any help please?
previously I used Resnet to predict to which class each image belong. but now I want to use the PCA.
PCA is not a method for classification. It is a dimensional reduction method that is sometimes used as a processing step.
Take a look at this CrossValidated post for some more explanation. It has an example.
(FYI, saw this because you pinged me via the MATLAB Answers forum.)
I am using COBRE brain MRI dataset containing Nifti files. I can visualize them but could not understand how to use them in deep learning with the correct format. I read Nilearn documentation but they have used only one example of .nii file for 1 subject. The question is how to give 100 .nii files to a CNN?
The second thing is how to determine which slice of the file should be used? Should it be the middle of them? Nifti file consists of 150 slices for each subject's brain.
The third thing is how to provide the model with labels? The dataset doesn't contain any mask. How to give the model specific label for a specific file? Should I create a csv file with path of .nii files and their associated label?
Please explain me or suggest me some resources for the same.
hi i recently got into processing of nii files for one of my projects. i could get a break though to some level (preprocessing) not yet to model level.
for your second question, usually an expert visualise the niis and provide the location(s) of the roi(region of interest)
I am currently in the process of parsing the nii into csv format with labels. so the answer to your third question is , we lable the coordinates (x,y,z,c,t) as per the roi locations . (i may need to correct this understanding as i advance on need basis but for now this is the approach to feed the dataset to model i am goin to follow)
I have a set of images like this
And I'm trying to train TensoFlow on python to read the numbers on the images.
I'm new to machine learn and on my research I found a solution to a similar problem that uses CTC to train/predict variable length data on an image.
I'm trying to figure out if I should use CTC or find a way to create a new image for every number of the image that I already have.
Like if the number of my image is 213, then I create 3 new images to train the model with the respective numbers 2, 1, 3 also using them as labels. I'm looking for tutorials or even TensorFlow documentation that can help me on that.
in the case of text CTC absolutely makes sense: you don't want to split a text (like "213") into "2", "1", "3" manually, because it often is difficult to segment the text into individual characters.
CTC, on the other hand, just needs images and the corresponding ground-truth texts as input for training. You don't have to manually take care of things like alignment of chars, width of chars, number of chars. CTC handles that for you.
I don't want to repeat myself here, so I just point you to the tutorials I've written about text recognition and to the source code:
Build a Handwritten Text Recognition System using TensorFlow
SimpleHTR: a TensorFlow model for text-recognition
You can use the SimpleHTR model as a starting point. To get good results, you will have to generate training data (e.g. write a rendering tool which renders realsitic looking examples) and train the model from scratch with that data (more details on training can be found in the README).
I know that it might be a dumb question, but I searched everywhere for an answer but I could not get.
Okay first properly explaining my question,
When I was learning CNN I was told that kernels or filters or activation map represent a feature of image.
To be specific, assume a cat image identification, a feature map would represent a "whiskers"
and in images which the activation of this feature map would be high it is inferred as whisker is present in image and so the image is a cat. (Correct me if I am wrong)
Well now when I made a Keras ConvNet I save the model
and then loaded the model and
saved all the filters to png images.
What I saw was 3x3 px images where each each pixel was of different colour (green, blue or their various variants and so on)
So how these 3x3px random colour pattern images of kernels represent in any way the "whisker" or any other feature of cat?
Or how could I know which png images is which feature ie which is whisker detector filter etc?
I am asking this because I might be asked in oral examination by teacher.
Sorry for the length of answer (but I had to make it so to explain properly)
You need to have a further look into how convolutional neural networks operate: the main topic being the convolution itself. The convolution occurs with the input image and filters/kernels to produce feature maps. A feature map is what may highlight important features.
The filters/kernels do not know anything of the input data so when you save these you are only going to see psuedo-random images.
Put simply, where * is the convolution operator,
input_image * filter = feature map
What you want to save, if you want to vizualise what is occuring during convolution, are the feature maps. This website gives a very detailed account on how to do so, and it is the method I have used in the past.
I've noticed that for any tutorial or example of a Keras CNN that I've seen, the input images are numbered, e.g.:
dog0001.jpg
dog0002.jpg
dog0003.jpg
...
Is this necessary?
I'm working with an image dataset with fairly random filenames (the classes come from the directory name), e.g.:
picture_A2.jpg
image41110.jpg
cellofinterest9A.jpg
I actually want to keep the filenames because they mean something to me, but do I need to append sequential numbers to my image files?
No they can be of different names, it really depends on how you load your data. In your case, you can use flow_from_directory to generate the training data and indeed the directory will be the associated class, this is part of ImageDataGenerator.