How can I train TensorFlow to read variable length numbers on an image? - python-3.x

I have a set of images like this
And I'm trying to train TensoFlow on python to read the numbers on the images.
I'm new to machine learn and on my research I found a solution to a similar problem that uses CTC to train/predict variable length data on an image.
I'm trying to figure out if I should use CTC or find a way to create a new image for every number of the image that I already have.
Like if the number of my image is 213, then I create 3 new images to train the model with the respective numbers 2, 1, 3 also using them as labels. I'm looking for tutorials or even TensorFlow documentation that can help me on that.

in the case of text CTC absolutely makes sense: you don't want to split a text (like "213") into "2", "1", "3" manually, because it often is difficult to segment the text into individual characters.
CTC, on the other hand, just needs images and the corresponding ground-truth texts as input for training. You don't have to manually take care of things like alignment of chars, width of chars, number of chars. CTC handles that for you.
I don't want to repeat myself here, so I just point you to the tutorials I've written about text recognition and to the source code:
Build a Handwritten Text Recognition System using TensorFlow
SimpleHTR: a TensorFlow model for text-recognition
You can use the SimpleHTR model as a starting point. To get good results, you will have to generate training data (e.g. write a rendering tool which renders realsitic looking examples) and train the model from scratch with that data (more details on training can be found in the README).

Related

I have 5 folders (each contain about 200 RGB images), I want to use "Principal Component Analysis" for image classification

I have 5 folders (which represent 5 classes, and each contain about 200 colored images), I want to use "Principal Component Analysis" for image classification.
previously I used Resnet to predict to which class each image belong. but now I want to use the PCA.
I am trying to apply that with code, any help please?
previously I used Resnet to predict to which class each image belong. but now I want to use the PCA.
PCA is not a method for classification. It is a dimensional reduction method that is sometimes used as a processing step.
Take a look at this CrossValidated post for some more explanation. It has an example.
(FYI, saw this because you pinged me via the MATLAB Answers forum.)

Training/Predicting with CNN / ResNet on all classes each iteration - concatenation of input data + Hungarian algorithm

So I've got a simple pytorch example of how to train a ResNet CNN to learn MNIST labeling from this link:
https://zablo.net/blog/post/using-resnet-for-mnist-in-pytorch-tutorial/index.html
It's working great, but I want to hack it a bit so that it does 2 things. First, instead of predicting digits, it predicts animal shapes/colors for a project I'm working on. That's already working quite well already and am happy with it.
Second, I'd like to hack the training (and possibly layers) so that predictions is done in parallel on multiple images at a time. In the MNIST example, basically prediction (or output) would be done for an image that has 10 digits at a time concatenated by me. For clarity, each 10-image input will have the digits 0-9 appearing only once each. The key here is that each of the 10 digit gets a unique class/label from the CNN/ResNet and each class gets assigned exactly once. And that digits that have high confidence will prevent other digits with lower confidence from using that label (a Hungarian algorithm type of approach).
So in my use case I want to train on concatenated images (not single images) as in Fig A below and force the classifier to learn to predict the best unique label for each of the concatenated images and do this all at once. Such an approach should outperform single image classification - and it's particularly useful for my animal classification because otherwise the CNN can sometimes return the same ID for multiple animals which is impossible in my application.
I can already predict in series as in Fig B below. And indeed looking at the confidence of each prediction I am able to implement a Hungarian-algorithm like approach post-prediction to assign the best (most confident) unique IDs in each batch of 4 animals. But this doesn't always work and I'm wondering if ResNet can try and learn the greedy Hungarian assignment as well.
In particular, it's not clear that implementing A simply requires augmenting the data input and labels in the training set will do it automatically - because I don't know how to penalize or dissalow returning the same label twice for each group of images. So for now I can generate these training datasets like this:
print (train_loader.dataset.data.shape)
print (train_loader.dataset.targets.shape)
torch.Size([60000, 28, 28])
torch.Size([60000])
And I guess I would want the targets to be [60000, 10]. And each input image would be [1, 28, 28, 10]? But I'm not sure what the correct approach would be.
Any advice or available links?
I think this is a specific type of training, but I forgot the name.

Is there any way to classify text based on some given keywords using python?

i been trying to learn a bit of machine learning for a project that I'm working in. At the moment I managed to classify text using SVM with sklearn and spacy having some good results, but i want to not only classify the text with svm, I also want it to be classified based on a list of keywords that I have. For example: If the sentence has the word fast or seconds I would like it to be classified as performance.
I'm really new to machine learning and I would really appreciate any advice.
I assume that you are already taking a portion of your data, classifying it manually and then using the result as your training data for the SVM algorithm.
If yes, then you could just append your list of keywords (features) and desired classifications (labels) to your training data. If you are not doing it already, I'd recommend using the SnowballStemmer on your training data features.

Do images need to be numbered for Keras CNN training/testing?

I've noticed that for any tutorial or example of a Keras CNN that I've seen, the input images are numbered, e.g.:
dog0001.jpg
dog0002.jpg
dog0003.jpg
...
Is this necessary?
I'm working with an image dataset with fairly random filenames (the classes come from the directory name), e.g.:
picture_A2.jpg
image41110.jpg
cellofinterest9A.jpg
I actually want to keep the filenames because they mean something to me, but do I need to append sequential numbers to my image files?
No they can be of different names, it really depends on how you load your data. In your case, you can use flow_from_directory to generate the training data and indeed the directory will be the associated class, this is part of ImageDataGenerator.

Making mask files for Tensorflow segmentation/object detection API

In the article for creating a dataaset for the TF object detection API [link], users are asked to store an object mask as:
a repeated list of single-channel encoded PNG strings, or a single dense 3D binary tensor where masks corresponding to each object are stacked along the first dimension
Since the article strongly suggests using a repeated list of single-channel encoded PNG strings, I would particularly be interested in knowing how to encode this. My annotations are typically from csv files, which I have no problem in generating the TFRecords file. Are there any instructions somewhere on how to make this conversion?
i make it works with pet dataset , on tensorflow you have 2 way with coco dataset tf record and with pet_tfrecord.
the first took JSON file
the second take XML and PNG
there is one application VGG could make annotations in PNG or in JSON, then you use the directory tree needed, i used the pet dataset example. but finally mask is not displayed, even with the example dataset...
Rather than the array of png, I ended up using a dense tensor, where each pixel value represents a class.
Note,, I’ve seen many other people who didn’t use sequential numbers and ended up having problems later. The idea makes sense, if I have 2 classes, label 1 as 0 and the other as 255. The rationale is that when you see this in grayscale it is obvious what gets labeled 1 or 255. This is impossible when you use 0,1,2,... However, this violates a lot of assumptions in downstream code (e.g. deeplab)

Resources