Handwriting Recognition of Floats using Convolutional Neural Network - decimal

I'm looking to recognise a set of handwritten numbers, starting from 0 and increasing by 0.5 for every number i.e. 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5 ... 10. Tried searching online but to no avail; MNIST does not seem fully useful either given it only deals with whole numbers.
Right now the plan is to augment MNIST's dataset with a few thousand images of the n.5 numbers with a few people's handwriting in various styles. The trained model would be used to recognise the same numbers written by other people. My concern is accuracy could be low given the uniqueness of every person's handwriting. Are there any alternatives which is more efficient or possibly produce a high accuracy rate? Thanks a bunch.

Related

Training/Predicting with CNN / ResNet on all classes each iteration - concatenation of input data + Hungarian algorithm

So I've got a simple pytorch example of how to train a ResNet CNN to learn MNIST labeling from this link:
https://zablo.net/blog/post/using-resnet-for-mnist-in-pytorch-tutorial/index.html
It's working great, but I want to hack it a bit so that it does 2 things. First, instead of predicting digits, it predicts animal shapes/colors for a project I'm working on. That's already working quite well already and am happy with it.
Second, I'd like to hack the training (and possibly layers) so that predictions is done in parallel on multiple images at a time. In the MNIST example, basically prediction (or output) would be done for an image that has 10 digits at a time concatenated by me. For clarity, each 10-image input will have the digits 0-9 appearing only once each. The key here is that each of the 10 digit gets a unique class/label from the CNN/ResNet and each class gets assigned exactly once. And that digits that have high confidence will prevent other digits with lower confidence from using that label (a Hungarian algorithm type of approach).
So in my use case I want to train on concatenated images (not single images) as in Fig A below and force the classifier to learn to predict the best unique label for each of the concatenated images and do this all at once. Such an approach should outperform single image classification - and it's particularly useful for my animal classification because otherwise the CNN can sometimes return the same ID for multiple animals which is impossible in my application.
I can already predict in series as in Fig B below. And indeed looking at the confidence of each prediction I am able to implement a Hungarian-algorithm like approach post-prediction to assign the best (most confident) unique IDs in each batch of 4 animals. But this doesn't always work and I'm wondering if ResNet can try and learn the greedy Hungarian assignment as well.
In particular, it's not clear that implementing A simply requires augmenting the data input and labels in the training set will do it automatically - because I don't know how to penalize or dissalow returning the same label twice for each group of images. So for now I can generate these training datasets like this:
print (train_loader.dataset.data.shape)
print (train_loader.dataset.targets.shape)
torch.Size([60000, 28, 28])
torch.Size([60000])
And I guess I would want the targets to be [60000, 10]. And each input image would be [1, 28, 28, 10]? But I'm not sure what the correct approach would be.
Any advice or available links?
I think this is a specific type of training, but I forgot the name.

How to choose the right neural network in a binary classification problem for a unbalanced data?

I am using keras sequential model for binary classification. But My data is unbalanced. I have 2 features column and 1 output column(1/0). I have 10000 of data. Among that only 20 results in output 1, all others are 0. Then i have extended the data size to 40000. Now also only 20 results in output 1, all others are 0. Since the data is unbalanced(0 dominates 1), which neural network will be better for correct prediction?
First of all, two features is a really small amount. Neural Networks are highly non-linear models with a really really high amount of freedom degrees, thus if you try to train a network with more than just a couple of networks it will overfit even with balanced classes. You can find more suitable models for a small dimensionality like Support Vector Machines in scikit-learn library.
Now about unbalanced data, the most common techniques are Undersampling and Oversampling. Undersampling is basically training your model several times with a fraction of the dataset, that contains the non dominant class and a random sample of the dominant so that the ratio is acceptable, where as oversampling consist on generating artificial data to balance the classes. In most cases undersampling works better.
Also when working with unbalanced data it's quite important to choose the right metric based on what is more important for the problem (is minimizing false positives more important than false negatives, etc).

Neural network cost function implementation

I am implementing neural network to train hand written digits in python. Following is the cost function,
In log(1-(h(x)), if h(x) is 1, then it would result in log(1-1), i.e. log(0). So I'm getting math error.
Im initializing the weights randomly between 10-60. I'm not sure where I have to change and what I have to change!
In this formula, h(x) is usually a sigmoid: h(x)=sigmoid(x), so it's never exactly 1.0, unless the activations in the network are too large (which is bad and will cause problems anyway). The same problem is possible with log(h(x)) when h(x)=0, i.e., when x is a large negative number.
If you don't want to worry about numerical issues, simply add a small number before computing the log: log(h(x) + 1e-10).
Other issues:
Weight initialization in a range [10, 60] doesn't look right, they should better be small random numbers, e.g., from [-0.01, 0.01].
The formula above is computing binary cross-entropy loss. If you're working with MNIST, it has 10 classes, so the loss must be multi-class cross-entropy. See this question for details.

Quantifying Text Keywords for Neural Network Analysis

I am working on a small research project. I am looking to write a program that
a) Takes a large number of short texts (~100 words / several thousand texts)
b) Identify keywords in the texts
c) Presents all of them to a group of users who indicate if they found them interesting or not
d) Have the software learn what keywords or combinations are likely to be preferable. Let's assume that the target group is uniform for this example.
Now, there are two main challenges. The first one I have an answer to, the second one I am looking for help with.
1) Keyword identification.
Reverse frequency analysis seems to be the way to go here. Identify those words that occur proportionally often in a given text when compared to all others. This has some drawbacks though as for example very common keywords may be overlooked.
2) How to prepare the data-set to be numeric. I could map keywords to input neurons and then adjust the value based on their relative frequency, but that limits the model and makes it hard to add new keywords. It also quickly becomes competitively expensive if we want to scale beyond a few dozen keywords.
How would this problem commonly be addressed?
This is a way to start with:
clean your input text (remove special tokens etc)
use n-grams as features (can just start with 1-gram).
treat user's feedback "preferrable or not" as a binary label.
learn a binary classifier (whatever model is fine, naive bayesian, logistic regression).
1) Keyword identification. Reverse frequency analysis seems to be the way to go here. Identify those words that occur proportionally often in a given text when compared to all others. This has some drawbacks though as for example very common keywords may be overlooked.
You can skip this part in the first model you built. Treat the sentence as bag of words(n-grams) to simplify the first working model. If you want, you can add this as feature weight later.
2) How to prepare the data-set to be numeric. I could map keywords to input neurons and then adjust the value based on their relative frequency, but that limits the model and makes it hard to add new keywords. It also quickly becomes competitively expensive if we want to scale beyond a few dozen keywords
You can just use a dictionary mapping n-grams to integer ids. For each training example, the feature would be sparse hence you have training examples like below:
34, 68, 79293, 23232 -> 0 (negative label)
340, 608, 3, 232 -> 1 (positive label)
Imagine you have a dictionary (or vocabulary) mapping:
3: foo
34: movie
68: in-stock
232: bar
340: barz
...
TO use neural networks, you will need to have an embedding layer to turn sparse features into dense features by aggregating (for instance, averaging) the embedding vectors of all features.
Use the same example as above, suppose we just use 4-dimensional embedding:
34 -> [0.1, 0.2, -0.3, 0]
68 -> [0, 0.1, -0.1, 0.2]
79293 -> [0.3, 0.0, 0.12, 0]
23232 -> [0.4, 0.0, 0.0, 0]
------------------------------- sum
sum -> [0.8, 0.3, -0.28, 0.2]
------------------------------- L1-normalize
l1 -> [0.8, 0.3, -0.28, 0.2] ./ (0.8 + 0.3 + 0.28 + 0.2)
-> [0.51,0.19,-0.18,0.13]
At prediction time, you will need to use the dictionary and the same way of feature extraction (cleanup/n-gram generation/mapping n-gram to ids) so that your model understands the input.
You can simply use sklearn to learn a TFIDF bag of words model of your texts which returns a sparse matrix n_samplesxn_features like this:
from sklearn.feature_extraction.text import TfidfTransformer
vectorizer = TfidfTransformer(smooth_idf=False)
X_train = vectorizer.fit_transform(list_of_texts)
print(X_train.shape)
X_train is a scipy csr sparse matrix. If your NN implementation doesn't support sparse matrices you can convert it to a numpy dense matrix but it might fill your RAM; better to use an implementation that supports sparseinput (e.g. I know Lasagne/Theano does that).
After training, you can use the parameters of the NN to find out which features have a high/low weight and so are more/less important for the particular label.

SVM integer features

I'm using the SVM classifier in the machine learning scikit-learn package for python.
My features are integers. When I call the fit function, I get the user warning "Scaler assumes floating point values as input, got int32", the SVM returns its prediction, I calculate the confusion matrix (I have 2 classes) and the prediction accuracy.
I've tried to avoid the user warning, so I saved the features as floats. Indeed, the warning disappeared, but I got a completely different confusion matrix and prediction accuracy (surprisingly much less accurate)
Does someone know why it happens? What is preferable, should I send the features as float or integers?
Thanks!
You should convert them as floats but the way to do it depends on what the integer features actually represent.
What is the meaning of your integers? Are they category membership indicators (for instance: 1 == sport, 2 == business, 3 == media, 4 == people...) or numerical measures with an order relationship (3 is larger than 2 that is in turn is larger than 1). You cannot say that "people" is larger than "media" for instance. It is meaningless and would confuse the machine learning algorithm to give it this assumption.
Categorical features should hence be transformed to explode each feature as several boolean features (with value 0.0 or 1.0) for each possible category. Have a look at the DictVectorizer class in scikit-learn to better understand what I mean by categorical features.
If there are numerical values just convert them as floats and maybe use the Scaler to have them loosely in the range [-1, 1]. If they span several order of magnitudes (e.g. counts of word occurrences) then taking the logarithm of the counts might yield better results. More documentation on feature preprocessing and examples in this section of the documentation: http://scikit-learn.org/stable/modules/preprocessing.html
Edit: also read this guide that has many more details for features representation and preprocessing: http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

Resources