How can I add the decode_batch_predictions() method into the Keras Captcha OCR model? - keras

The current Keras Captcha OCR model returns a CTC encoded output, which requires decoding after inference.
To decode this, one needs to run a decoding utility function after inference as a separate step.
preds = prediction_model.predict(batch_images)
pred_texts = decode_batch_predictions(preds)
The decoded utility function uses keras.backend.ctc_decode, which in turn uses either a greedy or beam search decoder.
# A utility function to decode the output of the network
def decode_batch_predictions(pred):
input_len = np.ones(pred.shape[0]) * pred.shape[1]
# Use greedy search. For complex tasks, you can use beam search
results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][
:, :max_length
]
# Iterate over the results and get back the text
output_text = []
for res in results:
res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
output_text.append(res)
return output_text
I would like to train a Captcha OCR model using Keras that returns the CTC decoded as an output, without requiring an additional decoding step after inference.
How would I achieve this?

The most robust way to achieve this is by adding a method which is called as part of the model definition:
def CTCDecoder():
def decoder(y_pred):
input_shape = tf.keras.backend.shape(y_pred)
input_length = tf.ones(shape=input_shape[0]) * tf.keras.backend.cast(
input_shape[1], 'float32')
unpadded = tf.keras.backend.ctc_decode(y_pred, input_length)[0][0]
unpadded_shape = tf.keras.backend.shape(unpadded)
padded = tf.pad(unpadded,
paddings=[[0, 0], [0, input_shape[1] - unpadded_shape[1]]],
constant_values=-1)
return padded
return tf.keras.layers.Lambda(decoder, name='decode')
Then defining the model as follows:
prediction_model = keras.models.Model(inputs=inputs, outputs=CTCDecoder()(model.output))
Credit goes to tulasiram58827.
This implementation supports exporting to TFLite, but only float32. Quantized (int8) TFLite export is still throwing an error, and is an open ticket with TF team.

Your question can be interpreted in two ways. One is: I want a neural network that solves a problem where the CTC decoding step is already inside what the network learned. The other one is that you want to have a Model class that does this CTC decoding inside of it, without using an external, functional function.
I don't know the answer to the first question. And I cannot even tell if it's feasible or not. In any case, sounds like a difficult theoretical problem and if you don't have luck here, you might want to try posting it in datascience.stackexchange.com, which is a more theory-oriented community.
Now, if what you are trying to solve is the second, engineering version of the problem, that's something I can help you with. The solution for that problem is the following:
You need to subclass keras.models.Model with a class with the method you want. I went over the tutorial in the link you posted and came with the following class:
class ModifiedModel(keras.models.Model):
# A utility function to decode the output of the network
def decode_batch_predictions(self, pred):
input_len = np.ones(pred.shape[0]) * pred.shape[1]
# Use greedy search. For complex tasks, you can use beam search
results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][
:, :max_length
]
# Iterate over the results and get back the text
output_text = []
for res in results:
res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
output_text.append(res)
return output_text
def predict_texts(self, batch_images):
preds = self.predict(batch_images)
return self.decode_batch_predictions(preds)
You can give it the name you want, it's just for illustration purposes.
With this class defined, you would replace the line
# Get the prediction model by extracting layers till the output layer
prediction_model = keras.models.Model(
model.get_layer(name="image").input, model.get_layer(name="dense2").output
)
with
prediction_model = ModifiedModel(
model.get_layer(name="image").input, model.get_layer(name="dense2").output
)
And then you can replace the lines
preds = prediction_model.predict(batch_images)
pred_texts = decode_batch_predictions(preds)
with
pred_texts = prediction_model.predict_texts(batch_images)

Related

Keras Realtime Augmentation adding SaltandPepper and Gaussian Noise

I am having trouble with modifying Keras' ImageDataGenerator in a custom way such that I can perform say, SaltAndPepper Noise and Gaussian Blur (which they do not offer). I know this type of question has been asked many times before, and I have read almost every link possible below:
But due to my inability to understand the full source code or the lack thereof of python knowledge; I am struggling to implement these two additional types of augmentation in ImageDataGenerator as a custom one. I very much wish someone could point me in the right direction on how to modify the source code, or any other way.
Use a generator for Keras model.fit_generator
Custom Keras Data Generator with yield
Keras Realtime Augmentation adding Noise and Contrast
Data Augmentation Image Data Generator Keras Semantic Segmentation
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
https://github.com/keras-team/keras/issues/3338
https://towardsdatascience.com/image-augmentation-14a0aafd0498
https://towardsdatascience.com/image-augmentation-for-deep-learning-using-keras-and-histogram-equalization-9329f6ae5085
An example of SaltAndPepper noise is as follows and I wish to add more types of augmentations into ImageDataGenerator:
class SaltAndPepperNoise:
def __init__(self, replace_probs=0.1, pepper=0, salt=255, noise_type="RGB"):
"""
It is important to know that the replace_probs here is the
Probability of replacing a "pixel" to salt and pepper noise.
"""
self.replace_probs = replace_probs
self.pepper = pepper
self.salt = salt
self.noise_type = noise_type
def get_aug(self, img, bboxes):
if self.noise_type == "SnP":
random_matrix = np.random.rand(img.shape[0], img.shape[1])
img[random_matrix >= (1 - self.replace_probs)] = self.salt
img[random_matrix <= self.replace_probs] = self.pepper
elif self.noise_type == "RGB":
random_matrix = np.random.rand(img.shape[0], img.shape[1], img.shape[2])
img[random_matrix >= (1 - self.replace_probs)] = self.salt
img[random_matrix <= self.replace_probs] = self.pepper
return img, bboxes
I want to do a similar thing in my code. I am reading the documentation here. See the parameter preprocessing_function. You can implement a function and then you can pass it to this parameter to ImageDataGenerator.
I edit my answer to show you a practical example:
def my_func(img):
return img/255
train_datagen = ImageDataGenerator(preprocessing_function =my_func)
Here I just implement a short function that rescales your data, but you can implement noises and so on.

custom loss function in Keras combining multiple outputs

I did a lot of searching and am still unable to figure out writing a custom loss function with multiple outputs where they interact.
I have a Neural Network defined as :
def NeuralNetwork():
inLayer = Input((2,));
layers = [Dense(numNeuronsPerLayer,activation = 'relu')(inLayer)];
for i in range(10):
hiddenLyr = Dense(5,activation = 'tanh',name = "layer"+ str(i+1))(layers[i]);
layers.append(hiddenLyr);
out_u = Dense(1,activation = 'linear',name = "out_u")(layers[i]);
out_k = Dense(1,activation = 'linear',name = "out_k")(layers[i]);
outLayer = Concatenate(axis=-1)([out_u,out_k]);
model = Model(inputs = [inLayer], outputs = outLayer);
return model
I am now trying to define a custom loss function as follows :
def computeLoss(true,prediction):
u_pred = prediction[:,0];
k_pred = prediction[:,1];
loss = f(u_pred)*k_pred;
return loss;
Where f(u_pred) is some manipulation of u_pred. The code seems to work correct and produce correct results when I use only u_pred (i.e., single output from the neural network only). However, the moment I try to include another output for k_pred and perform the slice of my prediction tensor in the loss function, I start getting wrong results. I feel I am doing something wrong in handling multiple outputs in Keras but am not sure where my mistake lies. Any help on how I may proceed is welcome.
I figured out that you can't just use indexing ( i.e., [:,0] or [:,1] ) to slice tensors in tf. The operation doesn't seem to work. Instead, use the built in function in tensorflow as
detailed in https://www.tensorflow.org/api_docs/python/tf/split?version=stable
So the code that worked was:
(u_pred, k_pred) = tf.split(prediction, num_or_size_splits=2, axis=1)

Using Keras like TensorFlow for gpu computing

I would like to know if Keras can be used as an interface to TensoFlow for only doing computation on my GPU.
I tested TF directly on my GPU. But for ML purposes, I started using Keras, including the backend. I would find it 'comfortable' to do all my stuff in Keras instead of Using two tools.
This is also a matter of curiosity.
I found some examples like this one:
http://christopher5106.github.io/deep/learning/2018/10/28/understand-batch-matrix-multiplication.html
However this example does not actually do the calculation.
It also does not get input data.
I duplicate the snippet here:
'''
from keras import backend as K
a = K.ones((3,4))
b = K.ones((4,5))
c = K.dot(a, b)
print(c.shape)
'''
I would simply like to know if I can get the result numbers from this snippet above, and how?
Thanks,
Michel
Keras doesn't have an eager mode like Tensorflow, and it depends on models or functions with "placeholders" to receive and output data.
So, it's a little more complicated than Tensorflow to do basic calculations like this.
So, the most user friendly solution would be creating a dummy model with one Lambda layer. (And be careful with the first dimension that Keras will insist to understand as a batch dimension and require that input and output have the same batch size)
def your_function_here(inputs):
#if you have more than one tensor for the inputs, it's a list:
input1, input2, input3 = inputs
#if you don't have a batch, you should probably have a first dimension = 1 and get
input1 = input1[0]
#do your calculations here
#if you used the batch_size=1 workaround as above, add this dimension again:
output = K.expand_dims(output,0)
return output
Create your model:
inputs = Input(input_shape)
#maybe inputs2 ....
outputs = Lambda(your_function_here)(list_of_inputs)
#maybe outputs2
model = Model(inputs, outputs)
And use it to predict the result:
print(model.predict(input_data))

Feeding list of tf.truncated_normal() or list of dictionaries into a Tensorflow Model

I am new to tensorflow and I am trying to learn how to use the tool efficiently.
I expand on the question below but here is the tldr:
I am wondering what is the best way to feed the following weights and biases into my model with feed_dict:
def generate_initial_population(my_population_size):
my_weights = []
my_biases = []
for _ in range(my_population_size):
my_weights.append({
'h1': tf.Variable(tf.truncated_normal([n_inputs, n_hidden_1])),
'h2': tf.Variable(tf.truncated_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.truncated_normal([n_hidden_2, n_class]))
})
my_biases.append({
'b1': tf.Variable(tf.truncated_normal([n_hidden_1])),
'b2': tf.Variable(tf.truncated_normal([n_hidden_2])),
'out': tf.Variable(tf.truncated_normal([n_class]))
})
return my_weights, my_biases
weights, biases = generate_initial_population(population_size)
I cannot simply use feed_dict={weights_ph: weights} because it will generate errors. I do not know how to deal with this problem efficiently
Examining the code at the end might help with understanding what i am talking about.
I am wondering if there is any way I could feed a list containing tf.truncated_normals to my model.
I get the ValueError: setting an array element with a sequence. error because I believe it is trying to convert to np.array but has issues with the dimensions
I have found an easy workout where i figure out the values of all the tensors first with session run and then feed that into my model.
I am just confused if this is the right solution since I would be inclined to believe it is slower because you have to execute session twice?
This solution also doesnt work however if my original list is not perfect shape
like [ [1, [1,2]]] or when my truncated_normals do not have the same shapes
I was thinking im just going to feed my weird shape list into my model and then use tf.gather to get the specific indexes I want to work on.
Since i cannot do that is my solution the proper way to deal with this... simply calculating the truncated_normals first and then feeding that into the model. then reshaping the list while inside the model if you need to?
I also am having a very similar problems because I wanted to feed in a list of dictionaries into the model as well. Is the proper way of dealing with that to extract the data from dictionaries and then just feed in each value from each key separately.
I am trying to learn and i couldnt find this information elsewhere
Here is a code snippet i designed to fail to explain what i mean
import tensorflow as tf
list_ph = tf.placeholder(dtype=tf.float32)
index_ph = tf.placeholder(dtype=tf.int32)
def model(my_list, index):
value = tf.gather(my_list, index, axis=0)
return value
my_model = model(list_ph, index_ph)
with tf.Session() as sess:
var_list = []
truncated_normal = tf.Variable(tf.truncated_normal(shape=[5, 3]))
for i in range(4):
var_list.append(truncated_normal)
# for i in range(4):
# var_list.append({i: i*2})
sess.run(tf.global_variables_initializer())
#will work but will not work for dictionaries
val = sess.run(var_list)
# will not work, but will work if you feed val
var = sess.run(my_model, feed_dict={list_ph: var_list, index_ph: 1})

Seq2seq for non-sentence, float data; stuck configuring the decoder

I am trying to apply sequence-to-sequence modelling to EEG data. The encoding works just fine, but getting the decoding to work is proving problematic. The input-data has the shape None-by-3000-by-31, where the second dimension is the sequence-length.
The encoder looks like this:
initial_state = lstm_sequence_encoder.zero_state(batchsize, dtype=self.model_precision)
encoder_output, state = dynamic_rnn(
cell=LSTMCell(32),
inputs=lstm_input, # shape=(None,3000,32)
initial_state=initial_state, # zeroes
dtype=lstm_input.dtype # tf.float32
)
I use the final state of the RNN as the initial state of the decoder. For training, I use the TrainingHelper:
training_helper = TrainingHelper(target_input, [self.sequence_length])
training_decoder = BasicDecoder(
cell=lstm_sequence_decoder,
helper=training_helper,
initial_state=thought_vector
)
output, _, _ = dynamic_decode(
decoder=training_decoder,
maximum_iterations=3000
)
My troubles start when I try to implement inference. Since I am using non-sentence data, I do not need to tokenize or embed, because the data is essentially embedded already. The InferenceHelper class seemed the best way to achieve my goal. So this is what I use. I'll give my code then explain my problem.
def _sample_fn(decoder_outputs):
return decoder_outputs
def _end_fn(_):
return tf.tile([False], [self.lstm_layersize]) # Batch-size is sequence-length because of time major
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[32],
sample_dtype=target_input.dtype,
start_inputs=tf.zeros(batchsize_placeholder, 32), # the batchsize varies
end_fn=_end_fn
)
inference_decoder = BasicDecoder(
cell=lstm_sequence_decoder,
helper=inference_helper,
initial_state=thought_vector
)
output, _, _ = dynamic_decode(
decoder=inference_decoder,
maximum_iterations=3000
)
The Problem
I don't know what the shape of the inputs should be. I know the start-inputs should be zero because it is the first time-step. But this throws errors; it expects the input to be (1,32).
I also thought I should pass the output of each time-step unchanged to the next. However, this raises problems at run-time: the batch-size varies, so the shape is partial. The library throws an exception at this as it tries to convert the start_input to a tensor:
...
self._start_inputs = ops.convert_to_tensor(
start_inputs, name='start_inputs')
Any ideas?
This is a lesson in poor documentation.
I fixed my problem, but failed to address the variable batch-size problem.
The _end_fn was causing problems I was unaware of. I also managed to work out what the appropriate fields are for the InferenceHelper. I've given the fields names in case anyone needs guidance in future
def _end_fn(_):
return tf.tile([False], [batchsize])
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[lstm_number_of_units], # In my case, 32
sample_dtype=tf.float32, # Depends on the data
start_inputs=tf.zeros((batchsize, lstm_number_of_units)),
end_fn=_end_fn
)
As for the batch-size problem, there are two things I'm considering:
Changing the internal state of my model object. My TensorFlow computation graph is built inside a class. A class-field records the batch-size. Changing this during training may work. Or:
Pad the batches so that they are 200 sequences long. This will waste time.
Preferably I'd like a way to dynamically manage the batch-sizes.
EDIT: I found a way. It involves simply substituting square-brackets for parentheses:
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[self.lstm_layersize],
sample_dtype=target_input.dtype,
start_inputs=tf.zeros([batchsize, self.lstm_layersize]),
end_fn=_end_fn
)

Resources