BERT error: Cannot squeeze a dimension whose value is not 1 - pytorch

I have an mlmodel based on pytorch-pretrained-BERT, exported via ONNX to CoreML. That process was pretty smooth, so now I'm trying to do some (very) basic testing—i.e., just to make some kind of prediction, and get a rough idea of what performance problems we might encounter.
Howver, when I try to run prediction, I get the following error:
[espresso] [Espresso::handle_ex_plan] exception=Espresso exception: "Invalid state": Cannot squeeze a dimension whose
value is not 1: shape[1]=128 stat2020-02-16
11:36:05.959261-0800 Xxxxx[6725:2140794] [coreml] Error computing NN outputs -5
Is this error indicating a problem with the model itself (i.e., from model conversion), or is there something in Swift/CoreML that I'm doing wrong? My prediction function looks like this:
public func prediction(withInput input: String) -> MLMultiArray? {
var predictions: MLMultiArray? = nil
if let bert = bertMLModel {
var ids = tokenizer.tokenizeToIds(text: input, includeWordpiece: true)
print("ids: \(ids)")
while ids.count < 256 {
ids.append(0)
}
let inputMLArray = MLMultiArray.from(ids, dims: 2)
let modelInput = bert_256_FP16Input(input_ids: inputMLArray)
var modelOutput: bert_256_FP16Output? = nil
do {
modelOutput = try bert.prediction(input: modelInput)
} catch {
print("Error running prediction: \(error)")
}
if let modelOutput = modelOutput {
predictions = modelOutput.output_weights
}
}
return predictions
}
I'm not trying to do anything meaningful, at this stage, only to get it running.
I used the pytorch-pretrained-BERT repo because I was able to find a ground-up pretraining example for that. I have since noticed that HuggingFace has released a "from scratch" training option, but there are still some issues with the tutorial that are being sorted out. So I would like to at least understand what might be going wrong with my current model/approach. However, if the problem is definitely in the PyTorch->ONNX->CoreML conversion, then I don't really want to fight that fight and will just dig into what HuggingFace is offering.
Any thoughts appreciated.
UPDATE: On Matthijs' advice, I'm trying to predict from the model in python:
from coremltools.models import *
import numpy as np
# Just a "dummy" input, but it is a valid series of tokens for my data.
tokens = [3, 68, 45, 68, 45, 5, 45, 68, 45, 4]
tokens_tensor = np.zeros((1, 128))
for i in range(0, 10):
tokens_tensor[0, i] = tokens[I]
# I'm doing masked token prediction, so one segment.
segments_tensor = np.zeros((1, 128))
mlmodel = 'bert_fp16.mlmodel'
model = MLModel(mlmodel)
spec = model.get_spec()
print("spec: ", spec.description)
predictions = model.predict({'input.1': np.asarray(tokens_tensor, dtype=np.int32), 'input.3': np.asarray(segments_tensor, dtype=np.int32)})
I admit I haven't run an mlmodel from python before, but I think the inputs are correct. The spec indicates:
input {
name: "input.1"
type {
multiArrayType {
shape: 1
shape: 128
dataType: INT32
}
}
}
input {
name: "input.3"
type {
multiArrayType {
shape: 1
shape: 128
dataType: INT32
}
}
}
...
for my inputs.
For this case, I don't get the cannot squeeze message, or the error code (-5), but it does fail with Error computing NN outputs. So something is definitely wrong. I'm just not at all sure how to go about debugging it.
J.
UPDATE: For comparison I've trained/converted the HuggingFace BERT Model (actually, DistilBert — I've updated the code above accordingly) and have the same error. Looking at the log from onnx, I see that there is one Squeeze added (also clear from the onnx-coreml log, of course), but the only squeeze in the PyTorch code is in BertForQuestionAnswering, not BertForMaskedLM. Perhaps onnx is building the question answering model, not the mlm one (or the question answering one is being saved in the checkpoint)?
Looking at the swift-coreml-transformers sample code I can see that distilBert's input is just let input_ids = MLMultiArray.from(allTokens, dims: 2), which is exactly how I've defined it. So I'm at a dead-end, I think. Has anybody managed to run MLM with Bert/DistilBert in CoreML (via onnx)? If so, an example would be super helpful.
I'm training using the latest run_language_modeling.py from HuggingFace, btw.
Looking at the mlmodel in Netron, I can see the offending squeeze. I confess that I don't know what that branch of the input is for, but I'm guessing it's a mask (and a further guess might be that it has to do with question answering). Could I just remove that somehow?

Related

How can I add the decode_batch_predictions() method into the Keras Captcha OCR model?

The current Keras Captcha OCR model returns a CTC encoded output, which requires decoding after inference.
To decode this, one needs to run a decoding utility function after inference as a separate step.
preds = prediction_model.predict(batch_images)
pred_texts = decode_batch_predictions(preds)
The decoded utility function uses keras.backend.ctc_decode, which in turn uses either a greedy or beam search decoder.
# A utility function to decode the output of the network
def decode_batch_predictions(pred):
input_len = np.ones(pred.shape[0]) * pred.shape[1]
# Use greedy search. For complex tasks, you can use beam search
results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][
:, :max_length
]
# Iterate over the results and get back the text
output_text = []
for res in results:
res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
output_text.append(res)
return output_text
I would like to train a Captcha OCR model using Keras that returns the CTC decoded as an output, without requiring an additional decoding step after inference.
How would I achieve this?
The most robust way to achieve this is by adding a method which is called as part of the model definition:
def CTCDecoder():
def decoder(y_pred):
input_shape = tf.keras.backend.shape(y_pred)
input_length = tf.ones(shape=input_shape[0]) * tf.keras.backend.cast(
input_shape[1], 'float32')
unpadded = tf.keras.backend.ctc_decode(y_pred, input_length)[0][0]
unpadded_shape = tf.keras.backend.shape(unpadded)
padded = tf.pad(unpadded,
paddings=[[0, 0], [0, input_shape[1] - unpadded_shape[1]]],
constant_values=-1)
return padded
return tf.keras.layers.Lambda(decoder, name='decode')
Then defining the model as follows:
prediction_model = keras.models.Model(inputs=inputs, outputs=CTCDecoder()(model.output))
Credit goes to tulasiram58827.
This implementation supports exporting to TFLite, but only float32. Quantized (int8) TFLite export is still throwing an error, and is an open ticket with TF team.
Your question can be interpreted in two ways. One is: I want a neural network that solves a problem where the CTC decoding step is already inside what the network learned. The other one is that you want to have a Model class that does this CTC decoding inside of it, without using an external, functional function.
I don't know the answer to the first question. And I cannot even tell if it's feasible or not. In any case, sounds like a difficult theoretical problem and if you don't have luck here, you might want to try posting it in datascience.stackexchange.com, which is a more theory-oriented community.
Now, if what you are trying to solve is the second, engineering version of the problem, that's something I can help you with. The solution for that problem is the following:
You need to subclass keras.models.Model with a class with the method you want. I went over the tutorial in the link you posted and came with the following class:
class ModifiedModel(keras.models.Model):
# A utility function to decode the output of the network
def decode_batch_predictions(self, pred):
input_len = np.ones(pred.shape[0]) * pred.shape[1]
# Use greedy search. For complex tasks, you can use beam search
results = keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0][
:, :max_length
]
# Iterate over the results and get back the text
output_text = []
for res in results:
res = tf.strings.reduce_join(num_to_char(res)).numpy().decode("utf-8")
output_text.append(res)
return output_text
def predict_texts(self, batch_images):
preds = self.predict(batch_images)
return self.decode_batch_predictions(preds)
You can give it the name you want, it's just for illustration purposes.
With this class defined, you would replace the line
# Get the prediction model by extracting layers till the output layer
prediction_model = keras.models.Model(
model.get_layer(name="image").input, model.get_layer(name="dense2").output
)
with
prediction_model = ModifiedModel(
model.get_layer(name="image").input, model.get_layer(name="dense2").output
)
And then you can replace the lines
preds = prediction_model.predict(batch_images)
pred_texts = decode_batch_predictions(preds)
with
pred_texts = prediction_model.predict_texts(batch_images)

Python Surprise package gives different predictions for predict method vs manual compute using latent factors

I am using the surprise package for matrix factorization. Below is the code for the tutorial:
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split
# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin('ml-100k')
trainset = data.build_full_trainset()
algo = SVD()
algo.fit(trainset)
algo.predict(str(196), str(302))
Out:
Prediction(uid='196', iid='301', r_ui=4, est=3.0740854315737174, details={'was_impossible': False})
However, when I use the SVD equation from its documentation and source code to manually compute the r_hat (r prediction):
algo.trainset.global_mean + algo.bi[301] + algo.bu[196] + np.dot(algo.qi[301], algo.pu[196])
Out:
2.817335384596893
The predictions does not match at all. Am I doing anything wrong or missing something?
I managed to figure it out. There's a difference between raw users/items and inner users/items. The former refers to the actual names of the users and items (e.g., user = John or a number like 10; items = Avengers or a number like 20) while the latter I assume to be the label encoded values given to the original users/items.
The hidden attributes of the trainset contain 4 attributes, _inner2raw_id_items, _inner2raw_id_users, _raw2inner_id_items, _raw2inner_id_users, which are dicts containing the conversion from one to the other.
If we call trainset._raw2inner_id_users and trainset._raw2inner_id_items, we get:
_raw2inner_id_users
{'196': 0,
'186': 1,
'22': 2, ...}
_raw2inner_id_items
{'242': 0,
'302': 1,
'377': 2, ...
'301': 404, ...}
Therefore, when we call:
algo.predict(str(196), str(302))
Out:
# different from original post as the prediction changes from run to run
Prediction(uid='196', iid='301', r_ui=None, est=3.2072618383879736, details={'was_impossible': False})
We are actually referring to the 0th user and 1st item. So when we use the manual computation using the latent factors, bias, and global mean according to the SVD equation, we should use these numbers instead:
algo.trainset.global_mean + algo.bi[404] + algo.bu[0] + np.dot(algo.qi[404], algo.pu[0])
Output:
3.2072618383879736

Keras. How to concatenate intermediate layers of two different models into a third model

I have two sequential models that both do a pretty good job of classifying audio. One uses mfccs and the other wave forms. I am now trying to combine them into a third functional API model using one of the later Dense layers from each of the mfcc and wave form models. The example about how to get the intermediate layers in the Keras FAQ is not working for me (https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer).
Here is my code:
mfcc_model = load_model(S01_model_local_loc)
waveform_model = load_model(T01_model_local_loc)
mfcc_input = Input(shape=(79,30,1))
mfcc_model_as_layer = Model(inputs=mfcc_model.input,
outputs=mfcc_model.get_layer(name = 'dense_9').output)
waveform_input = Input(shape=(40000,1))
waveform_model_as_layer = Model(inputs=waveform_model.input,
outputs=waveform_model.get_layer(name = 'dense_2').output)
concatenated_1024 = concatenate([mfcc_model_as_layer, waveform_model_as_layer])
model_pred = layers.Dense(2, activation='sigmoid')(concatenated_1024)
uber_model = Model(inputs=[mfcc_input,waveform_input], outputs=model_pred)
This throws the error:
AttributeError: Layer sequential_5 has multiple inbound nodes, hence the notion of "layer input" is ill-defined. Use get_input_at(node_index) instead.
Changing the inputs to the first two Model statements to inputs=mfcc_model.get_input_at(1) and inputs=waveform_model.get_input_at(1) solves that error message, but I then get this error message:
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("dropout_21_input:0", shape=(?, 79, 30, 1), dtype=float32) at layer "dropout_21_input". The following previous layers were accessed without issue: []
If I remove the .get_layer statements and just take the final output of the model the graph connects nicely.
What do I need to do to just get the output of the Dense layers that I want?
Update: I found a really hacky way of getting what I want. I pop'ed off the layers of the mfcc and wave form models until the output layers were what I wanted. Then the code below seems to work. I'd love to know the right way to do this!
mfcc_input = Input(shape=(79,30,1))
waveform_input = Input(shape=(40000,1))
mfcc_model_as_layer = mfcc_model(mfcc_input)
waveform_model_as_layer = waveform_model(waveform_input)
concatenated_1024 = concatenate([mfcc_model_as_layer, waveform_model_as_layer])
model_pred = layers.Dense(2, activation='sigmoid')(concatenated_1024)
test_model = Model(inputs=[mfcc_input,waveform_input], outputs=model_pred)

custom loss function in Keras combining multiple outputs

I did a lot of searching and am still unable to figure out writing a custom loss function with multiple outputs where they interact.
I have a Neural Network defined as :
def NeuralNetwork():
inLayer = Input((2,));
layers = [Dense(numNeuronsPerLayer,activation = 'relu')(inLayer)];
for i in range(10):
hiddenLyr = Dense(5,activation = 'tanh',name = "layer"+ str(i+1))(layers[i]);
layers.append(hiddenLyr);
out_u = Dense(1,activation = 'linear',name = "out_u")(layers[i]);
out_k = Dense(1,activation = 'linear',name = "out_k")(layers[i]);
outLayer = Concatenate(axis=-1)([out_u,out_k]);
model = Model(inputs = [inLayer], outputs = outLayer);
return model
I am now trying to define a custom loss function as follows :
def computeLoss(true,prediction):
u_pred = prediction[:,0];
k_pred = prediction[:,1];
loss = f(u_pred)*k_pred;
return loss;
Where f(u_pred) is some manipulation of u_pred. The code seems to work correct and produce correct results when I use only u_pred (i.e., single output from the neural network only). However, the moment I try to include another output for k_pred and perform the slice of my prediction tensor in the loss function, I start getting wrong results. I feel I am doing something wrong in handling multiple outputs in Keras but am not sure where my mistake lies. Any help on how I may proceed is welcome.
I figured out that you can't just use indexing ( i.e., [:,0] or [:,1] ) to slice tensors in tf. The operation doesn't seem to work. Instead, use the built in function in tensorflow as
detailed in https://www.tensorflow.org/api_docs/python/tf/split?version=stable
So the code that worked was:
(u_pred, k_pred) = tf.split(prediction, num_or_size_splits=2, axis=1)

Seq2seq for non-sentence, float data; stuck configuring the decoder

I am trying to apply sequence-to-sequence modelling to EEG data. The encoding works just fine, but getting the decoding to work is proving problematic. The input-data has the shape None-by-3000-by-31, where the second dimension is the sequence-length.
The encoder looks like this:
initial_state = lstm_sequence_encoder.zero_state(batchsize, dtype=self.model_precision)
encoder_output, state = dynamic_rnn(
cell=LSTMCell(32),
inputs=lstm_input, # shape=(None,3000,32)
initial_state=initial_state, # zeroes
dtype=lstm_input.dtype # tf.float32
)
I use the final state of the RNN as the initial state of the decoder. For training, I use the TrainingHelper:
training_helper = TrainingHelper(target_input, [self.sequence_length])
training_decoder = BasicDecoder(
cell=lstm_sequence_decoder,
helper=training_helper,
initial_state=thought_vector
)
output, _, _ = dynamic_decode(
decoder=training_decoder,
maximum_iterations=3000
)
My troubles start when I try to implement inference. Since I am using non-sentence data, I do not need to tokenize or embed, because the data is essentially embedded already. The InferenceHelper class seemed the best way to achieve my goal. So this is what I use. I'll give my code then explain my problem.
def _sample_fn(decoder_outputs):
return decoder_outputs
def _end_fn(_):
return tf.tile([False], [self.lstm_layersize]) # Batch-size is sequence-length because of time major
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[32],
sample_dtype=target_input.dtype,
start_inputs=tf.zeros(batchsize_placeholder, 32), # the batchsize varies
end_fn=_end_fn
)
inference_decoder = BasicDecoder(
cell=lstm_sequence_decoder,
helper=inference_helper,
initial_state=thought_vector
)
output, _, _ = dynamic_decode(
decoder=inference_decoder,
maximum_iterations=3000
)
The Problem
I don't know what the shape of the inputs should be. I know the start-inputs should be zero because it is the first time-step. But this throws errors; it expects the input to be (1,32).
I also thought I should pass the output of each time-step unchanged to the next. However, this raises problems at run-time: the batch-size varies, so the shape is partial. The library throws an exception at this as it tries to convert the start_input to a tensor:
...
self._start_inputs = ops.convert_to_tensor(
start_inputs, name='start_inputs')
Any ideas?
This is a lesson in poor documentation.
I fixed my problem, but failed to address the variable batch-size problem.
The _end_fn was causing problems I was unaware of. I also managed to work out what the appropriate fields are for the InferenceHelper. I've given the fields names in case anyone needs guidance in future
def _end_fn(_):
return tf.tile([False], [batchsize])
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[lstm_number_of_units], # In my case, 32
sample_dtype=tf.float32, # Depends on the data
start_inputs=tf.zeros((batchsize, lstm_number_of_units)),
end_fn=_end_fn
)
As for the batch-size problem, there are two things I'm considering:
Changing the internal state of my model object. My TensorFlow computation graph is built inside a class. A class-field records the batch-size. Changing this during training may work. Or:
Pad the batches so that they are 200 sequences long. This will waste time.
Preferably I'd like a way to dynamically manage the batch-sizes.
EDIT: I found a way. It involves simply substituting square-brackets for parentheses:
inference_helper = InferenceHelper(
sample_fn=_sample_fn,
sample_shape=[self.lstm_layersize],
sample_dtype=target_input.dtype,
start_inputs=tf.zeros([batchsize, self.lstm_layersize]),
end_fn=_end_fn
)

Resources