I notice that in https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html it says that we can set tensorflow op as input of keras model like: first_layer.set_input(my_input_tensor). But I find that keras does not have set_input function:
first_layer = Dense(32, activation='relu', input_dim=784)
first_layer.set_input(my_input_tensor)
But I get:
AttributeError: 'Dense' object has no attribute 'set_input'.
What may be the problem?
I guess set_input() method is removed in latest versions of Keras. If you see this documentation of Keras, there is a function called set_input() function of keras.layers.containers.Sequential class. But its source code is no longer available on Github.
If you look at the source code of Dense layer class in Keras, you will see that there is no such method called set_input() as well. If you also see the source of abstract class Layer which is the base class for Dense layer, you will see there is no such function called set_input().
So, we can conclude, set_input() method is probably no longer available in Keras.
Related
I want to export roberta-base based language model to ONNX format. The model uses ROBERTA embeddings and performs text classification task.
from torch import nn
import torch.onnx
import onnx
import onnxruntime
import torch
import transformers
from logs:
17: pytorch: 1.10.2+cu113
18: CUDA: False
21: device: cpu
26: onnxruntime: 1.10.0
27: onnx: 1.11.0
PyTorch export
batch_size = 3
model_input = {
'input_ids': torch.empty(batch_size, 256, dtype=torch.int).random_(32000),
'attention_mask': torch.empty(batch_size, 256, dtype=torch.int).random_(2),
'seq_len': torch.empty(batch_size, 1, dtype=torch.int).random_(256)
}
model_file_path = os.path.join("checkpoints", 'model.onnx')
torch.onnx.export(da_inference.model, # model being run
model_input, # model input (or a tuple for multiple inputs)
model_file_path, # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=11, # the ONNX version to export the model to
operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['input_ids', 'attention_mask', 'seq_len'], # the model's input names
output_names = ['output'], # the model's output names
dynamic_axes={'input_ids': {0 : 'batch_size'},
'attention_mask': {0 : 'batch_size'},
'seq_len': {0 : 'batch_size'},
'output' : {0 : 'batch_size'}},
verbose=True)
I know there maybe problems converting some operators from ATen (A Tensor Library for C++11), if included in model architecture PyTorch Model Export to ONNX Failed Due to ATen.
Exports succeeds if I set the parameter operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK which means 'leave as is ATen operators if not supported in ONNX'.
PyTorch export function gives me the following warning:
Warning: Unsupported operator ATen. No schema registered for this operator.
Warning: Shape inference does not support models with experimental operators: ATen
It looks like the only ATen operators in the model that are not converted to ONNX are situated inside layers LayerNorm.weight and LayerNorm.bias (I have several layers like that):
%1266 : Float(3, 256, 768, strides=[196608, 768, 1], requires_grad=0, device=cpu) =
onnx::ATen[cudnn_enable=1, eps=1.0000000000000001e-05, normalized_shape=[768], operator="layer_norm"]
(%1265, %model.utterance_rnn.base.encoder.layer.11.output.LayerNorm.weight,
%model.utterance_rnn.base.encoder.layer.11.output.LayerNorm.bias)
# /opt/conda/lib/python3.9/site-packages/torch/nn/functional.py:2347:0
Than model check passes OK:
model = onnx.load(model_file_path)
# Check that the model is well formed
onnx.checker.check_model(model)
# Print a human readable representation of the graph
print(onnx.helper.printable_graph(model.graph))
I also can visualize computation graph using Netron.
But when I try to perform inference using exported ONNX model it stalls with no logs or stdout. So this code will hang the system:
model_file_path = os.path.join("checkpoints", "model.onnx")
sess_options = onnxruntime.SessionOptions()
sess_options.log_severity_level = 0
ort_providers: List[str] = ["CUDAExecutionProvider"] if use_gpu else ['CPUExecutionProvider']
session = InferenceSession(model_file_path, providers=ort_providers, sess_options=sess_options)
Is there any suggestions to overcome this problem? From official documentation I see that torch.onnx models exported this way are probably runnable only by Caffe2.
This layers are not inside the base frozen roberta model, so this is additional layers that I added by myself. Is it possible to substitute the offending layers with similar ones and retrain the model?
Or Caffe2 is the best choice here and onnxruntime will not do the inference?
Update: I retrained the model on the basis of BERT cased embeddings, but the problem persists. The same ATen operators are not converted in ONNX.
It looks like the layers LayerNorm.weight and LayerNorm.bias are only in the model above BERT. So, what is your suggestions to change this layers and enable ONNX export?
Have you tried to export after defining the operator for onnx? Something along the lines of the following code by Huawei.
On another note, when loading a model, you can technically override anything you want. Putting a specific layer to equal your modified class that inherits the original, keeps the same behavior (input and output) but execution of it can be modified.
You can try to use this to save the model with changed problematic operators, transform it in onnx, and fine tune in such form (or even in pytorch).
This generally seems best solved by the onnx team, so long term solution might be to post a request for that specific operator on the github issues page (but probably slow).
Best way to go will be to rewrite the place in the model that uses these operator in a way it will convert look at this for reference.
if for example the issue is layer norm then you can write it yourself. another thing that help sometimes is not setting the axes as dynamic, since some op dont support it yet
yes I've read everywhere that keras and tf.keras aren't compatible. But you can pass tf.keras.layers into a keras model, and it does work. When I try to do that with my own models... it does not work!
If you examine the resnet sourcecode for Resnet50.py, they build models like
input = layers.Input(shape=input_shape)
x = layers.Dense()(x)
model = Model(input,x)
and it works fine whether you pass in layers=tf.keras.layers or layers=keras.layers
demonstration code:
import tensorflow as tf
import keras
# THIS WORKS!
input_shape = (224,224,3)
base_model = keras.applications.ResNet50(layers=tf.keras.layers, weights='imagenet',
weights='imagenet', include_top=False, pooling=None,
input_shape=input_shape,
classes=1000)
# this fails!!
input = tf.keras.layers.Input(shape=input_shape)
x = tf.keras.layers.Dense(1000,activation='relu')(input)
model = keras.Model(input, x)
My code produces this error: type error:
object of type Dense has no len
How to make my this work? Apparently there is a way to make it work, because the keras.applications prebuilt models do seem to support it and it works fine
I want to use tf.keras.layers because their batchnormalization layer works different. This is potentially the easiest way to drop it into our massive existing code base.
I do see this related stackoverflow post with the same error: Object of Type 'Dense' has no len()
They correctly mention it's due to tf.keras and keras not being compatible. but again, I've confirmed that passing tf.keras.layers into keras.applications.resnet50 does return a keras model with the correct layers. Somehow.
You got the wrong conclusion, keras.applications is a module that supports both keras and tf.keras packages, as keras.applications uses models.Model, it detects if you use tf.keras or keras and gets the corresponding modules so the code is agnostic to the actual keras implementation.
keras.applications is not mixing usage of keras and tf.keras, it just supports both.
I am trying to implement a DQN agent using Keras-rl. The problem is that when I define my model I need to use an LSTM layer in the architecture:
model = Sequential()
model.add(Flatten(input_shape=(1, 8000)))
model.add(Reshape(target_shape=(200, 40)))
model.add(LSTM(20))
model.add(Dense(3, activation='softmax'))
return model
Executing the rl-agent I obtain the following error:
RuntimeError: Attempting to capture an EagerTensor without building a function.
Which is related to the use of the LSTM and to the following line of code:
tf.compat.v1.disable_eager_execution()
Using a Dense layer instead of an LSTM:
model = Sequential()
model.add(Flatten(input_shape=(1, 8000)))
model.add(Dense(20))
model.add(Dense(3, activation='softmax'))
return model
and maintaining eager execution disabled I don't have the previously reported error. If I delete the disabling of the eager execution with the LSTM layer I have other errors.
Can anyone help me to understand the reason of the error?
The keras-rl library does not have explicit support for TensorFlow 2.0, so it will not work with such version of TensorFlow. The library is sparsely updated and the last release is around 2 years old (from 2018), so if you want to use it you should use TensorFlow 1.x
Install keras-rl2 from github support tensorflow 2.x
Although it is possible to migrate the code for keras-rl to use eager execution and therefore an LSTM. LSTMs need to be updated with a whole episode of learning to prove accurate something which keras-rl does not support. See more here: https://github.com/keras-rl/keras-rl/issues/41
I would like to modify the back-propagation for the embedding layer but I don't understand where the definition is.
In the definition available in https://pytorch.org/docs/stable/_modules/torch/nn/functional.html in the embedding function, they call torch.embedding and here there should be defined how the weights are updated.
So my question is:
Where can I find the documentation of torch.embedding?
It calls underlying C function, in my torch build(version 4) this file.
How is it possible to use leaky ReLUs in the newest version of keras?
Function relu() accepts an optional parameter 'alpha', that is responsible for the negative slope, but I cannot figure out how to pass ths paramtere when constructing a layer.
This line is how I tried to do it,
model.add(Activation(relu(alpha=0.1))
but then I get the error
TypeError: relu() missing 1 required positional argument: 'x'
How can I use a leaky ReLU, or any other activation function with some parameter?
relu is a function and not a class and it takes the input to the activation function as the parameter x. The activation layer takes a function as the argument, so you could initialize it with a lambda function through input x for example:
model.add(Activation(lambda x: relu(x, alpha=0.1)))
Well, from this source (keras doc), and this github question , you use a linear activation then you put the leaky relu as another layer right after.
from keras.layers.advanced_activations import LeakyReLU
model.add(Dense(512, 512, activation='linear')) # Add any layer, with the default of an identity/linear squashing function (no squashing)
model.add(LeakyReLU(alpha=.001)) # add an advanced activation
does that help?
You can build a wrapper for parameterized activations functions. I've found this useful and more intuitive.
class activation_wrapper(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
def _func(x):
return self.func(x, *args, **kwargs)
return _func
Of course I could have used a lambda expression in call.
Then
wrapped_relu = activation_wrapper(relu).
Then use it as you have above
model.add(Activation(wrapped_relu(alpha=0.1))
You can also use it as part of a layer
model.add(Dense(64, activation=wrapped_relu(alpha=0.1))
While this solution is a little more complicated than the one offered by #Thomas Jungblut, the wrapper class can be reused for any parameterized activation function. In fact, I used it whenever I have a family of activation functions that are parameterized.
Keras defines separate activation layers for the most common use cases, including LeakyReLU, ThresholdReLU, ReLU (which is a generic version that supports all ReLU parameters), among others. See the full documentation here: https://keras.io/api/layers/activation_layers
Example usage with the Sequential model:
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer(10))
model.add(tf.keras.layers.Dense(16))
model.add(tf.keras.layers.LeakyReLU(0.2))
model.add(tf.keras.layers.Dense(1))
model.add(tf.keras.layers.Activation(tf.keras.activations.sigmoid))
model.compile('adam', 'binary_crossentropy')
If the activation parameter you want to use is unavailable as a predefined class, you could use a plain lambda expression as suggested by #Thomas Jungblut:
from tensorflow.keras.layers import Activation
model.add(Activation(lambda x: tf.keras.activations.relu(x, alpha=0.2)))
However, as noted by #leenremm in the comments, this fails when trying to save or load the model. As suggested you could use the Lambda layer as follows:
from tensorflow.keras.layers import Activation, Lambda
model.add(Activation(Lambda(lambda x: tf.keras.activations.relu(x, alpha=0.2))))
However, the Lambda documentation includes the following warning:
WARNING: tf.keras.layers.Lambda layers have (de)serialization limitations!
The main reason to subclass tf.keras.layers.Layer instead of using a Lambda layer is saving and inspecting a Model. Lambda layers are saved by serializing the Python bytecode, which is fundamentally non-portable. They should only be loaded in the same environment where they were saved. Subclassed layers can be saved in a more portable way by overriding their get_config method. Models that rely on subclassed Layers are also often easier to visualize and reason about.
As such, the best method for activations not already provided by a layer is to subclass tf.keras.layers.Layer instead. This should not be confused with subclassing object and overriding __call__ as done in #Anonymous Geometer's answer, which is the same as using a lambda without the Lambda layer.
Since my use case is covered by the provided layer classes, I'll leave it up to the reader to implement this method. I am making this answer a community wiki in the event anyone would like to provide an example below.