How to use strategy.scope() in RoBERTa? - python-3.x

I made a model using BERT, for a NLI problem, the algorithm ran without problems, however, when I wanted to adapt it to RoBERTa, and I use strategy.scope (), it generates an error that I don't know how to solve, I appreciate any indication.
max_len1 = 515 # 128*4 de premisa mas 128*4 de hipotesis
def build_model1():
input_word_ids = tf.keras.Input(shape=(max_len1,), dtype=tf.int32,name="input_word_ids")
input_mask = tf.keras.Input(shape = (max_len1,),dtype=tf.int32,name = "input_mask")
input_type_ids = tf.keras.Input(shape = (max_len1,),dtype=tf.int32,name="input_type_ids")
embedding = model([input_word_ids,input_mask,input_type_ids])[0]
output = tf.keras.layers.Dense(3,activation='softmax')(embedding[:,0,:])
model3 = tf.keras.Model(inputs=[input_word_ids, input_mask, input_type_ids], outputs=output)
loss = 'sparse_categorical_crossentropy', metrics= ['accuracy'])
return model3
with strategy.scope():
model3 = build_model1()
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f2425631d00>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f2425631d00>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f2425631d00>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:AutoGraph could not transform <function wrap at 0x7f243c214d40> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform <function wrap at 0x7f243c214d40> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING: AutoGraph could not transform <function wrap at 0x7f243c214d40> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
ValueError Traceback (most recent call last)
<ipython-input-24-e91a2e7e4b41> in <module>()
1 with strategy.scope():
----> 2 model3 = build_model1()
3 model3.summary()
2 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/ in _validate_compile(self, optimizer, metrics, **kwargs)
2533 'with strategy.scope():\n'
2534 ' model=_create_model()\n'
-> 2535 ' model.compile(...)' % (v, strategy))
2537 # Model metrics must be created in the same distribution strategy scope
ValueError: Variable (<tf.Variable 'tfxlm_roberta_model/roberta/encoder/layer_._0/attention/self/query/kernel:0' shape=(1024, 1024) dtype=float32, numpy=
array([[-0.00294119, -0.00129846, 0.00517603, ..., 0.03835522,
0.0218797 , 0.02100084],
[-0.00933813, -0.05062149, 0.01634834, ..., -0.02387142,
0.0113477 , -0.02262339],
[-0.02023344, -0.04181184, -0.00581416, ..., -0.00609464,
0.00801133, 0.00512151],
[-0.02129102, -0.03157991, -0.04071935, ..., 0.04682101,
0.01948426, 0.00312433],
[-0.04902648, -0.01055507, 0.01377375, ..., 0.00845209,
0.01616496, -0.01041171],
[ 0.00759454, -0.00162496, -0.00215843, ..., -0.03199947,
-0.03871808, 0.04949447]], dtype=float32)>) was not created in the distribution strategy scope
of (<tensorflow.python.distribute.tpu_strategy.TPUStrategy object at 0x7f21fcbbb210>). It is most
likely due to not all layers or the model or optimizer being created outside the distribution
strategy scope. Try to make sure your code looks similar to the following.
with strategy.scope():
The same code, as I said above, works perfectly for BERT, obviously, for RoBERTa I made the changes in the tokenizer and the loading of the model

I managed to solve it, investigating, I reached that the implementation of roberta went beyond just calling the model


Why is the use of "color" considered an error?

I have been running a manim code, and it specifies: "TypeError: Mobject.getattr..getter() got an unexpected keyword argument 'color'"
This is the code I am trying to run:
from manim import *
class Function(Scene):
def construct(self):
ax=Axes(x_range=[-5,5,0.5], y_range=[-3,3,0.5],
x_axis_config={"numbers_to_include": np.arange(-5,5,1)},
y_axis_config={"numbers_to_include": [1]})
#ax_labels=ax.get_axis_labels(x_label="Time (t)", y_label=Tex(r"y=sin(x)"))
sin_graph=ax.get_graph(lambda x: np.sin(2*x), color=BLUE)
cos_graph = ax.get_graph(lambda x: np.cos(2*x), color=RED_B)
sin_label=ax.get_graph_label(sin_graph, label="\\sin(x)",
x_val=-4.5, direction=UP*4)
cos_label=ax.get_graph_label(cos_graph, label="\\cos(x)",
x_val=4.5, direction=DOWN*2)
ax_group=VGroup(ax, ax_labels)
#labels=VGroup(sin_label, cos_label), run_time=6)
self.wait(), run_time=2)
self.wait(), run_time=2)
I was expecting an animation of a sin graph, but it didn't work as specified above.
The error message from get_graph is unfortunately somewhat unhelpful. Practically, get_graph has been renamed to plot a while ago. See the docuementation:

How to use resnet by reading the .ckpt file of my learned weights with pytorch

In pytorch, how can I write the code that loads my .ckpt file instead of
model = torchvision.models.resnet50(pretrained=True)
Here is my attempt below
model = torchvision.models.resnet50(pretrained=False)
PATH = "/content/drive/MyDrive/Colab Notebooks/mlearning2/multi_logs/resnet_2/version_0/checkpoints/epoch=1-step=2543.ckpt"
model.load_state_dict(torch.load(PATH, map_location=torch.device('cpu')))
But it could not work and the following error appeared.
RuntimeError: Error(s) in loading state_dict for ResNet:
Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var", "layer1.0.conv2.weight", "layer1.0.bn2.weight", "layer1.0.bn2.bias", "layer1.0.bn2.running_mean", "layer1.0.bn2.running_var", "layer1.0.conv3.weight", "layer1.0.bn3.weight", "layer1.0.bn3.bias", "layer1.0.bn3.running_mean", "layer1.0.bn3.running_var", "layer1.0.downsample.0.weight", "layer1.0.downsample.1.weight", "layer1.0.downsample.1.bias", "layer1.0.downsample.1.running_mean", "layer1.0.downsample.1.running_var", "layer1.1.conv1.weight", "layer1.1.bn1.weight", "layer1.1.bn1.bias", "layer1.1.bn1.running_mean", "layer1.1.bn1.running_var", "layer1.1.conv2.weight", "layer1.1.bn2.weight", "layer1.1.bn2.bias", "layer1.1.bn2.running_mean", "layer1.1.bn2.running_var", "layer1.1.conv3.weight", "layer1.1.bn3.weight", "layer1.1.bn3.bias", "layer1.1.bn3.running_mean", "layer1.1.bn3.running_var", "layer1.2.conv1.weight", "layer1.2.bn1.weight", "layer1.2.bn1.bias", "layer1.2.bn1.running_mean", "layer1.2.bn1.running_var", "layer1.2.conv2.weight", "layer1.2.bn2.weight", "layer1.2.bn2.bias", "layer1.2.bn2.running_mean", "layer1.2.bn2.running_var", "layer1.2.conv3.weight", "layer1.2.bn3.weight", "layer1.2.bn3.bias", "layer1.2.bn3.running_mean", "layer1.2.bn3.running_var", "layer2.0.conv1.weight", "layer2.0.bn1.weight", "layer2.0.bn1.bias", "layer2.0.bn1...
Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "callbacks", "optimizer_states", "lr_schedulers", "hparams_name", "hyper_parameters".
How can I do it?
#Shai I tried to run
model.load_state_dict(torch.load(PATH, map_location=torch.device('cpu'))['state_dict'])
however got the following error.
Your saved checkpoint contains not only a snapshot of the trained weights of the model but some other useful information on the state of the training (e.g., the state of the optimizer etc.).
Try selecting only the relevant part of the saved checkpoint:
model.load_state_dict(torch.load(PATH, map_location=torch.device('cpu'))['state_dict'])
Based on the modification you made and the new error you received, it seems like the model that was saved is model.backbone = torchvision.models.resnet50().
You need to instantiate your model in the same manner as done during training.

Why does sklearn pipeline.set_params() not work?

I have the following pipeline:
from sklearn.pipeline import Pipeline
import lightgbm as lgb
steps_lgb = [('lgb', lgb.LGBMClassifier())]
# Create the pipeline: composed of preprocessing steps and estimators
pipe = Pipeline(steps_lgb)
Now I want to set the parameters of the classifier using the following command:
best_params = {'boosting_type': 'dart',
'colsample_bytree': 0.7332216010898506,
'feature_fraction': 0.922329814019706,
'learning_rate': 0.046566283755421566,
'max_depth': 7,
'metric': 'auc',
'min_data_in_leaf': 210,
'num_leaves': 61,
'objective': 'binary',
'reg_lambda': 0.5185517505019249,
'subsample': 0.5026815575448366}
This however raises an error:
ValueError: Invalid parameter boosting_type for estimator Pipeline(steps=[('estimator', LGBMClassifier())]). Check the list of available parameters with `estimator.get_params().keys()`.
boosting_type is definitely a core parameter of the lightgbm framework, if removed however (from best_params) other parameters cause the valueError to be raised.
So, what I want is to set the parameters of the classifier after a pipeline is created.
When using pipelines, you need to prefix the parameters depending on which part of the pipeline they refer to with the name of the respective component (here lgb) followed by a double uncerscore (lgb__); the fact that here your pipeline consists of only a single element does not change this requirement.
So, your parameters should be like (only the first 2 elements shown):
best_params = {'lgb__boosting_type': 'dart',
'lgb__colsample_bytree': 0.7332216010898506
You would have discovered this yourself if you had followed the advice clearly offered in your error message:
Check the list of available parameters with `estimator.get_params().keys()`.
In your case,

Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers

Update #1 (original question and details below):
As per the suggestion of #MatthijsHollemans below I've tried to run this by removing dynamic_axes from the initial create_onnx step below. This removed both:
Description of image feature 'input_image' has missing or non-positive width 0.
Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
Unfortunately this opens up two sub-questions:
I still want to have a functional ONNX model. Is there a more appropriate way to make H and W dynamic? Or should I be saving two versions of the ONNX model, one without dynamic_axes for the CoreML conversion, and one with for use as a valid ONNX model?
Although this solves the compilation error in xcode (specified below) it introduces the following runtime issues:
Finalizing CVPixelBuffer 0x282f4c5a0 while lock count is 1.
[espresso] [Espresso::handle_ex_plan] exception=Invalid X-dimension 1/480 status=-7
[coreml] Error binding image input buffer input_image: -7
[coreml] Failure in bindInputsAndOutputs.
I am calling this the same way I was calling the fixed size model, which does still work fine. The image dimensions are 640 x 480.
As specified below the model should accept any image between 64x64 and higher.
For flexible shape models, do I need to provide an input differently in xcode?
Original Question (parts still relevant)
I have been slowly working on converting a style transfer model from pytorch > onnx > coreml. One of the issues that has been a struggle is flexible/dynamic input + output shape.
This method (besides i/o renaming) has worked well on iOS 12 & 13 when using a static input shape.
I am using the following code to do the onnx > coreml conversion:
def create_coreml(name):
mlmodel = convert(
model="onnx/" + name + ".onnx",
preprocessing_args={'is_bgr': True},
deprocessing_args={'is_bgr': True},
spec = mlmodel.get_spec()
img_size_ranges = flexible_shape_utils.NeuralNetworkImageSizeRange()
img_size_ranges.add_height_range((64, -1))
img_size_ranges.add_width_range((64, -1))
mlmodel = coremltools.models.MLModel(spec)"mlmodel/" + name + ".mlmodel")
Although the conversion 'succeeds' there are a couple of warnings (spaces added for readability):
Translation to CoreML spec completed. Now compiling the CoreML model.
RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was:
Error compiling model:
"Error reading protobuf spec. validator error: Description of image feature 'input_image' has missing or non-positive width 0.".
Model Compilation done.
RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was:
Error compiling model:
"compiler error: Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
If I ignore these warnings and try to compile the model for latest targets (13.0) I get the following error in xcode:
coremlc: Error: compiler error: Input 'input_image' of layer '63' not found in any of the outputs of the preceeding layers.
Here is what the problematic area appears to look like in netron:
My main question is how can I get these two warnings out of the way?
Happy to provide any other details.
Thanks for any advice!
Below is my pytorch > onnx conversion:
def create_onnx(name):
prior = torch.load("pth/" + name + ".pth")
model = transformer.TransformerNetwork()
dummy_input = torch.zeros(1, 3, 64, 64) # I wasn't sure what I would set the H W to here?
torch.onnx.export(model, dummy_input, "onnx/" + name + ".onnx",
input_names=["input_image"], # These are being renamed from garbled originals.
output_names=["stylized_image"], # ^
{2: 'height', 3: 'width'},
{2: 'height', 3: 'width'}}
onnx.save_model(original_model, "onnx/" + name + ".onnx")

Pytorch, Unable to get repr for <class 'torch.Tensor'>

I'm implementing some RL in PyTorch and had to write my own mse_loss function (which I found on Stackoverflow ;) ).
The loss function is:
def mse_loss(input_, target_):
return torch.sum(
(input_ - target_) * (input_ - target_)) /
Now, in my training loop, the first input is something like:
tensor([-1.7610e+10]), tensor([-6.5097e+10])
With this input I'll get the error:
Unable to get repr for <class 'torch.Tensor'>
Computing a = (input_ - target_) works fine, while b = a * a respectively b = torch.pow(a, 2) will fail with the error metioned above.
Does anyone know a fix for this?
Thanks a lot!
I just tried using torch.nn.functional.mse_loss which will result in the same error..
I had the same error,when I use the below code
criterion = torch.nn.CrossEntropyLoss().cuda()
loss=criterion(output, target)
but I finally found my wrong:output is like tensor([[0.5746,0.4254]]) and target is like tensor([2]),the number 2 is out of indice of output
when I not use GPU,this error message is:
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /opt/conda/conda-bld/pytorch-nightly_1547458468907/work/aten/src/THNN/generic/ClassNLLCriterion.c:93
Are you using a GPU ?
I had simillar problem (but I was using gather operations), and when I moved my tensors to CPU I could get a proper error message. I fixed the error, switched back to GPU and it was alright.
Maybe pytorch has trouble outputing the correct error when it comes from inside the GPU.
