When running tensorflow-federated v.0.20.0 I get the error "tensorflow-federated.python.learning.models" has no attribute ModelWeights - nlp

I tried running the tensorflow-federated tutorial on "Federated Learning for Text Generation" but keep getting AttributeErrors on tff.learning.models. I wonder if this is because I have an outdated version of tensorflow-federated 0.20.0. The following code is what I tried running:
NUM_ROUNDS = 5
# The state of the FL server, containing the model and optimization state.
state = fed_avg.initialize()
# Load our pre-trained Keras model weights into the global model state.
pre_trained_weights = tff.learning.models.ModelWeights(
    trainable=[v.numpy() for v in keras_model.trainable_weights],
    non_trainable=[v.numpy() for v in keras_model.non_trainable_weights]
)
state = fed_avg.set_model_weights(state, pre_trained_weights)
def keras_evaluate(state, round_num):
  # Take our global model weights and push them back into a Keras model to
  # use its standard `.evaluate()` method.
  keras_model = load_model(batch_size=BATCH_SIZE)
  keras_model.compile(
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[FlattenedCategoricalAccuracy()])
  model_weights = fed_avg.get_model_weights(state)
  model_weights.assign_weights_to(keras_model)
  loss, accuracy = keras_model.evaluate(example_dataset, steps=2, verbose=0)
  print('\tEval: loss={l:.3f}, accuracy={a:.3f}'.format(l=loss, a=accuracy))
for round_num in range(NUM_ROUNDS):
  print('Round {r}'.format(r=round_num))
  keras_evaluate(state, round_num)
  result = fed_avg.next(state, train_datasets)
  state = result.state
  train_metrics = result.metrics['client_work']['train']
  print('\tTrain: loss={l:.3f}, accuracy={a:.3f}'.format(
      l=train_metrics['loss'], a=train_metrics['accuracy']))
print('Final evaluation')
keras_evaluate(state, NUM_ROUNDS + 1)
Also the reason I am running v.0.20.0 is because this is the only version I got to work in my conda environment on my Linux system. Please come with comments on what I have done wrong I am very new to tensorflow-federated. Also since this is tutorial form tensorflow I am pretty sure the code should run if done correctly.
I tried to find documentation on tensorflow-federated.learning.models and there it says there should be a module called ModelWeights.

Related

pytorch on colab "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)"

I've been running a resnet50 pytorch script on colab for nine months. I haven't run the script for about three weeks and I get the following error now: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
My script barfs on this colab cell:
# Train the model
if IS_TRAINING:
# Create the model
mymodel = torchvision.models.resnet50(weights='ResNet50_Weights.DEFAULT')
n_features = mymodel.fc.in_features
# Replace the last layer by our own Linear layer
mymodel.fc = DisMaxLossFirstPart(n_features, len(class_names))
mymodel = mymodel.to(device)
criterion = DisMaxLossSecondPart(mymodel.fc)
optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)
mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
)`
With this error:
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%
97.8M/97.8M [00:00<00:00, 308MB/s]
Epoch 0/3
----------
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-22-01c27b985121> in <module>
12 optimizer_conv = torch.optim.Adam(mymodel.parameters(), lr=1e-4)
13
---> 14 mymodel, train_acc_1, train_loss_1, val_acc_1, val_loss_1 = train_model(
15 mymodel, criterion=criterion, optimizer=optimizer_conv, scheduler=None, num_epochs=TRAIN_EPOCHS
16 )
2 frames
<ipython-input-19-c213dabd46bb> in forward(self, logits, targets, debug, precompute_thresholds)
87 num_classes = logits.size(1)
88 half_batch_size = batch_size//2
---> 89 targets_one_hot = torch.eye(num_classes)[targets].long().cuda()
90
91 if self.model_classifier.training:
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
As described above, I am running a resnet50 saved model using pytorch on colab. The script used to run without a problem. All I have done today is change the input set of images. I've also verified this by running the last version of my script that worked for me and it barfs in the same way.
I am specifying the device to be used as follows:
device = torch.device("cuda:0" if torch.cuda.is_available() else "CPU")
Shouldn't this be forcing everything on colab to be on the GPU?
I saw that someone else had a similar problem with YOLOv7 but I don't have any code that is "from_which_layer.append((torch.ones(size=(len(b),)) * i)"
As jhso said, I should enter the device in this line:
targets_one_hot = torch.eye(num_classes)[targets].long().cuda()
so I changed it to this:
targets_one_hot = torch.eye(num_classes, device="cuda")[targets].long().cuda()
And that works.

Pytorch + BERT+ batch_encode_plus() Code running fine in Colab but producing problems with Kaggle in mismatch input shapes

I tried to use a Google Colab initialised Notebook for Kaggle and found a strange behaviour as it gave me something like:
16 # text2tensor
---> 17 train_seq,train_mask,train_y = textToTensor(train_text,train_labels,pad_len)
18 val_seq,val_mask,val_y = textToTensor(val_text,val_labels,pad_len)
19
<ipython-input-9-ee85c4607a30> in textToTensor(text, labels, max_len)
4 tokens = tokenizer.batch_encode_plus(text.tolist(), max_length=max_len, padding='max_length', truncation=True)
5
----> 6 text_seq = torch.tensor(tokens['input_ids'])
7 text_mask = torch.tensor(tokens['attention_mask'])
8
ValueError: expected sequence of length 38 at dim 1 (got 13)
The error came from the code below:
def textToTensor(text,labels=None,max_len=38):#max_len is 38
tokens = tokenizer.batch_encode_plus(text.tolist(), max_length=max_len, padding='max_length', truncation=True)
text_seq = torch.tensor(tokens['input_ids']) # ERROR CAME FROM HERE
text_mask = torch.tensor(tokens['attention_mask'])
text_y = None
if isinstance(labels,np.ndarray):
text_y = torch.tensor(labels.tolist())
return text_seq, text_mask, text_y
train_seq,train_mask,train_y = textToTensor(train_text,train_labels,pad_len)
train_data = TensorDataset(train_seq, train_mask, train_y)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
I again ran this code on COLAB and it ran smoothly. Can it be because of the Version and something like that? Can someone please this?
Kaggle Configs:
transformers: '2.11.0'
torch: '1.5.1'
python: 3.7.6
Colab Configs:
torch: 1.7.0+cu101
transformers: 3.5.1
python: 3.6.9
EDIT:
My train_text is a numpy array of texts and train_labels is 1-d numerical array with 4 classes ranging 0-3.
Also: I initialized my tokenizer as:
from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')

RuntimeError: Unknown device when trying to run AlbertForMaskedLM on colab tpu

I am running the following code on colab taken from the example here: https://huggingface.co/transformers/model_doc/albert.html#albertformaskedlm
import os
import torch
import torch_xla
import torch_xla.core.xla_model as xm
assert os.environ['COLAB_TPU_ADDR']
dev = xm.xla_device()
from transformers import AlbertTokenizer, AlbertForMaskedLM
import torch
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
model = AlbertForMaskedLM.from_pretrained('albert-base-v2').to(dev)
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
data = input_ids.to(dev)
outputs = model(data, masked_lm_labels=data)
loss, prediction_scores = outputs[:2]
I haven't done anything to the example code except move input_ids and model onto the TPU device using .to(dev). It seems everything is moved to the TPU no problem as when I input data I get the following output: tensor([[ 2, 10975, 15, 51, 1952, 25, 10901, 3]], device='xla:1')
However when I run this code I get the following error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-f756487db8f7> in <module>()
1
----> 2 outputs = model(data, masked_lm_labels=data)
3 loss, prediction_scores = outputs[:2]
9 frames
/usr/local/lib/python3.6/dist-packages/transformers/modeling_albert.py in forward(self, hidden_states, attention_mask, head_mask)
277 attention_output = self.attention(hidden_states, attention_mask, head_mask)
278 ffn_output = self.ffn(attention_output[0])
--> 279 ffn_output = self.activation(ffn_output)
280 ffn_output = self.ffn_output(ffn_output)
281 hidden_states = self.full_layer_layer_norm(ffn_output + attention_output[0])
RuntimeError: Unknown device
Anyone know what's going on?
Solution is here: https://github.com/pytorch/xla/issues/1909
Before calling model.to(dev), you need to call xm.send_cpu_data_to_device(model, xm.xla_device()):
model = AlbertForMaskedLM.from_pretrained('albert-base-v2')
model = xm.send_cpu_data_to_device(model, dev)
model = model.to(dev)
There are also some issues with getting the gelu activation function ALBERT uses to work on the TPU, so you need to use the following branch of transformers when working on TPU: https://github.com/huggingface/transformers/tree/fix-jit-tpu
See the following colab notebook (by https://github.com/jysohn23) for full solution: https://colab.research.google.com/gist/jysohn23/68d620cda395eab66289115169f43900/getting-started-with-pytorch-on-cloud-tpus.ipynb

How can I do simple matmul on edge tpu?

I can't work out how to invoke my .tflite model that does matmul on the coral accelerator using the python api.
The .tflite model is generated from some example code here. It works well using the tf.lite.Interpreter() class but I don't know how to transform it to work with the edgetpu class. I have tried edgetpu.basic.basic_engine.BasicEngine() by changing the models datatype from numpy.float32 to numpy.uint8, but that did not help. I am a complete beginner with TensorFlow and just want to use my tpu for matmul.
import numpy
import tensorflow as tf
import edgetpu
from edgetpu.basic.basic_engine import BasicEngine
def export_tflite_from_session(session, input_nodes, output_nodes, tflite_filename):
print("Converting to tflite...")
converter = tf.lite.TFLiteConverter.from_session(session, input_nodes, output_nodes)
tflite_model = converter.convert()
with open(tflite_filename, "wb") as f:
f.write(tflite_model)
print("Converted %s." % tflite_filename)
#This does matmul just fine but does not use the TPU
def test_tflite_model(tflite_filename, examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
interpreter = tf.lite.Interpreter(model_path=tflite_filename)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print("input details: %s" % input_details)
print("output details: %s" % output_details)
for i, input_tensor in enumerate(input_details):
interpreter.set_tensor(input_tensor['index'], examples[i])
interpreter.invoke()
model_output = []
for i, output_tensor in enumerate(output_details):
model_output.append(interpreter.get_tensor(output_tensor['index']))
return model_output
#this should use the TPU, but I don't know how to run the model or if it needs
#further processing. One matrix can be constant for my use case
def test_tpu(tflite_filename,examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
#TODO edgetpu.basic
interpreter = BasicEngine(tflite_filename)
interpreter.allocate_tensors()#does not work...
def main():
tflite_filename = "model.tflite"
shape_a = (2, 2)
shape_b = (2, 2)
a = tf.placeholder(dtype=tf.float32, shape=shape_a, name="A")
b = tf.placeholder(dtype=tf.float32, shape=shape_b, name="B")
c = tf.matmul(a, b, name="output")
numpy.random.seed(1234)
a_ = numpy.random.rand(*shape_a).astype(numpy.float32)
b_ = numpy.random.rand(*shape_b).astype(numpy.float32)
with tf.Session() as session:
session_output = session.run(c, feed_dict={a: a_, b: b_})
export_tflite_from_session(session, [a, b], [c], tflite_filename)
tflite_output = test_tflite_model(tflite_filename, [a_, b_])
tflite_output = tflite_output[0]
#test the TPU
tflite_output = test_tpu(tflite_filename, [a_, b_])
print("Input example:")
print(a_)
print(a_.shape)
print(b_)
print(b_.shape)
print("Session output:")
print(session_output)
print(session_output.shape)
print("TFLite output:")
print(tflite_output)
print(tflite_output.shape)
print(numpy.allclose(session_output, tflite_output))
if __name__ == '__main__':
main()
You're only converting your model once, and your model is not fully compiled for the Edge TPU. From the docs:
At the first point in the model graph where an unsupported operation occurs, the compiler partitions the graph into two parts. The first part of the graph that contains only supported operations is compiled into a custom operation that executes on the Edge TPU, and everything else executes on the CPU
There are several specific requirements that the model must meet:
quantization-aware training
constant tensor sizes and model parameters at compile time
tensors are 3-dimensional or smaller.
models only use operations supported by the Edge TPU.
There is an online compiler as well as a CLI version that is useful for translating .tflite models into Edge TPU compatible .tflite models.
Your code is also incomplete. You've passed your model to the class here:
interpreter = BasicEngine(tflite_filename)
but you're missing the step of actually running the inference on the tensor:
output = RunInference(interpreter)

AttributeError: Can't pickle local object 'Op.make_py_thunk.<locals>.rval while running Pymc code

I am new to PyMC3. While learning PyMC3, I re-created a sample program from a blog (which presumably ran for the author), and run into the following error: AttributeError: Can't pickle local object 'Op.make_py_thunk..rval'. I am completely stuck and any guidance will be appreciated. Relevant code follows:
log_dose = np.array([-.86, -.3, -.05, .73])
log_dose_shared = shared(log_dose)
n = 5 * np.ones(4, dtype = int)
n_shared = shared(n)
deaths = np.array([0, 1, 3, 5])
with Model() as bioassay_model:
# Logit model parameters
alpha = Normal('alpha', 0, sd = 100)
beta = Normal('beta', 0, sd = 100)
# Calculate probabilities of death
theta = invlogit(alpha + beta * log_dose_shared)
# Data likelihood
obs_death = Binomial('obs_death', n = n_shared, p = theta, observed = deaths)
with bioassay_model:
# Obtain starting values via MAP
start = find_MAP(model = bioassay_model)
# Instantiate sampler
step = pm.Metropolis()
# Draw 2000 posterior samples
bioassay_trace = sample(50000, step = step, start = start)
logp = -13.034, ||grad|| = 0.00043389: 100%|██████████████████████████████████████████| 14/14 [00:00<00:00, 398.61it/s]
Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
Metropolis: [beta]
Metropolis: [alpha]
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
Traceback (most recent call last):
File "C:\Users\bikim\AppData\Local\conda\conda\envs\pymc3p36\lib\site-packages\joblib\externals\loky\backend\queues.py", line 151, in _feed
obj, reducers=reducers)
File "C:\Users\bikim\AppData\Local\conda\conda\envs\pymc3p36\lib\site-packages\joblib\externals\loky\backend\reduction.py", line 145, in dumps
p.dump(obj)
AttributeError: Can't pickle local object 'Op.make_py_thunk..rval'
For completeness, here is my environment:
I installed PyMC3 using conda
3.6.5 |Anaconda custom (64-bit)| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
Running on PyMC3 version 3.4.1
Theano version: 1.0.2
Windows 10

Resources