pytorch collections.OrderedDict' object has no attribute 'to' - pytorch

this is my main code,but I don't know how to fix the problem?
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load('./checkpoints/fcn_model_5.pth') # 加载模型
model = model.to(device)

You are loading the checkpoint as a state dict, it is not a nn.module object.
checkpoint = './checkpoints/fcn_model_5.pth'
model = your_model() # a torch.nn.Module object
model.load_state_dict(torch.load(checkpoint ))
model = model.to(device)

The source of your problem is simply you are loading your model as a dict, instead of nn.Module. Here is an another approach you can employ without converting to nn.Module bloat adopted from here:
for k, v in model.items():
model[k] = v.to(device)
Now, you have an ordered dict with the items at correct place.
Please note that you will still have an ordered dict instead of nn.Module. You will not be able to forward pass anything from an ordered dict.

Related

model = model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1))) AttributeError: 'NoneType' object has no attribute 'add'

This error occur in Maxpooling stage while i train my CNN model
Error: Attribute Error: 'None Type' object has no attribute 'current'. Please help.
model = model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1)))
The question is missing some info, but I think I can see what's going on.
Assuming that model was at some point a tf.models.Sequential() I guess you did something like:
model = models.Sequential()
model = model.add(...)
model = model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1)))
However, that's not quite how model.add(..) works. Instead of returning a new model, it modifies the existing model.
Instead you should do something like:
model = models.Sequential() # create a first model
model.add(...) # add things to the existing model
model.add(MaxPooling2D(pool_size=(2,2),input_shape=(48,48,1)))

TFBertMainLayer gets less accuracy compared to TFBertModel

I had a problem with saving weights of TFBertModel wrapped in Keras. the problem is described here in GitHub issue and here in Stack Overflow.The solution proposed in both cases is to use
config = BertConfig.from_pretrained(transformer_model_name)
bert = TFBertMainLayer(config=config,trainable=False)
instead of
bert = TFBertModel.from_pretrained(transformer_model_name, trainable=False)
The problem is that when I change my model to the former code, the accuracy decreases by 10 percent.While the parameters count in both cases are the same. I wonder what is the reason and how can be prevented?
It seems like the performance regression in the code snippet that instantiates MainLayer directly occurs because the pre-trained weights are not being loaded. You can load the weights by either:
Calling TFBertModel.from_pretrained and grabbing the MainLayer from the loaded TFBertModel
Creating the MainLayer directly, then loading the weights in a similar way to from_pretrained
Why This Happens
When you call TFBertModel.from_pretrained, it uses the function TFPreTrainedModel.from_pretrained (via inheritance) which handles a few things, including downloading, caching, and loading the model weights.
class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin):
...
#classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
...
# Load model
if pretrained_model_name_or_path is not None:
if os.path.isfile(os.path.join(pretrained_model_name_or_path, TF2_WEIGHTS_NAME)):
# Load from a TF 2.0 checkpoint
archive_file = os.path.join(pretrained_model_name_or_path, TF2_WEIGHTS_NAME)
...
resolved_archive_file = cached_path(
archive_file,
cache_dir=cache_dir,
force_download=force_download,
proxies=proxies,
resume_download=resume_download,
local_files_only=local_files_only,
)
...
model.load_weights(resolved_archive_file, by_name=True)
(If you read the actual code, a lot has been ...'ed out above).
However, when you instantiate TFBertMainLayer directly, it doesn't do any of this set up work.
#keras_serializable
class TFBertMainLayer(tf.keras.layers.Layer):
config_class = BertConfig
def __init__(self, config, **kwargs):
super().__init__(**kwargs)
self.num_hidden_layers = config.num_hidden_layers
self.initializer_range = config.initializer_range
self.output_attentions = config.output_attentions
self.output_hidden_states = config.output_hidden_states
self.return_dict = config.use_return_dict
self.embeddings = TFBertEmbeddings(config, name="embeddings")
self.encoder = TFBertEncoder(config, name="encoder")
self.pooler = TFBertPooler(config, name="pooler")
... rest of the class
Essentially, you need to make sure these weights are being loaded.
Solutions
(1) Using TFAutoModel.from_pretrained
You can rely on transformers.TFAutoModel.from_pretrained to load the model, then just grab the MainLayer field from the specific subclass of TFPreTrainedModel. For example, if you wanted to access a distilbert main layer, it would look like:
model = transformers.TFAutoModel.from_pretrained(`distilbert-base-uncased`)
assert isinstance(model, TFDistilBertModel)
main_layer = transformer_model.distilbert
You can see in modeling_tf_distilbert.html
that the MainLayer is a field of the model.
This is less code and less duplication, but has a few disadvantages. It's less easy to change the pre-trained model you're going to use, because now you're depending on the fieldname, if you change the model type, you'll have to change the field name (for example in TFAlbertModel the MainLayer field is called albert). In addition, this doesn't seem to be the intended way to use huggingface, so this could change under your nose, and your code could break with huggingface updates.
class TFDistilBertModel(TFDistilBertPreTrainedModel):
def __init__(self, config, *inputs, **kwargs):
super().__init__(config, *inputs, **kwargs)
self.distilbert = TFDistilBertMainLayer(config, name="distilbert") # Embeddings
[DOCS] #add_start_docstrings_to_callable(DISTILBERT_INPUTS_DOCSTRING)
#add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="distilbert-base-uncased",
output_type=TFBaseModelOutput,
config_class=_CONFIG_FOR_DOC,
)
def call(self, inputs, **kwargs):
outputs = self.distilbert(inputs, **kwargs)
return outputs
(2) Re-implementing the weight loading logic from from_pretrained
You can do this by essentially copy/pasting the parts of from_pretrained that are relevant to loading weights. This also has some serious disadvantages, you'll be duplicating logic that that can fall out of sync with the huggingface libraries. Though you could likely write it in a way that is more flexible and robust to underlying model name changes.
Conclusion
Ideally this is something that will get fixed internally by the huggingface team, either by providing a standard function to create a MainLayer, wrapping the weight loading logic into its own function that can be called, or by supporting serialization on the model class.

Why do we need state_dict = state_dict.copy()

I want to load the weights of a pre-trained model on my local model. I don’t understand why state_dict = state_dict.copy() is necessary if the two networks have the same name state_dict.
# copy state_dict so _load_from_state_dict can modify it
metadata = getattr(state_dict, '_metadata', None)
state_dict = state_dict.copy()
if metadata is not None:
state_dict._metadata = metadata
def load(module, prefix=''):
local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {})
module._load_from_state_dict(
state_dict, prefix, local_metadata, True, missing_keys, unexpected_keys, error_msgs)
for name, child in module._modules.items():
if child is not None:
load(child, prefix + name + '.')
start_prefix = ''
# print("hasattr(model, 'bert')",hasattr(model, 'bert') ) :false
if not hasattr(model, 'bert') and any(s.startswith('bert.') for s in state_dict.keys()):
start_prefix = 'bert.'
load(model, prefix=start_prefix)
Note: the above code is from Hugging Face.
state_dict = state_dict.copy()
does exactly what you tell him to do: it copies in place the state_dict. State dict are all the parameters of your model, and copying it allows to make them independant. One should be careful whether you need a copy or a deepcopy though !

How do I know whether an instance is stored on GPU with PyTorch?

I'm learning PyTorch recently, and this question comes up.
For example, if I have a net inheriting the "torch.nn.Module".
class Net(torch.nn.Module):
def __init__(self, something):
super(net, self).__init__()
self.p1=something
def forward():
pass
net1=Net(123)
net1.cuda() ##Here I can't see what is changed.
Then how can I know whether net1 (and that something) is stored on GPU.
I've read how the *.cuda() works, seems like let all the "children" run the *.cuda(). I tried to see what the "children" are. It seems the net1 above has no children.
To check a simple tensor, you can check the is_cuda attribute. For example:
x = torch.zeros(100).cuda()
y = torch.zeros(100)
print(x.is_cuda) # True
print(y.is_cuda) # False
To check a model, a think the easiest way is using the parameters() method, which returns all trainable parameters of your model.
next(model.parameters()).is_cuda

tf.global_variables_initializer() does not work

Hello Tensorflow users/developers,
Even though I call initializer function, reporter tells me that none of my variable is initialized. I created them using tf.get_variable(). Here is where my session and graph objects are created:
with tf.Graph().as_default():
# Store all scores (each score is a loss-per-episode)
init = tf.global_variables_initializer()
all_scores, scores = [], []
# Build common tensors used throughout entire session
nn.build(seq_len)
# Generate inference and loss models
[loss, train_op] = nn.generate_models()
with tf.Session() as sess:
try:
st = time.time()
# Initialize all variables (Note that! not operation tensors; but variable tensors)
print('Initializing variables...')
sess.run(init)
print('Training starts...')
for e, (input_, target) in sample_generator:
feed_dict = nn.prepare_dict(input_, target)
# Run one step of the model. The return values are the activations
# from the `train_op` (which is discarded) and the `loss` Op.
x = sess.run(tf.report_uninitialized_variables(tf.global_variables()))
print(x)
_, score = sess.run([train_op, loss],
feed_dict=feed_dict)
all_scores.append(score)
scores.append(score)
# Asses your predictions against target
if e > 0 and not (e%100):
print('Episode %05d: %.6f' % (e, np.mean(scores).tolist()[0]))
scores.clear()
except KeyboardInterrupt:
print('Elapsed time: %ld' % (time.time()-st))
pass
I've called this method for millions of times before, and it had worked perfectly; but right now it is leaving me in the lurch. What do you think the cause might be? Any suggestion would really be appreciated.
P.S. I tried calling tf.local_variables_initializer() too; though reporter told me that you don't have any local at all.
Thanks in advance.
Thanks for the reply.
Well I've figured it out. I shouldn't have executed the following assignment instruction before I build my model:
init = tf.global_variables_initializer()
For anyone's information: You may think that "I'll execute and get the result of this operation called 'init' when I do so in a Session. So it doesn't matter where I do the assignment specified above".
No! It is not true. Tensorflow decides on which variables to be initialized right after this assignment instruction is executed. Thus, call it after you build your entire model.
If it does not exist I suspect you accidentally downgraded you Tensorflow version.
Can you try tf.initialize_all_variables ?
If this does not work, can you post what version you are using?
I got the same error. However this is my solution: just skip the init = tf.global_variables_initializer()
and just use :
sess = tf.Session
sess.run(init = tf.global_variables_initializer())

Resources