I want to load the weights of a pre-trained model on my local model. I don’t understand why state_dict = state_dict.copy() is necessary if the two networks have the same name state_dict.
# copy state_dict so _load_from_state_dict can modify it
metadata = getattr(state_dict, '_metadata', None)
state_dict = state_dict.copy()
if metadata is not None:
state_dict._metadata = metadata
def load(module, prefix=''):
local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {})
module._load_from_state_dict(
state_dict, prefix, local_metadata, True, missing_keys, unexpected_keys, error_msgs)
for name, child in module._modules.items():
if child is not None:
load(child, prefix + name + '.')
start_prefix = ''
# print("hasattr(model, 'bert')",hasattr(model, 'bert') ) :false
if not hasattr(model, 'bert') and any(s.startswith('bert.') for s in state_dict.keys()):
start_prefix = 'bert.'
load(model, prefix=start_prefix)
Note: the above code is from Hugging Face.
state_dict = state_dict.copy()
does exactly what you tell him to do: it copies in place the state_dict. State dict are all the parameters of your model, and copying it allows to make them independant. One should be careful whether you need a copy or a deepcopy though !
Related
I'm interested in how I'd go about combining multiple DataLoaders sequentially for training. I understand I can use ConcatDataset to combine datasets first, but this does not work for my use case. I have a custom collate_fn that is passed to each dataloader, and this function depends on an attribute of the underlying Dataset. So, I'll have a set of custom DataLoaders like the following:
def custom_collate(sample, ref):
data = clean_sample(torch.stack([x[0] for x in sample]), ref)
labels = torch.tensor([x[1] for x in sample])
return data, labels
class CollateLoader(torch.utils.data.DataLoader):
def __init__(self, ref, *args, **kwargs):
collate_fn = functools.partial(custom_collate, ref=ref)
super().__init__(collate_fn = collate_fn, *args, **kwargs)
Where ref is a property of the custom Dataset class and is passed on initialization of a CollateLoader. Also, I know transforms can be applied in the Dataset, but in my case it must be done batch-wise.
So, how would I go about combining multiple DataLoaders? In the PyTorch-Lightning LightningDataModule, we can do something like
def train_dataloader(self):
return [data_loader_1, data_loader_2]
But this will return a list of batches, not the batches sequentially.
I ran into the same problem and found a workaround. I overrided the epoch training loop using the Loops API from PytorchLightning, defining a class CustomLoop which inherits from pytorch_lightning.loops.TrainingEpochLoop, and overrided the advance() method. I copy pasted the source code from pytorch_lightning and replaced these lines with:
if not hasattr(self,'dataloader_idx'):
self.dataloader_idx=0
if not isinstance(data_fetcher, DataLoaderIterDataFetcher):
batch_idx = self.batch_idx + 1
batch = next(data_fetcher.dataloader.loaders[self.dataloader_idx])
self.dataloader_idx+=1
if self.dataloader_idx == len(data_fetcher.dataloader.loaders):
self.dataloader_idx = 0
else:
batch_idx, batch = next(data_fetcher)
That way, instead of iterating over the CombinedLoader, i make it iterate over one dataloader at a time.
Then, to make use of this custom loop you have to replace the default loop in the Trainer:
trainer.fit_loop.replace(epoch_loop=CustomLoop)
trainer.fit(my_model)
You can return [train_dataloader, train_2_dataloader] and then you take two batches, each dataloader, so, you can apply a for and sum losses
this is my main code,but I don't know how to fix the problem?
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = torch.load('./checkpoints/fcn_model_5.pth') # 加载模型
model = model.to(device)
You are loading the checkpoint as a state dict, it is not a nn.module object.
checkpoint = './checkpoints/fcn_model_5.pth'
model = your_model() # a torch.nn.Module object
model.load_state_dict(torch.load(checkpoint ))
model = model.to(device)
The source of your problem is simply you are loading your model as a dict, instead of nn.Module. Here is an another approach you can employ without converting to nn.Module bloat adopted from here:
for k, v in model.items():
model[k] = v.to(device)
Now, you have an ordered dict with the items at correct place.
Please note that you will still have an ordered dict instead of nn.Module. You will not be able to forward pass anything from an ordered dict.
I had a problem with saving weights of TFBertModel wrapped in Keras. the problem is described here in GitHub issue and here in Stack Overflow.The solution proposed in both cases is to use
config = BertConfig.from_pretrained(transformer_model_name)
bert = TFBertMainLayer(config=config,trainable=False)
instead of
bert = TFBertModel.from_pretrained(transformer_model_name, trainable=False)
The problem is that when I change my model to the former code, the accuracy decreases by 10 percent.While the parameters count in both cases are the same. I wonder what is the reason and how can be prevented?
It seems like the performance regression in the code snippet that instantiates MainLayer directly occurs because the pre-trained weights are not being loaded. You can load the weights by either:
Calling TFBertModel.from_pretrained and grabbing the MainLayer from the loaded TFBertModel
Creating the MainLayer directly, then loading the weights in a similar way to from_pretrained
Why This Happens
When you call TFBertModel.from_pretrained, it uses the function TFPreTrainedModel.from_pretrained (via inheritance) which handles a few things, including downloading, caching, and loading the model weights.
class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin, TFGenerationMixin):
...
#classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
...
# Load model
if pretrained_model_name_or_path is not None:
if os.path.isfile(os.path.join(pretrained_model_name_or_path, TF2_WEIGHTS_NAME)):
# Load from a TF 2.0 checkpoint
archive_file = os.path.join(pretrained_model_name_or_path, TF2_WEIGHTS_NAME)
...
resolved_archive_file = cached_path(
archive_file,
cache_dir=cache_dir,
force_download=force_download,
proxies=proxies,
resume_download=resume_download,
local_files_only=local_files_only,
)
...
model.load_weights(resolved_archive_file, by_name=True)
(If you read the actual code, a lot has been ...'ed out above).
However, when you instantiate TFBertMainLayer directly, it doesn't do any of this set up work.
#keras_serializable
class TFBertMainLayer(tf.keras.layers.Layer):
config_class = BertConfig
def __init__(self, config, **kwargs):
super().__init__(**kwargs)
self.num_hidden_layers = config.num_hidden_layers
self.initializer_range = config.initializer_range
self.output_attentions = config.output_attentions
self.output_hidden_states = config.output_hidden_states
self.return_dict = config.use_return_dict
self.embeddings = TFBertEmbeddings(config, name="embeddings")
self.encoder = TFBertEncoder(config, name="encoder")
self.pooler = TFBertPooler(config, name="pooler")
... rest of the class
Essentially, you need to make sure these weights are being loaded.
Solutions
(1) Using TFAutoModel.from_pretrained
You can rely on transformers.TFAutoModel.from_pretrained to load the model, then just grab the MainLayer field from the specific subclass of TFPreTrainedModel. For example, if you wanted to access a distilbert main layer, it would look like:
model = transformers.TFAutoModel.from_pretrained(`distilbert-base-uncased`)
assert isinstance(model, TFDistilBertModel)
main_layer = transformer_model.distilbert
You can see in modeling_tf_distilbert.html
that the MainLayer is a field of the model.
This is less code and less duplication, but has a few disadvantages. It's less easy to change the pre-trained model you're going to use, because now you're depending on the fieldname, if you change the model type, you'll have to change the field name (for example in TFAlbertModel the MainLayer field is called albert). In addition, this doesn't seem to be the intended way to use huggingface, so this could change under your nose, and your code could break with huggingface updates.
class TFDistilBertModel(TFDistilBertPreTrainedModel):
def __init__(self, config, *inputs, **kwargs):
super().__init__(config, *inputs, **kwargs)
self.distilbert = TFDistilBertMainLayer(config, name="distilbert") # Embeddings
[DOCS] #add_start_docstrings_to_callable(DISTILBERT_INPUTS_DOCSTRING)
#add_code_sample_docstrings(
tokenizer_class=_TOKENIZER_FOR_DOC,
checkpoint="distilbert-base-uncased",
output_type=TFBaseModelOutput,
config_class=_CONFIG_FOR_DOC,
)
def call(self, inputs, **kwargs):
outputs = self.distilbert(inputs, **kwargs)
return outputs
(2) Re-implementing the weight loading logic from from_pretrained
You can do this by essentially copy/pasting the parts of from_pretrained that are relevant to loading weights. This also has some serious disadvantages, you'll be duplicating logic that that can fall out of sync with the huggingface libraries. Though you could likely write it in a way that is more flexible and robust to underlying model name changes.
Conclusion
Ideally this is something that will get fixed internally by the huggingface team, either by providing a standard function to create a MainLayer, wrapping the weight loading logic into its own function that can be called, or by supporting serialization on the model class.
state_dict(destination=None, prefix='', keep_vars=False)
what does changing keep_vars to True do?
In PyTorch >=0.4, it has no use.
keep_vars was added in the commit: Add keep_vars parameter to state_dict stating that
When keep_vars is true, it returns a Variable for each parameter
(rather than a Tensor).
In state_dict function, _save_to_state_dict is called internally, which contains the following code
for name, param in self._parameters.items():
if param is not None:
destination[prefix + name] = param if keep_vars else param.data
for name, buf in self._buffers.items():
if buf is not None:
destination[prefix + name] = buf if keep_vars else buf.data
The portion param if keep_vars else param.data made difference prior to PyTorch 0.4.0 when Variable and Tensor were separate, but now as they are merged, keep_vars is probably present only for backward compatibility. Check Is .data still useful in pytorch?
I am reading an XML to transfer its attributes to other XML file with the same source. However something is wrong in the loop as I am overwriting the first element of the dictionary with the newest one.
The XMl looks like this:
<sxml locale="en-US" version="1.0" segtype="sentence" options="eol=unix;" createdby="AP 2.24.1" datatype="regexp" targetlocale="DE">
<family blockId="1" mat="33" type="freeOS" seccion="2" datatype="html" subtype="BSD"><section sectionId="1">
<product>FreeBSD</product>
</section></family>
<family blockId="2" mat="32" type="privative" seccion="3" datatype="html" subtype="commercial"><section sectionId="1">
<product>Windows</product><ws> </ws>
</section><section sectionId="2">
<product>Sistema operativo.</product>
</section></family>
</sxml>
And I want to get the attributes: "mat", "seccion", "type" and "subtype".
My code is:
from lxml import etree as et
from pathlib import Path
def add_attributes(files_path_str, proc_path_str, attributes):
"""
Adds the attributes to the frequent files.
"""
product_path = Path(files_path_str)
proc_files = Path(unk_path_str).glob('*.sxml')
dict_notes_src = dict()
list_src_sxml_files = full_sxml_path.glob('**/*.sxml')
for sxml_file in list_full_sxml_files:
xml_parser = et.parse(str(sxml_file))
root_xml = xml_parser.getroot()
print(sxml_file)
dict_notes_src_temp = __generate_notes_product_dict(root_xml, attributes)
dict_notes_src = {**dict_notes_src, **dict_notes_src_temp}
#It is the part where I copy the attributes to the processed files. The bug is not found in this part.
#The bug is somewhere in the generation of the dictionary
#for proc_file in list_proc_sxml_files:
# xml_parser = et.parse(str(unk_file))
# root_unk_xml = xml_parser.getroot()
# tree = __add_notes_to_unk(root_unk_xml, dict_notes_src)
# tree.write(str(unk_file), encoding='utf-8', pretty_print=True)
def __generate_notes_product_dict(root_xml, attributes):
"""
Internal method to process the xml file to get a dictionary product-note.
"""
translatable_elements = root_xml.xpath('family')
notes_product = dict()
dict_values = dict()
for element in translatable_elements:
product_elements = element.xpath('./section/product')
list_attrib_values = []
print(element.tag, element.attrib)
#satt_note = element.attrib['satt_note']
# List comprehension fails if there is a segment without an expected attribute.
#list_attrib_values = [element.attrib[attribute] for attribute in attributes]
# Checks if there are attributes that does not exist in the Full WordFast file.
# If that is the case ignores them and adds a None value.
for attribute in attributes:
try:
list_attrib_values.append(element.attrib[attribute])
print('Reading the attributes. {} : {}'.format(attribute, element.attrib[attribute]))
logging.debug('__generate_notes_product_dict: Add values of the attributes {}: {}'.format(
attribute, element.attrib[attribute]))
except KeyError:
list_attrib_values.append(None)
if len(product_elements) > 0:
for product_element in product_elements:
#product_element = element.xpath('./segment/product')[0]
product_str = str(et.tostring(product_element), 'utf-8')
# Create the string of the content of the product element.
product_str = ' '.join(product_str.split())
if list_attrib_values is not None:
if product_str not in notes_product:
# Generate a dictionary with the product text as key.
#notes_product[product_str] = satt_note
print(product_str)
for attribute in attributes:
try:
print(element.tag, element.attrib)
dict_values[attribute] = element.attrib[attribute]
except KeyError:
dict_values[attribute] = None
#for attribute, value in zip(attributes, list_attrib_values):
# if value is not None:
# print ('Adding the values {}: {}'.format(attribute, value))
# dict_values[attribute] = value
attrib_product[product_str] = dict_values
return attrib_product
add_attributes(folder_where_is_stored_xml, folder_where_save_xml,["mat", "seccion", "type", "subtype"]
It return a dictionary that all the products have the attributes of the last family.
I've been debugging the code and it looks like the when I running attrib_product[product_str] = dict_values is looping through all the values of the dict_values and stores only the last one.
Any ideas where am I doing wrong? I am not able to see why it is happening.