Has anyone done training on custom data using AllenNLP for coreference resolution? - nlp

I'm trying to train AllenNLP on custom data instead of using the pre-trained model for coreference resolution. The instructions are here but they are very vague and I am not sure how to progress, in particular I don't know how to modify the JSONNET file to indicate the path to my train, test and dev ConLL-2012 training files. Has anyone ever accomplished this before? Thank you very much.

You can specify the path to your data in these lines in the jsonnet config:
"train_data_path": std.extVar("COREF_TRAIN_DATA_PATH"),
"validation_data_path": std.extVar("COREF_DEV_DATA_PATH"),
"test_data_path": std.extVar("COREF_TEST_DATA_PATH"),
Either you can update the config to use your paths explicitly, or else set these environment variables before running the config with the allennlp train command.

Related

How can I use a model I trained to make predictions in the future without retraining whenever I want to use it

I recently finished training a linear regression algorithm but I don't know how to save it so that in the future, I can use it to make relevant predictions without having to retrain it whenever I want to use it.
Do I save the .py file and call it whenever I need it or create a class or what?
I just want to know how I can save a model I trained so I can use it in the future.
Depending on how you make the linear regression, you should be able to obtain the equation of the regression, as well as the values of the coefficients, most likely by inspecting the workspace.
If you explain what module, function, or code you use to do the regression, it will be easier to give a specific solution.
Furthermore, you can probably use the dill package:
https://pypi.org/project/dill/
I saw the solution here:
https://askdatascience.com/441/anyone-knows-workspace-jupyter-python-variables-functions
The steps proposed for using dill are:
Install dill. If you use conda, the code would be conda install -c anaconda dill
To save workspace using dill:
import dill
dill.dump_session('notebook_session.db')
To restore sesion:
import dill
dill.load_session('notebook_session.db')
I saw the same package discussed here: How to save all the variables in the current python session?
and I tested it using a model created with the interpretML package, and it worked for me.

How to get the inference compute graph of the pytorch model?

I want to hand write a framework to perform inference of a given neural network. The network is so complicated, so to make sure my implementation is correct, I need to know how exactly the inference process is done on device.
I tried to use torchviz to visualize the network, but what I got seems to be the back propagation compute graph, which is really hard to understand.
Then I tried to convert the pytorch model to ONNX format, following the instruction enter link description here, but when I tried to visualize it, it seems that the original layers of the model had been seperated into very small operators.
I just want to get the result like this
How can I get this? Thanks!
Have you tried saving the model with torch.save (https://pytorch.org/tutorials/beginner/saving_loading_models.html) and opening it with Netron? The last view you showed is a view of the Netron app.
You can try also the package torchview, which provides several features (useful especially for large models). For instance you can set the display depth (depth in nested hierarchy of moduls).
It is also based on forward prop
github repo
Disclaimer: I am the author of the package
Note: The accepted format for tool is pytorch model

Delete and Reinitialize pertained BERT weights / parameters

I tried to fine-tune BERT for a classification downstream task.
Now I loaded the model again and I run into the following warning:
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[Screen Shot][1]
[1]: https://i.stack.imgur.com/YJZVc.png
I already deleted and reinstalled transformers==4.6.0 but nothing helped.
I thought maybe through the parameter "force_download=True" it might get the original weights back but nothing helped.
Shall I continue and ignore the warning? Is there a way to delete the model checkpoints such when the model is downloaded the weights are fixed again?
Thanks in advance!
Best,
Alex
As long as you're fine-tuning a model for a downstream task this warning can be ignored. The idea is that the [CLS] token weights from the pretrained model aren't going to be useful for downstream tasks and need to be fine-tuned.
Huggingface randomly initializes them because you're using bert-base-cased which is a BertForPretraing model and you're created a BertModel from it. The warning is to ensure that you understand the difference of directly using the pretrained model directly or if you're planning on finetuning them for a different task.
On that note if you plan working on a classification task I'd recommend using their BertForSequenceClassification class instead.
TL;DR you can ignore it as long as you're finetuning.
Hi Thanks for your answer! I was not very specific in the description! I first fine_tuned Bert for a downstream task and afterwards in a different Notebook I just wanted the usual pertained BERT and work with its embeddings.
I have correlated things that were not related at all. I thought through fine-tuning the BERT parameters on the downstream-task I have changed the parameters for all my 'bert_base_uncased' models and that's why I get this warning. Even when I just wanted the usual embeddings from the standard pertained BERT.
I have kind of "solved" the problem or at least I found a solution:
One Conda environment for downstream task classification: conda install -c conda-forge transformers
One Conda environment for just getting the embeddings: conda install -c conda-forge/label/cf202003 transformers
Maybe this is a Apple/Mac specific thing I do not know why I run into this problem but nobody else ^^
Anyway thanks for your answer!
Best,
Alex

Using AllenNLP Interpret with a HuggingFace model

I would like to use AllenNLP Interpret (code + demo) with a PyTorch classification model trained with HuggingFace (electra base discriminator). Yet, it is not obvious to me, how I can convert my model, and use it in a local allen-nlp demo server.
How should I proceed ?
Thanks in advance
If your task is binary classification, you can look at the BoolQ example in https://github.com/allenai/allennlp-models/blob/main/training_config/classification/boolq_roberta.jsonnet. You can change that configuration to use a different model (such as Electra).
We also just put some new documentation out for the Interpret functionality: https://guide.allennlp.org/interpret
To give you a more specific answer, I'll need to know some more details, like what the task is you're trying to solve, how you trained the original model, etc.

Porting pre-trained keras models and run them on IPU

I am trying to port two pre-trained keras models into the IPU machine. I managed to load and run them using IPUstrategy.scope but I dont know if i am doing it the right way. I have my pre-trained models in .h5 file format.
I load them this way:
def first_model():
model = tf.keras.models.load_model("./model1.h5")
return model
After searching your ipu.keras.models.py file I couldn't find any load methods to load my pre-trained models, and this is why i used tf.keras.models.load_model().
Then i use this code to run:
cfg=ipu.utils.create_ipu_config()
cfg=ipu.utils.auto_select_ipus(cfg, 1)
ipu.utils.configure_ipu_system(cfg)
ipu.utils.move_variable_initialization_to_cpu()
strategy = ipu.ipu_strategy.IPUStrategy()
with strategy.scope():
model = first_model()
print('compile attempt\n')
model.compile("sgd", "categorical_crossentropy", metrics=["accuracy"])
print('compilation completed\n')
print('running attempt\n')
res = model.predict(input_img)[0]
print('run completed\n')
you can see the output here:link
So i have some difficulties to understand how and if the system is working properly.
Basically the model.compile wont compile my model but when i use model.predict then the system first compiles and then is running. Why is that happening? Is there another way to run pre-trained keras models on an IPU chip?
Another question I have is if its possible to load a pre-trained keras model inside an ipu.keras.model and then use model.fit/evaluate to further train and evaluate it and then save it for future use?
One last question I have is about the compilation part of the graph. Is there a way to avoid recompilation of the graph every time i use the model.predict() in a different strategy.scope()?
I use tensorflow2.1.2 wheel
Thank you for your time
To add some context, the Graphcore TensorFlow wheel includes a port of Keras for the IPU, available as tensorflow.python.ipu.keras. You can access the API documentation for IPU Keras at this link. This module contains IPU-specific optimised replacement for TensorFlow Keras classes Model and Sequential, plus more high-performance, multi-IPU classes e.g. PipelineModel and PipelineSequential.
As per your specific issue, you are right when you mention that there are no IPU-specific ways to load pre-trained Keras models at present. I would encourage you, as you appear to have access to IPUs, to reach out to Graphcore Support. When doing so, please attach your pre-trained Keras model model1.h5 and a self-contained reproducer of your code.
Switching topic to the recompilation question: using an executable cache prevents recompilation, you can set that up with environmental variable TF_POPLAR_FLAGS='--executable_cache_path=./cache'. I'd also recommend to take a look into the following resources:
this tutorial gathers several considerations around recompilation and how to avoid it when using TensorFlow2 on the IPU.
Graphcore TensorFlow documentation here explains how to use the pre-compile mode on the IPU.

Resources