How to train Spacy3 project with FP16 mixed precision - nlp

The goal is to run python -m spacy train with FP16 mixed precision to enable the use of large transformers (roberta-large, albert-large, etc.) in limited VRAM (RTX 2080ti 11 GB).
The new Spacy3 project.yml approach to training directly uses Huggingface-transformers models loaded via Spacy-transformers v1.0. Huggingface models can be run with mixed precision just by adding the --fp16 flag (as described here).
The spacy config was generated using python -m spacy init config --lang en --pipeline ner --optimize efficiency --gpu -F default.cfg, and checked to be complete by python -m spacy init fill-config default.cfg config.cfg --diff. Yet no FP16 / mixed-precision is to be found.
To reproduce
Use the spaCy Project: Named Entity Recognition (WikiNER) with changed init-config in project.yml to use a GPU and a transformer (roberta-base by default):
commands:
-
name: init-config
help: "Generate a transformer English NER config"
script:
- "python -m spacy init config --lang en --pipeline ner --gpu -F --optimize efficiency -C configs/${vars.config}.cfg"
What was tested
Added --fp16 to python -m spacy project run
Added --fp16 to python -m spacy train
Added fp16 = true to default.cfg in various sections ([components.transformer], [components.transformer.model], [training], [initialize])
The logic was transformers are run in FP16 as such:
from transformers import TrainingArguments
TrainingArguments(..., fp16=True, ...)
SW stack specifics
- spacy 3.0.3
- spacy-transformers 1.0.1
- transformers 4.2.2
- torch 1.6.0+cu101

Related

Spacy ValueError: [E002] Can't find factory for 'relation_extractor' for language English (en)

I want to train a "relation extractor" component as in this tutorial. I have 3 .spacy files (train.spacy, dev.spacy, test.spacy.
I run:
python3 -m spacy init fill-config config.cfg config.cfg
followed by
python3 -m spacy train --output ./model config.cfg --paths.train train.spacy --paths.dev dev.spacy
Output:
ValueError: [E002] Can't find factory for 'relation_extractor' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `#Language.component` (for function components) or `#Language.factory` (for class components).
Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, lemmatizer, trainable_lemmatizer, entity_linker, ner, beam_ner, entity_ruler, tagger, morphologizer, senter, sentencizer, textcat, spancat, future_entity_ruler, span_ruler, textcat_multilabel, en.lemmatizer
I have tried the two config files here but the output is the same.
To enable Transformers I have installed spacy-transformers downloaded en_core_web_trf via
python3 -m spacy download en_core_web_trf
A similar issue was mentioned on GitHub but that solution is for an other context. Similarly, on GitHub somebody raised the same issue with no solution. Here too was not solved.

How to convert the tensor model into an onnx file

I have a tensorrt engine file, a builder in jetson nx2. But my onnx file is missing. How to convert the tensor model to onnx file?
e.g: a.engine file -> a.onnx
Please give me a suggestion thanks
All I retrieved from the search engine are from onnx to tensor model.
You will need Python 3.7-3.10,
Install tf2onnx using pip
pip install -U tf2onnx
use to following command. You will need to provide the following:
the path to your TensorFlow model (where the model is in saved model format)
a name for the ONNX output file
python -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx
The above command uses a default of 13 for the ONNX opset. If you need a newer opset, or want to limit your model to use an older opset then you can provide the --opset argument to the command.
python -m tf2onnx.convert --saved-model tensorflow-model-path --opset 17 --output model.onnx
For checkpoint format:
python -m tf2onnx.convert --checkpoint tensorflow-model-meta-file-path --output model.onnx --inputs input0:0,input1:0 --outputs output0:0
Follow the official tf2onxx repository to learn more: https://github.com/onnx/tensorflow-onnx

Uable to train model using gpu on M1 Mac

How to reproduce the behaviour
install space for apple , select model training option and follow the on screen instructions.
generate config files for model training.
Declare your training and testing corpus in base_config file then auto fill the to generate the final config file.
python -m spacy init fill-config base_config.cfg config.cfg
python -m spacy train config/config.cfg -g 0 --output trf_model-2
Your Environment
Operating System: MacOS 12.4
Python Version Used: 3.10
spaCy Version Used: 3.4.0
Environment Information: -
spaCy version: 3.4.0
Platform: macOS-12.4-arm64-arm-64bit
Python version: 3.10.5
Pipelines: en_core_web_trf (3.4.0)
While I'm trying to train the model using gpu on m1 Mac got this error
RuntimeError: invalid gradient at index 0 - expected device cpu but got mps:0
The motel GPU support for Apple silicon M1 Mac processor is not there in spacy yet,
We have to wait till the spacy team provides proper support for apple silicon based hardware.
As you noted, there's no GPU support for training on Apple Silicon at this time. However, spaCy can leverage some accelerations on Apple Silicon if the thinc-apple-ops module is installed. You can specify it as an option when installing spaCy.
pip install spacy[apple]

After installing Tensorflow 2.0 in a python 3.7.1 env, do I need to install Keras, or does Keras come bundled with TF2.0?

I need to use Tensorflow 2.0(TF2.0) and Keras but I don't know if it's necessary to install both seperately or just TF2.0 (assuming TF2.0 has Keras bundled inside it). If I need to install TF2.0 only, will installing in a Python 3.7.1 be acceptable?
This is for Ubuntu 16.04 64 bit.
In Tensorflow 2.0 there is strong integration between TensorFlow and the Keras API specification (TF ships its own Keras implementation, that respects the Keras standard), therefore you don't have to install Keras separately since Keras already comes with TF in the tf.keras package.

Unable to load 'en' from spacy in jupyter notebook

I run the following lines of code in a jupyter notebook:
import spacy
nlp = spacy.load('en')
And get following error:
Warning: no model found for 'en_default'
Only loading the 'en' tokenizer.
I am using python 3.5.3, spacy 1.9.0, and jupyter notebook 5.0.0.
I downloaded spacy using conda install spacy and python3 spacy install en.
I am able to import spacy and load 'en' from my terminal but not from a jupyter notebook.
Based on the answer in your comments, it seems fairly clear that the two Python interpreters for Jupyter and your system Python are not the same, and therefore likely do not have shared libraries between them.
I would recommend re-running the installation or just specifically installation the en tool in the correct Spacy. Replace the path with the full path to the file, if the above is not the full path.
//anaconda/envs/capstone/bin/python -m spacy download
That should be enough. Let me know if there are any issues.
You can also download en language model in the jupyter notebook:
import sys
!{sys.executable} -m spacy download en

Resources