I have created a custom NER using spacy and i want to train it with additional data but what to change in config.cfg file? - nlp

I have created a spacy NER model for named entity recognition and its having tok2vec and ner as components in the pipeline. Now i want to add some more data to it, so i am using a model-best directory from where I can load my trained model for predictions. If i follow the documentation without changing anything from config.cfg file then the newly created model-best have no information about it's previous trained data.
! python -m spacy convert one.json ./ -t spacy
! python -m spacy init fill-config base_config.cfg config.cfg
! python -m spacy train config.cfg --output ./ --paths.train ./one.spacy --paths.dev ./one.spacy
After running them two folders got created (model-best and model-last)
now to train it with new data i tried like this:
import spacy
from spacy.tokens import DocBin
from tqdm import tqdm
import json
nlp=spacy.load('model-best')
f = open('two.json')
TRAIN_DATA = json.load(f)
db = DocBin()
for text, annot in tqdm(TRAIN_DATA['annotations']):
doc = nlp.make_doc(text)
ents = []
for start, end, label in annot["entities"]:
span = doc.char_span(start, end, label=label, alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
doc.ents = ents
db.add(doc)
db.to_disk("./training_data.spacy")
! python -m spacy init fill-config base_config.cfg config.cfg
! python -m spacy train config.cfg --output ./ --paths.train ./training_data.spacy --paths.dev ./training_data.spacy
After running them, it replaced my model-best folder with new one and it can only recognnise the new data now
what changes should i make in my config.cfg inorder to train it properly so that it can remember both old data and new data?

To train over an existing model, you can define the source component in your base_config.cfg
[components.ner]
source = "<path_to_model-best>"
component = "ner"
This information is available on the spacy documentation website here:
https://spacy.io/usage/processing-pipelines#sourced-components

Related

Convert model Pytorch->ONNX->NCNN

Trying to run this example on your custom trained Yolov8 model.
Field of study I have a model best.pt
ncnn exporting Insturction
I do the export to ONNX format
pip install ultralytics yolo mode=export model={HOME}/best.pt format=onnx
simplify onnx model
pip install onnxsim pip install onnxruntime python -m onnxsim {HOME}/best.onnx {HOME}/best-sim.onnx
Error: [1] 67272 segmentation fault python -m onnxsim best.onnx best-sim.onnx
But I found a solution
onnx to ncnn
onnx2ncnn best-sim.onnx best.param best.bin
I changing a new model path and CLASS_NAME in the example (file yoloV8.cpp)
yolo.load_param("best.param"); yolo.load_model("best.bin");
const char *class_names[] = {"bus", "car", "truck"};
I changing layer name (file yoloV8.cpp). I looked at the structural model https://netron.app/
ex.extract("output0", out); ... std::vector<int> strides = {32};
Run Test image
I get house, but in the process of training the model, it is working.

Spacy ValueError: [E002] Can't find factory for 'relation_extractor' for language English (en)

I want to train a "relation extractor" component as in this tutorial. I have 3 .spacy files (train.spacy, dev.spacy, test.spacy.
I run:
python3 -m spacy init fill-config config.cfg config.cfg
followed by
python3 -m spacy train --output ./model config.cfg --paths.train train.spacy --paths.dev dev.spacy
Output:
ValueError: [E002] Can't find factory for 'relation_extractor' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `#Language.component` (for function components) or `#Language.factory` (for class components).
Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, lemmatizer, trainable_lemmatizer, entity_linker, ner, beam_ner, entity_ruler, tagger, morphologizer, senter, sentencizer, textcat, spancat, future_entity_ruler, span_ruler, textcat_multilabel, en.lemmatizer
I have tried the two config files here but the output is the same.
To enable Transformers I have installed spacy-transformers downloaded en_core_web_trf via
python3 -m spacy download en_core_web_trf
A similar issue was mentioned on GitHub but that solution is for an other context. Similarly, on GitHub somebody raised the same issue with no solution. Here too was not solved.

How to convert the tensor model into an onnx file

I have a tensorrt engine file, a builder in jetson nx2. But my onnx file is missing. How to convert the tensor model to onnx file?
e.g: a.engine file -> a.onnx
Please give me a suggestion thanks
All I retrieved from the search engine are from onnx to tensor model.
You will need Python 3.7-3.10,
Install tf2onnx using pip
pip install -U tf2onnx
use to following command. You will need to provide the following:
the path to your TensorFlow model (where the model is in saved model format)
a name for the ONNX output file
python -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx
The above command uses a default of 13 for the ONNX opset. If you need a newer opset, or want to limit your model to use an older opset then you can provide the --opset argument to the command.
python -m tf2onnx.convert --saved-model tensorflow-model-path --opset 17 --output model.onnx
For checkpoint format:
python -m tf2onnx.convert --checkpoint tensorflow-model-meta-file-path --output model.onnx --inputs input0:0,input1:0 --outputs output0:0
Follow the official tf2onxx repository to learn more: https://github.com/onnx/tensorflow-onnx

how to make yolov7 training with 1 class from cocodataset

I am working on yolov7, train.py files.
I want to use cocodataset, but take 1 class for training: person. Coco have 80 class.
Can i control this from train.py?
Train py has ;
parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
option. But i have no idea how can i use this command.
Also, train log says;
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
But page gives nothing.
if you want to use all 80 classes you have to use detect.py. With train.py you can train your custom dataset to detect your custom object.
Tensorboard. First, you have to install with
$ pip install tensorboard
if you have trained with train.py, you can run tensorboard with this command:
$ tensorboard --logdir=runs/train
And you can see results in http://localhost:6006/

How to train Spacy3 project with FP16 mixed precision

The goal is to run python -m spacy train with FP16 mixed precision to enable the use of large transformers (roberta-large, albert-large, etc.) in limited VRAM (RTX 2080ti 11 GB).
The new Spacy3 project.yml approach to training directly uses Huggingface-transformers models loaded via Spacy-transformers v1.0. Huggingface models can be run with mixed precision just by adding the --fp16 flag (as described here).
The spacy config was generated using python -m spacy init config --lang en --pipeline ner --optimize efficiency --gpu -F default.cfg, and checked to be complete by python -m spacy init fill-config default.cfg config.cfg --diff. Yet no FP16 / mixed-precision is to be found.
To reproduce
Use the spaCy Project: Named Entity Recognition (WikiNER) with changed init-config in project.yml to use a GPU and a transformer (roberta-base by default):
commands:
-
name: init-config
help: "Generate a transformer English NER config"
script:
- "python -m spacy init config --lang en --pipeline ner --gpu -F --optimize efficiency -C configs/${vars.config}.cfg"
What was tested
Added --fp16 to python -m spacy project run
Added --fp16 to python -m spacy train
Added fp16 = true to default.cfg in various sections ([components.transformer], [components.transformer.model], [training], [initialize])
The logic was transformers are run in FP16 as such:
from transformers import TrainingArguments
TrainingArguments(..., fp16=True, ...)
SW stack specifics
- spacy 3.0.3
- spacy-transformers 1.0.1
- transformers 4.2.2
- torch 1.6.0+cu101

Resources