Is it possible to build transactions on model and rollback if the dependent model failed validations? - codeigniter-4

I have two models A and B. Model B has a foreign key which is referred to A. Now, I was wondering if I could treat model A and model B as database transactions and rollback saving changes from Model A and B if any of the instances of Model B failed validations.
Something similar to DB queries as show here, except with models.
https://codeigniter.com/user_guide/database/transactions.html

Related

Issue with Relation Annotation (rel.manual) in Spacy's Prodigy tool

I am trying to build a relation extraction model via spacy's prodigy tool.
NOTE: ner.manual, ner.correct, rel.manual are all recipes provided by prodigy.
(ner.manual, ner.correct) The first step involved annotating and training a NER model that can predict entities (this step is done and model is obtained)
The next step involved annotating the relations between the entities. Now, this step could be done int wo different method.
i. Label the entities and relations all from scratch
ii. Use the trained NER model to predict the entities in the UI tool and make corrections to it if needed (similar to ner.correct) and label the relations between the entities
The issue I am now facing is, whenever I use the trained model in the recipe's loop (rel.manual), there is no entities predicted.
Could someone help me with this??
PS: There is no trailing whitespaces issue, i cross-verified it

combine multiple spacy textcat_multilabel models into a single textcat_multilabel model

Problem: I have millions of records that need to be transformed using a bunch of spacy textcat_multilabel models.
// sudo code
for model in models:
nlp = spacy.load(model)
for groups_of_records in records: // millions of records
new_data = nlp.pipe(groups_of_records) // data is getting processed bulk
// process data
bulk_create_records(new_data)
My current loop is as follows:
load a model
loop through records / transform data using model / save
As you can imagine, the more records i process, and the more models i include, the longer this entire process will take. The idea is to make a single model, and just process my data once, instead of (n * num_of_models)
Question: is there a way to combine multiple textcat_multilabel models created from the same spacy config, into a single textcat_multilabel model?
There is no basic feature to just combine models, but there are a couple of ways you can do this.
One is to source all your components into the same pipeline. This is very easy to do, see the double NER project for an example. The disadvantage is that this might not save you much processing time, since separately trained models will still have their own tok2vec layers.
You could combine your training data and train one big model. But if your models are actually separate that would almost certainly cause a reduction in accuracy.
If speed is the primary concern, you could train each of your textcats separately while freezing your tok2vec. That would result in decreased accuracy, though maybe not too bad, and it would allow you to then combine the textcat models in the same pipeline while removing a bunch of tok2vec processing. (This is probably the method I've listed with the best balance of implementation complexity, speed advantage, and accuracy sacrificed.)
One thing that I don't think has been tested is that you could try training separate textcat models at the same time with separate sets of labels by manually specifying the labels to each component in their configs. I am not completely sure that would work but you could try it.

How to determine relationship between two entities when there is more than one relation while creating distant supervision training data?

I got the concepts of distant supervision. As for my understanding, the creating training data process is like;
Extract named entities from sentences
Find two entities named "e1" and "e2" from each sentence.
Search these two entities in knowledge base (freebase etc.) to find relationship between them
I got confused at this step. What if there is more than 1 relation between these two entities (e1 and e2) ? If so which relation should I select?
It depends on the model you're training.
Are you learning a model for one kind of relationship and doing bootstrapping? Then only pay attention to that one relationship and drop the others from your DB.
Are you trying to learn a bunch of relationships? Then use the presence or absence of each as a feature in your model. This is how Universals Schemas work.
Here's an image of a feature matrix from the Universal Schema paper:

Pickle models according to their feature set

I have a REST service that applies a classifier to user data. I subset my feature space and train a model according to which subset of features are available in the user's data. I want to cache/pickle the models once they've been created. How can I store these models so that I can always recover the right one?
One model might be:
['feature1', 'feature2', 'feature10']
and another model might be defined by:
['feature1', 'feature9'].
I'm considering hashing this list, and then pickling the model with:
pickle(filename=hash(set(feature_list)), model=model)
Is this appropriate? Will I always get back the intended model? Would it be better to pickle a dictionary of type Dict[Set[features], model]?

How to merge new model after training with the old one?

I have a model "en-ner-organization.bin" which I downloaded from apache web-site. It's works fine, but I prefer to train it with my organizations database to increase recognition quality. But after I trained "en-ner-organization.bin" with my organization database - the size of model became less that it was. So it seems, it was overwritten with my data.
I see that there is no possibility to re-train existing model, but maybe there is a way to merge models?
If no - I guess I can add my train data into the .train file of original model, so generated model will consists of default data, plus my data from db. But I can't find such file in web.
So, the main question is: how to keep existing model data and add new data into model?
Thanks
As far as I know it's not possibile to merge different models, but it's possible to specify different files to the finder.
From the sinopsys:
$ bin/opennlp TokenNameFinder
Usage: opennlp TokenNameFinder model1 model2 ... modelN < sentences

Resources