How do I evaluate documents on a trained model in Document Understanding - document

I am trying to extract data from pdf. I have trained a document understanding model (invoices) on my dataset and retrained it multiple times. Now I want to evaluate the model on some unseen documents. What is the procedure for that? This is what I think but I am not sure:
First import the evaluation docs into data manager as a separate batch and make sure to check the "make this an evaluation set" option.
Then label the docs. I use the predict option present in the data manager to make the labeling faster.
Export the evaluation data. This exported data will contain all the training and evaluation docs.
Run the evaluation pipeline on this dataset.
This is my understanding of the process band I may be wrong. Kindly let me know if I am in the right direction!
What I Tried: I tried the above procedure.
What happened: I think along with evaluating the evaluation docs it also evaluated the train docs as the evaluation metrics file contained all the train and evaluation docs.

Related

Spacy 3.0 Training custom NER --> Validation of this custom NER model

I trained a custom SpaCy Named entity recognition model to detect biased words in job description. Now that I trained 8 variantions (using different base model, training model, and pipeline setting), I want to evaluate which model is performing best.
But.. I can't find any documentation on the validation of these models.
There are some numbers of recall, f1-score and precision on the meta.json file, in the output folder, but that is no sufficient.
Anyone knows how to validate or can link me to the correct documentation? The documentation seem nowhere to be found.
NOTE: Talking about SpaCy V3.x
During training you should provide "evaluation data" that can be used for validation. This will be evaluated periodically during training and appropriate scores will be printed.
Note that there's a lot of different terminology in use, but in spaCy there's "training data" that you actually train on and "evaluation data" which is not training and just used for scoring during the training process. To evaluate on held-out test data you can use the cli evaluate command.
Take a look at this fashion brands example project to see how "eval" data is configured and used.

Tuned model with GroupKFold Cross-Validaion requires Group parameter when Predicting

I tuned a RandomForest with GroupKFold (to prevent data leakage because some rows came from the same group).
I get a best fit model, but when I go to make a prediction on the test data it says that it needs the group feature.
Does that make sense? Its odd that the group feature is coming up as one of the most important features as well.
I'm just wondering if there is something I could be doing wrong.
Thanks
A search on the scikit-learn Github repo does not reveal a single instance of the string "group feature" or "group_feature" or anything similar, so I will go ahead and assume you have in your data set a feature called "group" that the prediction model requires as input in order to produce an output.
Remember that a prediction model is basically a function that takes an input (the "predictor" variable) and returns an output (the "predicted" variable). If a variable called "group" was defined as input for your prediction model, then it makes sense that scikit-learn would request it.
Does the group appear as a column on the training set? If so, remove it and re-train. It looks like you are just using it to generate splits. If it isn't a part of the input data you need to predict, it shouldn't be in the training set.

Using my saved ML model to work on a raw and unprocessed dataset

I have created few models in ML and saved them for future use in predicting the outcomes. This time there is a common scenario but unseen for me.
I need to provide this model to someone else to test it out on their dataset.
I had removed few redundant columns from my training data, trained a regression model on it and saved it after validating it. However, when I give this model to someone to use it on their dataset, how do I tell them to drop few columns. I could have manually added the column list in a python file where saved model will be called from but that does not sound too neat.
What is the best way to do this in general. Kindly share some inputs.
One can simply use pickle library to save column list and other things along with the model. In the new session, one can simply use pickle to upload those things in the session again.

How to calculate evaluation metrics on training data in TensorFlow's Object Detection API?

I am using the object detector api for quite a while now so training models and use them for inference is all good. Unfortunately, when using TensorBoard to visualize metrics (such as mAP, AR, classification/localization loss) we only get to see those metrics on the validation set. I'd like to calculate the aforementioned metrics also during training so that we can compare train/validation metrics on Tensorboard.
edit: I've stumbled on this post which addresses the same concern how to check both training/eval performances in tensorflow object_detection
Anyone got a pointer on how to achieve this?
You can evaluate your model on the training data by adding the arguments --eval_training_data=True --sample_1_of_n_eval_on_train_examples=10 to the arguments of model_main.
By doing so, you instruct it to perform the evaluation on the training data, and you choose how much to dilute the training data sent to evaluation, since usually the amount of training data is very large.
The thing is that I don't think it's currently possible to evaluate both on training on validation data, but I don't think it's too bad, since usually evaluation on training data is only for sanity check, and not for actual continuous evaluation the model.

Aggregate training results to predits

When training the model the results depend on the sampling. In order to obtain something better you could repeat the training (in another randomly create training sample, using Ffolds, StratifiedKFold ... ), somehow aggregate the results and have this way a result that will be more robust that one create in a particular case alone. Question: is it already implemented in sklearn or similar?. Apologies is this is a straighforward question, I haven't see a simple solution.
I see that there is a function called cross_val_predict however my first impresion having a quick look to the source code is that it predecits as many times as trains and I would like to predicts only ones, so I can piclke the, somehow aggregate results, and predict later, instead of repeat the whole training thing again.
So far I think the best option are the ensemblers in sklearn.
I left here the solution I was using before. I am pretty sure could be improved (as mentioned before the Ensemblers in sklearn) are better. I have placed here https://github.com/rafaelvalero/aggreating_predictions_sklearn, where I have left a notebook with and example (using iris database), in case anyone can play around and see in details how could be done.
That solution will train models (in parallel, using joblib), pickle the trained model (a model from SKlearn), store the results (using joblib dump) and later would recover them to create predictions (in parallel, using joblib) that later are aggregated.

Resources