Exporting pytorch profiler data to csv - pytorch

I am currently running a PyTorch profiler on my code and I'm not sure how to approach exporting this data into a csv. The data table comes from profile.key_avergages().table() and profile.key_averages() has return type EventList.

Related

Train Spacy model with larger-than-RAM dataset

I asked this question to better understand some of the nuances between training Spacy models with DocBins serialized to disk, versus loading Example instances via custom data loading function. The goal was to train a Spacy NER model with more data that can fit into RAM (or at least some way to avoid loading the entire file into RAM). Though the custom data loader seemed like one specific way to accomplish this, I am writing this question to ask more generally:
How can one train a Spacy model without loading the entire training data set file during training?
Your only options are using a custom data loader or setting max_epochs = -1. See the docs.

Exporting Training Data in DialogFlow ES

I can see how I can view training data - particularly unanswered questions is what I need. However, I would like to export this data for someone else (without access to the system) to review. Is there a way I can export training data to any format?

Can I combine two ONNX graphs together, passing the output from one as input to another?

I have a model, exported from pytorch, I'll call main_model.onnx. It has an input node I'll call main_input that expects a list of integers. I can load this in onnxruntime and send a list of ints and it works great.
I made another ONNX model I'll call pre_model.onnx with input pre_input and output pre_output. This preprocesses some text so input is the text, and pre_output is a list of ints, exactly as main_model.onnx needs for input.
My goal here is, using the Python onnx.helper tools, create one uber-model that accepts text as input, and runs through my pre-model.onnx, possibly some connector node (Identity maybe?), and then through main_model.onnx all in one big combined.onnx model.
I have tried using pre_model.graph.node+Identity connector+main_model.graph.node as nodes in a new graph, but the parameters exported from pytorch are lost this way. Is there a way to keep all those parameters and everything around, and export this one even larger combined ONNX model?
This is possible to achieve albeit a bit tricky. You can explore the Python APIs offered by ONNX (https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md). This will allow you to load models to memory and you'll have to "compose" your combined model using the APIs exposed (combine both the GraphProto messages into one - this is easier said than done - you' ll have to ensure that you don't violate the onnx spec while doing this) and finally store the new Graphproto in a new ModelProto and you have your combined model. I would also run it through the onnx checker on completion to ensure the model is valid post its creation.
If you have static size inputs, sclblonnx package is an easy solution for merging Onnx models. However, it does not support dynamic size inputs.
For dynamic size inputs, one solution would be writing your own code using ONNX API as stated earlier.
Another solution would be converting the two ONNX models to a framework(Tensorflow or PyTorch) using tools like onnx-tensorflow or onnx2pytorch. Then pass the outputs of one network as inputs of the other network and export the whole network to Onnx format.

How to read numerical data from CSV in PyTorch?

I'm new to PyTorch; trying to implement a model I developed in TF and compare the results. The model is an Autoencoder model. The input data is a csv file including n samples each with m features (a n*m numerical matrix in a csv file). The targets (the labels) are in another csv file with the same format as the input file. I've been looking online but couldn't find a good documentation for reading non-image data from csv file with multiple labels. Any idea how can I read my data and iterate over it during training?
Thank you
Might you be looking for something like TabularDataset?
class
torchtext.data.TabularDataset(path, format, fields, skip_header=False, csv_reader_params={}, **kwargs)
Defines a Dataset of columns stored in CSV, TSV, or JSON format.
It will take a path to a CSV file and build a dataset from it. You also need to specify the names of the columns which will then become the data fields.
In general, all of implementations of torch.Dataset for specific types of data are located outside of pytorch in the torchvision, torchtext, and torchaudio libraries.

LIBSVM with large data samples

I am currently looking to use libsvm (or an alternate if it is suggested; opencv also looks like a viable option) in order to train an SVM. My training data sets are rather large; around 50 binary 128MB files. It appears to use libsvm I must convert the data to a proper format; however I was wondering if it is possible to do training on the raw binary data itself? Thanks in advance.
No, you cannot use your raw binary (image) data for training nor for testing.
In order to use libsvm you have to convert your binary data files into this format.
See this stackoverflow post for the details of the libsvm data-format.

Resources