"strip" onnx graph from its constants (initializers) - onnx

I have an onnx graph/model that has big constants in it, so it is taking a lot of time to load it and parse it. Can I "strip" the data from the graph, so I inspect the graph nodes without its data ?

Initializer is one of the field in GraphProto. You should be able to clear initializer field with simple python script. I haven't tested the following code but should be something like this:
import onnx
def clear_initializer(model_path):
model = onnx.load_model(model_path)
model.graph.ClearField('initializer')
onnx.save_model(model)
references:
https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.message.Message-class
https://github.com/onnx/onnx/blob/2e7099ee7c37b196c197c9a084a97698a41da232/onnx/init.py

Related

Pre-trained FastText hyperparameters

I'm using the pre-trained model:
import fasttext.util
fasttext.util.download_model('en', if_exists='ignore') # English
ft = fasttext.load_model('cc.en.300.bin')
Where can I find an exhaustive list of the values of the hyperparameters used to train the model?
https://fasttext.cc/docs/en/options.html list the default values, that differ from the used one: for example, the dimension of the word vectors is 300 and not 100 (citing https://fasttext.cc/docs/en/crawl-vectors.html that doesn't list them all).
From looking at the _FastText Python model class in Facebook's source...
https://github.com/facebookresearch/fastText/blob/a20c0d27cd0ee88a25ea0433b7f03038cd728459/python/fasttext_module/fasttext/FastText.py#L99
...it looks like, at least when creating a model, all the hyperparameters are added as attributes on the object.
Have you checked if that's the case on your loaded model? For example, does ft.dim report 300, and other parameters like ft.minCount report anything interesting?
Update: As that didn't seem to work, it also looks like the _FastText model wraps an internal instance of a native (not-in-Python) FastText model in its .f attribute. (See a few lines up from the source code I pointed to earlier.)
And that native-instance is set up by the module specified by fasttext_pybind.cc. That code looks like it specified a bunch of read-write class variable, associated with the metaparameters - see for example starting at:
https://github.com/facebookresearch/fastText/blob/a20c0d27cd0ee88a25ea0433b7f03038cd728459/python/fasttext_module/fasttext/pybind/fasttext_pybind.cc#L88
So: does ft.f.minCount or ft.f.dim return anything useful from a post-loaded model ft?
Citing NVS Abhilash from https://github.com/facebookresearch/fastText/issues/887#issuecomment-649018188 the right code to write is:
args_obj = ft.f.getArgs()
for hparam in dir(args_obj):
if not hparam.startswith('__'):
print(f"{hparam} -> {getattr(args_obj, hparam)}")
This will print all the hyperparameters of the trained model!

Using paraview filters in Python, Paraview python api

I have been using Paraview to visualize and analyse VTU files. I find the calculate gradient filter quite useful. I would like to know if there is a python API for Paraview which I can use to use this filter.
I'm looking for something like this.
import paraview as pv
MyFile = "Myfile0001.vtu"
Divergence = pv.filters.GradientOfUnstructuredDataset.(Myfile)
ParaView is fully scriptable in python. Each part of this doc has a 'do it in python' version.
Whereas API doc does not necessary exist, you can use the Python Trace (in Tool menu), that records action from the GUI and save it as a python script.
EDIT
To get back data as an array, it needs some additional steps as ParaView works on a client/server mode. You should Fetch the data and then you can manipulate the vtkObject, extract the array and convert it to numpy.
Something like
from paraview.simple import *
from vtk.numpy_interface import dataset_adapter as dsa
gridvtu = XMLUnstructuredGridReader(registrationName='grid', FileName=['grid.vtu'])
gradient = GradientOfUnstructuredDataSet(registrationName='Gradient', Input=gridvtu)
vtk_grid = servermanager.Fetch(gradient)
wraped_grid = dsa.WrapObject(vtk_grid)
divergence_array = wraped_grid.PointData["Divergence"]
Note that divergence_array is a numpy.ndarray
You also can write pure vtk code, as in this example on SO

How should i process the data in a json/dataframe format so that is suitable for rasa chatbots

I'm new with NLP and the rasa api. I'm trying to prepare the data so that it can be used as training data for intent recognition. The function that I'm trying to use is:
from rasa_nlu.training_data import load_data #Import function
train_data_rasa=load_data('/content/data_file.json') #Json file
However the next error pop ups:
AttributeError: 'str' object has no attribute 'get'
The json file is the result of using pandas.to_json() function. The original dataset, is the ATIS flight intent dataframe in which there are two columns: The text and the intent.
Here is a preview of the json file:
{"Intent":{"0":"atis_flight","1":"atis_flight_time","2":"atis_airfare","3":"atis_airfare","4":"atis_flight","5":"atis_aircraft","6" ........
I don't really know what is going on as the dataset seems to be clean. I have also tried multiple alternatives such as markdown (md) type of file but it does not seem to work.
Thank you in advance !!
I would suggest to try the rasa data convert command (that converts your training data from json to yml format) and then try to train your data (with command rasa train from the cli) to see if you get the same error. Also, the Training Data format page in the docs might be a useful resource for you since it explains the types of training data and their expected structure. Another idea would be to post your question also on the Rasa forum where there might be more people that have encountered the same error like here. That way you might get more ideas on how to solve your issue or more people will jump in and help.

CNTK: "inferred dimension cannot be calculated from input and new shape size."

I've set up a model for CIFAR-10 using Pytorch, and saved it as an ONNX file.
But it looks like I can't load it from CNTK.
I've already loaded another ONNX file from the same source code (by mistake), so the dependencies look OK. The problem occurs when I call Function.Load()
var deviceDescriptor = DeviceDescriptor.CPUDevice; ;
var function = Function.Load(ONNX_PATH, deviceDescriptor, ModelFormat.ONNX);
I get this exception (Unhandled exception):
System.ApplicationException : 'Reshape: inferred dimension cannot be calculated from input and new shape size.
[CALL STACK]
- CNTK::TrainingParameterSchedule:: GetMinibatchSize
- CNTK:: XavierInitializer (x6)
- CNTK::Function::Load
- CSharp_CNTK_Function__Load__SWIG_0
- 00007FFB0C41C307 (SymFromAddr() error: Le module spécifié est introuvable.)
It looks like this model can't be loaded in CNTK. CNTK has good support for exporting (saving) to ONNX, importing (loading) can be problematic for some operations.
CNTK development is frozen, what's your motivation to use it?
The recommended way now is to use ONNX Runtime https://github.com/microsoft/onnxruntime for inference, it has first-class support for ONNX.

"numpy.ndarray' object has no attribute 'get_support" error message after running SelectKBest in Scikit Learn

I met a question related to this old one: The easiest way for getting feature names after running SelectKBest in Scikit Learn
When trying to use "get_support()" to get the selected features, I got the error message:
numpy.ndarray' object has no attribute 'get_support
I would greatly appreciate your kind help!
Jeff
Without doing fitting you cannot get support. You need to do the fitting so that the selector can analyze the data, and then call get_support() on the selector, not the output of fit_transform()
Currently you are doing something like:
selector = SelectKBest()
#fit_transform returns the data after selecting the best features
new_data = selector.fit_transform(old_data, labels)
#so you are trying to access get_support() on new data, which is not possible
new_data.get_support()
After you call fit() or fit_transform(), do this:
# get_support is a method of SelectKBest class
selector.get_support()
I think I found out the reason why I got the errors. I used "get_support()" on the results after fit() or fit_transform(), which led to the error message.
I should have used the "get_support()" on the selector itself (but still need to use selector to do fit() or fit_transform() first).
Thanks!
Jeff

Resources