Can't load HDF5 in python - nlp

I am following this tutorial: https://github.com/fastai/fastai/tree/master/courses/dl2/imdb_scripts
I downloaded the pre-trained model in part 3b.
I want to open the .h5 files and look/use the weights. I tried to use python to do this, but it is not opening.
Here’s the code I used:
import tables
import pandas as pd
filename = “…bwd_wt103.h5”
file = tables.open_file(filename)
Here’s the error:
OSError: HDF5 error back trace
File “C:\ci\hdf5_1525883595717\work\src\H5F.c”, line 511, in H5Fopen
unable to open file
File “C:\ci\hdf5_1525883595717\work\src\H5Fint.c”, line 1604, in H5F_open
unable to read superblock
File “C:\ci\hdf5_1525883595717\work\src\H5Fsuper.c”, line 413, in H5F__super_read
file signature not found
End of HDF5 error back trace
Unable to open/create file 'C:/Users/Rishabh/Documents/School and Work/Classes/8
Fall2019/Senior Design/ULMFiT/Wiki Data/wt103/models/bwd_wt103.h5'
I also used The HDF Group HDF Viewer: https://support.hdfgroup.org/products/java/release/download.html
But that didn’t work either. It gave an error saying “Failed to open the file… Unsupported format”
Is there a way to load the weights in Python? I ultimately want to access the last layer of the stacked LSTMS to create word embeddings.
Thanks in advance.

That's because it's a torch model. You can load it on your local machine using torch like so:
>>> import torch
>>> filename = "bwd_wt103.h5"
>>> f = torch.load(filename, map_location=torch.device('cpu'))
Now, let's explore it:
>>> type(f)
OrderedDict
>>> len(f.keys())
15
>>> list(f.keys())
['0.encoder.weight',
'0.encoder_with_dropout.embed.weight',
'0.rnns.0.module.weight_ih_l0',
'0.rnns.0.module.bias_ih_l0',
'0.rnns.0.module.bias_hh_l0',
'0.rnns.0.module.weight_hh_l0_raw',
'0.rnns.1.module.weight_ih_l0',
'0.rnns.1.module.bias_ih_l0',
'0.rnns.1.module.bias_hh_l0',
'0.rnns.1.module.weight_hh_l0_raw',
'0.rnns.2.module.weight_ih_l0',
'0.rnns.2.module.bias_ih_l0',
'0.rnns.2.module.bias_hh_l0',
'0.rnns.2.module.weight_hh_l0_raw',
'1.decoder.weight']
You can access the weights of 0.rnns.2.module.weight_hh_l0_raw like so:
>>> wts = f['0.rnns.2.module.weight_hh_l0_raw']
>>> wts.shape
torch.Size([1600, 400])

Related

Gensim: Not able to load the id2word file

I am working on topic inference on a new corpus given a previously derived lda model. I am able to load the model perfectly, while I am not able to load the id2word file to create the corpora.Dictionary object needed to map the new corpus into numbers: the load method returns a dict attribute error that I don't know why. Below is the minimal code that replicates the situation, and I have attached the code (and packages used) here.
Thank you in advance for your response...
import numpy as np
import os
import pandas as pd
import gensim
from gensim import corpora
import datetime
import nltk
model_name = "lda_sub_full_35"
dictionary_name = "lda_sub_full_35.id2word"
model_for_inference = gensim.models.LdaModel.load(model_name, mmap='r')
print('Successfully load the model')
lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')
I expect to have both the dictionary and the model loaded, but it turns out that when I load the dictionary, I got the below error:
File "topic_inference.py", line 31, in <module>
lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')
File "/topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 487, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'```
How were the contents of the lda_sub_full_35.id2word file originally saved?
Only if it was saved by a Gensim corpora.Dictionary object's .save() method should it be loaded as you've tried, with corpora.Dictionary.load().
If, by any chance, it was just a plain Python dict saved via some other method of writing a pickle()-created object, then you would need to load it in a symmetrically-matched way. That might be as simple as:
import pickle
with open(path, 'rb') as f:
lda_dictionary = pickle.load(f)

TensorFlow 2.0 and NetCDF4 RuntimeError: HDF error - Possible issue with I/O

While trying to write a NumPy array of floats to a NetCDF4 dataset, I am getting a RuntimeError: NetCDF: HDF error. I believe that somewhere TensorFlow2.0 is messing up with NetCDF4, but I do need to import both in the same class/function. It is not clear why the sequence of importing libraries is affecting I/O of a NetCDF4 file.
Here's a sample script:
##The sequence of import which doesn't work
import numpy as np
import tensorflow as tf ### <<< if imported here, saving .nc doesn't work
import netCDF4 as nc
#import tensorflow as tf ### <<< if imported here, saving .nc works properly
print("I am TensorFlow ", tf.__version__, " but I have no job here")
print("I would let NetCDF4 ", nc.__version__," do it's job, for now")
Nx = 160 # just another number
outputfile = "outputfile.nc" # just another filename
ArrayField = np.ones((Nx,Nx,1)) # sample array to write
print("Writing field data of shape", ArrayField.shape)
ncfile = nc.Dataset("outputfile.nc",'w',format='NETCDF4_CLASSIC')
ncfile.createDimension('X',ArrayField.shape[0]) #line is probably okay
newx = ncfile.createVariable('X','d',('X')) #line is probably okay
newx[:] = np.linspace(0.00,1.00,ArrayField.shape[0]) #line is probably okay
velx = ncfile.createVariable('Component_X','d',('X','X')) #line is probably okay
velx[:] = ArrayField[:,:,0].T #line is probably okay
print("Something written to: ", outputfile)
ncfile.close() ###### <<<<<<< Gives error here
print("Data successfully written to: ", outputfile)
Output/Error:
I am TensorFlow 2.0.0 but I have no job here
I would let NetCDF4 1.5.3 do it's job, for now
Writing field data of shape (160, 160, 1)
Something written to: outputfile.nc
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-1cfdf4070b97> in <module>
22
23 print("Something written to: ", outputfile)
---> 24 ncfile.close() ###### <<<<<<< Gives error here
25 print("Data successfully written to: ", outputfile)
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.close()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset._close()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: HDF error
Expected Output:
I am TensorFlow 2.0.0 but I have no job here
I would let NetCDF4 1.5.3 do it's job, for now
Writing field data of shape (160, 160, 1)
Something written to: outputfile.nc
Data successfully written to: outputfile.nc
Though I can import TF2.0 after importing NetCDF4 for this particular sample to work, it doesn't really answer the question about getting a RuntimeError: NetCDF: HDF error and fixing the issue for its use in complex cases. Also, I would like to return more debugging information.
Tested with tensorflow==2.1.0 and 2.2.0, gives the same error.
Disk size of created garbage .nc file is ~204K which should be about ~209K for this array sample.
Issue persists on a different machine or a clean environment by installing only tensorflow==2.0.0, numpy==1.17.3, netCDF4==1.5.3.
Just in case, here's a pip freeze list of my clean environment: https://gist.github.com/aakash30jan/9ae0cf3dde8a63d28df5275873cb0f10

Converting caffe model to ONNX format - problem with coremltools

I wanted to convert my face detection model written in caffe (https://github.com/adelekuzmiakova/onnx-converter/blob/master/res10_300x300_ssd_iter_140000.caffemodel) to ONNX format. I was following this tutorial: https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/converter_scripts/caffe_coreml_onnx.ipynb and also here is my code:
import coremltools
import onnxmltools
# Update your input name and path for your caffe model
proto_file = 'no_norm_param.deploy.prototext'
input_caffe_path = 'res10_300x300_ssd_iter_140000.caffemodel'
# Update the output name and path for intermediate coreml model, or leave as is
output_coreml_model = 'model.mlmodel'
# Change this path to the output name and path for the onnx model
output_onnx_model = 'model.onnx'
# Convert Caffe model to CoreML
coreml_model = coremltools.converters.caffe.convert((input_caffe_path, proto_file))
# Save CoreML model
coreml_model.save(output_coreml_model)
# Load a Core ML model
coreml_model = coremltools.utils.load_spec(output_coreml_model)
# Convert the Core ML model into ONNX
onnx_model = onnxmltools.convert_coreml(coreml_model)
# Save as protobuf
onnxmltools.utils.save_model(onnx_model, output_onnx_model)
However, when I run this code, I get the following error message:
[libprotobuf ERROR /Users/zach/builds/peTAVmNC/3/nn-inference/coremltools-build/deps/protobuf/src/google/protobuf/text_format.cc:287] Error parsing text-format caffe.NetParameter: 1010:17: Message type "caffe.LayerParameter" has no field named "permute_param".
Traceback (most recent call last):
File "convert-caffe-onnx.py", line 19, in <module>
coreml_model = coremltools.converters.caffe.convert((input_caffe_path, proto_file))
File "/Users/adele/Desktop/vay-sports/onnx-converter/.env/lib/python3.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 192, in convert
predicted_feature_name)
File "/Users/adele/Desktop/vay-sports/onnx-converter/.env/lib/python3.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 260, in _export
predicted_feature_name)
RuntimeError: Unable to load caffe network Prototxt file: no_norm_param.deploy.prototext
To me it's a bit strange because when I look at my prototext file (https://github.com/adelekuzmiakova/onnx-converter/blob/master/no_norm_param.deploy.prototext), there is no permute_param. My prototext file, caffe model, and code can be found here: https://github.com/adelekuzmiakova/onnx-converter
Did anyone else run into this problem? Do you know what might be going on? Or does it have to do somethinng with SSD? Many thanks!

Function not working, Syntax errors and more

The other day I was working on a project for an Image captioning model on Keras. But when I am running it, I am facing a host of error. Note that I am using Atom Editor and a virtual environment in Python, Running everything from a Command-line.
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
In this line, I am receiving this error==>
File "C:\Users\neelg\Documents\Atom_projects\Main\Img_cap.py", line 143
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
^
SyntaxError: invalid syntax
I think that the syntax is correct regarding the function, yet the error persists. So, in a seperate file I copied the function and tried to isolate problem.
Code for the standalone function:-
from pickle import load
import os
def load_photo_features(filename, dataset):
all_features = load(open(filename, 'rb'))
features = {k: all_features[k] for k in dataset}
return features
filename = 'C:/Users/neelg/Documents/Atom_projects/Main/Flickr8k_text/Flickr8k.trainImages.txt'
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
Now, A different type of problem crops up:
Traceback (most recent call last):
File "C:\Users\neelg\Documents\Atom_projects\Main\testing.py", line 10, in <module>
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
TypeError: 'module' object is not callable
Any help? I am trying to import the Flickr_8k dataset, which contains random pictures and another small dataset which are the labels of those photographs...
P.S=>Pls send suggestions after testing the code on tour own editors before submitting because I suspect there is some core problem arising due to the System encoding(As suggested by some others). Also, it is not possible to load the whole code due to it's length and requirement of multiple files.
This error comes from the fact that you're calling os.path which is a module not a function. Just remove it, you don't need it in this use-case, a string is enough for filename in open
I was about to ask you the same question with #ted why do you use os.path when you are trying to load the file.
Normally, I am using the following code for loading from pickle:
def load_obj(filename):
with open(filename, "rb") as fp:
return pickle.load(fp, enconding = 'bytes')
Furthermore, if I try something like that it works:
from pickle import load
import os
import pdb
def load_photo_features(filename):
all_features = load(open(filename, 'rb'))
pdb.set_trace()
#features = {k: all_features[k] for k in dataset}
#return features
train_features = load_photo_features('train.pkl')
I do not know what is the dataset input to proceed, but loading of the pickle file works fine.

Plotting decision tree, graphvizm pydotplus

I'm following the tutorial for decision tree on scikit documentation.
I have pydotplus 2.0.2 but it is telling me that it does not have write method - error below. I've been struggling for a while with it now, any ideas, please? Many thanks!
from sklearn import tree
from sklearn.datasets import load_iris
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
from IPython.display import Image
dot_data = tree.export_graphviz(clf, out_file=None)
import pydotplus
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
Image(graph.create_png())
and my error is
/Users/air/anaconda/bin/python /Users/air/PycharmProjects/kiwi/hemr.py
Traceback (most recent call last):
File "/Users/air/PycharmProjects/kiwi/hemr.py", line 10, in <module>
dot_data = tree.export_graphviz(clf, out_file=None)
File "/Users/air/anaconda/lib/python2.7/site-packages/sklearn/tree/export.py", line 375, in export_graphviz
out_file.write('digraph Tree {\n')
AttributeError: 'NoneType' object has no attribute 'write'
Process finished with exit code 1
----- UPDATE -----
Using the fix with out_file, it throws another error:
Traceback (most recent call last):
File "/Users/air/PycharmProjects/kiwi/hemr.py", line 13, in <module>
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
File "/Users/air/anaconda/lib/python2.7/site-packages/pydotplus/graphviz.py", line 302, in graph_from_dot_data
return parser.parse_dot_data(data)
File "/Users/air/anaconda/lib/python2.7/site-packages/pydotplus/parser.py", line 548, in parse_dot_data
if data.startswith(codecs.BOM_UTF8):
AttributeError: 'NoneType' object has no attribute 'startswith'
---- UPDATE 2 -----
Also, se my own answer below which solves another problem
The problem is that you are setting the parameter out_file to None.
If you look at the documentation, if you set it at None it returns the string file directly and does not create a file. And of course a string does not have a write method.
Therefore, do as follows :
dot_data = tree.export_graphviz(clf)
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
Method graph_from_dot_data() didn't work for me even after specifying proper path for out_file.
Instead try using graph_from_dot_file method:
graph = pydotplus.graphviz.graph_from_dot_file("iris.dot")
I met the same error this morning. I use python 3.x and here is how I solve the problem.
from sklearn import tree
from sklearn.datasets import load_iris
from IPython.display import Image
import io
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
# Let's give dot_data some space so it will not feel nervous any more
dot_data = io.StringIO()
tree.export_graphviz(clf, out_file=dot_data)
import pydotplus
graph = pydotplus.graphviz.graph_from_dot_data(dot_data.getvalue())
# make sure you have graphviz installed and set in path
Image(graph.create_png())
if you use python 2.x, I believe you need to change "import io" as:
import StringIO
and,
dot_data = StringIO.StringIO()
Hope it helps.
Also another problem was the backend settings to my Graphviz!! It is solved nicely here. you just need to lookup that settings file and change backend, or in the code mpl.use("TkAgg") as suggested there in the comments. After I only got error that pydotplot couldn't find my Graphviz executable, hence I reinstalled Graphviz via homebrew: brew install graphviz which solved the issue and I can make plots now!!
What really helped me solve the problem was:-
I executed the code from the same user through which graphviz was installed. So executing from any other user would give your error
i would suggest avoid graphviz & use the following alternate approach
from sklearn.tree import plot_tree
plt.figure(figsize=(60,30))
plot_tree(clf, filled=True);

Resources