Function not working, Syntax errors and more - python-3.x

The other day I was working on a project for an Image captioning model on Keras. But when I am running it, I am facing a host of error. Note that I am using Atom Editor and a virtual environment in Python, Running everything from a Command-line.
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
In this line, I am receiving this error==>
File "C:\Users\neelg\Documents\Atom_projects\Main\Img_cap.py", line 143
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
^
SyntaxError: invalid syntax
I think that the syntax is correct regarding the function, yet the error persists. So, in a seperate file I copied the function and tried to isolate problem.
Code for the standalone function:-
from pickle import load
import os
def load_photo_features(filename, dataset):
all_features = load(open(filename, 'rb'))
features = {k: all_features[k] for k in dataset}
return features
filename = 'C:/Users/neelg/Documents/Atom_projects/Main/Flickr8k_text/Flickr8k.trainImages.txt'
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
Now, A different type of problem crops up:
Traceback (most recent call last):
File "C:\Users\neelg\Documents\Atom_projects\Main\testing.py", line 10, in <module>
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
TypeError: 'module' object is not callable
Any help? I am trying to import the Flickr_8k dataset, which contains random pictures and another small dataset which are the labels of those photographs...
P.S=>Pls send suggestions after testing the code on tour own editors before submitting because I suspect there is some core problem arising due to the System encoding(As suggested by some others). Also, it is not possible to load the whole code due to it's length and requirement of multiple files.

This error comes from the fact that you're calling os.path which is a module not a function. Just remove it, you don't need it in this use-case, a string is enough for filename in open

I was about to ask you the same question with #ted why do you use os.path when you are trying to load the file.
Normally, I am using the following code for loading from pickle:
def load_obj(filename):
with open(filename, "rb") as fp:
return pickle.load(fp, enconding = 'bytes')
Furthermore, if I try something like that it works:
from pickle import load
import os
import pdb
def load_photo_features(filename):
all_features = load(open(filename, 'rb'))
pdb.set_trace()
#features = {k: all_features[k] for k in dataset}
#return features
train_features = load_photo_features('train.pkl')
I do not know what is the dataset input to proceed, but loading of the pickle file works fine.

Related

Gensim: Not able to load the id2word file

I am working on topic inference on a new corpus given a previously derived lda model. I am able to load the model perfectly, while I am not able to load the id2word file to create the corpora.Dictionary object needed to map the new corpus into numbers: the load method returns a dict attribute error that I don't know why. Below is the minimal code that replicates the situation, and I have attached the code (and packages used) here.
Thank you in advance for your response...
import numpy as np
import os
import pandas as pd
import gensim
from gensim import corpora
import datetime
import nltk
model_name = "lda_sub_full_35"
dictionary_name = "lda_sub_full_35.id2word"
model_for_inference = gensim.models.LdaModel.load(model_name, mmap='r')
print('Successfully load the model')
lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')
I expect to have both the dictionary and the model loaded, but it turns out that when I load the dictionary, I got the below error:
File "topic_inference.py", line 31, in <module>
lda_dictionary = corpora.Dictionary.load(dictionary_name, mmap='r')
File "/topic_modeling/env/lib/python3.8/site-packages/gensim/utils.py", line 487, in load
obj._load_specials(fname, mmap, compress, subname)
AttributeError: 'dict' object has no attribute '_load_specials'```
How were the contents of the lda_sub_full_35.id2word file originally saved?
Only if it was saved by a Gensim corpora.Dictionary object's .save() method should it be loaded as you've tried, with corpora.Dictionary.load().
If, by any chance, it was just a plain Python dict saved via some other method of writing a pickle()-created object, then you would need to load it in a symmetrically-matched way. That might be as simple as:
import pickle
with open(path, 'rb') as f:
lda_dictionary = pickle.load(f)

AttributeError when not trying to use any attribute

Good morning!
I'm using a code (with python 3.8) that is running both in a local PC and in a ssh server. In one point, I'm loading data from a pickle using the next piece of code:
from os.path import exists
import _pickle as pickle
def load_pickle(pickle_file):
if exists(pickle_file):
with open(pickle_file, 'rb') as f:
loaded_dic = pickle.load(f)
return loaded_dic
else:
return 'Pickle not found'
pickle_file is a string with the path of the pickle. If the pickle exists, the function returns a dictionary, while if it doesn't exist, it returns the string 'Pickle not found'.
In my local PC, the code works perfectly, loading the dict without problems. However, in the ssh server, theoretically, the dict is loaded, but, if I try to access to it, jus typing loaded_dic, it throws the following error:
AttributeError: 'NoneType' object has no attribute 'axes'
Due to it, the rest of my code fails when it try to use the variable loaded_dic.
Thank you very much in advance!
I have a similar problem. For me it happens as I store pandas DataFrames in a dictionary and save this dict as a pickle with pandas version '1.1.1'.
When I read the dictionary pickle with pandas version '0.25.3' on another server, I get the same error.
Both have pickle version 4.0 and I do not have a solution yet, other than upgrading to similar pandas versions.
I made a small example, it also happens when I store just a DataFrame, Saving it on one machine:
import pandas as pd
print("Pandas version", pd.__version__)
df = pd.DataFrame([1, 2, 3])
df.to_pickle('df.pkl')
Pandas version 1.1.1
Then loading it on another machine:
import pandas as pd
print("Pandas version", pd.__version__)
df = pd.read_pickle('df.pkl')
print(type(df))
Pandas version 0.25.3
<class 'pandas.core.frame.DataFrame'>
print(len(df))
results in this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-9f6b6d8c3cd3> in <module>
----> 1 print(len(df))
/opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in __len__(self)
994 Returns length of info axis, but here we use the index.
995 """
--> 996 return len(self.index)
997
998 def dot(self, other):
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5173 or name in self._accessors
5174 ):
-> 5175 return object.__getattribute__(self, name)
5176 else:
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__get__()
AttributeError: 'NoneType' object has no attribute 'axes'
Avi's answer helped me. I had pickled with a later version of Pandas and was trying to read the pickle file with an earlier version.
This code clearly doesn't work anywhere; it doesn't return loaded_dict, a local variable, so nothing can use it. Change it to:
return pickle.load(f)
and the caller will receive the loaded dict instead of the default return value, None.
Update for edited question: With the return, your code works as written. Your pickle file on said machine must have the result of pickling None stored in it, rather than whatever you expected. Or your code is broken in some other place we haven't seen. The loading code is fine, and behaving exactly as its supposed to.

How to read multiple text files as strings from two folders at the same time using readline() in python?

Currently have version of the following script that uses two simple readline() snippets to read a single line .txt file from two different folders. Running under ubuntu 18.04 and python 3.67 Not using glob.
Encountering 'NameError' now when trying to read multiple text files from same folders using 'sorted.glob'
readlines() causes error because input from .txt files must be strings not lists.
New to python. Have tried online python formatting, reindent.py etc. but no success.
Hoping it's a simple indentation issue so it won't be an issue in future scripts.
Current error from code below:
Traceback (most recent call last):
File "v1-ReadFiles.py", line 21, in <module>
context_input = GenerationInput(P1=P1, P3=P3,
NameError: name 'P1' is not defined
Current modified script:
import glob
import os
from src.model_use import TextGeneration
from src.utils import DEFAULT_DECODING_STRATEGY, LARGE
from src.flexible_models.flexible_GPT2 import FlexibleGPT2
from src.torch_loader import GenerationInput
from transformers import GPT2LMHeadModel, GPT2Tokenizer
for name in sorted(glob.glob('P1_files/*.txt')):
with open(name) as f:
P1 = f.readline()
for name in sorted(glob.glob('P3_files/*.txt')):
with open(name) as f:
P3 = f.readline()
if __name__ == "__main__":
context_input = GenerationInput(P1=P1, P3=P3,
genre=["mystery"],
persons=["Steve"],
size=LARGE,
summary="detective")
print("PREDICTION WITH CONTEXT WITH SPECIAL TOKENS")
model = GPT2LMHeadModel.from_pretrained('models/custom')
tokenizer = GPT2Tokenizer.from_pretrained('models/custom')
tokenizer.add_special_tokens(
{'eos_token': '[EOS]',
'pad_token': '[PAD]',
'additional_special_tokens': ['[P1]', '[P2]', '[P3]', '[S]', '[M]', '[L]', '[T]', '[Sum]', '[Ent]']}
)
model.resize_token_embeddings(len(tokenizer))
GPT2_model = FlexibleGPT2(model, tokenizer, DEFAULT_DECODING_STRATEGY)
text_generator_with_context = TextGeneration(GPT2_model, use_context=True)
predictions = text_generator_with_context(context_input, nb_samples=1)
for i, prediction in enumerate(predictions):
print('prediction n°', i, ': ', prediction)
Thanks to afghanimah here:
Problem with range() function when used with readline() or counter - reads and processes only last line in files
Dropped glob. Also moved all model= etc. load functions before 'with open ...'
with open("data/test-P1-Multi.txt","r") as f1, open("data/test-P3-Multi.txt","r") as f3:
for i in range(5):
P1 = f1.readline()
P3 = f3.readline()
context_input = GenerationInput(P1=P1, P3=P3, size=LARGE)
etc.

Can't load HDF5 in python

I am following this tutorial: https://github.com/fastai/fastai/tree/master/courses/dl2/imdb_scripts
I downloaded the pre-trained model in part 3b.
I want to open the .h5 files and look/use the weights. I tried to use python to do this, but it is not opening.
Here’s the code I used:
import tables
import pandas as pd
filename = “…bwd_wt103.h5”
file = tables.open_file(filename)
Here’s the error:
OSError: HDF5 error back trace
File “C:\ci\hdf5_1525883595717\work\src\H5F.c”, line 511, in H5Fopen
unable to open file
File “C:\ci\hdf5_1525883595717\work\src\H5Fint.c”, line 1604, in H5F_open
unable to read superblock
File “C:\ci\hdf5_1525883595717\work\src\H5Fsuper.c”, line 413, in H5F__super_read
file signature not found
End of HDF5 error back trace
Unable to open/create file 'C:/Users/Rishabh/Documents/School and Work/Classes/8
Fall2019/Senior Design/ULMFiT/Wiki Data/wt103/models/bwd_wt103.h5'
I also used The HDF Group HDF Viewer: https://support.hdfgroup.org/products/java/release/download.html
But that didn’t work either. It gave an error saying “Failed to open the file… Unsupported format”
Is there a way to load the weights in Python? I ultimately want to access the last layer of the stacked LSTMS to create word embeddings.
Thanks in advance.
That's because it's a torch model. You can load it on your local machine using torch like so:
>>> import torch
>>> filename = "bwd_wt103.h5"
>>> f = torch.load(filename, map_location=torch.device('cpu'))
Now, let's explore it:
>>> type(f)
OrderedDict
>>> len(f.keys())
15
>>> list(f.keys())
['0.encoder.weight',
'0.encoder_with_dropout.embed.weight',
'0.rnns.0.module.weight_ih_l0',
'0.rnns.0.module.bias_ih_l0',
'0.rnns.0.module.bias_hh_l0',
'0.rnns.0.module.weight_hh_l0_raw',
'0.rnns.1.module.weight_ih_l0',
'0.rnns.1.module.bias_ih_l0',
'0.rnns.1.module.bias_hh_l0',
'0.rnns.1.module.weight_hh_l0_raw',
'0.rnns.2.module.weight_ih_l0',
'0.rnns.2.module.bias_ih_l0',
'0.rnns.2.module.bias_hh_l0',
'0.rnns.2.module.weight_hh_l0_raw',
'1.decoder.weight']
You can access the weights of 0.rnns.2.module.weight_hh_l0_raw like so:
>>> wts = f['0.rnns.2.module.weight_hh_l0_raw']
>>> wts.shape
torch.Size([1600, 400])

Plotting decision tree, graphvizm pydotplus

I'm following the tutorial for decision tree on scikit documentation.
I have pydotplus 2.0.2 but it is telling me that it does not have write method - error below. I've been struggling for a while with it now, any ideas, please? Many thanks!
from sklearn import tree
from sklearn.datasets import load_iris
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
from IPython.display import Image
dot_data = tree.export_graphviz(clf, out_file=None)
import pydotplus
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
Image(graph.create_png())
and my error is
/Users/air/anaconda/bin/python /Users/air/PycharmProjects/kiwi/hemr.py
Traceback (most recent call last):
File "/Users/air/PycharmProjects/kiwi/hemr.py", line 10, in <module>
dot_data = tree.export_graphviz(clf, out_file=None)
File "/Users/air/anaconda/lib/python2.7/site-packages/sklearn/tree/export.py", line 375, in export_graphviz
out_file.write('digraph Tree {\n')
AttributeError: 'NoneType' object has no attribute 'write'
Process finished with exit code 1
----- UPDATE -----
Using the fix with out_file, it throws another error:
Traceback (most recent call last):
File "/Users/air/PycharmProjects/kiwi/hemr.py", line 13, in <module>
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
File "/Users/air/anaconda/lib/python2.7/site-packages/pydotplus/graphviz.py", line 302, in graph_from_dot_data
return parser.parse_dot_data(data)
File "/Users/air/anaconda/lib/python2.7/site-packages/pydotplus/parser.py", line 548, in parse_dot_data
if data.startswith(codecs.BOM_UTF8):
AttributeError: 'NoneType' object has no attribute 'startswith'
---- UPDATE 2 -----
Also, se my own answer below which solves another problem
The problem is that you are setting the parameter out_file to None.
If you look at the documentation, if you set it at None it returns the string file directly and does not create a file. And of course a string does not have a write method.
Therefore, do as follows :
dot_data = tree.export_graphviz(clf)
graph = pydotplus.graphviz.graph_from_dot_data(dot_data)
Method graph_from_dot_data() didn't work for me even after specifying proper path for out_file.
Instead try using graph_from_dot_file method:
graph = pydotplus.graphviz.graph_from_dot_file("iris.dot")
I met the same error this morning. I use python 3.x and here is how I solve the problem.
from sklearn import tree
from sklearn.datasets import load_iris
from IPython.display import Image
import io
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
# Let's give dot_data some space so it will not feel nervous any more
dot_data = io.StringIO()
tree.export_graphviz(clf, out_file=dot_data)
import pydotplus
graph = pydotplus.graphviz.graph_from_dot_data(dot_data.getvalue())
# make sure you have graphviz installed and set in path
Image(graph.create_png())
if you use python 2.x, I believe you need to change "import io" as:
import StringIO
and,
dot_data = StringIO.StringIO()
Hope it helps.
Also another problem was the backend settings to my Graphviz!! It is solved nicely here. you just need to lookup that settings file and change backend, or in the code mpl.use("TkAgg") as suggested there in the comments. After I only got error that pydotplot couldn't find my Graphviz executable, hence I reinstalled Graphviz via homebrew: brew install graphviz which solved the issue and I can make plots now!!
What really helped me solve the problem was:-
I executed the code from the same user through which graphviz was installed. So executing from any other user would give your error
i would suggest avoid graphviz & use the following alternate approach
from sklearn.tree import plot_tree
plt.figure(figsize=(60,30))
plot_tree(clf, filled=True);

Resources