I am learning TensorBoard. After running the code (as follows), it did not create a the logs folder. How to deal with it?
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('logs')
# writer.add_image()
for i in range(100):
writer.add_scalar("y=x", i, i)
writer.close()
Related
We have a huge data processing code written in python 3.6.5 version running inside Linux Docker container it works for smaller dataset but for larger dataset > 80K records it is failing with PicklingError.
Code Structure:
from functools import partial
from multiprocessing import Pool
class workclass(object):
def f(self,a,b,c):
# data processing
def main():
w = workclass
sample_dict = {}
sample_dict = loading fromdb
sample_list = list(sample_dict.keys())
func = partial(w.f, sample_dict, 'env')
p = Pool(5)
result = p.map(func, sample_list)
print(result)
if __name__=='__main__':
main()
fails in python/reduction.py save_global line 922 Error: picklingerror: cant pickle 'java.lang.Integer'
Any suggestion will be of great help.
Strangely the same program works fine in Windows OS but fails in Linux and Linux Docker container.
I have tried running my code from google collab for the fashion using this code but it is stuck on downloading the code. I also switched between the hardware accelerators but still nothing. Is there any workaround to this problem?
For Google Colab
At the top write !pip install mnist. Use import mnist.
Then simply store the images and labels:
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()
That's it!!!
You can download it from the github repository.
Put the downloaded files (from the readme links) in a directory in your current path called data/fashion/, then you can use their loader.
def load_mnist(path, kind='train'):
import os
import gzip
import numpy as np
"""Load MNIST data from `path`"""
labels_path = os.path.join(path,
'%s-labels-idx1-ubyte.gz'
% kind)
images_path = os.path.join(path,
'%s-images-idx3-ubyte.gz'
% kind)
with gzip.open(labels_path, 'rb') as lbpath:
labels = np.frombuffer(lbpath.read(), dtype=np.uint8,
offset=8)
with gzip.open(images_path, 'rb') as imgpath:
images = np.frombuffer(imgpath.read(), dtype=np.uint8,
offset=16).reshape(len(labels), 784)
return images, labels
X_train, y_train = load_mnist('data/fashion', kind='train')
X_test, y_test = load_mnist('data/fashion', kind='t10k')
The other option would be to use the torchvision FMNIST dataset.
Edit
You can also use:
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/fashion', source_url='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/')
Edit 2
Here is the code for downloading the files (it can be improved with some try-catch):
import os
import requests
path = 'data/fashion'
def download_fmnist(path):
DEFAULT_SOURCE_URL = 'http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/'
files = dict(
TRAIN_IMAGES='train-images-idx3-ubyte.gz',
TRAIN_LABELS='train-labels-idx1-ubyte.gz',
TEST_IMAGES='t10k-images-idx3-ubyte.gz',
TEST_LABELS='t10k-labels-idx1-ubyte.gz')
if not os.path.exists(path):
os.mkdir(path)
for f in files:
filepath = os.path.join(path, files[f])
if not os.path.exists(filepath):
url = DEFAULT_SOURCE_URL + files[f]
r = requests.get(url, allow_redirects=True)
open(filepath, 'wb').write(r.content)
print('Successfully downloaded', f)
download_fmnist(path)
The command keras.datasets.fashion_mnist.load_data() returns a tuple of numpy arrays: (xtrain, ytrain) and (xtest, ytest).
The dataset won't be downloaded to your local storage this way. This is why the command cd fashion-mnist/ raises an error. There was no directory created. The fashion-mnist dataset was loaded correctly into (xtrain, ytrain) and (xtest, ytest) in your code.
The other day I was working on a project for an Image captioning model on Keras. But when I am running it, I am facing a host of error. Note that I am using Atom Editor and a virtual environment in Python, Running everything from a Command-line.
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
In this line, I am receiving this error==>
File "C:\Users\neelg\Documents\Atom_projects\Main\Img_cap.py", line 143
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
^
SyntaxError: invalid syntax
I think that the syntax is correct regarding the function, yet the error persists. So, in a seperate file I copied the function and tried to isolate problem.
Code for the standalone function:-
from pickle import load
import os
def load_photo_features(filename, dataset):
all_features = load(open(filename, 'rb'))
features = {k: all_features[k] for k in dataset}
return features
filename = 'C:/Users/neelg/Documents/Atom_projects/Main/Flickr8k_text/Flickr8k.trainImages.txt'
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
Now, A different type of problem crops up:
Traceback (most recent call last):
File "C:\Users\neelg\Documents\Atom_projects\Main\testing.py", line 10, in <module>
train_features = load_photo_features(os.path('C:/Users/neelg/Documents/Atom_projects/Main/features.pkl'), train)
TypeError: 'module' object is not callable
Any help? I am trying to import the Flickr_8k dataset, which contains random pictures and another small dataset which are the labels of those photographs...
P.S=>Pls send suggestions after testing the code on tour own editors before submitting because I suspect there is some core problem arising due to the System encoding(As suggested by some others). Also, it is not possible to load the whole code due to it's length and requirement of multiple files.
This error comes from the fact that you're calling os.path which is a module not a function. Just remove it, you don't need it in this use-case, a string is enough for filename in open
I was about to ask you the same question with #ted why do you use os.path when you are trying to load the file.
Normally, I am using the following code for loading from pickle:
def load_obj(filename):
with open(filename, "rb") as fp:
return pickle.load(fp, enconding = 'bytes')
Furthermore, if I try something like that it works:
from pickle import load
import os
import pdb
def load_photo_features(filename):
all_features = load(open(filename, 'rb'))
pdb.set_trace()
#features = {k: all_features[k] for k in dataset}
#return features
train_features = load_photo_features('train.pkl')
I do not know what is the dataset input to proceed, but loading of the pickle file works fine.
I am trying to load a LightGBM.Booster from a JSON file pointer, and can't find an example online.
import json ,lightgbm
import numpy as np
X_train = np.arange(0, 200).reshape((100, 2))
y_train = np.tile([0, 1], 50)
tr_dataset = lightgbm.Dataset(X_train, label=y_train)
booster = lightgbm.train({}, train_set=tr_dataset)
model_json = booster.dump_model()
with open('model.json', 'w+') as f:
json.dump(model_json, f, indent=4)
with open('model.json') as f2:
model_json = json.load(f2)
How can I create a lightGBM booster from f2 or model_json? This snippet only shows dumping to JSON. model_from_string might help but seems to require an instance of the booster, which I won't have before loading.
There's no such method for creation of Booster directly from json. No such method in the source code or documentation, also there's no github issue.
Because of it, I just load models from a text file via
gbm.save_model('model.txt') # gbm is trained Booster instance
# ...
bst = lgb.Booster(model_file='model.txt')
or use pickle to dump and load models:
import pickle
pickle.dump(gbm, open('model.pkl', 'wb'))
# ...
gbm = pickle.load(open('model.pkl', 'rb'))
Unforunately, pickle files are unreadable (or, at least, this files are not so clear). But it's better than nothing.
I wrote a Flask-based web app that takes text from users and returns the probability that it is of a given classification (full script below). The app loads some of the trained models needed to make predictions before any requests are made. I am currently trying to deploy it on Heroku and experiencing some problems.
I am able to run it locally when I execute python ml_app.py. But when I use the Heroku CLI command heroku local web to try to run it locally to test before deployment, I get the following error
AttributeError: module '__main__' has no attribute 'tokenize'
This error is associated with the loading of a text vectorizer called TFIDF found in the line
tfidf_model = joblib.load('models/tfidf_vectorizer_train.pkl')
I have imported the required function at the top of the script to ensure that this is loaded properly (from utils import tokenize). This works given that I can run it when I use python ml_app.py. But for reasons I do not know, it doesn't load when I use heroku local web. It also doesn't work when I use the Flask CLI command flask run when trying to run it locally. Any idea why?
I admit that I do not have a good understanding of what is going on under the hood here (with respect to the web dev./deployment aspect of the code) so any explanation helps.
from flask import Flask, request, render_template
from sklearn.externals import joblib
from utils import tokenize # custom tokenizer required for tfidf model loaded in load_tfidf_model()
app = Flask(__name__)
models_directory = 'models'
#app.before_first_request
def nbsvm_models():
global tfidf_model
global logistic_identity_hate_model
global logistic_insult_model
global logistic_obscene_model
global logistic_severe_toxic_model
global logistic_threat_model
global logistic_toxic_model
tfidf_model = joblib.load('models/tfidf_vectorizer_train.pkl')
logistic_identity_hate_model = joblib.load('models/logistic_identity_hate.pkl')
logistic_insult_model = joblib.load('models/logistic_insult.pkl')
logistic_obscene_model = joblib.load('models/logistic_obscene.pkl')
logistic_severe_toxic_model = joblib.load('models/logistic_severe_toxic.pkl')
logistic_threat_model = joblib.load('models/logistic_threat.pkl')
logistic_toxic_model = joblib.load('models/logistic_toxic.pkl')
#app.route('/')
def my_form():
return render_template('main.html')
#app.route('/', methods=['POST'])
def my_form_post():
"""
Takes the comment submitted by the user, apply TFIDF trained vectorizer to it, predict using trained models
"""
text = request.form['text']
comment_term_doc = tfidf_model.transform([text])
dict_preds = {}
dict_preds['pred_identity_hate'] = logistic_identity_hate_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_insult'] = logistic_insult_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_obscene'] = logistic_obscene_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_severe_toxic'] = logistic_severe_toxic_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_threat'] = logistic_threat_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_toxic'] = logistic_toxic_model.predict_proba(comment_term_doc)[:, 1][0]
for k in dict_preds:
perc = dict_preds[k] * 100
dict_preds[k] = "{0:.2f}%".format(perc)
return render_template('main.html', text=text,
pred_identity_hate=dict_preds['pred_identity_hate'],
pred_insult=dict_preds['pred_insult'],
pred_obscene=dict_preds['pred_obscene'],
pred_severe_toxic=dict_preds['pred_severe_toxic'],
pred_threat=dict_preds['pred_threat'],
pred_toxic=dict_preds['pred_toxic'])
if __name__ == '__main__':
app.run(debug=True)
Fixed it. It was due to the way I picked the class instance stored in tfidf_vectorizer_train.pkl. The model was created in an ipython notebook where one of its attributes depended on a tokenizer function that I defined interactively in the notebook. I soon learned that pickling does not save the exact instance of a class, which means tfidf_vectorizer_train.pkl does not contain the function I defined in the notebook.
To fix this, I moved the tokenizer function to a separate utilities python file and imported the function in both the file where I trained and subsequently pickled the model and in the file where I unpickled it.
In code, I did
from utils import tokenize
...
tfidfvectorizer = TfidfVectorizer(ngram_range=(1, 2), tokenizer=tokenize,
min_df=3, max_df=0.9, strip_accents='unicode',
use_idf=1, smooth_idf=True, sublinear_tf=1)
train_term_doc = tfidfvectorizer.fit_transform(train[COMMENT])
joblib.dump(tfidfvectorizer, 'models/tfidf_vectorizer_train.pkl')
...
in the file where I trained the model and
from utils import tokenize
...
#app.before_first_request
def load_models():
# from utils import tokenize
global tfidf_model
tfidf_model =
joblib.load('{}/tfidf_vectorizer_train.pkl'.format(models_directory))
...
in the file containing the web app code.