GPU not used on d3rlpy - pytorch

I am new to using d3rlpy for offline RL training and makes use of pytorch. So I installed cuda 1.16 as recommended from PYtorch doc: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116. I installed d3rlpy after and run the following sample code:
from d3rlpy.algos import BC,DDPG,CRR,PLAS,PLASWithPerturbation,TD3PlusBC,IQL
import d3rlpy
import numpy as np
import glob
import time
#models
continuous_models = {
"BehaviorCloning": BC,
"DeepDeterministicPolicyGradients": DDPG,
"CriticRegularizedRegression": CRR,
"PolicyLatentActionSpace": PLAS,
"PolicyLatentActionSpacePerturbation": PLASWithPerturbation,
"TwinDelayedPlusBehaviorCloning": TD3PlusBC,
"ImplicitQLearning": IQL,
}
#load dataset data_batch is created as a*.h5 file with d3rlpy
dataset = d3rlpy.dataset.MDPDataset.load(data_batch)
# preprocess
mean = np.mean(dataset.observations, axis=0, keepdims=True)
std = np.std(dataset.observations, axis=0, keepdims=True)
scaler = d3rlpy.preprocessing.StandardScaler(mean=mean, std=std)
# test models
for _model in continuous_models:
the_model = continuous_models[_model](scaler = scaler)
the_model.use_gpu = True
the_model.build_with_dataset(dataset)
the_model.fit(dataset = dataset.episodes,
n_steps_per_epoch = 10800,
n_steps = 54000,
logdir = './logs',
experiment_name = f"{_model}",
tensorboard_dir = 'logs',
save_interval = 900, # we don't want to save intermediate parameters
)
#save model
the_timestamp = int(time.time())
the_model.save_model(f"./models/{_model}/{_model}_{the_timestamp}.pt")
The issue is that None of the models, despite being set with use_gpu =True are actually using the GPU. With a sample code of pytotch and testing torch.cuda.current_device() I can see that pytorch is properly set and detecting the gpu. Any idea where to look for solving this issue? I am not sure this is a bug from the d3rlpy so I would bother creating an issue on github yet :)

Related

MLflow not tracking anything with pycaret

I have the mlflow ui (v.2.1.1.) running. This code with pycaret 2.3.10 does not track anything for some reason. Only the experiment name shows up.
import pycaret
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
# import libraries
import pandas as pd
import numpy as np
# read csv data
data = pd.read_csv('https://raw.githubusercontent.com/srees1988/predict-churn-py/main/customer_churn_data.csv')
data = data.sample(1000)
# initialize setup
from pycaret.classification import *
s = setup(data, target = 'Churn', session_id = 123, ignore_features = ['customerID'], log_experiment = True, experiment_name = 'churn1', silent=True)
best_model = compare_models()
model = create_model('rf')
plot_model(model, 'confusion_matrix')
predict_model(model)
finalize_model(model)
save_model(model, 'model')
I would expect the metrics and potentially the images to appear.
Did you try to expand the square button in front of Session Initialized? For me, it's logged under the session. If it does not work try to downgrade mlflow to the version that PyCaret supports.
The current version of PyCaret (3.0.0rc8) only supports the mlflow between 1.24.0 and less than 2.0.0 you can reference from here.

Trying to use fetch_20newsgroups

I am in the process of learning Python and have the following problem with fetching the 20newwsgroups data in the this code:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import *
categories = ['comp_graphics', 'misc_foresale',
'rec.autos', 'sci.space']
twenty_train = fetch_20newsgroups(subset='train',
categories=categories,
shuffle=True,
random_state=42)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(
twenty_train.data)
print(X_train_counts)
print("BOW shape:", X_train_counts.shape)
caltech_idx = count_vect.vocabulary_['caltech']
print('"Caltech": %i' % X_train_counts[0, caltech_idx])
The version of scikit learn i have is 1.0.2, python version 3.7.9. although I do have a copy of Python 3.10 installed as well

How to train an image similarity model on 20 millions images(total size 10GB)?

My system is configured with 16GB RAM. I have tried to train image similarity model on 20 millions images(total size 10GB) using VGG19 and KNN's nearest neighbor. When tried to read images i am getting Memory error. Even I have tried to train model on 200000(total size 770MB) but issue is same. How I can read millions of images to train ML models.
Ubuntu 18.04.2 LTS,Core™ i7,Intel® HD Graphics 5500 (Broadwell GT2), 64-bit, 16GB RAM
import os
import skimage.io
import tensorflow as tf
from skimage.transform import resize
import numpy as np
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt
from matplotlib import offsetbox
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
from sklearn import manifold
import pickle
skimage.io.use_plugin('matplotlib')
dirPath = 'train_data'
args = [os.path.join(dirPath, filename) for filename in os.listdir(dirPath)]
imgs_train = [skimage.io.imread(arg, as_gray=False) for arg in args]
shape_img = (130, 130, 3)
model = tf.keras.applications.VGG19(weights='imagenet', include_top=False,
input_shape=shape_img)
model.summary()
shape_img_resize = tuple([int(x) for x in model.input.shape[1:]])
input_shape_model = tuple([int(x) for x in model.input.shape[1:]])
output_shape_model = tuple([int(x) for x in model.output.shape[1:]])
n_epochs = None
def resize_img(img, shape_resized):
img_resized = resize(img, shape_resized,
anti_aliasing=True,
preserve_range=True)
assert img_resized.shape == shape_resized
return img_resized
def normalize_img(img):
return img / 255.
def transform_img(img, shape_resize):
img_transformed = resize_img(img, shape_resize)
img_transformed = normalize_img(img_transformed)
return img_transformed
def apply_transformer(imgs, shape_resize):
imgs_transform = [transform_img(img, shape_resize) for img in imgs]
return imgs_transform
imgs_train_transformed = apply_transformer(imgs_train, shape_img_resize)
X_train = np.array(imgs_train_transformed).reshape((-1,) + input_shape_model)
E_train = model.predict(X_train)
E_train_flatten = E_train.reshape((-1, np.prod(output_shape_model)))
knn = NearestNeighbors(n_neighbors=5, metric="cosine")
knn.fit(E_train_flatten)
Knowing that keras is working well with generator, you should consider using one:
python generator tutorial,
using a generator with keras (example)
It allows you to load your image during your training, batch by batch.

Save and load a Pytorch model

i am trying to train a pytorch model on colab then save the model parameters and load it on my local computer.
After training, the model parameters are stored as below:
torch.save(Model.state_dict(),PATH)
loaded as below:
device = torch.device('cpu')
Model.load_state_dict(torch.load(PATH, map_location=device))
error:
AttributeError: 'Sequential' object has no attribute 'copy'
Does anyone know how to solve this issue?
Your question does not provide sufficient details to be answered correctly. If you are trying to save and load your own model and have a class definition for it see this well known answer and clarify why that's not sufficient for your use.
If you are loading a torch.nn.Sequential model then as far as I know simply loading the model directly and just using it should be sufficient. If it's not post on the pytorch forum what error you get.
For now look at my example show casing loading a sequential model and then using it without error:
# test for saving everything with torch.save
import torch
import torch.nn as nn
from pathlib import Path
from collections import OrderedDict
import numpy as np
import pickle
path = Path('~/data/tmp/').expanduser()
path.mkdir(parents=True, exist_ok=True)
num_samples = 3
Din, Dout = 1, 1
lb, ub = -1, 1
x = torch.torch.distributions.Uniform(low=lb, high=ub).sample((num_samples, Din))
f = nn.Sequential(OrderedDict([
('f1', nn.Linear(Din,Dout)),
('out', nn.SELU())
]))
y = f(x)
# save data torch to numpy
x_np, y_np = x.detach().cpu().numpy(), y.detach().cpu().numpy()
db2 = {'f': f, 'x': x_np, 'y': y_np}
torch.save(db2, path / 'db_f_x_y')
db3 = torch.load(path / 'db_f_x_y')
f3 = db3['f']
x3 = db3['x']
y3 = db3['y']
xx = torch.tensor(x3)
yy3 = f3(xx)
print(yy3)
there should be an official answer how to save and load nn.Sequential models How does one save torch.nn.Sequential models in pytorch properly? but for now torch.save and torch.load seem to work just fine.

Running python code consumes GPU. why?

This is my python code for a model prediction.
import csv
import numpy as np
np.random.seed(1)
from keras.models import load_model
import tensorflow as tf
import pandas as pd
import time
output_location='Desktop/result/'
#load model
global graph
graph = tf.get_default_graph()
model = load_model("newmodel.h5")
def Myfun():
ecg = pd.read_csv('/Downloads/model.csv')
X = ecg.iloc[:,1:42].values
y = ecg.iloc[:,42].values
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y1 = encoder.fit_transform(y)
Y = pd.get_dummies(y1).values
from sklearn.model_selection import train_test_split
X_train,X_test, y_train,y_test = train_test_split(X,Y,test_size=0.2,random_state=0)
t1= timer()
with graph.as_default():
prediction = model.predict(X_test[0:1])
diff=timer()-t1
class_labels_predicted = np.argmax(prediction)
filename1=str(i)+"output.txt"
newfile=output_location+filename1
with open(str(newfile),'w',encoding = 'utf-8') as file:
file.write(" takes %f seconds time. predictedclass is %s \n" %(diff,class_labels_predicted))
return class_labels_predicted
for i in range(1,100):
Myfun()
My system GPU is of size 2GB. While running this code ,nvidia-smi -l 2 shows it consumes 1.8 GB of GPU. And 100 files are getting as a result. Soon after the task completes again GPU utilisation turns to 500MB. I have tensorflow and keras GPU version installed in my system. My Question is:
Why does this code runs on GPU. Does the complete code uses GPU or its only for importing libraries such as keras-gpu and tensorflow-gpu?
As I can see from your code, you are using Keras and Tensorflow. From Keras F.A.Q.
If you are running on the TensorFlow or CNTK backends, your code will automatically run on GPU if any available GPU is detected.
You can force Keras to run on CPU only
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""

Resources