I am trying to use two gpus on my windows machine, but I keep getting
raise RuntimeError("Distributed package doesn't have NCCL " "built
in") RuntimeError: Distributed package doesn't have NCCL built in
I am still new to pytorch and couldnt really find a way of setting the backend to ‘gloo’. Any way to set backend= 'gloo' to run two gpus on windows.
from torch import distributed as dist
Then in your init of the training logic:
dist.init_process_group("gloo", rank=rank, world_size=world_size)
Update:
You should use python multiprocess like this:
class Trainer:
def __init__(self, rank, world_size):
self.rank = rank
self.world_size = world_size
self.log('Initializing distributed')
os.environ['MASTER_ADDR'] = self.args.distributed_addr
os.environ['MASTER_PORT'] = self.args.distributed_port
dist.init_process_group("gloo", rank=rank, world_size=world_size)
if __name__ == '__main__':
world_size = torch.cuda.device_count()
mp.spawn(
Trainer,
nprocs=world_size,
args=(world_size,),
join=True)
Related
What is the best pythonic way to check for the presence or absence of fit method in a generic machine learning model knowing its estimator?
Here is a portion of the code:
import os
import errno
from constants import (numerical_columns, categorics, names, ids)
from pycaret.regression import (setup, save_model, get_logs, compare_models, predict_model, plot_model, finalize_model, load_model)
def silentremove(filenames):
for filename in filenames:
try:
os.remove(filename)
except OSError as e: # this would be "except OSError, e:" before Python 2.6
if e.errno != errno.ENOENT: # errno.ENOENT = no such file or directory
raise # re-raise exception if a different error occurred
def ml_modelling(all_data, train, test) -> None:
for target_var in targets:
numerical_features = [col for col in numerical_columns if col != target_var]
X, y = train.loc[:, train.columns != target_var], train[target_var]
s = setup(
data=train,
target=target_var,
ignore_features = ['Series'],
numeric_features=[el for el in numerical_features if el not in categorics],
categorical_features = categorics,
silent=True,
log_experiment=True,)
best_model = compare_models()
exp_logs = get_logs()
save_model(best_model, 'best_model')
# Making some plots
for id, name in zip(ids, names):
target_name = f'plots/{target_var}/'+name+'.png'
silentremove([name, target_name])
try:
best_model.fit(X, y)
plot_model(str(best_model), plot=id, scale=6, save=True)
os.rename(name, target_name)
except AttributeError:
pass # Code in case model does not have .fit() method
final_best = finalize_model(best_model)
loaded_model = load_model('best_model')
prediction = predict_model(loaded_model, round=0,)
##########################################################
def main() -> None:
train, test = "some dataframes consisting of numerics and categoricals"
ml_modelling(data, train, test)
if __name__ == "__main__":
main()
"Ask forgiveness not permission" is generally considered Pythonic, so I suppose an approach like:
try:
model.fit(Xdata, ydata)
except AttributeError:
pass # Code in case model does not have .fit() method
Would be valid in many (but not all) use cases. Especially if a large portion of the given estimators would be from packages like sklearn, where almost all models would have a .fit method, because catching exception is quite slow. Though the "premature optimization is the root of all evil" idiom could be relevant here.
Realise that this would also catch potential AttributeErrors thrown within the excecution of the .fit() method, so it could theoretically make errors thrown in poorly implemented models harder to debug. If that is a problem in your use case, maybe it is better to ask for permission.
In a compound pytorch network net with different subnetworks, I would like to initialize a specific subnetwork net1 with a random seed seed1, while the rest of the network uses the globally chosen random seed seed.
import torch
import troch.nn as nn
class Net1(nn.Module):
def __init__(self):
self.mlp = MLP(...) # Some MLP network module.
def forward(self,x):
return self.mlp(x)
class Net(nn.Module):
def __init__(self,seed1):
# Other initializations using globally set seed
# Then I want something like
with torch.local_seed(seed1):
self.net1 = Net1()
# Back to global seed
...
if __name__=='__main__':
seed = 42
seed1 = 1
torch.manual_seed(seed)
net = Net(seed1)
I can't work out how to invoke my .tflite model that does matmul on the coral accelerator using the python api.
The .tflite model is generated from some example code here. It works well using the tf.lite.Interpreter() class but I don't know how to transform it to work with the edgetpu class. I have tried edgetpu.basic.basic_engine.BasicEngine() by changing the models datatype from numpy.float32 to numpy.uint8, but that did not help. I am a complete beginner with TensorFlow and just want to use my tpu for matmul.
import numpy
import tensorflow as tf
import edgetpu
from edgetpu.basic.basic_engine import BasicEngine
def export_tflite_from_session(session, input_nodes, output_nodes, tflite_filename):
print("Converting to tflite...")
converter = tf.lite.TFLiteConverter.from_session(session, input_nodes, output_nodes)
tflite_model = converter.convert()
with open(tflite_filename, "wb") as f:
f.write(tflite_model)
print("Converted %s." % tflite_filename)
#This does matmul just fine but does not use the TPU
def test_tflite_model(tflite_filename, examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
interpreter = tf.lite.Interpreter(model_path=tflite_filename)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print("input details: %s" % input_details)
print("output details: %s" % output_details)
for i, input_tensor in enumerate(input_details):
interpreter.set_tensor(input_tensor['index'], examples[i])
interpreter.invoke()
model_output = []
for i, output_tensor in enumerate(output_details):
model_output.append(interpreter.get_tensor(output_tensor['index']))
return model_output
#this should use the TPU, but I don't know how to run the model or if it needs
#further processing. One matrix can be constant for my use case
def test_tpu(tflite_filename,examples):
print("Loading TFLite interpreter for %s..." % tflite_filename)
#TODO edgetpu.basic
interpreter = BasicEngine(tflite_filename)
interpreter.allocate_tensors()#does not work...
def main():
tflite_filename = "model.tflite"
shape_a = (2, 2)
shape_b = (2, 2)
a = tf.placeholder(dtype=tf.float32, shape=shape_a, name="A")
b = tf.placeholder(dtype=tf.float32, shape=shape_b, name="B")
c = tf.matmul(a, b, name="output")
numpy.random.seed(1234)
a_ = numpy.random.rand(*shape_a).astype(numpy.float32)
b_ = numpy.random.rand(*shape_b).astype(numpy.float32)
with tf.Session() as session:
session_output = session.run(c, feed_dict={a: a_, b: b_})
export_tflite_from_session(session, [a, b], [c], tflite_filename)
tflite_output = test_tflite_model(tflite_filename, [a_, b_])
tflite_output = tflite_output[0]
#test the TPU
tflite_output = test_tpu(tflite_filename, [a_, b_])
print("Input example:")
print(a_)
print(a_.shape)
print(b_)
print(b_.shape)
print("Session output:")
print(session_output)
print(session_output.shape)
print("TFLite output:")
print(tflite_output)
print(tflite_output.shape)
print(numpy.allclose(session_output, tflite_output))
if __name__ == '__main__':
main()
You're only converting your model once, and your model is not fully compiled for the Edge TPU. From the docs:
At the first point in the model graph where an unsupported operation occurs, the compiler partitions the graph into two parts. The first part of the graph that contains only supported operations is compiled into a custom operation that executes on the Edge TPU, and everything else executes on the CPU
There are several specific requirements that the model must meet:
quantization-aware training
constant tensor sizes and model parameters at compile time
tensors are 3-dimensional or smaller.
models only use operations supported by the Edge TPU.
There is an online compiler as well as a CLI version that is useful for translating .tflite models into Edge TPU compatible .tflite models.
Your code is also incomplete. You've passed your model to the class here:
interpreter = BasicEngine(tflite_filename)
but you're missing the step of actually running the inference on the tensor:
output = RunInference(interpreter)
How to use tfrecord with pytorch?
I have downloaded "Youtube8M" datasets with video-level features, but it is stored in tfrecord.
I tried to read some sample from these file to convert it to numpy and then load in pytorch. But it failed.
reader = YT8MAggregatedFeatureReader()
files = tf.gfile.Glob("/Data/youtube8m/train*.tfrecord")
filename_queue = tf.train.string_input_producer(
files, num_epochs=5, shuffle=True)
training_data = [
reader.prepare_reader(filename_queue) for _ in range(1)
]
unused_video_id, model_input_raw, labels_batch, num_frames = tf.train.shuffle_batch_join(
training_data,
batch_size=1024,
capacity=1024 * 5,
min_after_dequeue=1024,
allow_smaller_final_batch=True ,
enqueue_many=True)
with tf.Session() as sess:
label_numpy = labels_batch.eval()
print(type(label_numpy))
But this step have no result, just stuck for a long while without any response.
One work around is to use tensorflow 1.1* eager mode or tensorflow 2+ to loop through the dataset(so you can use var len feature, use buckets window), then just
torch.as_tensor(val.numpy()).to(device) to use in torch.
You can use the DALI library to load the tfrecords directly in a PyTorch code.
You can find out, how to do it in their documentation.
Maybe this can help you: TFRecord reader for PyTorch
I cooked up this:
class LiTS(torch.utils.data.Dataset):
def __init__(self, filenames):
self.filenames = filenames
def __len__(self):
return len(self.filenames)
def __getitem__(self, idx):
volume, segmentation = None, None
if idx >= len(self):
raise IndexError()
ds = tf.data.TFRecordDataset(filenames[idx:idx+1])
for x, y in ds.map(read_tfrecord):
volume = torch.from_numpy(x.numpy())
segmentation = torch.from_numpy(y.numpy())
return volume, segmentation
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn import clone
import multiprocessing
import functools
import numpy as np
def train_model(n_estimators, base_model, X, y):
model = clone(base_model)
model.set_params(n_estimators=n_estimators)
model.fit(X,y)
return model
class A():
def __init__(self, random_state, jobs, **kwargs):
self.model = RandomForestClassifier(oob_score=True, random_state=random_state, **kwargs)
self.jobs = jobs
def fit(self, X, y):
job_pool = multiprocessing.Pool(self.jobs)
n_estimators = [100]
for output in job_pool.imap_unordered(functools.partial(train_model,
base_model=self.model,
X=X,
y=y),n_estimators):
model = output
job_pool.terminate()
self.model = model
if __name__ == '__main__':
np.random.seed(42)
X, y = make_classification(n_samples=500,n_informative=6,n_redundant=6, flip_y=0.1)
print "Class A"
for i in range(5):
base_model = A(random_state=None, jobs=1)
base_model.fit(X,y)
print base_model.model.oob_score_
print "Bare RF"
base_model = RandomForestClassifier(n_estimators=500, max_features=2, oob_score=True, random_state=None)
for i in range(5):
model = clone(base_model)
model.fit(X,y)
print model.oob_score_
Output on a Windows 7 machine (Python 2.7.13):
(pip freeze : numpy==1.11.0, scikit-image==0.12.3, scikit-learn==0.17, scipy==0.17.0)
Class A
0.82
0.826
0.832
0.822
0.816
Bare RF
0.814
0.81
0.818
0.818
0.818
Output on a Red Hat 4.8.3-9 Linux machine (Python 2.7.5):
(pip freeze: numpy==1.11.0, scikit-learn==0.17, scipy==0.17.0, sklearn==0.0)
Class A
0.818
0.818
0.818
0.818
0.818
Bare RF
0.814
0.81
0.818
0.818
0.818
So, to sum up:
In Linux, "Class A" (which uses multiprocessing) seems to be training the same exact model, hence the same scores. Whereas the behavior I would expect would be the one of the "Bare RF" section where the scores do not coincide (it's a random algorithm). In Windows (Pycharm), the issue cannot be reproduced.
Can you please help?
BIG EDIT: Created a reproducible code example.
The solution is to add a reseed inside "train_model" which is executed in parallel.
def train_model(n_estimators, base_model, X, y):
np.random.seed()
model = clone(base_model)
model.set_params(n_estimators=n_estimators)
model.fit(X,y)
return model
The reasoning:
What happens is that on Unix every worker process inherits the same state of the random number generator from the parent process. This is why they generate identical pseudo-random sequences.
It is multiprocessing which actually launches the worker processes, that's why this is relevant. So this is not a scikit-learn clone issue.
I've found the answer here and here