Can I disable CUDA temporarily in PyTorch? [duplicate] - linux

I want to do some timing comparisons between CPU & GPU as well as some profiling and would like to know if there's a way to tell pytorch to not use the GPU and instead use the CPU only? I realize I could install another CPU-only pytorch, but hoping there's an easier way.

Before running your code, run this shell command to tell torch that there are no GPUs:
export CUDA_VISIBLE_DEVICES=""
This will tell it to use only one GPU (the one with id 0) and so on:
export CUDA_VISIBLE_DEVICES="0"

I just wanted to add that it is also possible to do so within the PyTorch Code:
Here is a small example taken from the PyTorch Migration Guide for 0.4.0:
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
...
# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)
I think the example is pretty self-explaining. But if there are any questions just ask! One big advantage is when using this syntax like in the example above is, that you can create code which runs on CPU if no GPU is available but also on GPU without changing a single line.
Instead of using the if-statement with torch.cuda.is_available() you can also just set the device to CPU like this:
device = torch.device("cpu")
Further you can create tensors on the desired device using the device flag:
mytensor = torch.rand(5, 5, device=device)
This will create a tensor directly on the device you specified previously.
I want to point out, that you can switch between CPU and GPU using this syntax, but also between different GPUs.
I hope this is helpful!

Simplest way using Python is:
os.environ["CUDA_VISIBLE_DEVICES"]=""

There are multiple ways to force CPU use:
Set default tensor type:
torch.set_default_tensor_type(torch.FloatTensor)
Set device and consistently reference when creating tensors:
(with this you can easily switch between GPU and CPU)
device = 'cpu'
# ...
x = torch.rand(2, 10, device=device)
Hide GPU from view:
import os
os.environ["CUDA_VISIBLE_DEVICES"]=""

General
As previous answers showed you can make your pytorch run on the cpu using:
device = torch.device("cpu")
Comparing Trained Models
I would like to add how you can load a previously trained model on the cpu (examples taken from the pytorch docs).
Note: make sure that all the data inputted into the model also is on the cpu.
Recommended loading
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location=torch.device("cpu")))
Loading entire model
model = torch.load(PATH, map_location=torch.device("cpu"))

This is a real world example: original function with gpu, versus new function with cpu.
Source: https://github.com/zllrunning/face-parsing.PyTorch/blob/master/test.py
In my case I have edited these 4 lines of code:
#totally new line of code
device=torch.device("cpu")
#net.cuda()
net.to(device)
#net.load_state_dict(torch.load(cp))
net.load_state_dict(torch.load(cp, map_location=torch.device('cpu')))
#img = img.cuda()
img = img.to(device)
#new_function_with_cpu
def evaluate(image_path='./imgs/116.jpg', cp='cp/79999_iter.pth'):
device=torch.device("cpu")
n_classes = 19
net = BiSeNet(n_classes=n_classes)
#net.cuda()
net.to(device)
#net.load_state_dict(torch.load(cp))
net.load_state_dict(torch.load(cp, map_location=torch.device('cpu')))
net.eval()
to_tensor = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),])
with torch.no_grad():
img = Image.open(image_path)
image = img.resize((512, 512), Image.BILINEAR)
img = to_tensor(image)
img = torch.unsqueeze(img, 0)
#img = img.cuda()
img = img.to(device)
out = net(img)[0]
parsing = out.squeeze(0).cpu().numpy().argmax(0)
return parsing
#original_function_with_gpu
def evaluate(image_path='./imgs/116.jpg', cp='cp/79999_iter.pth'):
n_classes = 19
net = BiSeNet(n_classes=n_classes)
net.cuda()
net.load_state_dict(torch.load(cp))
net.eval()
to_tensor = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),])
with torch.no_grad():
img = Image.open(image_path)
image = img.resize((512, 512), Image.BILINEAR)
img = to_tensor(image)
img = torch.unsqueeze(img, 0)
img = img.cuda()
out = net(img)[0]
parsing = out.squeeze(0).cpu().numpy().argmax(0)
return parsing

Related

tensorflow only predicts '[UNK]' characters

I am trying to generate text in a certain style using tensorflow, and even when I copy and paste the code from the tensorflow website it only predicts unknown characters even though there is a mask to prevent this. I'm thinking it has something to do with the version of python I'm running (3.9.15) since it doesnt even work with their code and dataset.
Has anyone else run into this issue?
class OneStep(tf.keras.Model):
def __init__(self, model, chars_from_ids, ids_from_chars, temperature=1.0):
super().__init__()
self.temperature = temperature
self.model = model
self.chars_from_ids = chars_from_ids
self.ids_from_chars = ids_from_chars
# Create a mask to prevent "[UNK]" from being generated.
skip_ids = self.ids_from_chars(['[UNK]'])[:, None]
sparse_mask = tf.SparseTensor(
# Put a -inf at each bad index.
values=[-float('inf')]*len(skip_ids),
indices=skip_ids,
# Match the shape to the vocabulary
dense_shape=[len(ids_from_chars.get_vocabulary())])
self.prediction_mask = tf.sparse.to_dense(sparse_mask)
#tf.function
def generate_one_step(self, inputs, states=None):
# Convert strings to token IDs.
input_chars = tf.strings.unicode_split(inputs, 'UTF-8')
input_ids = self.ids_from_chars(input_chars).to_tensor()
# Run the model.
# predicted_logits.shape is [batch, char, next_char_logits]
predicted_logits, states = self.model(inputs=input_ids, states=states,
return_state=True)
# Only use the last prediction.
predicted_logits = predicted_logits[:, -1, :]
predicted_logits = predicted_logits/self.temperature
# Apply the prediction mask: prevent "[UNK]" from being generated.
predicted_logits = predicted_logits + self.prediction_mask
# Sample the output logits to generate token IDs.
predicted_ids = tf.random.categorical(predicted_logits, num_samples=1)
predicted_ids = tf.squeeze(predicted_ids, axis=-1)
# Convert from token ids to characters
predicted_chars = self.chars_from_ids(predicted_ids)
# Return the characters and model state.
return predicted_chars, states
I lifted this straight from their tutorial and tried to run it in my enviroment and it just predicted '[UNK]'
I'm running a mac M1 with the latest version of tensorflow. So, that may also be an issue.
PLEASE HELP!!
Tutorial for reference: https://www.tensorflow.org/text/tutorials/text_generation

How to use multiprocessing in PyTorch?

I'm trying to use PyTorch with complex loss function. In order to accelerate the code, I hope that I can use the PyTorch multiprocessing package.
The first trial, I put 10x1 features into the NN and get 10x4 output.
After that, I want to pass 10x4 parameters into a function to do some calculation. (The calculation will be complex in the future.)
After calculating, the function will return a 10x1 array in total. This array will be set as NN_energy and calculate loss function.
Besides, I also want to know if there is another method to create a backward-able array to store the NN_energy array, instead of using
NN_energy = net(Data_in)[0:10,0]
Thanks a lot.
Full Code:
import torch
import numpy as np
from torch.autograd import Variable
from torch import multiprocessing
def func(msg,BOP):
ans = (BOP[msg][0]+BOP[msg][1]/BOP[msg][2])*BOP[msg][3]
return ans
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden_1, n_hidden_2, n_output):
super(Net, self).__init__()
self.hidden_1 = torch.nn.Linear(n_feature , n_hidden_1) # hidden layer
self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2) # hidden layer
self.predict = torch.nn.Linear(n_hidden_2, n_output ) # output layer
def forward(self, x):
x = torch.tanh(self.hidden_1(x)) # activation function for hidden layer
x = torch.tanh(self.hidden_2(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__': # apply_async
Data_in = Variable( torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() )
Ground_truth = Variable( torch.from_numpy( np.asarray(list(range(20,30))).reshape(10,1) ).float() )
net = Net( n_feature=1 , n_hidden_1=15 , n_hidden_2=15 , n_output=4 ) # define the network
optimizer = torch.optim.Rprop( net.parameters() )
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
NN_output = net(Data_in)
args = range(0,10)
pool = multiprocessing.Pool()
return_data = pool.map( func, zip(args, NN_output) )
pool.close()
pool.join()
NN_energy = net(Data_in)[0:10,0]
for i in range(0,10):
NN_energy[i] = return_data[i]
loss = torch.sqrt( loss_func( NN_energy , Ground_truth ) ) # must be (1. nn output, 2. target)
print(loss)
Error messages:
File
"C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py",
line 126, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which
requires_grad, since autograd does not support crossing process
boundaries. If you just want to transfer the data, call detach() on
the tensor before serializing (e.g., putting it on the queue).
First of all, Torch Variable API is deprecated since a very long time, just don't use it.
Next, torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() is wrong at many levels: np.asarray of list is useless since a copy will be performed anyway, and np.array takes list as input by design. Then, np.arange is available to return a range as numpy array, and it is also available on Torch. Next, specifying both dimension for reshape is useless and error prone, you could simply do reshape((-1, 1)), or even better unsqueeze(-1).
Here is the simplified expression torch.arange(10, dtype=torch.float32, requires_grad=True).unsqueeze(-1).
Using multiprocessing pool is a bad practice if using batch processing is possible. It will be both way more efficient and readable. Indeed, performing N small algebraic operations in parallel is always slower and a larger single algebraic operation, and even more on GPU. More importantly, computing the gradient is not supported by multiprocessing, hence the error that you get. Yet, this is partially true, because it is supports for tensors on cpu since 1.6.0. Have a lok, to the official release changelog.
Could you post a more representative example of what func method could be to make sure you really need it ?
NB: Distributed autograd as you are looking is now available in Pytorch as an experimental feature available in beta since 1.6.0. Have a look to the official documentation.

Tensorflow variable/graph key missing while reloading the model in Python 3.6

ERROR MESSAGE:
NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
""" PLACEHOLDER """
x=tf.placeholder(tf.float32,shape=[None,784])
y_true=tf.placeholder(tf.float32,shape=[None,4])
Weight=tf.global_variables()
bias=tf.global_variables()
""" LAYERS:"""
"the image input layer 28*28 input"
x_image = tf.reshape(x,[-1,28,28,1])
"""first convolution, using 6*6 filter, and then max pooling 2by2, final
output will have depth 32
here we are calculating 32 features
"""
convo_1 = convolutional_layer(x_image,shape=[6,6,1,32])
convo_1_pooling = max_pool_2by2(convo_1)
"""first convolution, using 6*6 filter, and then max pooling 2by2, final
output will have 64
features hence depth 64
"""
convo_2 = convolutional_layer(convo_1_pooling,shape=[6,6,32,64])
convo_2_pooling = max_pool_2by2(convo_2)
"flattening the output of the last pooling layer to fuly connect it"
convo_2_flat = tf.reshape(convo_2_pooling,[-1,7*7*64])
Wt,bs=normal_full_layer(convo_2_flat,1024)#1024 nodes in the final layer
full_layer_one = tf.nn.relu(tf.matmul(convo_2_flat,Wt)+bs)
# NOTE THE PLACEHOLDER HERE!
hold_prob = tf.placeholder(tf.float32)
full_one_dropout = tf.nn.dropout(full_layer_one,keep_prob=hold_prob)
Weight, bias= normal_full_layer(full_one_dropout,2)
y_pred=tf.matmul(full_one_dropout,Weight)+bias
"""INITIALIZE VARIABLES"""
init = tf.global_variables_initializer()
"""Add ops to save and restore all the variables"""
saver = tf.train.Saver()
"""Later, launch the model, use the saver to restore variables from disk,
and
# do some work with the model"""
with tf.Session() as sess:
# Restore variables from disk.
saver.restore(sess,SAVE_PATH)
print("Model restored.")
If you use the spyder, please Restart kernel! It can work well.

Creating a session in a graph that uses another graph and its session

Versions : I am using tensorflow (version : v1.1.0-13-g8ddd727 1.1.0) in python3 (Python 3.4.3 (default, Nov 17 2016, 01:08:31) [GCC 4.8.4] on linux), it is installed from source and GPU-based (name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate (GHz) 1.076).
Context : Generative adversarial networks (GANs) learn to synthesise new samples from a high-dimensional distribution by passing samples drawn from a latent space through a generative network. When the high-dimensional distribution describes images of a particular data set, the network should learn to generate visually similar image samples for latent variables that are close to each other in the latent space. For tasks such as image retrieval and image classification, it may be useful to exploit the arrangement of the latent space by projecting images into it, and using this as a representation for discriminative tasks.
Context Problem : I am trying to invert a generator (compute L2 norm between an input image in cifar10 and a image g(z) of the generator, where z is a parameter to be trained with stochastic gradient descent in order to minimize this norm and find an approximation of the preimage of the input image).
Technical Issue : Therefore, I am building a new graph in a new session in tensorflow but I need to use a trained gan that was trained in another session, which I cannot import because the two graphs are not the same. That is to say, when I use sess.run(), the variables are not found and therefore there is a Error Message.
The code is
import tensorflow as tf
from data import cifar10, utilities
from . import dcgan
import logging
logger = logging.getLogger("gan.test")
BATCH_SIZE = 1
random_z = tf.get_variable(name='z_to_invert', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
#random_z = tf.random_normal([BATCH_SIZE, 100], mean=0.0, stddev=1.0, name='random_z')
# Generate images with generator
generator = dcgan.generator(random_z, is_training=True, name='generator')
# Add summaries to visualise output images
generator_visualisation = tf.cast(((generator / 2.0) + 0.5) * 255.0, tf.uint8)
summary_generator = tf.summary.\
image('summary/generator', generator_visualisation,
max_outputs=8)
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
summary_inp = tf.summary.image('input_image', inp)
img_summary = tf.summary.merge([summary_generator, summary_inp])
with tf.name_scope('error'):
error = inp - generator #generator = g(z)
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sv = tf.train.Supervisor(logdir="gan/invert_logs/", save_summaries_secs=None, save_model_secs=None)
batch = 0
with sv.managed_session() as sess:
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
while not sv.should_stop():
if batch > 0 and batch % 100 == 0:
logger.debug('Step {} '.format(batch))
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, batch)
print('step %d: Patiente un peu poto!' % batch)
img = sess.run(img_summary)
logwriter.add_summary(img, batch)
batch += 1
print(batch)
I understood what is the problem, it is actually that I am trying to run a session which is saved in gan/train_logs but the graph does not have those variables I am trying to run.
Therefore, I tried to implement this instead :
graph = tf.Graph()
tf.reset_default_graph()
with tf.Session(graph=graph) as sess:
ckpt = tf.train.get_checkpoint_state('gan/train_logs/')
saver = tf.train.import_meta_graph(ckpt.model_checkpoint_path + '.meta', clear_devices=True)
saver.restore(sess, ckpt.model_checkpoint_path)
logwriter = tf.summary.FileWriter("gan/invert_logs/", sess.graph)
#inp, _ = next(test_image)
BATCH_SIZE = 1
#Create one image to test inverting
test_image = map((lambda inp: (inp[0]*2. -1., inp[1])),
utilities.infinite_generator(cifar10.get_train(), BATCH_SIZE))
inp, _ = next(test_image)
#M_placeholder = tf.placeholder(tf.float32, shape=cifar10.get_shape_input(), name='M_input')
M_placeholder = inp
zmar = tf.summary.image('input_image', inp)
#Create sample noise from random normal distribution
z = tf.get_variable(name='z', shape=[BATCH_SIZE, 100], initializer=tf.random_normal_initializer())
# Function g(z) zhere z is randomly generated
g_z = dcgan.generator(z, is_training=True, name='generator')
generator_visualisation = tf.cast(((g_z / 2.0) + 0.5) * 255.0, tf.uint8)
sum_generator = tf.summary.image('summary/generator', generator_visualisation)
img_summary = tf.summary.merge([sum_generator, zmar])
with tf.name_scope('error'):
error = M_placeholder - g_z
# We set axis = None because norm(tensor, ord=ord) is equivalent to norm(reshape(tensor, [-1]), ord=ord)
error_norm = tf.norm(error, ord=2, axis=None, keep_dims=False, name='L2Norm')
summary_error = tf.summary.scalar('error_norm', error_norm)
with tf.name_scope('Optimizing'):
optimizer = tf.train.AdamOptimizer(0.001).minimize(error_norm, var_list=z)
sess.run(tf.global_variables_initializer())
for i in range(10000):
(_, s) = sess.run((optimizer, summary_error))
logwriter.add_summary(s, i)
print('step %d: Patiente un peu poto!' % i)
img = sess.run(img_summary)
logwriter.add_summary(img, i)
print('Done Training')
This script runs, but I have checked on tensorboard, the generator that is used here does not have the trained weights and it only produces noise.
I think I am trying to run a session in a graph that uses another graph and its trained session. I have read thoroughly the Graphs and Session documentation on tensorflow website https://www.tensorflow.org/versions/r1.3/programmers_guide/graphs, I have found an interesting tf.import_graph_def function :
You can rebind tensors in the imported graph to tf.Tensor objects in the default graph by passing the optional input_map argument. For example, input_map enables you to take import a graph fragment defined in a tf.GraphDef, and statically connect tensors in the graph you are building to tf.placeholder tensors in that fragment.
You can return tf.Tensor or tf.Operation objects from the imported graph by passing their names in the return_elements list.
But I don't know how to use this function, no example is given, and also I only found those two links that may help me :
https://github.com/tensorflow/tensorflow/issues/7508
Tensorflow: How to use a trained model in a application?
It would be really nice to have your help on this topic. This should be straightforward for someone who has already used the tf.import_graph_def function... What I really need is to get the trained generator to apply it to a new variable z which is to be trained in another session.
Thanks

Train your own image with tensorflow?

I have one image ( i don't have dataset ) I want to train a model in tensorflow,
such that I can use that model to recognize the image fast.
I have implemented one such thing, but it doesn't work:
import tensorflow as tf
filenames = ['pic.jpg']
# step 2
filename_queue = tf.train.string_input_producer(filenames)
# step 3: read, decode and resize images
reader = tf.WholeFileReader()
filename, content = reader.read(filename_queue)
image = tf.image.decode_jpeg(content, channels=3)
image = tf.cast(image, tf.float32)
resized_image = tf.image.resize_images(image, [224, 224])
# step 4: Batching
image_batch = tf.train.batch([resized_image], batch_size=8)
Also, how vuforia is able to recognize with only one image so fast?. I want a similar implementation in tensorflow
This is not how machine learning and deep learning works. You can't just grab one element and build a model which explains this one element. If you will check a few NN tutorials, you will see that in order to train a reasonable model people use thousands or even millions of data points.

Resources