tf.global_variables_initializer() does not work - python-3.x

Hello Tensorflow users/developers,
Even though I call initializer function, reporter tells me that none of my variable is initialized. I created them using tf.get_variable(). Here is where my session and graph objects are created:
with tf.Graph().as_default():
# Store all scores (each score is a loss-per-episode)
init = tf.global_variables_initializer()
all_scores, scores = [], []
# Build common tensors used throughout entire session
nn.build(seq_len)
# Generate inference and loss models
[loss, train_op] = nn.generate_models()
with tf.Session() as sess:
try:
st = time.time()
# Initialize all variables (Note that! not operation tensors; but variable tensors)
print('Initializing variables...')
sess.run(init)
print('Training starts...')
for e, (input_, target) in sample_generator:
feed_dict = nn.prepare_dict(input_, target)
# Run one step of the model. The return values are the activations
# from the `train_op` (which is discarded) and the `loss` Op.
x = sess.run(tf.report_uninitialized_variables(tf.global_variables()))
print(x)
_, score = sess.run([train_op, loss],
feed_dict=feed_dict)
all_scores.append(score)
scores.append(score)
# Asses your predictions against target
if e > 0 and not (e%100):
print('Episode %05d: %.6f' % (e, np.mean(scores).tolist()[0]))
scores.clear()
except KeyboardInterrupt:
print('Elapsed time: %ld' % (time.time()-st))
pass
I've called this method for millions of times before, and it had worked perfectly; but right now it is leaving me in the lurch. What do you think the cause might be? Any suggestion would really be appreciated.
P.S. I tried calling tf.local_variables_initializer() too; though reporter told me that you don't have any local at all.
Thanks in advance.

Thanks for the reply.
Well I've figured it out. I shouldn't have executed the following assignment instruction before I build my model:
init = tf.global_variables_initializer()
For anyone's information: You may think that "I'll execute and get the result of this operation called 'init' when I do so in a Session. So it doesn't matter where I do the assignment specified above".
No! It is not true. Tensorflow decides on which variables to be initialized right after this assignment instruction is executed. Thus, call it after you build your entire model.

If it does not exist I suspect you accidentally downgraded you Tensorflow version.
Can you try tf.initialize_all_variables ?
If this does not work, can you post what version you are using?

I got the same error. However this is my solution: just skip the init = tf.global_variables_initializer()
and just use :
sess = tf.Session
sess.run(init = tf.global_variables_initializer())

Related

Calling VGG many times causes an out of memory error

I want to extract the VGG features of a set of images and keep them in memory in a dictionary. The dictionary ends up holding 8091 tensors each of shape (1,4096), but my machine crashes with an out of memory error after about 6% of the way. Does anybody have a clue why this is happening and how to prevent it?
In fact, this seems to be triggered by the call to VGG rather than the memory space, since storing the VGG classification is sufficient to trigger the error.
Below is the simplest code I've found to reproduce the error. Once a helper function is defined:
import torch, torchvision
from tqdm import tqdm
vgg = torchvision.models.vgg16(weights='DEFAULT')
def try_and_crash(gen_data):
store_out = {}
for i in tqdm(range(8091)):
my_output = gen_data(torch.randn(1,3,224,224))
store_out[i] = my_output
return store_out
Calling it to quickly produce a large tensor doesn't cause a fuss
just_fine = try_and_crash(lambda x: torch.randn(1,4096))
but calling it to use vgg causes the machine to crash:
will_crash = try_and_crash(vgg)
The problem is that each element of the dictionary store_out[i] also stores the gradients that led to its computation, therefore ends up being much larger than a simple 1x4096 element tensor.
Running the code with torch.no_grad(), or equivalently with torch.set_grad_enabled(False) solves the issue. We can test it by slightly changing the helper function
def try_and_crash_grad(gen_data, grad_enabled):
store_out = {}
for i in tqdm(range(8091)):
with torch.set_grad_enabled(grad_enabled):
my_output = gen_data(torch.randn(1,3,224,224))
store_out[i] = my_output
return store_out
Now the following works
works_fine = try_and_crash_grad(vgg, False)
while the following throws an out of memory error
crashes = try_and_crash_grad(vgg, True)

Creating tensors on M1 GPU by default on PyTorch using jupyter

Right now, if I want to create a tensor on gpu, I have to do it manually. For context, I'm sure that GPU support is available since
print(torch.backends.mps.is_available())# this ensures that the current current PyTorch installation was built with MPS activated.
print(torch.backends.mps.is_built())
returns True.
I've been doing this every time:
device = torch.device("mps")
a = torch.randn((), device=device, dtype=dtype)
Is there a way to specify, for a jupyter notebook, that all my tensors are supposed to be run on the GPU?
The convenient way
There is no convenient way to set default device to MPS as of 2022-12-22, per discussion on this issue.
The inconvenient way
You can accomplish the objective of 'I don't want to specify device= for tensor constructors, just use MPS' by intercepting calls to tensor constructors:
class MPSMode(torch.overrides.TorchFunctionMode):
def __init__(self):
# incomplete list; see link above for the full list
self.constructors = {getattr(torch, x) for x in "empty ones arange eye full fill linspace rand randn randint randperm range zeros tensor as_tensor".split()}
def __torch_function__(self, func, types, args=(), kwargs=None):
if kwargs is None:
kwargs = {}
if func in self.constructors:
if 'device' not in kwargs:
kwargs['device'] = 'mps'
return func(*args, **kwargs)
# sensible usage
with MPSMode():
print(torch.empty(1).device) # prints mps:0
# sneaky usage
MPSMode().__enter__()
print(torch.empty(1).device) # prints mps:0
The recommended way:
I would lean towards just putting your device in a config at the top of your notebook and using it explicitly:
class Conf: dev = torch.device("mps")
# ...
a = torch.randn(1, device=Conf.dev)
This requires you to type device=Conf.dev throughout the code. But you can easily switch your code to different devices, and you don't have any implicit global state to worry about.
as of 2023-01-20, with pytorch 2.0 nightly, you can set default device as mps using:
torch.set_default_device("mps")

Why would you call .detach() on a parameter when the code is within with torch.no_grad()?

So I have this code for updating critic in SAC
with torch.no_grad():
_, policy_action, log_pi, _ = self.actor(next_obs)
target_Q1, target_Q2 = self.critic_target(next_obs, policy_action)
target_V = torch.min(target_Q1, target_Q2) - self.alpha.detach() * log_pi
target_Q = reward + (not_done * self.discount * target_V)
this is not my code it's code I got off GitHub. As we can see they have a torch.no_grad() and self.alpha.detach() why would you need both? This seems redundant to me as torch.no_grad() anything within the with statement will not be added to the computational graph and .detach() does the same thing but for a single variable. Why would you use torch.no_grad() and detach()?
You don't need to detach tensors from the graph when under the torch.no_grad context manager. However, I suspect this snippet was copied from the inference loop where the gradients are computed by default. You could verify that by navigating the training loop's source file.

Disable grad and backward Globally?

How to disable GLOBALLY grad,backward and any other non forward() functionality in Torch ?
I see examples of how to do it locally but not globally ?
The Docs say that what may be I'm looking is Inference only mode ! but how to set it globally.
You can use torch.set_grad_enabled(False) to disable gradient propagation globally for the entire thread. Besides, after you called torch.set_grad_enabled(False), doing anything like backward() will raise an exception.
a = torch.tensor(np.random.rand(64,5),dtype=torch.float32)
l = torch.nn.Linear(5,10)
o = torch.sum(l(a))
print(o.requires_grad) #True
o.backward()
print(l.weight.grad) #showed gradients
torch.set_grad_enabled(False)
o = torch.sum(l(a))
print(o.requires_grad) #False
o.backward()# RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
print(l.weight.grad)

Protocol problem with PyMc3 on jupyter notebook

I am working with the following code, but I get an error
import pymc3 as pm
import theano.tensor as tt
with pm.Model() as model:
alpha = 1.0/count_data.mean() # Recall count_data is the
# variable that holds our txt counts
lambda_1 = pm.Exponential("lambda_1", alpha)
lambda_2 = pm.Exponential("lambda_2", alpha)
tau = pm.DiscreteUniform("tau", lower=0, upper=n_count_data - 1)
with model:
idx = np.arange(n_count_data) # Index
lambda_ = pm.math.switch(tau > idx, lambda_1, lambda_2)
with model:
observation = pm.Poisson("obs", lambda_, observed=count_data)
with model:
step = pm.Metropolis()
trace = pm.sample(10000, tune=5000,step=step)
But I get the error
ValueError: must use protocol 4 or greater to copy this object; since getnewargs_ex returned keyword arguments.
I have windows-10, python-3.5.6,
pymc3- 3.5, ipython-6.5.0. Any help is deeply appreciated. Thanks in advance.
It sounds like this exception is being thrown by the joblib library, which uses pickle to send the model to different processes. The easiest fix is to use only a single core, by changing the last line to
trace = pm.sample(10000, tune=5000, step=step, cores=1, chains=4)
It will be hard to diagnose the problem with joblib without more details. Creating a fresh conda environment might help.
The workaround suggested by colcarroll did not work for me. The behavior you are seeing is related to PR#3140 of PyMC3, which you may want to track there. The solution and/or workaround may depend on how you are running theano (with or without GPU support).

Resources