How can I fix the scaling optimization pytorch code error - pytorch

I want make a scaling optimization system
class GraphFit:
def __init__(self):
self.X=torch.tensor([])
self.Y=torch.tensor([])
self.scale_x=torch.Tensor([])
self.scale_y=torch.Tensor([])
self.opt=0
def scale(self,imagx=1.,imagy=1):
self.scale_x=torch.tensor([1.],requires_grad=True)
self.scale_y=torch.tensor([1.],requires_grad=True)
self.X=self.scale_x *self.X
self.Y=self.scale_y *self.Y
def optimize(self,lr=1e-03):
params=[self.scale_x,self.scale_y]
self.opt=torch.optim.Adam(params,lr)
x=torch.linspace(-10,10,100)
yt=0.5*torch.sin(0.2 * x) ##truth
y=torch.sin(x) ## try
G1=GraphFit()
G1.X=x
G1.scale()
G1.optimize()
L=nn.MSELoss()
for i in range(1): ####
G1.opt.zero_grad()
G1.Y=torch.sin(G1.X)
err=L(G1.Y,yt)
err.backward()
G1.opt.step()
if i%100==0:
print(err)
print(G1.scale_y.grad,G1.scale_x.grad)
When I use 'for i in range(1)' , its output is
tensor(0.1189, grad_fn=<MseLossBackward>)
None tensor([-0.0425])
and when I use 'for i in range(N)' for N >1
it's output is
---> 83 err.backward()
Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
What is the problem and how can I fix it?
########################
dummydummydummydummydummydummydummydummydummydummy
dummydummydummydummydummydummydummydummydummydummy
dummydummydummydummydummydummydummydummydummydummy
dummydummydummydummydummydummydummydummydummydummy

Related

Sklearn Pipeline: One feature automatically missed out

I created a Custom Classifier(Dummy Classifier). Below is definition. I also added some print statements & global variables to capture values
class FeaturePassThroughClassifier(ClassifierMixin):
def __init__(self):
pass
def fit(self, X, y):
global test_arr1
self.classes_ = np.unique(y)
test_arr1 = X
print("1:", X.shape)
return self
def predict(self, X):
global test_arr2
test_arr2 = X
print("2:", X.shape)
return X
def predict_proba(self, X):
global test_arr3
test_arr3 = X
print("3:", X.shape)
return X
Below is Stacking Classifier definition where the above defined CustomClassifier is one of base classifier. There are 3 more base classifiers (these are fitted estimators). Goal is to get input training set variables as is (which will come out from CustomClassifier) + prediction from base_classifier2, base_classifier3, base_classifier4. These features will act as input to meta classifier.
model = StackingClassifier(estimators=[
('select_features', Pipeline(steps = [("model_feature_selector", ColumnTransformer([('feature_list', 'passthrough', X_train.columns)])),
('base(dummy)_classifier1', FeaturePassThroughClassifier())])),
('base_classifier2', base_classifier2),
('base_classifier3', base_classifier3),
('base_classifier4', base_classifier4)
],
final_estimator = Pipeline(memory=None,
steps=[
('save_base_estimator_output_data', FunctionTransformer(save_base_estimator_output_data, validate=False)), ('final_model', RandomForestClassifier())
], verbose=True), passthrough = False, **stack_method = 'predict_proba'**)
Below is o/p on fitting the model. There are 230 variables:
Here is the problem: There are 230 variables but CustomClassifier o/p is showing only 229 which is strange. We can clearly see from print statements above that 230 variables get passed through CustomClassifier.
I need to use stack_method = "predict_proba". I am not sure what's going wrong here. The code works fine when stack_method = "predict".
Since this is a binary classifier, the classifier class expects you to add two probability columns in the output matrix - one for probability for class label "1" and another for "0".
In the output, it has dropped one of these since both are not required, hence, 230 columns get reduced to 229. Add a dummy column to solve your problem.
In the Notes section of the documentation:
When predict_proba is used by each estimator (i.e. most of the time for stack_method='auto' or specifically for stack_method='predict_proba'), The first column predicted by each estimator will be dropped in the case of a binary classification problem.
Here's the code that eliminates the first column.
You could add a sacrificial first column in your custom estimator's predict_proba, or switch to decision_function (which will cause differences depending on your real base estimators), or use the passthrough option instead of the custom estimator (doing feature selection in the final_estimator object instead).
Both the above solutions are on point. This is how I implemented the workaround with dummy column:
Declare a custom transformer whose output is the column that gets dropped due reasons explained above:
class add_dummy_column(BaseEstimator, TransformerMixin):
def __init__(self, key):
self.key = key
def fit(self, X, y=None):
return self
def transform(self, X):
print(type(X))
return X[[self.key]]
Do a feature union where above customer transformer + column transformer are called to create final dataframe. This will duplicate the column that gets dropped. Below is altered definition for defining Stacking classifier with FeatureUnion:
model = StackingClassifier(estimators=[
('select_features', Pipeline(steps = [('featureunion', FeatureUnion([('add_dummy_column_to_input_dataframe', add_dummy_column(key='FEATURE_THAT_GETS_DROPPED')),
("model_feature_selector", ColumnTransformer([('feature_list', 'passthrough', X_train.columns)]))])),
('base(dummy)_classifier1', FeaturePassThroughClassifier())])),
('base_classifier2', base_classifier2),
('base_classifier3', base_classifier3),
('base_classifier4', base_classifier4)
],
final_estimator = Pipeline(memory=None,
steps=[
('save_base_estimator_output_data', FunctionTransformer(save_base_estimator_output_data, validate=False)), ('final_model', RandomForestClassifier())
], verbose=True), passthrough = False, **stack_method = 'predict_proba'**)

Custom PyTorch optimizer is not working properly and am unable to access gradients

I'm trying to implement the elastic averaging stochastic gradient descent (EASGD) algorithm from the paper Deep Learning with Elastic Averaging SGD and was running into some trouble.
I'm using PyTorch's torch.optim.Optimizer class and referencing the official implementation of SGD and the official implementation of Accelerated SGD in order to start off somewhere.
The code that I have is:
import torch.optim as optim
class EASGD(optim.Optimizer):
def __init__(self, params, lr, tau, alpha=0.001):
self.alpha = alpha
if lr < 0.0:
raise ValueError(f"Invalid learning rate {lr}.")
defaults = dict(lr=lr, alpha=alpha, tau=tau)
super(EASGD, self).__init__(params, defaults)
def __setstate__(self, state):
super(EASGD, self).__setstate__(state)
def step(self, closure=None):
loss = None
if closure is not None:
with torch.enable_grad():
loss = closure()
for group in self.param_groups:
tau = group['tau']
for t, p in enumerate(group['params']):
x_normal = p.clone()
x_tilde = p.clone()
if p.grad is None:
continue
if t % tau == 0:
p = p - self.alpha * (x_normal - x_tilde)
x_tilde = x_tilde + self.alpha * (x_normal - x_tilde)
d_p = p.grad.data
p.data.add_(d_p, alpha=-group['lr'])
return loss
When I run this code, I get the following error:
/home/user/github/test-repo/easgd.py:50: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
Reading this PyTorch Discussion helped understand what the difference between leaf and non-leaf variables are, but I'm not sure how I should fix my code to make it work properly.
Any tips on what to do or where to look are appreciated. Thanks.
I think the issue is you are copying p on this line:
p = p - self.alpha * (x_normal - x_tilde)
If this line gets execute (which the case in the first cycle when t=0) the following line raises the error because p doesn't have .grad attribute anymore.
You should use inplace operators instead, add_, mult_, sub_, divide_, etc...
for t, p in enumerate(group['params']):
if p.grad is None:
continue
d_p = p.grad.data
if t % tau == 0:
d_p.sub_(self.alpha*0.01)
p.data.add_(d_p, alpha=-group['lr'])
Above, I have removed x_normal, x_tilde since you didn't give them proper values. But I hope you get the idea. Only use inplace operator when working with the data inside the step function.

How to use multiprocessing in PyTorch?

I'm trying to use PyTorch with complex loss function. In order to accelerate the code, I hope that I can use the PyTorch multiprocessing package.
The first trial, I put 10x1 features into the NN and get 10x4 output.
After that, I want to pass 10x4 parameters into a function to do some calculation. (The calculation will be complex in the future.)
After calculating, the function will return a 10x1 array in total. This array will be set as NN_energy and calculate loss function.
Besides, I also want to know if there is another method to create a backward-able array to store the NN_energy array, instead of using
NN_energy = net(Data_in)[0:10,0]
Thanks a lot.
Full Code:
import torch
import numpy as np
from torch.autograd import Variable
from torch import multiprocessing
def func(msg,BOP):
ans = (BOP[msg][0]+BOP[msg][1]/BOP[msg][2])*BOP[msg][3]
return ans
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden_1, n_hidden_2, n_output):
super(Net, self).__init__()
self.hidden_1 = torch.nn.Linear(n_feature , n_hidden_1) # hidden layer
self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2) # hidden layer
self.predict = torch.nn.Linear(n_hidden_2, n_output ) # output layer
def forward(self, x):
x = torch.tanh(self.hidden_1(x)) # activation function for hidden layer
x = torch.tanh(self.hidden_2(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__': # apply_async
Data_in = Variable( torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() )
Ground_truth = Variable( torch.from_numpy( np.asarray(list(range(20,30))).reshape(10,1) ).float() )
net = Net( n_feature=1 , n_hidden_1=15 , n_hidden_2=15 , n_output=4 ) # define the network
optimizer = torch.optim.Rprop( net.parameters() )
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
NN_output = net(Data_in)
args = range(0,10)
pool = multiprocessing.Pool()
return_data = pool.map( func, zip(args, NN_output) )
pool.close()
pool.join()
NN_energy = net(Data_in)[0:10,0]
for i in range(0,10):
NN_energy[i] = return_data[i]
loss = torch.sqrt( loss_func( NN_energy , Ground_truth ) ) # must be (1. nn output, 2. target)
print(loss)
Error messages:
File
"C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py",
line 126, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which
requires_grad, since autograd does not support crossing process
boundaries. If you just want to transfer the data, call detach() on
the tensor before serializing (e.g., putting it on the queue).
First of all, Torch Variable API is deprecated since a very long time, just don't use it.
Next, torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() is wrong at many levels: np.asarray of list is useless since a copy will be performed anyway, and np.array takes list as input by design. Then, np.arange is available to return a range as numpy array, and it is also available on Torch. Next, specifying both dimension for reshape is useless and error prone, you could simply do reshape((-1, 1)), or even better unsqueeze(-1).
Here is the simplified expression torch.arange(10, dtype=torch.float32, requires_grad=True).unsqueeze(-1).
Using multiprocessing pool is a bad practice if using batch processing is possible. It will be both way more efficient and readable. Indeed, performing N small algebraic operations in parallel is always slower and a larger single algebraic operation, and even more on GPU. More importantly, computing the gradient is not supported by multiprocessing, hence the error that you get. Yet, this is partially true, because it is supports for tensors on cpu since 1.6.0. Have a lok, to the official release changelog.
Could you post a more representative example of what func method could be to make sure you really need it ?
NB: Distributed autograd as you are looking is now available in Pytorch as an experimental feature available in beta since 1.6.0. Have a look to the official documentation.

Keras - EarlyStopping based on user input

I am wondering if there is an easy way of creating a way of triggering early stopping in Keras based on user input rather than monitorization of any particular metric.
Ie I would like to send a keyboard signal to the process executing the training so that it gets out of the fit_generator function and execute the remaining code.
Any ideas?
EDIT: Based on #AnkurGoel 's answer, I wrote this code:
# Monitors the SIGINT (ctrl + C) to safely stop training when it is sent
flag = False
class TerminateOnFlag(Callback):
"""Callback that terminates training when the flag is raised.
"""
def on_batch_end(self, batch, logs=None):
if flag:
self.model.stop_training = True
def handler(signum, frame):
logging.info('SIGINT signal received. Training will finish after this epoch')
global flag
flag = True
signal.signal(signal.SIGINT, handler) # We assign a specific handler for the SIGINT signal
terminateOnFlag = TerminateOnFlag()
callbacks.append(terminateOnFlag)
Where callbacks is a list of callbacks I fed into fit_generator.
During training, when I send the SIGINT signal indeed I get the message SIGINT signal received. Training will finish after this epoch, but when the epoch ends nothing happens. What is going on?
You can give a thought to approach below:
Use One global variable, initialize 0
Use Signal Handler,
When signal(interrupt) received by the python process, its value is changed from 0 to 1.
Use Custom Callback in Keras, to stop the training when this variable value is changed
class TerminateOnFlag(Callback):
"""Callback that terminates training when flag=1 is encountered.
"""
def on_batch_end(self, batch, logs=None):
if flag==1:
self.model.stop_training = True
Original Callbacks are available at:
https://github.com/keras-team/keras/blob/master/keras/callbacks.py#L251
You still have to check if it is possible to provide custom callback to fit_generator, instead of standard callbacks.
Here is the code for signal Handler :
For windows:
import signal, os
def handler(signum, frame):
print('Signal handler called with signal', signum)
raise OSError("Couldn't open device!")
signal.signal(signal.CTRL_C_EVENT, handler) # only in python version 3.2
For Linux:
import signal, os
def handler(signum, frame):
print('Signal handler called with signal', signum)
raise OSError("Couldn't open device!")
signal.signal(signal.SIGINT, handler)
Better and safer way is to use mouse as input, for stopping, and other internal interactions.
For example, to stop keras in the end of batch when mouse is moved to the left side (mouse_x<10):
def queryMousePosition():
from ctypes import windll, Structure, c_long, byref
class POINT(Structure): _fields_ = [("x", c_long), ("y", c_long)]
pt = POINT()
windll.user32.GetCursorPos(byref(pt))
return pt.x, pt.y # %timeit queryMousePosition()
class TerminateOnFlag(keras.callbacks.Callback):
def on_batch_end(self, batch, logs=None):
mouse_x, mouse_y = queryMousePosition()
if mouse_x < 10:
self.model.stop_training = True
callbacks=[keras.callbacks.ReduceLROnPlateau(), TerminateOnFlag()]
model.fit_generator(..., callbacks=callbacks, ...)
Not using a keyboard signal, but when running Keras in a Jupyter notebook I found it easiest to use a callback that stops training on the presence of a particular file.
TRAINING_POISON_PILL_FILE_NAME = 'stop-training'
class PoisonPillCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if os.path.exists(TRAINING_POISON_PILL_FILE_NAME):
self.model.stop_training = True
os.remove(TRAINING_POISON_PILL_FILE_NAME)
print(f'poison pill file "{TRAINING_POISON_PILL_FILE_NAME}" detected, stopping training')
model.fit(..., callbacks=[PoisonPillCallback(), ...])
Then you can just creat an (empty) file with this name in the current directoy in the Jupyter UI and it will stop training after the current epoch.

Keras: badalloc with custom generator

I am using keras with a tensorflow-gpu back end on a Ubuntu 17.04 VM.
I have created a custom generator to read inputs and classes from pickle files, but it seems to get the following error:
terminate called after throwing an instance of 'std::ba
d_alloc'
what(): std::bad_alloc
the code for loading data can be seen here:
def data_gen(self, pklPaths, batch_size=16):
while True:
data = []
labels = []
for i, pklPath in enumerate(pklPaths):
# print(pklPath)
image = pickle.load(open(pklPath, 'rb'))
for i in range(batch_size):
# Set a label
data.append(image[0][0])
labels.append(image[1][1])
yield np.array(data), np.array(labels)
then in the train section i'm using a fit generator:
vm_model.fit_generator(vm.data_gen(pkl_train), validation_data=vm.data_gen(pkl_validate), epochs=15, verbose=2,
steps_per_epoch=(5000/16), validation_steps=(1000/16), callbacks=[tb])
the generator should have better memory management than loading everything, however it doesn't seem to be the case! any ideas?
ok, so i found the issue so I'm answering my own question.
Basically, previous one had one unnecessary loop and also kept increasing the size of data and labels essentially loading the entire data in memory:
def data_gen(self, pklPaths, batch_size=16):
while True:
data = []
labels = []
for i, pklPath in enumerate(pklPaths):
# load pickle
image = pickle.load(open(pklPath, 'rb'))
# append
data.append(image[0][0])
labels.append(image[1])
# if batch is complete yield data and labels and reset
if i % batch_size == 0 and i != 0:
yield np.array(data), np.array(labels)
data.clear()
labels.clear()

Resources