I am wondering if there is an easy way of creating a way of triggering early stopping in Keras based on user input rather than monitorization of any particular metric.
Ie I would like to send a keyboard signal to the process executing the training so that it gets out of the fit_generator function and execute the remaining code.
Any ideas?
EDIT: Based on #AnkurGoel 's answer, I wrote this code:
# Monitors the SIGINT (ctrl + C) to safely stop training when it is sent
flag = False
class TerminateOnFlag(Callback):
"""Callback that terminates training when the flag is raised.
"""
def on_batch_end(self, batch, logs=None):
if flag:
self.model.stop_training = True
def handler(signum, frame):
logging.info('SIGINT signal received. Training will finish after this epoch')
global flag
flag = True
signal.signal(signal.SIGINT, handler) # We assign a specific handler for the SIGINT signal
terminateOnFlag = TerminateOnFlag()
callbacks.append(terminateOnFlag)
Where callbacks is a list of callbacks I fed into fit_generator.
During training, when I send the SIGINT signal indeed I get the message SIGINT signal received. Training will finish after this epoch, but when the epoch ends nothing happens. What is going on?
You can give a thought to approach below:
Use One global variable, initialize 0
Use Signal Handler,
When signal(interrupt) received by the python process, its value is changed from 0 to 1.
Use Custom Callback in Keras, to stop the training when this variable value is changed
class TerminateOnFlag(Callback):
"""Callback that terminates training when flag=1 is encountered.
"""
def on_batch_end(self, batch, logs=None):
if flag==1:
self.model.stop_training = True
Original Callbacks are available at:
https://github.com/keras-team/keras/blob/master/keras/callbacks.py#L251
You still have to check if it is possible to provide custom callback to fit_generator, instead of standard callbacks.
Here is the code for signal Handler :
For windows:
import signal, os
def handler(signum, frame):
print('Signal handler called with signal', signum)
raise OSError("Couldn't open device!")
signal.signal(signal.CTRL_C_EVENT, handler) # only in python version 3.2
For Linux:
import signal, os
def handler(signum, frame):
print('Signal handler called with signal', signum)
raise OSError("Couldn't open device!")
signal.signal(signal.SIGINT, handler)
Better and safer way is to use mouse as input, for stopping, and other internal interactions.
For example, to stop keras in the end of batch when mouse is moved to the left side (mouse_x<10):
def queryMousePosition():
from ctypes import windll, Structure, c_long, byref
class POINT(Structure): _fields_ = [("x", c_long), ("y", c_long)]
pt = POINT()
windll.user32.GetCursorPos(byref(pt))
return pt.x, pt.y # %timeit queryMousePosition()
class TerminateOnFlag(keras.callbacks.Callback):
def on_batch_end(self, batch, logs=None):
mouse_x, mouse_y = queryMousePosition()
if mouse_x < 10:
self.model.stop_training = True
callbacks=[keras.callbacks.ReduceLROnPlateau(), TerminateOnFlag()]
model.fit_generator(..., callbacks=callbacks, ...)
Not using a keyboard signal, but when running Keras in a Jupyter notebook I found it easiest to use a callback that stops training on the presence of a particular file.
TRAINING_POISON_PILL_FILE_NAME = 'stop-training'
class PoisonPillCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if os.path.exists(TRAINING_POISON_PILL_FILE_NAME):
self.model.stop_training = True
os.remove(TRAINING_POISON_PILL_FILE_NAME)
print(f'poison pill file "{TRAINING_POISON_PILL_FILE_NAME}" detected, stopping training')
model.fit(..., callbacks=[PoisonPillCallback(), ...])
Then you can just creat an (empty) file with this name in the current directoy in the Jupyter UI and it will stop training after the current epoch.
Related
If we use a combination of the Dataset and Dataloader classes (as shown below), I have to explicitly load the data onto the GPU using .to() or .cuda(). Is there a way to instruct the dataloader to do it automatically/implicitly?
Code to understand/reproduce the scenario:
from torch.utils.data import Dataset, DataLoader
import numpy as np
class DemoData(Dataset):
def __init__(self, limit):
super(DemoData, self).__init__()
self.data = np.arange(limit)
def __len__(self):
return self.data.shape[0]
def __getitem__(self, idx):
return (self.data[idx], self.data[idx]*100)
demo = DemoData(100)
loader = DataLoader(demo, batch_size=50, shuffle=True)
for i, (i1, i2) in enumerate(loader):
print('Batch Index: {}'.format(i))
print('Shape of data item 1: {}; shape of data item 2: {}'.format(i1.shape, i2.shape))
# i1, i2 = i1.to('cuda:0'), i2.to('cuda:0')
print('Device of data item 1: {}; device of data item 2: {}\n'.format(i1.device, i2.device))
Which will output the following; note - without explicit device transfer instruction, the data is loaded onto CPU:
Batch Index: 0
Shape of data item 1: torch.Size([50]); shape of data item 2: torch.Size([50])
Device of data item 1: cpu; device of data item 2: cpu
Batch Index: 1
Shape of data item 1: torch.Size([50]); shape of data item 2: torch.Size([50])
Device of data item 1: cpu; device of data item 2: cpu
A possible solution is at this PyTorch GitHub repo. Issue(still open at the time this question was posted), but, I am unable to make it to work when the dataloader has to return multiple data-items!
You can modify the collate_fn to handle several items at once:
from torch.utils.data.dataloader import default_collate
device = torch.device('cuda:0') # or whatever device/cpu you like
# the new collate function is quite generic
loader = DataLoader(demo, batch_size=50, shuffle=True,
collate_fn=lambda x: tuple(x_.to(device) for x_ in default_collate(x)))
Note that if you want to have multiple workers for the dataloader, you'll need to add
torch.multiprocessing.set_start_method('spawn')
after your if __name__ == '__main__' (see this issue).
Having said that, it seems like using pin_memory=True in your DataLoader would be much more efficient. Have you tried this option?
See memory pinning for more information.
Update (Feb 8th, 2021)
This post made me look at my "data-to-model" time spent during training.
I compared three alternatives:
DataLoader works on CPU and only after the batch is retrieved data is moved to GPU.
Same as (1) but with pin_memory=True in DataLoader.
The proposed method of using collate_fn to move data to GPU.
From my limited experimentation it seems like the second option performs best (but not by a big margin).
The third option required fussing about the start_method of the data loader processes, and it seems to incur an overhead at the beginning of each epoch.
Since hyperparameter tuning seems to consist in training different models for the same task, I suppose it is a good idea to train them in parallel in order to gain some time. However, my attempt was quite unsuccessful, as multiple errors occured during the execution of my code. I was wondering if using keras requires me to write multithreading differently, or if the problem lies elsewhere.
Here's what I wrote (I'm trying to calculate the effect of dropout on the minimal value of a custom metric) :
from threading import Thread
class FitModel(Thread):
def __init__(self, params):
Thread.__init__(self)
self.params = params
def run(self):
DC=DeltaCallback(verbose=0) #custom metric
model=keras.models.Sequential([
keras.layers.Conv1D(64,11,activation="relu",padding="SAME",input_shape=(700,1)),
keras.layers.BatchNormalization(),
keras.layers.AvgPool1D(pool_size=2),
keras.layers.Conv1D(128,11,activation="relu",padding="SAME"),
keras.layers.BatchNormalization(),
keras.layers.AvgPool1D(pool_size=2),
keras.layers.Conv1D(256,11,activation="relu",padding="SAME"),
keras.layers.BatchNormalization(),
keras.layers.AvgPool1D(pool_size=2),
keras.layers.Conv1D(512,11,activation="relu",padding="SAME"),
keras.layers.BatchNormalization(),
keras.layers.AvgPool1D(pool_size=2),
keras.layers.Conv1D(512,11,activation="relu",padding="SAME"),
keras.layers.BatchNormalization(),
keras.layers.AvgPool1D(pool_size=2),
keras.layers.Flatten(),
keras.layers.Dropout(self.params[0]),
keras.layers.Dense(4096,activation="relu"),
keras.layers.Dropout(self.params[1]),
keras.layers.Dense(4096,activation="relu"),
keras.layers.Dense(256,activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.RMSprop(learning_rate=0.00001),
metrics=["accuracy"])
model.fit(X_train,y_train,batch_size=100,epochs=50,validation_data=(X_valid,y_valid),callbacks=[DC],verbose=2)
print(self.params,"epochs : ",DC.deltas.index(min(DC.deltas)))
print(self.params,"deltamin : ",DC.deltas[DC.deltas.index(min(DC.deltas))])
print(self.params,"Nval : ", DC.Nvals[DC.deltas.index(min(DC.deltas))])
parameters_list=[[0,0.1],[0.1,0],[0.2,0],[0.1,0.2],[0.2,0.1],[0.3,0],[0.3,0.1],[0.3,0.2],[0,0.3],[0.1,0.3],[0.2,0.3]]
# create threads
THREADS = [FitModel(parameters) for parameters in parameters_list]
# start threads
for thread in THREADS:
thread.start()
# wait for threads to finish
for thread in THREADS:
thread.join()
The problem is that multiple Exceptions occur when I try to execute this code, as well as OOM errors. Any idea how to make this work?
Is there a possibility to implement the following scenario with Tensorflow:
In the first N batches, the learning rate should be increased from 0 to 0.001. After this number of batches has been reached, the learning rate should slowly decrease from 0.001 to 0.00001 after each epoch.
How can I combine this combination in a callback? Tensorflow offers the tf.keras.callbacks.LearningRateScheduler and the callback functions on_train_batch_begin() or on_train_batch_end(). But I will not come to a common combination of these callbacks.
Can someone give me an approach how to create such a combined callback that depends on the number of batches and epochs?
Something like this would work. I didn't test this and I didn't try to perfect it...but the pieces are there so that you can get it working how you like.
import tensorflow as tf
from tensorflow.keras.callbacks import Callback
import numpy as np
class LRSetter(Callback):
def __init__(self, start_lr=0, middle_lr=0.001, end_lr=0.00001,
start_mid_batches=200, end_epochs=2000):
self.start_mid_lr = np.linspace(start_lr, middle_lr, start_mid_batches)
#Not exactly right since you'll have gone through a couple epochs
#but you get the picture
self.mid_end_lr = np.linspace(middle_lr, end_lr, end_epochs)
self.start_mid_batches = start_mid_batches
self.epoch_takeover = False
def on_train_batch_begin(self, batch, logs=None):
if batch < self.start_mid_batches:
tf.keras.backend.set_value(self.model.optimizer.lr, self.start_mid_lr[batch])
else:
self.epoch_takeover = True
def on_epoch_begin(self, epoch):
if self.epoch_takeover:
tf.keras.backend.set_value(self.model.optimizer.lr, self.mid_end_lr[epoch])
I want make a scaling optimization system
class GraphFit:
def __init__(self):
self.X=torch.tensor([])
self.Y=torch.tensor([])
self.scale_x=torch.Tensor([])
self.scale_y=torch.Tensor([])
self.opt=0
def scale(self,imagx=1.,imagy=1):
self.scale_x=torch.tensor([1.],requires_grad=True)
self.scale_y=torch.tensor([1.],requires_grad=True)
self.X=self.scale_x *self.X
self.Y=self.scale_y *self.Y
def optimize(self,lr=1e-03):
params=[self.scale_x,self.scale_y]
self.opt=torch.optim.Adam(params,lr)
x=torch.linspace(-10,10,100)
yt=0.5*torch.sin(0.2 * x) ##truth
y=torch.sin(x) ## try
G1=GraphFit()
G1.X=x
G1.scale()
G1.optimize()
L=nn.MSELoss()
for i in range(1): ####
G1.opt.zero_grad()
G1.Y=torch.sin(G1.X)
err=L(G1.Y,yt)
err.backward()
G1.opt.step()
if i%100==0:
print(err)
print(G1.scale_y.grad,G1.scale_x.grad)
When I use 'for i in range(1)' , its output is
tensor(0.1189, grad_fn=<MseLossBackward>)
None tensor([-0.0425])
and when I use 'for i in range(N)' for N >1
it's output is
---> 83 err.backward()
Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
What is the problem and how can I fix it?
########################
dummydummydummydummydummydummydummydummydummydummy
dummydummydummydummydummydummydummydummydummydummy
dummydummydummydummydummydummydummydummydummydummy
dummydummydummydummydummydummydummydummydummydummy
I'm trying to use PyTorch with complex loss function. In order to accelerate the code, I hope that I can use the PyTorch multiprocessing package.
The first trial, I put 10x1 features into the NN and get 10x4 output.
After that, I want to pass 10x4 parameters into a function to do some calculation. (The calculation will be complex in the future.)
After calculating, the function will return a 10x1 array in total. This array will be set as NN_energy and calculate loss function.
Besides, I also want to know if there is another method to create a backward-able array to store the NN_energy array, instead of using
NN_energy = net(Data_in)[0:10,0]
Thanks a lot.
Full Code:
import torch
import numpy as np
from torch.autograd import Variable
from torch import multiprocessing
def func(msg,BOP):
ans = (BOP[msg][0]+BOP[msg][1]/BOP[msg][2])*BOP[msg][3]
return ans
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden_1, n_hidden_2, n_output):
super(Net, self).__init__()
self.hidden_1 = torch.nn.Linear(n_feature , n_hidden_1) # hidden layer
self.hidden_2 = torch.nn.Linear(n_hidden_1, n_hidden_2) # hidden layer
self.predict = torch.nn.Linear(n_hidden_2, n_output ) # output layer
def forward(self, x):
x = torch.tanh(self.hidden_1(x)) # activation function for hidden layer
x = torch.tanh(self.hidden_2(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__': # apply_async
Data_in = Variable( torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() )
Ground_truth = Variable( torch.from_numpy( np.asarray(list(range(20,30))).reshape(10,1) ).float() )
net = Net( n_feature=1 , n_hidden_1=15 , n_hidden_2=15 , n_output=4 ) # define the network
optimizer = torch.optim.Rprop( net.parameters() )
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
NN_output = net(Data_in)
args = range(0,10)
pool = multiprocessing.Pool()
return_data = pool.map( func, zip(args, NN_output) )
pool.close()
pool.join()
NN_energy = net(Data_in)[0:10,0]
for i in range(0,10):
NN_energy[i] = return_data[i]
loss = torch.sqrt( loss_func( NN_energy , Ground_truth ) ) # must be (1. nn output, 2. target)
print(loss)
Error messages:
File
"C:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\reductions.py",
line 126, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which
requires_grad, since autograd does not support crossing process
boundaries. If you just want to transfer the data, call detach() on
the tensor before serializing (e.g., putting it on the queue).
First of all, Torch Variable API is deprecated since a very long time, just don't use it.
Next, torch.from_numpy( np.asarray(list(range( 0,10))).reshape(10,1) ).float() is wrong at many levels: np.asarray of list is useless since a copy will be performed anyway, and np.array takes list as input by design. Then, np.arange is available to return a range as numpy array, and it is also available on Torch. Next, specifying both dimension for reshape is useless and error prone, you could simply do reshape((-1, 1)), or even better unsqueeze(-1).
Here is the simplified expression torch.arange(10, dtype=torch.float32, requires_grad=True).unsqueeze(-1).
Using multiprocessing pool is a bad practice if using batch processing is possible. It will be both way more efficient and readable. Indeed, performing N small algebraic operations in parallel is always slower and a larger single algebraic operation, and even more on GPU. More importantly, computing the gradient is not supported by multiprocessing, hence the error that you get. Yet, this is partially true, because it is supports for tensors on cpu since 1.6.0. Have a lok, to the official release changelog.
Could you post a more representative example of what func method could be to make sure you really need it ?
NB: Distributed autograd as you are looking is now available in Pytorch as an experimental feature available in beta since 1.6.0. Have a look to the official documentation.