I introduced the following lines in my deep learning project in order to early stop when the validation loss has not improved for 10 epochs:
if best_valid_loss is None or valid_loss < best_valid_loss:
best_valid_loss = valid_loss
counter = 0
else:
counter += 1
if counter == 10:
break
Now I want to use Optuna to tune some hyperparameters, but I don't really understand how pruning works in Optuna. Is it possible for Optuna pruners to act the same way as in the code above? I assume I have to use the following:
optuna.pruners.PatientPruner(???, patience=10)
But I don't know which pruner I could use inside PatientPruner. Btw in Optuna I'm minimizing the validation loss.
Short answer: Yes.
Hi, I'm one of the authors of PatientPruner in Optuna. If we perform vanilla early-stopping, wrapped_pruner=None works as we expected. For example,
import optuna
def objective(t):
for step in range(30):
if step == 5:
t.report(0., step=step)
else:
t.report(step * 0.1, step=step)
if t.should_prune():
print("pruned at {}".format(step))
raise optuna.exceptions.TrialPruned()
return 1.
study = optuna.create_study(pruner=optuna.pruners.PatientPruner(None, patience=9), direction="minimize")
study.optimize(objective, n_trials=1)
The output will be pruned at 15.
Related
I'm trying to implement the elastic averaging stochastic gradient descent (EASGD) algorithm from the paper Deep Learning with Elastic Averaging SGD and was running into some trouble.
I'm using PyTorch's torch.optim.Optimizer class and referencing the official implementation of SGD and the official implementation of Accelerated SGD in order to start off somewhere.
The code that I have is:
import torch.optim as optim
class EASGD(optim.Optimizer):
def __init__(self, params, lr, tau, alpha=0.001):
self.alpha = alpha
if lr < 0.0:
raise ValueError(f"Invalid learning rate {lr}.")
defaults = dict(lr=lr, alpha=alpha, tau=tau)
super(EASGD, self).__init__(params, defaults)
def __setstate__(self, state):
super(EASGD, self).__setstate__(state)
def step(self, closure=None):
loss = None
if closure is not None:
with torch.enable_grad():
loss = closure()
for group in self.param_groups:
tau = group['tau']
for t, p in enumerate(group['params']):
x_normal = p.clone()
x_tilde = p.clone()
if p.grad is None:
continue
if t % tau == 0:
p = p - self.alpha * (x_normal - x_tilde)
x_tilde = x_tilde + self.alpha * (x_normal - x_tilde)
d_p = p.grad.data
p.data.add_(d_p, alpha=-group['lr'])
return loss
When I run this code, I get the following error:
/home/user/github/test-repo/easgd.py:50: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
Reading this PyTorch Discussion helped understand what the difference between leaf and non-leaf variables are, but I'm not sure how I should fix my code to make it work properly.
Any tips on what to do or where to look are appreciated. Thanks.
I think the issue is you are copying p on this line:
p = p - self.alpha * (x_normal - x_tilde)
If this line gets execute (which the case in the first cycle when t=0) the following line raises the error because p doesn't have .grad attribute anymore.
You should use inplace operators instead, add_, mult_, sub_, divide_, etc...
for t, p in enumerate(group['params']):
if p.grad is None:
continue
d_p = p.grad.data
if t % tau == 0:
d_p.sub_(self.alpha*0.01)
p.data.add_(d_p, alpha=-group['lr'])
Above, I have removed x_normal, x_tilde since you didn't give them proper values. But I hope you get the idea. Only use inplace operator when working with the data inside the step function.
Is there a possibility to implement the following scenario with Tensorflow:
In the first N batches, the learning rate should be increased from 0 to 0.001. After this number of batches has been reached, the learning rate should slowly decrease from 0.001 to 0.00001 after each epoch.
How can I combine this combination in a callback? Tensorflow offers the tf.keras.callbacks.LearningRateScheduler and the callback functions on_train_batch_begin() or on_train_batch_end(). But I will not come to a common combination of these callbacks.
Can someone give me an approach how to create such a combined callback that depends on the number of batches and epochs?
Something like this would work. I didn't test this and I didn't try to perfect it...but the pieces are there so that you can get it working how you like.
import tensorflow as tf
from tensorflow.keras.callbacks import Callback
import numpy as np
class LRSetter(Callback):
def __init__(self, start_lr=0, middle_lr=0.001, end_lr=0.00001,
start_mid_batches=200, end_epochs=2000):
self.start_mid_lr = np.linspace(start_lr, middle_lr, start_mid_batches)
#Not exactly right since you'll have gone through a couple epochs
#but you get the picture
self.mid_end_lr = np.linspace(middle_lr, end_lr, end_epochs)
self.start_mid_batches = start_mid_batches
self.epoch_takeover = False
def on_train_batch_begin(self, batch, logs=None):
if batch < self.start_mid_batches:
tf.keras.backend.set_value(self.model.optimizer.lr, self.start_mid_lr[batch])
else:
self.epoch_takeover = True
def on_epoch_begin(self, epoch):
if self.epoch_takeover:
tf.keras.backend.set_value(self.model.optimizer.lr, self.mid_end_lr[epoch])
I'm trying to run a simple boostedTreeClassifier on my dataset from the example, but it seems to get stuck on first step:
2019-06-28 11:20:31.658689: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:111] Filling up shuffle buffer (this may take a while): 84090 of 85873
2019-06-28 11:20:32.908425: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:162] Shuffle buffer filled.
I0628 11:20:34.904214 140220602029888 basic_session_run_hooks.py:262] loss = 0.6931464, step = 0
W0628 11:21:03.421219 140220602029888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W0628 11:21:05.555618 140220602029888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
The same dataset seems to work fine when I pass it to other keras based model or xgboost model.
Here's the relevant code:
def make_input_fn(self, X, y, shuffle=True, num_epochs=None):
num_samples = len(self.y_train)
def input_fn():
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
if shuffle:
dataset = dataset.shuffle(num_samples).repeat(num_epochs).batch(self.batch_size)
else:
dataset = dataset.repeat(num_epochs).batch(self.batch_size)
return dataset
return input_fn
def ens_train(self):
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)
train_input_fn = self.make_input_fn(self.X_train, self.y_train, num_epochs=self.epochs)
self.model = tf.estimator.BoostedTreesClassifier(self.feature_columns,
n_batches_per_layer = int(0.5* len(self.y_train)/self.batch_size),
model_dir = self.ofolder,
max_depth = 10,
n_trees = 1000)
self.model.train(train_input_fn, max_steps = 1000)
Was able to get a result by playing around with learning rates and number of epochs. The "best" parameters obtained by hyperparameter tuning on xgboost doesn't give similar results in BoostedTreeClassifier. It took a large number of epochs to get around 84% accuracy (balanced dataset). xgboost had given 95% without even hyperparameter tuning..
I have implemented several regression forecast approaches and now I want to compare them. I picked the MAE, RMSE and SMAPE ratings. My result looks like follows:
Approach 1: MAE= 0,6 , RMSE= 0,9 and SMAPE 531
Approach 2: MAE= 3,0 ,RMSE= 6,1 and SMAPE 510
Approach 3: MAE= 10,1 , RMSE= 17,00 and SMAPE 420
When I plot my predictions and compare them with my test set, I can see that Approach 1 > Approach 2 > Approach 3. This is also evident from the values of MAE and RMSE. But I thought that the lower the resulting SMAPE, the better the prediction.
Did I misunderstand SMAPE?
Since there is no predefined method in phyton, my SMAPE calculation looks like this:
def smape(A, F):
return 100/len(A) * np.sum(2 * np.abs(F - A) / (np.abs(A) + np.abs(F)))
Or is the calculation wrong?
thanks in advance
Okay maybe the method was wrong.. instead i used this one from Kaggle:
from numba import jit
import math
#jit
def smape_fast(y_true, y_pred):
out = 0
for i in range(y_true.shape[0]):
a = y_true[i]
b = y_pred[i]
c = a+b
if c == 0:
continue
out += math.fabs(a - b) / c
out *= (200.0 / y_true.shape[0])
return out
URL
now my results look more plausible from SMAPE compared to MAE and RMSE
I was reading through below website for decision tree classification part.
http://spark.apache.org/docs/latest/mllib-decision-tree.html
I built provided example code into my laptop and tried to understand it's output.
but I couldn't understand a bit. The below is the code and
sample_libsvm_data.txt can be found below https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt
Please refer the output, and let me know whether my opinion is correct. Here is my opinions.
Test Error mean it has approximately 95% correction based on training
Data?
(most curious one)if feature 434 is greater than 0.0 then, it would be 1 based on gini impurity? for example, the value is given as 434:178 then it would be 1.
from __future__ import print_function
from pyspark import SparkContext
from pyspark.mllib.tree import DecisionTree, DecisionTreeModel
from pyspark.mllib.util import MLUtils
if __name__ == "__main__":
sc = SparkContext(appName="PythonDecisionTreeClassificationExample")
data = MLUtils.loadLibSVMFile(sc,'/home/spark/bin/sample_libsvm_data.txt')
(trainingData, testData) = data.randomSplit([0.7, 0.3])
model = DecisionTree.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={}, impurity='gini', maxDepth=5, maxBins=32)
predictions = model.predict(testData.map(lambda x: x.features))
labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions)
testErr = labelsAndPredictions.filter(lambda (v, p): v != p).count() / float(testData.count())
print('Test Error = ' + str(testErr))
print('Learned classification tree model:')
print(model.toDebugString())
// =====Below is my output=====
Test Error = 0.0454545454545
Learned classification tree model:
DecisionTreeModel classifier of depth 1 with 3 nodes
If (feature 434 <= 0.0)
Predict: 0.0
Else (feature 434 > 0.0)
Predict: 1.0
I believe you are correct. Yes, your error rate is about 5%, so your algorithm is correct about 95% of the time for that 30% of the data you withheld as testing. According to your output (which I will assume is correct, I did not test the code myself), yes, the only feature that determines the class of the observation is feature 434, and if it is less than 0 it is 0, else 1.
Why in Spark ML, when training a decision tree model, the minInfoGain or minimum number of instances per node are not used to control the growth of the tree? It is very easy to over grow the tree.