BoostedTreeClassifier gets stuck on loss on the first step - python-3.x

I'm trying to run a simple boostedTreeClassifier on my dataset from the example, but it seems to get stuck on first step:
2019-06-28 11:20:31.658689: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:111] Filling up shuffle buffer (this may take a while): 84090 of 85873
2019-06-28 11:20:32.908425: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:162] Shuffle buffer filled.
I0628 11:20:34.904214 140220602029888 basic_session_run_hooks.py:262] loss = 0.6931464, step = 0
W0628 11:21:03.421219 140220602029888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
W0628 11:21:05.555618 140220602029888 basic_session_run_hooks.py:724] It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
The same dataset seems to work fine when I pass it to other keras based model or xgboost model.
Here's the relevant code:
def make_input_fn(self, X, y, shuffle=True, num_epochs=None):
num_samples = len(self.y_train)
def input_fn():
dataset = tf.data.Dataset.from_tensor_slices((dict(X), y))
if shuffle:
dataset = dataset.shuffle(num_samples).repeat(num_epochs).batch(self.batch_size)
else:
dataset = dataset.repeat(num_epochs).batch(self.batch_size)
return dataset
return input_fn
def ens_train(self):
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)
train_input_fn = self.make_input_fn(self.X_train, self.y_train, num_epochs=self.epochs)
self.model = tf.estimator.BoostedTreesClassifier(self.feature_columns,
n_batches_per_layer = int(0.5* len(self.y_train)/self.batch_size),
model_dir = self.ofolder,
max_depth = 10,
n_trees = 1000)
self.model.train(train_input_fn, max_steps = 1000)

Was able to get a result by playing around with learning rates and number of epochs. The "best" parameters obtained by hyperparameter tuning on xgboost doesn't give similar results in BoostedTreeClassifier. It took a large number of epochs to get around 84% accuracy (balanced dataset). xgboost had given 95% without even hyperparameter tuning..

Related

Neural Network Python Gradient Descent

I am new to machine learning and trying to understand it (self-learning). So I grabbed a book (this one if interested: https://www.amazon.com/Neural-Networks-Unity-Programming-Windows/dp/1484236726) and started to read the first chapter. While reading, there are a few things I did not understand so I went to research online.
However, I still have trouble with a few points that I cannot understand even after so much reading and research:
How are we calculating l2_delta and l1_delta? (marked with #what is this part doing? in code below)
How does gradient descent relate? (I looked up the formula and tried to read a bit about it but I could not relate the one line code to the code I have down there)
Is that a network with 3 layers (layer 1: 3 input nodes, layer 2: not sure ,layer 3: 1 output node )
Neural Network Full Code:
trying to write my first neural network!
import numpy as np
#activation function (sigmoid , maps value between 0 and 1)
def sigmoid(x):
return 1/(1+np.exp(-x))
def derivative(x):
return x*(1-x)
#initialize input (4 training data (row), 3 features (col))
X = np.array([[0,0,1],[0,1,1],[1,0,1],[1,1,1]])
#initialize output for training data (4 training data (rows), 1 output for each (col))
Y = np.array([[0],[1],[1],[0]])
np.random.seed(1)
#synapses
syn0 = 2* np.random.random((3,4)) - 1
syn1 = 2* np.random.random((4,1)) - 1
for iter in range(60000):
#layers
l0 = X
l1 = sigmoid(np.dot(l0,syn0))
l2 = sigmoid(np.dot(l1,syn1))
#error
l2_error = Y - l2
if(iter % 10000 == 0): #only print error every 10000 steps to save time and limit the amount of output
print("Error L2: " + str (np.mean(np.abs(l2_error))))
#what is this part doing?
l2_delta = l2_error * derivative(l2)
l1_error = l2_delta.dot(syn1.T)
l1_delta = l1_error * derivative(l1)
if(iter % 10000 == 0): #only print error every 10000 steps to save time and limit the amount of output
print("Error L1: " + str (np.mean(np.abs(l1_error))))
#update weights
syn1 = syn1 + l1.T.dot(l2_delta) // derative with respect to cost function
syn0 = syn2 + l0.T.dot(l1_delta)
print(l2)
Thank you!
In general, layerwise computations (Hence the notation l1 and l2 above) is simply getting the dot product of a vector $x \in \mathbb{R}^n$ and a vector of weights in the same dimension, then applying the sigmoid function on each component .
Gradient Descent. - - - Imagine, in two dimensions say the graph of $f(x) = x^2$ Suppose, we don't know how to get it's minimum. Gradient descent will basically evaluate $f'(x)$ at various points, and check whether $f'(x)$ is close to zero

Find wrongly categorized samples from validation step

I am using a keras neural net for identifying category in which the data belongs.
self.model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adam(lr=0.001, decay=0.0001),
metrics=[categorical_accuracy])
Fit function
history = self.model.fit(self.X,
{'output': self.Y},
validation_split=0.3,
epochs=400,
batch_size=32
)
I am interested in finding out which labels are getting categorized wrongly in the validation step. Seems like a good way to understand what is happening under the hood.
You can use model.predict_classes(validation_data) to get the predicted classes for your validation data, and compare these predictions with the actual labels to find out where the model was wrong. Something like this:
predictions = model.predict_classes(validation_data)
wrong = np.where(predictions != Y_validation)
If you are interested in looking 'under the hood', I'd suggest to use
model.predict(validation_data_x)
to see the scores for each class, for each observation of the validation set.
This should shed some light on which categories the model is not so good at classifying. The way to predict the final class is
scores = model.predict(validation_data_x)
preds = np.argmax(scores, axis=1)
be sure to use the proper axis for np.argmax (I'm assuming your observation axis is 1). Use preds to then compare with the real class.
Also, as another exploration you want to see the overall accuracy on this dataset, use
model.evaluate(x=validation_data_x, y=validation_data_y)
I ended up creating a metric which prints the "worst performing category id + score" on each iteration. Ideas from link
import tensorflow as tf
import numpy as np
class MaxIoU(object):
def __init__(self, num_classes):
super().__init__()
self.num_classes = num_classes
def max_iou(self, y_true, y_pred):
# Wraps np_max_iou method and uses it as a TensorFlow op.
# Takes numpy arrays as its arguments and returns numpy arrays as
# its outputs.
return tf.py_func(self.np_max_iou, [y_true, y_pred], tf.float32)
def np_max_iou(self, y_true, y_pred):
# Compute the confusion matrix to get the number of true positives,
# false positives, and false negatives
# Convert predictions and target from categorical to integer format
target = np.argmax(y_true, axis=-1).ravel()
predicted = np.argmax(y_pred, axis=-1).ravel()
# Trick from torchnet for bincounting 2 arrays together
# https://github.com/pytorch/tnt/blob/master/torchnet/meter/confusionmeter.py
x = predicted + self.num_classes * target
bincount_2d = np.bincount(x.astype(np.int32), minlength=self.num_classes**2)
assert bincount_2d.size == self.num_classes**2
conf = bincount_2d.reshape((self.num_classes, self.num_classes))
# Compute the IoU and mean IoU from the confusion matrix
true_positive = np.diag(conf)
false_positive = np.sum(conf, 0) - true_positive
false_negative = np.sum(conf, 1) - true_positive
# Just in case we get a division by 0, ignore/hide the error and set the value to 0
with np.errstate(divide='ignore', invalid='ignore'):
iou = false_positive / (true_positive + false_positive + false_negative)
iou[np.isnan(iou)] = 0
return np.max(iou).astype(np.float32) + np.argmax(iou).astype(np.float32)
~
usage:
custom_metric = MaxIoU(len(catagories))
self.model.compile(loss='categorical_crossentropy',
optimizer=keras.optimizers.Adam(lr=0.001, decay=0.0001),
metrics=[categorical_accuracy, custom_metric.max_iou])

How does shuffle = 'batch' argument of the .fit() layer work in the background?

When I train the model using the .fit() layer there is the argument shuffle preset to True.
Let's say that my dataset has 100 samples and that the batch size is 10. When I set shuffle = True then keras first randomly selects randomly the samples (now the 100 samples have a different order) and on the new order it will start creating the batches: batch 1: 1-10, batch 2: 11-20 etc.
If I set shuffle = 'batch' how is it supposed to work in the background? Intuitively and using the previous example of 100 samples dataset with batch size = 10 my guess would be that keras first allocates the samples to the batches (i.e. batch 1: samples 1-10 following the dataset original order, batch 2: 11-20 following the dataset original order as well, batch 3 ... so on so forth) and then shuffles the order of the batches. So the model now will be trained on the randomly ordered batches say for example: 3 (contains samples 21 - 30), 4 (contains samples 31 - 40), 7 (contains samples 61 - 70), 1 (contains samples 1 - 10), ... (I made up the order of the batches).
Is my thinking right or am I missing something?
Thanks!
Looking at the implementation at this link (line 349 of training.py) the answer seems to be positive.
Try this code for checking:
import numpy as np
def batch_shuffle(index_array, batch_size):
"""Shuffles an array in a batch-wise fashion.
Useful for shuffling HDF5 arrays
(where one cannot access arbitrary indices).
# Arguments
index_array: array of indices to be shuffled.
batch_size: integer.
# Returns
The `index_array` array, shuffled in a batch-wise fashion.
"""
batch_count = int(len(index_array) / batch_size)
# to reshape we need to be cleanly divisible by batch size
# we stash extra items and reappend them after shuffling
last_batch = index_array[batch_count * batch_size:]
index_array = index_array[:batch_count * batch_size]
index_array = index_array.reshape((batch_count, batch_size))
np.random.shuffle(index_array)
index_array = index_array.flatten()
return np.append(index_array, last_batch)
x = np.array(range(100))
x_s = batch_shuffle(x,10)

Memory leak evaluating CNN model for text clasification

I've been doing some adaptation to code in this blog about CNN for text clasification:
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
Everything works fine! But when I try to use the model trained to predict new instances it consumes all memory available. It seems that it's not liberating any memory when evaluates and load all the model again and again. As far as I know memory should be liberated after every sess.run command.
Here is the part of the code I'm working with:
with graph.as_default():
session_conf = tf.ConfigProto(
allow_soft_placement=FLAGS.allow_soft_placement,
log_device_placement=FLAGS.log_device_placement)
sess = tf.Session(config=session_conf)
with sess.as_default():
# Load the saved meta graph and restore variables
saver = tf.train.import_meta_graph("{}.meta".format(checkpoint_file))
saver.restore(sess, checkpoint_file)
# Get the placeholders from the graph by name
input_x = graph.get_operation_by_name("input_x").outputs[0]
# input_y = graph.get_operation_by_name("input_y").outputs[0]
dropout_keep_prob = graph.get_operation_by_name("dropout_keep_prob").outputs[0]
# Tensors we want to evaluate
predictions = graph.get_operation_by_name("output/predictions").outputs[0]
# Add a vector for probas
probas =graph.get_operation_by_name("output/scores").outputs[0]
# Generate batches for one epoch
print("\nGenerating Bathces...\n")
gc.collect()
#mem0 = proc.get_memory_info().rss
batches = data_helpers.batch_iter(list(x_test), FLAGS.batch_size, 1, shuffle=False)
#mem1 = proc.get_memory_info().rss
print("\nBatches done...\n")
#pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
#print "Allocation: %0.2f%%" % pd(mem1, mem0)
# Collect the predictions here
all_predictions = []
all_probas = []
for x_test_batch in batches:
#Calculate probability of prediction been good
gc.collect()
batch_probas = sess.run(tf.reduce_max(tf.nn.softmax(probas),1), {input_x: x_test_batch, dropout_keep_prob: 1.0})
batch_predictions = sess.run(predictions, {input_x: x_test_batch, dropout_keep_prob: 1.0})
all_predictions = np.concatenate([all_predictions, batch_predictions])
all_probas = np.concatenate([all_probas, batch_probas])
# Add summary ops to collect data
with tf.name_scope("eval") as scope:
p_h = tf.histogram_summary("eval/probas", batch_probas)
summary= sess.run(p_h)
eval_summary_writer.add_summary(summary)
Any help will be much appreciated
Cheers
Your training loop creates new TensorFlow operations (tf.reduce_max(), tf.nn.softmax() and tf.histogram_summary()) in each iteration, which will lead to more memory being consumed over time. TensorFlow is most efficient when you run the same graph many times, because it can amortize the cost of optimizing the graph over multiple executions. Therefore,
to get the best performance, you should revise your program so that you create each of these operations once, before the for x_test_batch in batches: loop, and then re-use the same operations in each iteration.

How do I use Theanets LSTM RNN's on my time series data?

I have a simple dataframe consisting of one column. In that column are 10320 observations (numerical). I'm simulating Time-Series data by inserting the data into a plot with a window of 200 observations each. Here is the code for plotting.
import matplotlib.pyplot as plt
from IPython import display
fig_size = plt.rcParams["figure.figsize"]
import time
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
fig, axes = plt.subplots(1,1, figsize=(19,5))
df = dframe.set_index(arange(0,len(dframe)))
std = dframe[0].std() * 6
window = 200
iterations = int(len(dframe)/window)
i = 0
dframe = dframe.set_index(arange(0,len(dframe)))
while i< iterations:
frm = window*i
if i == iterations:
to = len(dframe)
else:
to = frm+window
df = dframe[frm : to]
if len(df) > 100:
df = df.set_index(arange(0,len(df)))
plt.gca().cla()
plt.plot(df.index, df[0])
plt.axhline(y=std, xmin=0, xmax=len(df[0]),c='gray',linestyle='--',lw = 2, hold=None)
plt.axhline(y=-std , xmin=0, xmax=len(df[0]),c='gray',linestyle='--', lw = 2, hold=None)
plt.ylim(min(dframe[0])- 0.5 , max(dframe[0]) )
plt.xlim(-50,window+50)
display.clear_output(wait=True)
display.display(plt.gcf())
canvas = FigureCanvas(fig)
canvas.print_figure('fig.png', dpi=72, bbox_inches='tight')
i += 1
plt.close()
This simulates a flow of real-time data and visualizes it. What I want is to apply theanets RNN LSTM to the data to detect anomalies unsupervised. Because I am doing it unsupervised I don't think that I need to split my data into training and test sets. I haven't found much of anything that makes sense to me so far and have been googling for about 2 hours. Just hoping that you guys may be able to help. I want to put the prediction output of the RNN on the graph as well and define a threshold that, if the error is too large, the values will be identified as anomalous. If you need more information please comment and let me know. Thank you!
READING
Like neurons, LSTM networks are build of interconnected LSTM Blocks whose training is done via BackPropogation Through Time.
Classical anomaly detection using time series required prediction of time series output in future (at one or more points) and finding error on these points with true values. Prediction Error above a threshold will reflect and amomly
SOLUTION
Having said this
You've to train network so you need training sets and test sets both
Use N inputs to predict M outputs (decide upon N and M with experimentation - values for which training error is low)
Scroll a window of (N+M) elements in input data and use this data array of (N+M) items also termed as frame to train or test network.
Typically we use 90% of starting series for training and 10% for testing.
This scheme will fail as if training is not proper there will be false prediction errors which are not-anomaly. So make sure to provide enough training, and most important shuffle training frames and consider all variations.

Resources