Triple loss in keras, how to get the anchor, positive, and negative from merged vector - python-3.x

What I am trying to do is use the triple loss as my loss function, but I don't know if I am getting the right values from the merged vector that is used.
So here is my loss function:
def triplet_loss(y_true, y_pred, alpha=0.2):
"""
Implementation of the triplet loss function
Arguments:
y_true -- true labels, required when you define a loss in Keras, not used in this function.
y_pred -- python list containing three objects:
anchor: the encodings for the anchor data
positive: the encodings for the positive data (similar to anchor)
negative: the encodings for the negative data (different from anchor)
Returns:
loss -- real number, value of the loss
"""
print("Ypred")
print(y_pred.shape)
anchor = y_pred[:,0:512]
positive = y_pred[:,512:1024]
negative = y_pred[:,1024:1536]
print(anchor.shape)
print(positive.shape)
print(negative.shape)
#anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2] # Dont think this is working
# distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)))
print("PosDist", pos_dist)
# distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)))
print("Neg Dist", neg_dist)
# compute loss
basic_loss = (pos_dist - neg_dist) + alpha
loss = tf.maximum(basic_loss, 0.0)
return loss
Now this does work when I use this line in the code and nother the sliceing one
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
But I dont think that this is correct as the shape of the merged vector is (?, 3, 3, 1536)
I think it is grabbing the wrong information. But I cannot seem to figure out how to slice this correctly. as the uncommented code gives me this issue.
Dimensions must be equal, but are 3 and 0 for 'loss_9/concatenate_10_loss/Sub' (op: 'Sub') with input shapes: [?,3,3,1536], [?,0,3,1536].
My network set up is like this:
input_dim = (7,7,2048)
anchor_in = Input(shape=input_dim)
pos_in = Input(shape=input_dim)
neg_in = Input(shape=input_dim)
base_network = create_base_network()
# Run input through base network
anchor_out = base_network(anchor_in)
pos_out = base_network(pos_in)
neg_out = base_network(neg_in)
print(anchor_out.shape)
merged_vector = Concatenate(axis=-1)([anchor_out, pos_out, neg_out])
print("Meged Vector", merged_vector.shape)
print(merged_vector)
model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=merged_vector)
adam = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=adam, loss=triplet_loss)
Update
Using this seems to be right, could anyone confirm this?
anchor = y_pred[:,:,:,0:512]
positive = y_pred[:,:,:,512:1024]
negative = y_pred[:,:,:,1024:1536]

You do not need to do the concatenation operation:
# change this line to this
model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=[anchor_out, pos_out, neg_out])
Complete code:
input_dim = (7,7,2048)
anchor_in = Input(shape=input_dim)
pos_in = Input(shape=input_dim)
neg_in = Input(shape=input_dim)
base_network = create_base_network()
# Run input through base network
anchor_out = base_network(anchor_in)
pos_out = base_network(pos_in)
neg_out = base_network(neg_in)
print(anchor_out.shape)
# code changed here
model = Model(inputs=[anchor_in, pos_in, neg_in], outputs=[anchor_out, pos_out, neg_out])
adam = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=adam, loss=triplet_loss)
Then you can use the following loss:
def triplet_loss(y_true, y_pred, alpha=0.3):
'''
Inputs:
y_true: True values of classification. (y_train)
y_pred: predicted values of classification.
alpha: Distance between positive and negative sample, arbitrarily
set to 0.3
Returns:
Computed loss
Function:
--Implements triplet loss using tensorflow commands
--The following function follows an implementation of Triplet-Loss
where the loss is applied to the network in the compile statement
as usual.
'''
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
positive_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), -1)
negative_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,negative)), -1)
loss_1 = tf.add(tf.subtract(positive_dist, negative_dist), alpha)
loss = tf.reduce_sum(tf.maximum(loss_1, 0.0))
return loss

Related

Regression Model with 3 Hidden DenseVariational Layers in Tensorflow-Probability returns nan as loss during training

I am getting acquainted with Tensorflow-Probability and here I am running into a problem. During training, the model returns nan as the loss (possibly meaning a huge loss that causes overflowing). Since the functional form of the synthetic data is not overly complicated and the ratio of data points to parameters is not frightening at first glance at least I wonder what is the problem and how it could be corrected.
The code is the following --accompanied by some possibly helpful images:
# Create and plot 5000 data points
x_train = np.linspace(-1, 2, 5000)[:, np.newaxis]
y_train = np.power(x_train, 3) + 0.1*(2+x_train)*np.random.randn(5000)[:, np.newaxis]
plt.scatter(x_train, y_train, alpha=0.1)
plt.show()
# Define the prior weight distribution -- all N(0, 1) -- and not trainable
def prior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
prior_model = Sequential([
tfpl.DistributionLambda(
lambda t: tfd.MultivariateNormalDiag(loc = tf.zeros(n) , scale_diag = tf.ones(n)
))
])
return(prior_model)
# Define variational posterior weight distribution -- multivariate Gaussian
def posterior(kernel_size, bias_size, dtype = None):
n = kernel_size + bias_size
posterior_model = Sequential([
tfpl.VariableLayer(tfpl.MultivariateNormalTriL.params_size(n) , dtype = dtype), # The parameters of the model are declared Variables that are trainable
tfpl.MultivariateNormalTriL(n) # The posterior function will return to the Variational layer that will call it a MultivariateNormalTril object that will have as many dimensions
# as the parameters of the Variational Dense Layer. That means that each parameter will be generated by a distinct Normal Gaussian shifted and scaled
# by a mu and sigma learned from the data, independently of all the other weights. The output of this Variablelayer will become the input to the
# MultivariateNormalTriL object.
# The shape of the VariableLayer object will be defined by the number of paramaters needed to create the MultivariateNormalTriL object given
# that it will live in a Space of n dimensions (event_size = n). This number is returned by the tfpl.MultivariateNormalTriL.params_size(n)
])
return(posterior_model)
x_in = Input(shape = (1,))
x = tfpl.DenseVariational(units= 2**4,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0],
activation='relu')(x_in)
x = tfpl.DenseVariational(units= 2**4,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0],
activation='relu')(x)
x = tfpl.DenseVariational(units=tfpl.IndependentNormal.params_size(1),
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0])(x)
y_out = tfpl.IndependentNormal(1)(x)
model = Model(inputs = x_in, outputs = y_out)
def nll(y_true, y_pred):
return -y_pred.log_prob(y_true)
model.compile(loss=nll, optimizer= 'Adam')
model.summary()
Train the model
history = model.fit(x_train1, y_train1, epochs=500)
The problem seems to be in the loss function: negative log-likelihood of the independent normal distribution without any specified location and scale leads to the untamed variance which leads to the blowing up the final loss value. Since you're experimenting with the variational layers, you must be interested in the estimation of the epistemic uncertainty, to that end, I'd recommend to apply the constant variance.
I tried to make a couple of slight changes to your code within the following lines:
first of all, the final output y_out comes directly from the final variational layer without any IndpendnetNormal distribution layer:
y_out = tfpl.DenseVariational(units=1,
make_prior_fn=prior,
make_posterior_fn=posterior,
kl_weight=1/x_train.shape[0])(x)
second, the loss function now contains the necessary calculations with the normal distribution you need but with the static variance in order to avoid the blowing up of the loss during training:
def nll(y_true, y_pred):
dist = tfp.distributions.Normal(loc=y_pred, scale=1.0)
return tf.reduce_sum(-dist.log_prob(y_true))
then the model is compiled and trained in the same way as before:
model.compile(loss=nll, optimizer= 'Adam')
history = model.fit(x_train, y_train, epochs=3000)
and finally let's sample 100 different predictions from the trained model and plot these values to visualize the epistemic uncertainty of the model:
predicted = [model(x_train) for _ in range(100)]
for i, res in enumerate(predicted):
plt.plot(x_train, res , alpha=0.1)
plt.scatter(x_train, y_train, alpha=0.1)
plt.show()
After 3000 epochs the result looks like this (with the reduced number of training points to 3000 instead of 5000 to speed-up the training):
The model has 38,589 trainable parameters but you have only 5,000 points as data; so, effective training is impossible with so many parameters.

What are inputs of Keras layers and custom functions?

Sorry for a nub's question:
Having the NN that is trained in fit_generator mode, say something like:
Lambda(...)
or
Dense(...)
and the custom loss function, what are input tensors?
Am I correct expecting (batch size, previous layer's output) in case of a Lambda layer?
Is it going to be the same (batch size, data) in case of a custom loss function that looks like:
triplet_loss(y_true, y_pred)
Are y_true, y_pred in format (batch,previous layer's output) and (batch, true 'expected' data we fed to NN)?
I would probaly duplicate the dense layers. Instead of having 2 layers with 128 units, have 4 layers with 64 units. The result is the same, but you will be able to perform the cross products better.
from keras.models import Model
#create dense layers and store their output tensors, they use the output of models 1 and to as input
d1 = Dense(64, ....)(Model_1.output)
d2 = Dense(64, ....)(Model_1.output)
d3 = Dense(64, ....)(Model_2.output)
d4 = Dense(64, ....)(Model_2.output)
cross1 = Lambda(myFunc, output_shape=....)([d1,d4])
cross2 = Lambda(myFunc, output_shape=....)([d2,d3])
#I don't really know what kind of "merge" you want, so I used concatenate, there are
Add, Multiply and others....
output = Concatenate()([cross1,cross2])
#use the "axis" attribute of the concatenate layer to define better which axis will
be doubled due to the concatenation
model = Model([Model_1.input,Model_2.input], output)
Now, for the lambda function:
import keras.backend as K
def myFunc(x):
return x[0] * x[1]
custom loss function, what are input tensors?
It depends on how you define your model outputs.
For example, let's define a simple model that returns the input unchanged.
model = Sequential([Lambda(lambda x: x, input_shape=(1,))])
Let's use dummy input X and label Y
x = [[0]]
x = np.array(x)
y = [[4]]
y = np.array(y)
If our custom loss function looks like this
def mce(y_true, y_pred):
print(y_true.shape)
print(y_pred.shape)
return K.mean(K.pow(K.abs(y_true - y_pred), 3))
model.compile('sgd', mce)
and then we can see the shape of y_true and y_pred will be
y_true: (?, ?)
y_pred: (?, 1)
However, for triplet loss the input for the loss function also can be received like this-
ALPHA = 0.2
def triplet_loss(x):
anchor, positive, negative = x
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), ALPHA)
loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
return loss
# Source: https://github.com/davidsandberg/facenet/blob/master/src/facenet.py
def build_model(input_shape):
# Standardizing the input shape order
K.set_image_dim_ordering('th')
positive_example = Input(shape=input_shape)
negative_example = Input(shape=input_shape)
anchor_example = Input(shape=input_shape)
# Create Common network to share the weights along different examples (+/-/Anchor)
embedding_network = faceRecoModel(input_shape)
positive_embedding = embedding_network(positive_example)
negative_embedding = embedding_network(negative_example)
anchor_embedding = embedding_network(anchor_example)
loss = merge([anchor_embedding, positive_embedding, negative_embedding],
mode=triplet_loss, output_shape=(1,))
model = Model(inputs=[anchor_example, positive_example, negative_example],
outputs=loss)
model.compile(loss='mean_absolute_error', optimizer=Adam())
return model

How to deal with triplet loss when at time of input i have only two files i.e. at time of testing

I am implementing a siamese network in which i know how to calculate triplet loss by picking anchor, positive and negative by dividing input in three parts(which is a handcrafted feature vector) and then calculating it at time of training.
anchor_output = ... # shape [None, 128]
positive_output = ... # shape [None, 128]
negative_output = ... # shape [None, 128]
d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)
d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)
loss = tf.maximum(0., margin + d_pos - d_neg)
loss = tf.reduce_mean(loss)
But the problem is when at time of testing i would be having only two files positive and negative then how i would deal with(triplets, as i need one more anchor file but my app only take one picture and compare with in database so only two files in this case), I searched a lot but nobody provided code to deal with this problem only there was code to implement triplet loss but not for whole scenario.
AND I DONT WANT TO USE CONTRASTIVE LOSS
Colab notebook with test code on CIFAR 10:
https://colab.research.google.com/drive/1VgOTzr_VZNHkXh2z9IiTAcEgg5qr19y0
The general idea:
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
img_width = 128
img_height = 128
img_colors = 3
margin = 1.0
VECTOR_SIZE = 32
def triplet_loss(y_true, y_pred):
""" y_true is a dummy value that should be ignored
Uses the inverse of the cosine similarity as a loss.
"""
anchor_vec = y_pred[:, :VECTOR_SIZE]
positive_vec = y_pred[:, VECTOR_SIZE:2*VECTOR_SIZE]
negative_vec = y_pred[:, 2*VECTOR_SIZE:]
d1 = keras.losses.cosine_proximity(anchor_vec, positive_vec)
d2 = keras.losses.cosine_proximity(anchor_vec, negative_vec)
return K.clip(d2 - d1 + margin, 0, None)
def make_image_model():
""" Build a convolutional model that generates a vector
"""
inp = Input(shape=(img_width, img_height, img_colors))
l1 = Conv2D(8, (2, 2))(inp)
l1 = MaxPooling2D()(l1)
l2 = Conv2D(16, (2, 2))(l1)
l2 = MaxPooling2D()(l2)
l3 = Conv2D(16, (2, 2))(l2)
l3 = MaxPooling2D()(l3)
conv_out = Flatten()(l3)
out = Dense(VECTOR_SIZE)(conv_out)
model = Model(inp, out)
return model
def make_siamese_model(img_model):
""" Siamese model input are 3 images base, positive, negative
output is a dummy variable that is ignored for the purposes of loss
calculation.
"""
anchor = Input(shape=(img_width, img_height, img_colors))
positive = Input(shape=(img_width, img_height, img_colors))
negative = Input(shape=(img_width, img_height, img_colors))
anchor_vec = img_model(anchor)
positive_vec = img_model(positive)
negative_vec = img_model(negative)
vecs = Concatenate(axis=1)([anchor_vec, positive_vec, negative_vec])
model = Model([anchor, positive, negative], vecs)
model.compile('adam', triplet_loss)
return model
img_model = make_image_model()
train_model = make_siamese_model(img_model)
img_model.summary()
train_model.summary()
###
train_model.fit(X, dummy_y, ...)
img_model.save('image_model.h5')
###
# In order to use the model
vec_base = img_model.predict(base_image)
vec_test = img_model.predict(test_image)
compare cosine similarity of vec_base and vec_test in order to determine whether base and test are within the acceptable criteria.

Recurrent neural network architecture

I'm working on a RNN architecture which does speech enhancement. The dimensions of the input is [XX, X, 1024] where XX is the batch size and X is the variable sequence length.
The input to the network is positive valued data and the output is masked binary data(IBM) which is later used to construct enhanced signal.
For instance, if the input to network is [10, 65, 1024] the output will be [10,65,1024] tensor with binary values. I'm using Tensorflow with mean squared error as loss function. But I'm not sure which activation function to use here(which keeps the outputs either zero or one), Following is the code I've come up with so far
tf.reset_default_graph()
num_units = 10 #
num_layers = 3 #
dropout = tf.placeholder(tf.float32)
cells = []
for _ in range(num_layers):
cell = tf.contrib.rnn.LSTMCell(num_units)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob = dropout)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
X = tf.placeholder(tf.float32, [None, None, 1024])
Y = tf.placeholder(tf.float32, [None, None, 1024])
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
out_size = Y.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(output, out_size)
prediction = (logit)
flat_Y = tf.reshape(Y, [-1] + Y.shape.as_list()[2:])
flat_logit = tf.reshape(logit, [-1] + logit.shape.as_list()[2:])
loss_op = tf.losses.mean_squared_error(labels=flat_Y, predictions=flat_logit)
#adam optimizier as the optimization function
optimizer = tf.train.AdamOptimizer(learning_rate=0.001) #
train_op = optimizer.minimize(loss_op)
#extract the correct predictions and compute the accuracy
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Also my reconstruction isn't good. Can someone suggest on improving the model?
If you want your outputs to be either 0 or 1, to me it seems a good idea to turn this into a classification problem. To this end, I would use a sigmoidal activation and cross entropy:
...
prediction = tf.nn.sigmoid(logit)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logit))
...
In addition, from my point of view the hidden dimensionality (10) of your stacked RNNs seems quite small for such a big input dimensionality (1024). However this is just a guess, and it is something that needs to be tuned.

How do I obtain predictions and probabilities from new data input to a CNN in Tensorflow

I'll preface this by saying this is my first posted question on SO. I've just recently started working with Tensorflow, and have been attempting to apply a convolutional-neural network model approach for classification of .csv records in a file representing images from scans of microarray data. (FYI: Microarrays are a grid of spotted DNA on a glass slide, representing specific DNA target sequences for determining the presence of those DNA targets in a sample. The individual pixels represent fluorescence intensity value from 0-1). The file has ~200,000 records in total. Each record (image) has 10816 pixels that represent DNA sequences from known viruses, and one index label which identifies the virus species. The pixels create a pattern which is unique to each of the different viruses. There are 2165 different viruses in total represented within the 200,000 records. I have trained the network on images of labeled microarray datasets, but when I try to pass a new dataset through to classify it/them as one of the 2165 different viruses and determine predicted values and probabilities, I don't seem to be having much luck. This is the code that I am currently using for this:
import tensorflow as tf
import numpy as np
import csv
def extract_data(filename):
print("extracting data...")
NUM_LABELS = 2165
NUM_FEATURES = 10816
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
labels.append(row[3])#(int(row[7])) #<<<IT ALWAYS PREDICTS THIS VALUE!
for x in row [4:10820]:
fvecs.append(float(x))
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
def TestModels():
fvecs_np, labels_onehot = extract_data("MicroarrayTestData.csv")
print('RESTORING NN MODEL')
weights = {}
biases = {}
sess=tf.Session()
init = tf.global_variables_initializer()
#Load meta graph and restore weights
ModelID = "MicroarrayCNN_Data-1000.meta"
print("RESTORING:::", ModelID)
saver = tf.train.import_meta_graph(ModelID)
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
x = graph.get_tensor_by_name("x:0")
y = graph.get_tensor_by_name("y:0")
keep_prob = tf.placeholder(tf.float32)
y_ = tf.placeholder("float", shape=[None, 2165])
wc1 = graph.get_tensor_by_name("wc1:0")
wc2 = graph.get_tensor_by_name("wc2:0")
wd1 = graph.get_tensor_by_name("wd1:0")
Wout = graph.get_tensor_by_name("Wout:0")
bc1 = graph.get_tensor_by_name("bc1:0")
bc2 = graph.get_tensor_by_name("bc2:0")
bd1 = graph.get_tensor_by_name("bd1:0")
Bout = graph.get_tensor_by_name("Bout:0")
weights = {wc1, wc2, wd1, Wout}
biases = {bc1, bc2, bd1, Bout}
print("NEXTArgmax")
prediction=tf.argmax(y,1)
probabilities = y
predY = prediction.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
probY = probabilities.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
accuracy = tf.reduce_mean(tf.cast(prediction, "float"))
print(sess.run(accuracy, feed_dict={x: fvecs_np, y: labels_onehot}))
print("%%%%%%%%%%%%%%%%%%%%%%%%%%")
print("Predicted::: ", predY, accuracy)
print("%%%%%%%%%%%%%%%%%%%%%%%%%%")
feed_dictTEST = {y: labels_onehot}
probabilities=probY
print("probabilities", probabilities.eval(feed_dict={x: fvecs_np}, session=sess))
########## Run Analysis ###########
TestModels()
So, when I run this code I get the correct prediction for the test set, although I am not sure I believe it, because it appears that whatever value I append in line 14 (see below) is the output it predicts:
labels.append(row[3])#<<<IT ALWAYS PREDICTS THIS VALUE!
I don't understand this, and it makes me suspicious that I've set up the CNN incorrectly, as I would have expected it to ignore my input label and determine a bast match from the trained network based on the trained patterns. The only thing I can figure is that when I pass the value through for the prediction; it is instead training the model on this data as well, and then predicting itself. Is this a correct assumption, or am I misinterpreting how Tensorflow works?
The other issue is that when I try to use code that (based on other tutorials) which is supposed to output the probabilities of all of the 2165 possible outputs, I get the error:
InvalidArgumentError (see above for traceback): Shape [-1,2165] has negative dimensions
[[Node: y = Placeholder[dtype=DT_FLOAT, shape=[?,2165], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
To me, it looks like it is the correct layer based on the 2165 value in the Tensor shape, but I don't understand the -1 value. So, to wrap up the summary, my questions are:
Based on the fact that I get the value that I have in the label of the input data, is this the correct method to make a classification using this model?
Am I missing a layer or have I configured the model incorrectly in order to extract the probabilities of all of the possible output classes, or am I using the wrong code to extract the information? I try to print out the accuracy to see if that would work, but instead it outputs the description of a tensor, so clearly that is incorrect as well.
(ADDITIONAL INFORMATION)
As requested, I'm also including the original code that was used to train the model, which is now below. You can see I do sort of a piece meal training of a limited number of related records at a time by their taxonomic relationships as I iterate through the file. This is mostly because the Mac that I'm training on (Mac Pro w/ 64GB ram) tends to give me the "Killed -9" error due to overuse of resources if I don't do it this way. There may be a better way to do it, but this seems to work.
Original Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
from __future__ import print_function
import tensorflow as tf
import numpy as np
import csv
import random
# Parameters
num_epochs = 2
train_size = 1609
learning_rate = 0.001 #(larger >speed, lower >accuracy)
training_iters = 5000 # How much do you want to train (more = better trained)
batch_size = 32 #How many samples to train on, size of the training batch
display_step = 10 # How often to diplay what is going on during training
# Network Parameters
n_input = 10816 # MNIST data input (img shape: 28*28)...in my case 104x104 = 10816(rough array size)
n_classes = 2165 #3280 #2307 #787# Switched to 100 taxa/training set, dynamic was too wonky.
dropout = 0.75 # Dropout, probability to keep units. Jeffery Hinton's group developed it, that prevents overfitting to find new paths. More generalized model.
# Functions
def extract_data(filename):
print("extracting data...")
# arrays to hold the labels and feature vectors.
NUM_LABELS = 2165
NUM_FEATURES = 10826
taxCount = 0
taxCurrent = 0
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
print("entering CNN loop")
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
taxCurrent = row[3]
print("profile:", row[0:12])
labels.append(int(row[3]))
fvecs.append([float(x) for x in row [4:10820]])
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1): #Layer 1 : Convolutional layer
# Conv2D wrapper, with bias and relu activation
print("conv2d")
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') # Strides are the tensors...list of integers. Tensors=data
x = tf.nn.bias_add(x, b) #bias is the tuning knob
return tf.nn.relu(x) #rectified linear unit (activation function)
def maxpool2d(x, k=2): #Layer 2 : Takes samples from the image. (This is a 4D tensor)
print("maxpool2d")
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')
# Create model
def conv_net(x, weights, biases, dropout):
print("conv_net setup")
# Reshape input picture
x = tf.reshape(x, shape=[-1, 104, 104, 1]) #-->52x52 , -->26x26x64
# Convolution Layer
conv1 = conv2d(x, weights['wc1'], biases['bc1']) #defined above already
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)
print(conv1.get_shape)
# Convolution Layer
conv2 = conv2d(conv1, weights['wc2'], biases['bc2']) #wc2 and bc2 are just placeholders...could actually skip this layer...maybe
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
print(conv2.get_shape)
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1) #activation function for the NN
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)
# Output, class prediction
out = tf.add(tf.matmul(fc1, weights['Wout']), biases['Bout'])
return out
def Train_Network(Txid_IN, Sess_File_Name):
import tensorflow as tf
tf.reset_default_graph()
x,y = 0,0
weights = {}
biases = {}
# tf Graph input
print("setting placeholders")
x = tf.placeholder(tf.float32, [None, n_input], name="x") #Gateway for data (images)
y = tf.placeholder(tf.float32, [None, n_classes], name="y") # Gateway for data (labels)
keep_prob = tf.placeholder(tf.float32) #dropout # Gateway for dropout(keep probability)
# Store layers weight & bias
#CREATE weights
weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32]),name="wc1"), #
# 5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64]),name="wc2"),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([26*26*64, 1024]),name="wd1"),
# 1024 inputs, 10 outputs (class prediction)
'Wout': tf.Variable(tf.random_normal([1024, n_classes]),name="Wout")
}
biases = {
'bc1': tf.Variable(tf.random_normal([32]), name="bc1"),
'bc2': tf.Variable(tf.random_normal([64]), name="bc2"),
'bd1': tf.Variable(tf.random_normal([1024]), name="bd1"),
'Bout': tf.Variable(tf.random_normal([n_classes]), name="Bout")
}
# Construct model
print("constructing model")
pred = conv_net(x, weights, biases, keep_prob)
print(pred)
# Define loss(cost) and optimizer
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) Deprecated version of the statement
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels=y)) #added reduce_mean 6/27
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
print("%%%%%%%%%%%%%%%%%%%%")
print ("%% ", correct_pred)
print ("%% ", accuracy)
print("%%%%%%%%%%%%%%%%%%%%")
# Initializing the variables
#init = tf.initialize_all_variables()
init = tf.global_variables_initializer()
saver = tf.train.Saver()
fvecs_np, labels_onehot = extract_data("MicroarrayDataOUT.csv") #CHAGE TO PICORNAVIRUS!!!!!AHHHHHH!!!
print("starting session")
# Launch the graph
FitStep = 0
with tf.Session() as sess: #graph is encapsulated by its session
sess.run(init)
step = 1
# Keep training until reach max iterations (training_iters)
while step * batch_size < training_iters:
if FitStep >= 5:
break
else:
#iterate and train
print(step)
print(fvecs_np, labels_onehot)
for step in range(num_epochs * train_size // batch_size):
sess.run(optimizer, feed_dict={x: fvecs_np, y: labels_onehot, keep_prob:dropout}) #no dropout???...added Keep_prob:dropout
if FitStep >= 5:
break
#else:
###batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
###sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
### keep_prob: dropout}) <<<<SOMETHING IS WRONG IN HERE?!!!
if step % display_step == 0:
# Calculate batch loss and accuracy
loss, acc = sess.run([cost, accuracy], feed_dict={x: fvecs_np,
y: labels_onehot,
keep_prob: 1.})
print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(np.mean(loss)) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
TrainAcc = float("{:.5f}".format(acc))
#print("******", TrainAcc)
if TrainAcc >= .99: #Changed from .95 temporarily
print(FitStep)
FitStep = FitStep+1
saver.save(sess, Sess_File_Name, global_step=1000) #
print("Saved Session:", Sess_File_Name)
step += 1
print("Optimization Finished!")
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: fvecs_np[:256],
y: labels_onehot[:256],
keep_prob: 1.}))
#feed_dictTEST = {x: fvecs_np[50]}
#prediction=tf.argmax(y,1)
#print(prediction)
#best = sess.run([prediction],feed_dictTEST)
#print(best)
print("DONE")
sess.close()
def Tax_Iterator(CSV_inFile, CSV_outFile): #Deprecate
#Need to copy *.csv file to MySQL for sorting
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
TaxCount = 0
TaxThreshold = 2165
ThresholdStep = 2165
PrevTax = 0
linecounter = 0
#Open all GenBank profile list
for line in open(CSV_inFile):
linecounter = linecounter+1
print(linecounter)
resultFile = open(CSV_outFile,'a')
wr = csv.writer(resultFile, dialect='excel')
# Check for new TXID
row = line.split(',')
print(row[7], "===", PrevTax)
if row[7] != PrevTax:
print("X1")
TaxCount = TaxCount+1
PrevTax = row[7]
#Check it current Tax count is < or > threshold
# < threshold
print(TaxCount,"=+=", TaxThreshold)
if TaxCount<=3300:
print("X2")
CurrentTax= row[7]
CurrTxCount = CurrentTax
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
wr.writerow(row[0:-1])
# is > threshold
else:
print("X3")
# but same TXID....
print(row[7], "=-=", CurrentTax)
if row[7]==CurrentTax:
print("X4")
CurrentTax= row[7]
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
wr.writerow(row[0:-1])
# but different TXID...
else:
print(row[7], "=*=", CurrentTax)
if row[7]>CurrentTax:
print("X5")
TaxThreshold=TaxThreshold+ThresholdStep
resultFile.close()
Sess_File_Name = "CNN_VirusIDvSPECIES_XXALL"+ str(TaxThreshold-ThresholdStep)
print("<<<< Start Training >>>>"
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
print("Training complete")
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
CurrentTax= row[7]
#reset tax count
CurrTxCount = 0
TaxCount = 0
resultFile.close()
Sess_File_Name = "MicroarrayCNN_Data"+ str(TaxThreshold+ThresholdStep)
print("<<<< Start Training >>>>")
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
CurrentTax= row[7]
Tax_Iterator("MicroarrayInput.csv", "MicroarrayOutput.csv")
You defined prediction as prediction=tf.argmax(y,1). And in both feed_dict, you feed labels_onehot for y. Consequently, your "prediction" is always equal to the labels.
As you didn't post the code you used to train your network, I can't tell you what exactly you need to change.
Edit: I have isses understanding the underlying problem you're trying to solve - based on your code, you're trying to train a neural network with 2165 different classes using 1609 training examples. How is this even possible? If each example had a different class, there would still be some classes without any training example. Or does one image belong to many classes? From your statement at the beginning of your question, I had assumed you're trying to output a real-valued number between 0-1.
I'm actually surprised that the code actually worked as it looks like you're adding only a single number to your labels list, but your model expects a list with length 2165 for each training example.

Resources