Hi I'm training my pytorch model on remote server.
All the job is managed by slurm.
My problem is 'training is extremely slower after training first epoch.'
I checked gpu utilization.
On my first epoch, utilization was like below image.
I can see gpu was utilized.
But from second epoch utilized percentage is almos zero
My dataloader code like this
class img2selfie_dataset(Dataset):
def __init__(self, path, transform, csv_file, cap_vec):
self.path = path
self.transformer = transform
self.images = [path + item for item in list(csv_file['file_name'])]
self.smiles_list = cap_vec
def __getitem__(self, idx):
img = Image.open(self.images[idx])
img = self.transformer(img)
label = self.smiles_list[idx]
label = torch.Tensor(label)
return img, label.type(torch.LongTensor)
def __len__(self):
return len(self.images)
My dataloader is defined like this
train_data_set = img2selfie_dataset(train_path, preprocess, train_dataset, train_cap_vec)
train_loader = DataLoader(train_data_set, batch_size = 256, num_workers = 2, pin_memory = True)
val_data_set = img2selfie_dataset(train_path, preprocess, val_dataset, val_cap_vec)
val_loader = DataLoader(val_data_set, batch_size = 256, num_workers = 2, pin_memory = True)
My training step defined like this
train_loss = []
valid_loss = []
epochs = 20
best_loss = 1e5
for epoch in range(1, epochs + 1):
print('Epoch {}/{}'.format(epoch, epochs))
print('-' * 10)
epoch_train_loss, epoch_valid_loss = train(encoder_model, transformer_decoder, train_loader, val_loader, criterion, optimizer)
train_loss.append(epoch_train_loss)
valid_loss.append(epoch_valid_loss)
if len(valid_loss) > 1:
if valid_loss[-1] < best_loss:
print(f"valid loss on this {epoch} is better than previous one, saving model.....")
torch.save(encoder_model.state_dict(), 'model/encoder_model.pickle')
torch.save(transformer_decoder.state_dict(), 'model/decoder_model.pickle')
best_loss = valid_loss[-1]
print(best_loss)
print(f'Epoch : [{epoch}] Train Loss : [{train_loss[-1]:.5f}], Valid Loss : [{valid_loss[-1]:.5f}]')
In my opinion, if this problem comes from my code. It wouldn't have hitted 100% utilization in first epoch.
I fixed this issue with moving my training data into local drive.
My remote server(school server) policy was storing personel data into NAS.
And file i/o from NAS proveked heavy load on network.
It was also affected by other user's file i/o from NAS.
After I moved training data into NAS, everything is fine.
I'm doing a binary classification, hence I used a binary cross entropy loss:
criterion = torch.nn.BCELoss()
However, I'm getting an error:
Using a target size (torch.Size([64, 1])) that is different to the input size (torch.Size([64, 2])) is deprecated. Please ensure they have the same size.
My model ends with:
x = self.wave_block6(x)
x = self.sigmoid(self.fc(x))
return x.squeeze()
I tried removing the squeeze, but to no avail. My batch size is 64. It seems like I'm doing something simple wrong here. Is my model giving 1 output and BCE loss expecting 2 inputs? Which loss should I use then?
Binary Cross-Entropy Loss (BCELoss) is used for binary classification tasks. Therefore if N is your batch size, your model output should be of shape [64, 1] and your labels must be of shape [64].Therefore just squeeze your output at the 2nd dimension and pass it to the loss function -
Here is a minimal working example
import torch
a = torch.randn((64, 1))
b = torch.randn((64))
loss = torch.nn.BCELoss()
b = torch.round(torch.sigmoid(b)) # just to create some labels
a = torch.sigmoid(a).squeeze(1)
l = loss(a, b)
Update - Basing on the conversation in the comments, focal loss can be defined as follows -
class focalLoss(nn.Module):
def __init__(self, alpha=0.25, gamma=3):
super(focalLoss, self).__init__()
self.alpha = alpha
self.gamma = gamma
def forward(self, pred_logits: torch.Tensor, target: torch.Tensor):
batch_size = pred_logits.shape[0]
pred = pred.view(batch_size, -1)
target = target.view(batch_size, -1)
pred = pred_logits.sigmoid()
ce = F.binary_cross_entropy(pred_logits, target, reduction='none')
alpha = target * self.alpha + (1. - target) * (1. - self.alpha)
pt = torch.where(target == 1, pred, 1 - pred)
return alpha * (1. - pt) ** self.gamma * ce
I'm trying to make the model predict a word from a sentence using pretrained Huggingface's BERT as feature extractor. The model look like this
class BertAutoEncoder(nn.Module):
def __init__(self, vocab_size):
super().__init__()
decoder_layer = nn.TransformerDecoderLayer(768, 2, 1024, dropout=0.1)
self.transformer_decoder = nn.TransformerDecoder(decoder_layer, 2)
self.fc = nn.Linear(768, vocab_size)
def forward(self, memory, embedded_word):
output = self.transformer_decoder(embedded_word, memory)
output = self.fc(output)
return output
And when train/evaluate I call the model like this
bert = BertModel.from_pretrained('bert-base-uncased')
bert.requires_grad_(False)
...
memory = bert(**src).last_hidden_state.transpose(0, 1)
embeded_word = bert.embeddings(trg.data['input_ids'][:, :-1], token_type_ids=trg.data['token_type_ids'][:, :-1]).transpose(0, 1)
output = model(memory, embeded_word)
The loss reduced nicely but turned out the model only predict <eos> token.
I tried train the model with 1 batch of 32 samples and it did work when loss reduced pass 8e-6 but when I trained it with all data the loss could go way beyond that but none of the saved models work. Even the one with eval or train loss around 4e-6 - 8e-6.
Surprisingly the model would work if I use a separate decoder's Embedding like this
class BertAutoEncoderOld(nn.Module):
def __init__(self, vocab_size):
super().__init__()
decoder_layer = nn.TransformerDecoderLayer(768, 2, 1024, dropout=0.1)
self.transformer_decoder = nn.TransformerDecoder(decoder_layer, 2)
self.decoder = nn.Embedding(vocab_size, 768)
self.pos_decoder = PositionalEncoding(768, 0.5)
self.fc = nn.Linear(768, vocab_size)
def forward(self, memory, word):
tgt = self.decoder(word.data['input_ids'][:, :-1].transpose(0, 1))
tgt = self.pos_decoder(tgt)
output = self.transformer_decoder(tgt, memory)
output = self.fc(output)
return output
But I was asked to make it work with one Embedding and I have no idea how.
I tried
Reduce/increase batch from 32 to 8-64
Also tried 2 and 1024 batch size
Remove <eos> token and change it's attention mask to 0
But none of those work.
What did I do wrong and how to fix it?
Thanks
Edit per #emily qeustion
I change the data itself in collate function
text.data['attention_mask'][text.data['input_ids'] == 102] = 0
text.data['input_ids'][text.data['input_ids'] == 102] = 0
word.data['attention_mask'][word.data['input_ids'] == 102] = 0
word.data['input_ids'][word.data['input_ids'] == 102] = 0
It only used in Bert though.
Let us say that we create a small network:
tf.reset_default_graph()
layers = [5, 3, 1]
activations = [tf.tanh, tf.tanh, None]
inp = tf.placeholder(dtype=tf.float32, shape=(None, 2 ), name='inp')
out = tf.placeholder(dtype=tf.float32, shape=(None, 1 ), name='out')
isTraining = tf.placeholder(dtype=tf.bool, shape=(), name='isTraining')
N = inp * 1 # I am lazy
for i, (l, a) in enumerate(zip(layers, activations)):
N = tf.layers.dense(N, l, None)
#N = tf.layers.batch_normalization( N, training = isTraining) # comment this line
if a is not None:
N = a(N)
err = tf.reduce_mean((N - out)**2)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
opt = tf.train.AdamOptimizer(0.05).minimize(err)
# insert vectors from the batch normalization
tVars = tf.trainable_variables()
graph = tf.get_default_graph()
for v in graph.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):
if all([
('batch_normalization' in v.name),
('optimizer' not in v.name),
v not in tVars ]):
tVars.append(v)
init = tf.global_variables_initializer()
saver = tf.train.Saver(var_list= tVars)
This is a simple NN generated for optimization. The only thing that I am currently interested in is batch optimization (the line that has been commented out). Now, we train this network, save it, restore its and calculate the error again, we do ok:
# Generate random data
N = 1000
X = np.random.rand(N, 2)
y = 2*X[:, 0] + 3*X[:, 1] + 3
y = y.reshape(-1, 1)
# Run the session and save it
with tf.Session() as sess:
sess.run(init)
print('During Training')
for i in range(3000):
_, errVal = sess.run([opt, err], feed_dict={inp:X, out:y, isTraining:True})
if i %500 == 0:
print(errVal)
shutil.rmtree('models1', ignore_errors=True)
os.makedirs('models1')
path = saver.save( sess, 'models1/model.ckpt' )
# restore the session
print('During testing')
with tf.Session() as sess:
saver.restore(sess, path)
errVal = sess.run(err, feed_dict={inp:X, out:y, isTraining:False})
print( errVal )
Here is the output:
During Training
24.4422
0.00330666
0.000314223
0.000106421
6.00441e-05
4.95262e-05
During testing
INFO:tensorflow:Restoring parameters from models1/model.ckpt
5.5899e-05
On the other hand, when we uncomment the batch normalization line, and redo the above calculation:
During Training
31.7372
1.92066e-05
3.87879e-06
2.55274e-06
1.25418e-06
1.43078e-06
During testing
INFO:tensorflow:Restoring parameters from models1/model.ckpt
0.041519
As you can see, the restored value is far from what the model is predicting. Is there anything that I am doing wrong?
Note: I know that for batch-normalization I need to generate mini batches. I have skipped all of that to keep the code simple and yet complete.
Batch normalization layer, as defined in Tensorflow, needs to have access to the placeholder isTraining (https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization). Make sure you include it when you define the layer: tf.layers.batch_normalization(..., training=isTraining, ...).
The reason for this is that Batch Normalization Layers have 2 trainable parameters (beta and gamma) that are trained normally with the rest of the network, but they also have 2 extra parameters (batch mean and variance) that require you to tell them to train. you do this simply by aplying the recipe above.
Right now your code seems not to be training mean and variance. Instead, they are randomly fixed and the network is optimized with those. Later on, when you save and restore, they are reinitialized with different values, hence the network doesn't perform as it used to.
I'll preface this by saying this is my first posted question on SO. I've just recently started working with Tensorflow, and have been attempting to apply a convolutional-neural network model approach for classification of .csv records in a file representing images from scans of microarray data. (FYI: Microarrays are a grid of spotted DNA on a glass slide, representing specific DNA target sequences for determining the presence of those DNA targets in a sample. The individual pixels represent fluorescence intensity value from 0-1). The file has ~200,000 records in total. Each record (image) has 10816 pixels that represent DNA sequences from known viruses, and one index label which identifies the virus species. The pixels create a pattern which is unique to each of the different viruses. There are 2165 different viruses in total represented within the 200,000 records. I have trained the network on images of labeled microarray datasets, but when I try to pass a new dataset through to classify it/them as one of the 2165 different viruses and determine predicted values and probabilities, I don't seem to be having much luck. This is the code that I am currently using for this:
import tensorflow as tf
import numpy as np
import csv
def extract_data(filename):
print("extracting data...")
NUM_LABELS = 2165
NUM_FEATURES = 10816
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
labels.append(row[3])#(int(row[7])) #<<<IT ALWAYS PREDICTS THIS VALUE!
for x in row [4:10820]:
fvecs.append(float(x))
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
def TestModels():
fvecs_np, labels_onehot = extract_data("MicroarrayTestData.csv")
print('RESTORING NN MODEL')
weights = {}
biases = {}
sess=tf.Session()
init = tf.global_variables_initializer()
#Load meta graph and restore weights
ModelID = "MicroarrayCNN_Data-1000.meta"
print("RESTORING:::", ModelID)
saver = tf.train.import_meta_graph(ModelID)
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
x = graph.get_tensor_by_name("x:0")
y = graph.get_tensor_by_name("y:0")
keep_prob = tf.placeholder(tf.float32)
y_ = tf.placeholder("float", shape=[None, 2165])
wc1 = graph.get_tensor_by_name("wc1:0")
wc2 = graph.get_tensor_by_name("wc2:0")
wd1 = graph.get_tensor_by_name("wd1:0")
Wout = graph.get_tensor_by_name("Wout:0")
bc1 = graph.get_tensor_by_name("bc1:0")
bc2 = graph.get_tensor_by_name("bc2:0")
bd1 = graph.get_tensor_by_name("bd1:0")
Bout = graph.get_tensor_by_name("Bout:0")
weights = {wc1, wc2, wd1, Wout}
biases = {bc1, bc2, bd1, Bout}
print("NEXTArgmax")
prediction=tf.argmax(y,1)
probabilities = y
predY = prediction.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
probY = probabilities.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
accuracy = tf.reduce_mean(tf.cast(prediction, "float"))
print(sess.run(accuracy, feed_dict={x: fvecs_np, y: labels_onehot}))
print("%%%%%%%%%%%%%%%%%%%%%%%%%%")
print("Predicted::: ", predY, accuracy)
print("%%%%%%%%%%%%%%%%%%%%%%%%%%")
feed_dictTEST = {y: labels_onehot}
probabilities=probY
print("probabilities", probabilities.eval(feed_dict={x: fvecs_np}, session=sess))
########## Run Analysis ###########
TestModels()
So, when I run this code I get the correct prediction for the test set, although I am not sure I believe it, because it appears that whatever value I append in line 14 (see below) is the output it predicts:
labels.append(row[3])#<<<IT ALWAYS PREDICTS THIS VALUE!
I don't understand this, and it makes me suspicious that I've set up the CNN incorrectly, as I would have expected it to ignore my input label and determine a bast match from the trained network based on the trained patterns. The only thing I can figure is that when I pass the value through for the prediction; it is instead training the model on this data as well, and then predicting itself. Is this a correct assumption, or am I misinterpreting how Tensorflow works?
The other issue is that when I try to use code that (based on other tutorials) which is supposed to output the probabilities of all of the 2165 possible outputs, I get the error:
InvalidArgumentError (see above for traceback): Shape [-1,2165] has negative dimensions
[[Node: y = Placeholder[dtype=DT_FLOAT, shape=[?,2165], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
To me, it looks like it is the correct layer based on the 2165 value in the Tensor shape, but I don't understand the -1 value. So, to wrap up the summary, my questions are:
Based on the fact that I get the value that I have in the label of the input data, is this the correct method to make a classification using this model?
Am I missing a layer or have I configured the model incorrectly in order to extract the probabilities of all of the possible output classes, or am I using the wrong code to extract the information? I try to print out the accuracy to see if that would work, but instead it outputs the description of a tensor, so clearly that is incorrect as well.
(ADDITIONAL INFORMATION)
As requested, I'm also including the original code that was used to train the model, which is now below. You can see I do sort of a piece meal training of a limited number of related records at a time by their taxonomic relationships as I iterate through the file. This is mostly because the Mac that I'm training on (Mac Pro w/ 64GB ram) tends to give me the "Killed -9" error due to overuse of resources if I don't do it this way. There may be a better way to do it, but this seems to work.
Original Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
from __future__ import print_function
import tensorflow as tf
import numpy as np
import csv
import random
# Parameters
num_epochs = 2
train_size = 1609
learning_rate = 0.001 #(larger >speed, lower >accuracy)
training_iters = 5000 # How much do you want to train (more = better trained)
batch_size = 32 #How many samples to train on, size of the training batch
display_step = 10 # How often to diplay what is going on during training
# Network Parameters
n_input = 10816 # MNIST data input (img shape: 28*28)...in my case 104x104 = 10816(rough array size)
n_classes = 2165 #3280 #2307 #787# Switched to 100 taxa/training set, dynamic was too wonky.
dropout = 0.75 # Dropout, probability to keep units. Jeffery Hinton's group developed it, that prevents overfitting to find new paths. More generalized model.
# Functions
def extract_data(filename):
print("extracting data...")
# arrays to hold the labels and feature vectors.
NUM_LABELS = 2165
NUM_FEATURES = 10826
taxCount = 0
taxCurrent = 0
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
print("entering CNN loop")
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
taxCurrent = row[3]
print("profile:", row[0:12])
labels.append(int(row[3]))
fvecs.append([float(x) for x in row [4:10820]])
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1): #Layer 1 : Convolutional layer
# Conv2D wrapper, with bias and relu activation
print("conv2d")
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') # Strides are the tensors...list of integers. Tensors=data
x = tf.nn.bias_add(x, b) #bias is the tuning knob
return tf.nn.relu(x) #rectified linear unit (activation function)
def maxpool2d(x, k=2): #Layer 2 : Takes samples from the image. (This is a 4D tensor)
print("maxpool2d")
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')
# Create model
def conv_net(x, weights, biases, dropout):
print("conv_net setup")
# Reshape input picture
x = tf.reshape(x, shape=[-1, 104, 104, 1]) #-->52x52 , -->26x26x64
# Convolution Layer
conv1 = conv2d(x, weights['wc1'], biases['bc1']) #defined above already
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)
print(conv1.get_shape)
# Convolution Layer
conv2 = conv2d(conv1, weights['wc2'], biases['bc2']) #wc2 and bc2 are just placeholders...could actually skip this layer...maybe
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
print(conv2.get_shape)
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1) #activation function for the NN
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)
# Output, class prediction
out = tf.add(tf.matmul(fc1, weights['Wout']), biases['Bout'])
return out
def Train_Network(Txid_IN, Sess_File_Name):
import tensorflow as tf
tf.reset_default_graph()
x,y = 0,0
weights = {}
biases = {}
# tf Graph input
print("setting placeholders")
x = tf.placeholder(tf.float32, [None, n_input], name="x") #Gateway for data (images)
y = tf.placeholder(tf.float32, [None, n_classes], name="y") # Gateway for data (labels)
keep_prob = tf.placeholder(tf.float32) #dropout # Gateway for dropout(keep probability)
# Store layers weight & bias
#CREATE weights
weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32]),name="wc1"), #
# 5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64]),name="wc2"),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([26*26*64, 1024]),name="wd1"),
# 1024 inputs, 10 outputs (class prediction)
'Wout': tf.Variable(tf.random_normal([1024, n_classes]),name="Wout")
}
biases = {
'bc1': tf.Variable(tf.random_normal([32]), name="bc1"),
'bc2': tf.Variable(tf.random_normal([64]), name="bc2"),
'bd1': tf.Variable(tf.random_normal([1024]), name="bd1"),
'Bout': tf.Variable(tf.random_normal([n_classes]), name="Bout")
}
# Construct model
print("constructing model")
pred = conv_net(x, weights, biases, keep_prob)
print(pred)
# Define loss(cost) and optimizer
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) Deprecated version of the statement
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels=y)) #added reduce_mean 6/27
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
print("%%%%%%%%%%%%%%%%%%%%")
print ("%% ", correct_pred)
print ("%% ", accuracy)
print("%%%%%%%%%%%%%%%%%%%%")
# Initializing the variables
#init = tf.initialize_all_variables()
init = tf.global_variables_initializer()
saver = tf.train.Saver()
fvecs_np, labels_onehot = extract_data("MicroarrayDataOUT.csv") #CHAGE TO PICORNAVIRUS!!!!!AHHHHHH!!!
print("starting session")
# Launch the graph
FitStep = 0
with tf.Session() as sess: #graph is encapsulated by its session
sess.run(init)
step = 1
# Keep training until reach max iterations (training_iters)
while step * batch_size < training_iters:
if FitStep >= 5:
break
else:
#iterate and train
print(step)
print(fvecs_np, labels_onehot)
for step in range(num_epochs * train_size // batch_size):
sess.run(optimizer, feed_dict={x: fvecs_np, y: labels_onehot, keep_prob:dropout}) #no dropout???...added Keep_prob:dropout
if FitStep >= 5:
break
#else:
###batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
###sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
### keep_prob: dropout}) <<<<SOMETHING IS WRONG IN HERE?!!!
if step % display_step == 0:
# Calculate batch loss and accuracy
loss, acc = sess.run([cost, accuracy], feed_dict={x: fvecs_np,
y: labels_onehot,
keep_prob: 1.})
print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(np.mean(loss)) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
TrainAcc = float("{:.5f}".format(acc))
#print("******", TrainAcc)
if TrainAcc >= .99: #Changed from .95 temporarily
print(FitStep)
FitStep = FitStep+1
saver.save(sess, Sess_File_Name, global_step=1000) #
print("Saved Session:", Sess_File_Name)
step += 1
print("Optimization Finished!")
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: fvecs_np[:256],
y: labels_onehot[:256],
keep_prob: 1.}))
#feed_dictTEST = {x: fvecs_np[50]}
#prediction=tf.argmax(y,1)
#print(prediction)
#best = sess.run([prediction],feed_dictTEST)
#print(best)
print("DONE")
sess.close()
def Tax_Iterator(CSV_inFile, CSV_outFile): #Deprecate
#Need to copy *.csv file to MySQL for sorting
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
TaxCount = 0
TaxThreshold = 2165
ThresholdStep = 2165
PrevTax = 0
linecounter = 0
#Open all GenBank profile list
for line in open(CSV_inFile):
linecounter = linecounter+1
print(linecounter)
resultFile = open(CSV_outFile,'a')
wr = csv.writer(resultFile, dialect='excel')
# Check for new TXID
row = line.split(',')
print(row[7], "===", PrevTax)
if row[7] != PrevTax:
print("X1")
TaxCount = TaxCount+1
PrevTax = row[7]
#Check it current Tax count is < or > threshold
# < threshold
print(TaxCount,"=+=", TaxThreshold)
if TaxCount<=3300:
print("X2")
CurrentTax= row[7]
CurrTxCount = CurrentTax
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
wr.writerow(row[0:-1])
# is > threshold
else:
print("X3")
# but same TXID....
print(row[7], "=-=", CurrentTax)
if row[7]==CurrentTax:
print("X4")
CurrentTax= row[7]
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
wr.writerow(row[0:-1])
# but different TXID...
else:
print(row[7], "=*=", CurrentTax)
if row[7]>CurrentTax:
print("X5")
TaxThreshold=TaxThreshold+ThresholdStep
resultFile.close()
Sess_File_Name = "CNN_VirusIDvSPECIES_XXALL"+ str(TaxThreshold-ThresholdStep)
print("<<<< Start Training >>>>"
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
print("Training complete")
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
CurrentTax= row[7]
#reset tax count
CurrTxCount = 0
TaxCount = 0
resultFile.close()
Sess_File_Name = "MicroarrayCNN_Data"+ str(TaxThreshold+ThresholdStep)
print("<<<< Start Training >>>>")
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
CurrentTax= row[7]
Tax_Iterator("MicroarrayInput.csv", "MicroarrayOutput.csv")
You defined prediction as prediction=tf.argmax(y,1). And in both feed_dict, you feed labels_onehot for y. Consequently, your "prediction" is always equal to the labels.
As you didn't post the code you used to train your network, I can't tell you what exactly you need to change.
Edit: I have isses understanding the underlying problem you're trying to solve - based on your code, you're trying to train a neural network with 2165 different classes using 1609 training examples. How is this even possible? If each example had a different class, there would still be some classes without any training example. Or does one image belong to many classes? From your statement at the beginning of your question, I had assumed you're trying to output a real-valued number between 0-1.
I'm actually surprised that the code actually worked as it looks like you're adding only a single number to your labels list, but your model expects a list with length 2165 for each training example.