Why is eager execute required when using my custom loss function? - keras

QUESTION: Why does my custom loss function require run_eagerly parameter to be set to True in the compile method in TensorFlow?
BACKGROUND: I have built a CNN with TensorFlow 2.8.1 to classify CIFAR-100 images using a custom loss function. The CIFAR dataset includes 32x32-pixel RGB images of 100 fine classes (e.g., bear, car) categorized into 20 coarse classes (e.g., large omnivore, vehicle). My custom loss function is a weighted sum of two other loss functions (see code below). The first component is the crossentropy loss for the fine label. The second component is the crossentropy loss for the coarse label. My hope is that this custom loss function will enforce accurate classification of the coarse label to get a more accurate classifications of the fine label (fingers crossed). The comparator will be crossentropy loss of just the fine label (the baseline model). Note that to derive the coarse (hierarchical) loss component, I had to map the y_true (true fine label, integer) and y_pred (predicted softmax probabilities for the fine labels, vector) to the y_true_coarse_int (true coarse label, integer) and y_pred_coarse_hot (predicted coarse label, one hot encoded vector), respectively. FineInts_to_CoarseInts is a python dictionary that allows this mapping.
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
def hierarchical_loss(y_true, y_pred):
y_true = tensorflow.cast(y_true, dtype=float)
y_true_reshaped = tensorflow.reshape(y_true, -1)
y_true_coarse_int = [FineInts_to_CoarseInts[K.eval(y_true_reshaped[i])] for i in range(y_true_reshaped.shape[0])]
y_true_coarse_int = tensorflow.cast(y_true_coarse_int, dtype=tensorflow.float32)
y_pred = tensorflow.cast(y_pred, dtype=float)
y_pred_int = tensorflow.argmax(y_pred, axis=1)
y_pred_coarse_int = [FineInts_to_CoarseInts[K.eval(y_pred_int[i])] for i in range(y_pred_int.shape[0])]
y_pred_coarse_int = tensorflow.cast(y_pred_coarse_int, dtype=tensorflow.float32)
y_pred_coarse_hot = to_categorical(y_pred_coarse_int, 20)
return SparseCategoricalCrossentropy()(y_true_coarse_int, y_pred_coarse_hot)
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
In the loss function I am using TensorFlow tensor operations or Keras backend operations, so I do not understand why I have to compile the model with run_eagerly=True (see below). The one potential problem I see is the use of K.eval but I've seen other code that used this and compiling with eager execution was not required. Therefore, I am assuming that this can't be the reasons.
# THIS CODE CELL IS TO COMPILE THE MODEL
model.compile(optimizer="adam", loss=custom_loss, metrics="accuracy", run_eagerly=True)
The full code is below:
# THIS CODE CELL LOADS THE PACKAGES USED IN THIS NOTEBOOK
# Load core packages for data analysis and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import sys
!{sys.executable} -m pip install pydot
!{sys.executable} -m pip install graphviz
# Load deep learning packages
import tensorflow
from tensorflow.keras.datasets.cifar100 import load_data
from tensorflow.keras import (Model, layers)
from tensorflow.keras.losses import SparseCategoricalCrossentropy
import tensorflow.keras.backend as K
from tensorflow.keras.utils import (to_categorical, plot_model)
from tensorflow.lookup import (StaticHashTable, KeyValueTensorInitializer)
# Load model evaluation packages
import sklearn
from sklearn.metrics import (confusion_matrix, classification_report)
# Print versions of main ML packages
print("Tensorflow version " + tensorflow.__version__)
print("Scikit learn version " + sklearn.__version__)
# THIS CODE CELL LOADS DATASETS AND CHECKS DATA DIMENSIONS
# There is an option to load the "fine" (100 fine classes) or "coarse" (20 super classes) labels with integer (int) encodings
# We will load both labels for hierarchical classification tasks
(x_train, y_train_fine_int), (x_test, y_test_fine_int) = load_data(label_mode="fine")
(_, y_train_coarse_int), (_, y_test_coarse_int) = load_data(label_mode="coarse")
# EXTRACT DATASET PARAMETERS FOR USE LATER ON
num_fine_classes = 100
num_coarse_classes = 20
input_shape = x_train.shape[1:]
# THIS CODE CELL PROVIDES THE CODE TO LINK INTEGER LABELS TO MEANINGFUL WORD LABELS
# Fine and coarse labels are provided as integers. We will want to link them both to meaningful world labels.
# CREATE A DICTIONARY TO MAP THE 20 COARSE LABELS TO THE 100 FINE LABELS
# This mapping comes from https://keras.io/api/datasets/cifar100/
# Except "computer keyboard" should just be "keyboard" for the encoding to work
CoarseLabels_to_FineLabels = {
"aquatic mammals": ["beaver", "dolphin", "otter", "seal", "whale"],
"fish": ["aquarium fish", "flatfish", "ray", "shark", "trout"],
"flowers": ["orchids", "poppies", "roses", "sunflowers", "tulips"],
"food containers": ["bottles", "bowls", "cans", "cups", "plates"],
"fruit and vegetables": ["apples", "mushrooms", "oranges", "pears", "sweet peppers"],
"household electrical devices": ["clock", "keyboard", "lamp", "telephone", "television"],
"household furniture": ["bed", "chair", "couch", "table", "wardrobe"],
"insects": ["bee", "beetle", "butterfly", "caterpillar", "cockroach"],
"large carnivores": ["bear", "leopard", "lion", "tiger", "wolf"],
"large man-made outdoor things": ["bridge", "castle", "house", "road", "skyscraper"],
"large natural outdoor scenes": ["cloud", "forest", "mountain", "plain", "sea"],
"large omnivores and herbivores": ["camel", "cattle", "chimpanzee", "elephant", "kangaroo"],
"medium-sized mammals": ["fox", "porcupine", "possum", "raccoon", "skunk"],
"non-insect invertebrates": ["crab", "lobster", "snail", "spider", "worm"],
"people": ["baby", "boy", "girl", "man", "woman"],
"reptiles": ["crocodile", "dinosaur", "lizard", "snake", "turtle"],
"small mammals": ["hamster", "mouse", "rabbit", "shrew", "squirrel"],
"trees": ["maple", "oak", "palm", "pine", "willow"],
"vehicles 1": ["bicycle", "bus", "motorcycle", "pickup" "truck", "train"],
"vehicles 2": ["lawn-mower", "rocket", "streetcar", "tank", "tractor"]
}
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED COARSE LABEL TO THE WORD LABEL
# Create list of Course Labels
CoarseLabels = list(CoarseLabels_to_FineLabels.keys())
# The target variable in CIFER100 is encoded such that the coarse class is assigned an integer based on its alphabetical order
# The CoarseLabels list is already alphabetized, so no need to sort
CoarseInts_to_CoarseLabels = dict(enumerate(CoarseLabels))
# CREATE A DICTIONARY TO MAP THE WORD LABEL TO THE INTEGER-ENCODED COARSE LABEL
CoarseLabels_to_CoarseInts = dict(zip(CoarseLabels, range(20)))
# CREATE A DICTIONARY TO MAP THE 100 FINE LABELS TO THE 20 COARSE LABELS
FineLabels_to_CoarseLabels = {}
for CoarseLabel in CoarseLabels:
for FineLabel in CoarseLabels_to_FineLabels[CoarseLabel]:
FineLabels_to_CoarseLabels[FineLabel] = CoarseLabel
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABEL TO THE WORD LABEL
# Create a list of the Fine Labels
FineLabels = list(FineLabels_to_CoarseLabels.keys())
# The target variable in CIFER100 is encoded such that the fine class is assigned an integer based on its alphabetical order
# Sort the fine class list.
FineLabels.sort()
FineInts_to_FineLabels = dict(enumerate(FineLabels))
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABELS TO THE INTEGER-ENCODED COARSE LABELS
b = list(dict(sorted(FineLabels_to_CoarseLabels.items())).values())
FineInts_to_CoarseInts = dict(zip(range(100), [CoarseLabels_to_CoarseInts[i] for i in b]))
#Tensor version of dictionary
#fine_to_coarse = tensorflow.constant(list((FineInts_to_CoarseInts).items()), dtype=tensorflow.int8)
# THIS CODE CELL IS TO BUILD A FUNCTIONAL MODEL
inputs = layers.Input(shape=input_shape)
x = layers.BatchNormalization()(inputs)
x = layers.Conv2D(64, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(1024, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
x = layers.Dense(512, activation = "relu")(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
output_fine = layers.Dense(num_fine_classes, activation="softmax", name="output_fine")(x)
model = Model(inputs=inputs, outputs=output_fine)
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
def hierarchical_loss(y_true, y_pred):
y_true = tensorflow.cast(y_true, dtype=float)
y_true_reshaped = tensorflow.reshape(y_true, -1)
y_true_coarse_int = [FineInts_to_CoarseInts[K.eval(y_true_reshaped[i])] for i in range(y_true_reshaped.shape[0])]
y_true_coarse_int = tensorflow.cast(y_true_coarse_int, dtype=tensorflow.float32)
y_pred = tensorflow.cast(y_pred, dtype=float)
y_pred_int = tensorflow.argmax(y_pred, axis=1)
y_pred_coarse_int = [FineInts_to_CoarseInts[K.eval(y_pred_int[i])] for i in range(y_pred_int.shape[0])]
y_pred_coarse_int = tensorflow.cast(y_pred_coarse_int, dtype=tensorflow.float32)
y_pred_coarse_hot = to_categorical(y_pred_coarse_int, 20)
return SparseCategoricalCrossentropy()(y_true_coarse_int, y_pred_coarse_hot)
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
# THIS CODE CELL IS TO COMPILE THE MODEL
model.compile(optimizer="adam", loss=crossentropy_loss, metrics="accuracy", run_eagerly=False)
# THIS CODE CELL IS TO TRAIN THE MODEL
history = model.fit(x_train, y_train_fine_int, epochs=200, validation_split=0.25, batch_size=100)
# THIS CODE CELL IS TO VISUALIZE THE TRAINING
history_frame = pd.DataFrame(history.history)
history_frame.to_csv("history.csv")
history_frame.loc[:, ["accuracy", "val_accuracy"]].plot()
history_frame.loc[:, ["loss", "val_loss"]].plot()
plt.show()
# THIS CODE CELL IS TO EVALUATE THE MODEL ON AN INDEPENDENT DATASET
score = model.evaluate(x_test, y_test_fine_int, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Related

TensorFlow error on custom loss function: ValueError: No gradients provided for any variable

QUESTION: What is the cause of this error and how do I fix it?
BACKGROUND: I am attempting to implement a custom ("hierarchical") loss function to classify CIFAR-100 images that leverages the class hierarchy. This dataset has 20 coarse classes, each with 5 fine classes. The custom loss function is a weighted sum of the fine class crossentropy loss and the coarse class crossentropy loss. It determines the coarse class crossentropy loss by first mapping the true fine labels (y_true) to the true coarse labels (y_true_coarse) and the predicted fine labels as softmax probabilities (y_pred) to the predicted coarse labels as softmax probabilities (y_pred_coarse). The mapping is done with a TensorFlow "dictionary". The fine class crossentropy loss works just fine by itself. The problem is the coarse class crossentropy loss.
When I implement the code in a training loop I get ValueError: No gradients provided for any variable. Below is the code for my custom loss function.
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
# First, map the true fine labels to the true coarse labels
def get_y_true_coarse(y_true):
y_true = tf.constant(y_true, dtype=tf.int32)
y_true_coarse = table.lookup(y_true)
return y_true_coarse
# Next, map the predicted fine class to the predicted coarse class (softmax probabilities)
initialize = tf.zeros(shape=(batch_size, num_coarse_classes), dtype=tf.float32)
y_pred_coarse = tf.Variable(initialize, dtype=tf.float32)
def get_y_pred_coarse(y_pred):
for i in range(batch_size):
for j in range(num_coarse_classes):
idx = table.lookup(tf.range(100)) == j
total = tf.reduce_sum(y_pred[i][idx])
y_pred_coarse[i, j].assign(total)
return y_pred_coarse
# Use the true coarse label and predicted coarse label (softmax probabilities) to derive the crossentropy loss of coarse labels
def hierarchical_loss(y_true, y_pred):
y_true_coarse = get_y_true_coarse(y_true)
y_pred_coarse = get_y_pred_coarse(y_pred)
return SparseCategoricalCrossentropy()(y_true_coarse, y_pred_coarse)
# Use the true fine label and predicted finel label (softmax probabilities) to derive the crossentropy loss of fine labels
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
# Finally, combine the coarse class and fine class crossentropy losses
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
I am passing the argument run_eagerly=True to the model.compile method before executing the model.fit method.
INVESTIGATIONS CONDUCTED:
I have reviewed the tensorflow graphs and tf.function introduction and stackoverflow/stackexchange pages. It seems that differentiability of the loss function the most commonly cited cause for this error in most articles (see article1 and article2), but my loss function is merely a weighted sum of two different crossentropy loss functions and, therefore, should be differentiable. I am using Python 3.9.7, TensorFlow 2.9.1, and VS Code 1.7.1 on 64bit Windows 10 machine.
NOTE: The fine class crossentropy loss (crossentropy_loss) works just fine by itself when I pass it to the model.fit method. The problem is the coarse class crossentropy loss (hierarchical_loss). Therefore, to better isolate the problem, I am passing the latter function to the model.fit method, not the custom_loss. I will mention that I have also tried passing a custom_loss when H=1, and this results in no "learning" (i.e., the accuracy remains at ~1% - the expected/naïve accuracy - from epoch to epoch). 1% accuracy is what one would get from randomly guessing one of the 100 (balanced) classes. If the custom_loss was working, we would expect at least some learning to occur because learning the coarse labels perfectly would result in a ~20% accuracy given that there are 5 fine labels within each coarse category.
FULL CODE
# %%
# THIS CODE CELL LOADS THE PACKAGES USED IN THIS NOTEBOOK
# Load core packages for data analysis and visualization
import pandas as pd
import matplotlib.pyplot as plt
# Load deep learning packages
import tensorflow as tf
from tensorflow.keras.datasets.cifar100 import load_data
from tensorflow.keras import (Model, layers)
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.utils import (to_categorical, plot_model)
from tensorflow.lookup import (StaticHashTable, KeyValueTensorInitializer)
# Print versions of main ML packages
print("Tensorflow version " + tf.__version__)
# %%
# THIS CODE CELL LOADS DATASETS AND CHECKS DATA DIMENSIONS
# There is an option to load the "fine" (100 fine classes) or "coarse" (20 super classes) labels with integer (int) encodings
# We will load both labels for hierarchical classification tasks
(x_train, y_train_fine_int), (x_test, y_test_fine_int) = load_data(label_mode="fine")
(_, y_train_coarse_int), (_, y_test_coarse_int) = load_data(label_mode="coarse")
# EXTRACT DATASET PARAMETERS FOR USE LATER ON
num_fine_classes = 100
num_coarse_classes = 20
input_shape = x_train.shape[1:]
# DEFINE BATCH SIZE
batch_size = 50
# %%
# THIS CODE CELL PROVIDES THE CODE TO LINK INTEGER LABELS TO MEANINGFUL WORD LABELS
# Fine and coarse labels are provided as integers. We will want to link them both to meaningful world labels.
# CREATE A DICTIONARY TO MAP THE 20 COARSE LABELS TO THE 100 FINE LABELS
# This mapping comes from https://keras.io/api/datasets/cifar100/
# Except "computer keyboard" should just be "keyboard" for the encoding to work
CoarseLabels_to_FineLabels = {
"aquatic mammals": ["beaver", "dolphin", "otter", "seal", "whale"],
"fish": ["aquarium fish", "flatfish", "ray", "shark", "trout"],
"flowers": ["orchids", "poppies", "roses", "sunflowers", "tulips"],
"food containers": ["bottles", "bowls", "cans", "cups", "plates"],
"fruit and vegetables": ["apples", "mushrooms", "oranges", "pears", "sweet peppers"],
"household electrical devices": ["clock", "keyboard", "lamp", "telephone", "television"],
"household furniture": ["bed", "chair", "couch", "table", "wardrobe"],
"insects": ["bee", "beetle", "butterfly", "caterpillar", "cockroach"],
"large carnivores": ["bear", "leopard", "lion", "tiger", "wolf"],
"large man-made outdoor things": ["bridge", "castle", "house", "road", "skyscraper"],
"large natural outdoor scenes": ["cloud", "forest", "mountain", "plain", "sea"],
"large omnivores and herbivores": ["camel", "cattle", "chimpanzee", "elephant", "kangaroo"],
"medium-sized mammals": ["fox", "porcupine", "possum", "raccoon", "skunk"],
"non-insect invertebrates": ["crab", "lobster", "snail", "spider", "worm"],
"people": ["baby", "boy", "girl", "man", "woman"],
"reptiles": ["crocodile", "dinosaur", "lizard", "snake", "turtle"],
"small mammals": ["hamster", "mouse", "rabbit", "shrew", "squirrel"],
"trees": ["maple", "oak", "palm", "pine", "willow"],
"vehicles 1": ["bicycle", "bus", "motorcycle", "pickup" "truck", "train"],
"vehicles 2": ["lawn-mower", "rocket", "streetcar", "tank", "tractor"]
}
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED COARSE LABEL TO THE WORD LABEL
# Create list of Course Labels
CoarseLabels = list(CoarseLabels_to_FineLabels.keys())
# The target variable in CIFER100 is encoded such that the coarse class is assigned an integer based on its alphabetical order
# The CoarseLabels list is already alphabetized, so no need to sort
CoarseInts_to_CoarseLabels = dict(enumerate(CoarseLabels))
# CREATE A DICTIONARY TO MAP THE WORD LABEL TO THE INTEGER-ENCODED COARSE LABEL
CoarseLabels_to_CoarseInts = dict(zip(CoarseLabels, range(20)))
# CREATE A DICTIONARY TO MAP THE 100 FINE LABELS TO THE 20 COARSE LABELS
FineLabels_to_CoarseLabels = {}
for CoarseLabel in CoarseLabels:
for FineLabel in CoarseLabels_to_FineLabels[CoarseLabel]:
FineLabels_to_CoarseLabels[FineLabel] = CoarseLabel
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABEL TO THE WORD LABEL
# Create a list of the Fine Labels
FineLabels = list(FineLabels_to_CoarseLabels.keys())
# The target variable in CIFER100 is encoded such that the fine class is assigned an integer based on its alphabetical order
# Sort the fine class list.
FineLabels.sort()
FineInts_to_FineLabels = dict(enumerate(FineLabels))
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABELS TO THE INTEGER-ENCODED COARSE LABELS
b = list(dict(sorted(FineLabels_to_CoarseLabels.items())).values())
FineInts_to_CoarseInts = dict(zip(range(100), [CoarseLabels_to_CoarseInts[i] for i in b]))
# CREATE A TENSORFLOW LOOKUP TABLE TO MAP THE INTEGER-ENCODED FINE LABELS TO THE INTEGER-ENCODED COARSE LABELS
table = StaticHashTable(
initializer=KeyValueTensorInitializer(
keys=list(FineInts_to_CoarseInts.keys()),
values=list(FineInts_to_CoarseInts.values()),
key_dtype=tf.int32,
value_dtype=tf.int32
),
default_value=tf.constant(-1, tf.int32),
name="dictionary"
)
# %%
# THIS CODE CELL IS TO BUILD A FUNCTIONAL MODEL
inputs = layers.Input(shape=input_shape)
x = layers.BatchNormalization()(inputs)
x = layers.Conv2D(64, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(1024, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
x = layers.Dense(512, activation = "relu")(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
output_fine = layers.Dense(num_fine_classes, activation="softmax", name="output_fine")(x)
model = Model(inputs=inputs, outputs=output_fine)
# %%
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
# First, map the true fine labels to the true coarse labels
def get_y_true_coarse(y_true):
y_true = tf.constant(y_true, dtype=tf.int32)
y_true_coarse = table.lookup(y_true)
return y_true_coarse
# Next, map the predicted fine class to the predicted coarse class (softmax probabilities)
initialize = tf.zeros(shape=(batch_size, num_coarse_classes), dtype=tf.float32)
y_pred_coarse = tf.Variable(initialize, dtype=tf.float32)
def get_y_pred_coarse(y_pred):
for i in range(batch_size):
for j in range(num_coarse_classes):
idx = table.lookup(tf.range(100)) == j
total = tf.reduce_sum(y_pred[i][idx])
y_pred_coarse[i, j].assign(total)
return y_pred_coarse
# Use the true coarse label and predicted coarse label (softmax probabilities) to derive the crossentropy loss of coarse labels
def hierarchical_loss(y_true, y_pred):
y_true_coarse = get_y_true_coarse(y_true)
y_pred_coarse = get_y_pred_coarse(y_pred)
return SparseCategoricalCrossentropy()(y_true_coarse, y_pred_coarse)
# Use the true fine label and predicted finel label (softmax probabilities) to derive the crossentropy loss of fine labels
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
# Finally, combine the coarse class and fine class crossentropy losses
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
# %%
# THIS CODE CELL IS TO COMPILE THE MODEL
model.compile(optimizer="adam", loss=hierarchical_loss, metrics="accuracy", run_eagerly=True)
# %%
# THIS CODE CELL IS TO TRAIN THE MODEL
history = model.fit(x_train, y_train_fine_int, epochs=20, validation_split=0.25, batch_size=batch_size)
# %%
# THIS CODE CELL IS TO VISUALIZE THE TRAINING
history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ["accuracy", "val_accuracy"]].plot()
history_frame.loc[:, ["loss", "val_loss"]].plot()
plt.show()
To do some debugging, I printed the shape and data types of the variables and ensured that the functions are working properly under eager execution (tf.config.run_functions_eagerly(True)). What looks fishy is the data type of y_true_coarse and y_pred_coarse, as they have different datatypes than y_true and y_pred, respectively. Nonetheless, the loss functions seem to be generating the correct output on the 50 training examples.
DEBUGGING
# Get shape and datatypes of downloaded training data
print("shape of y_train_fine_int: ", y_train_fine_int.shape)
print("values of y_train_fine_int: ", y_train_fine_int.dtype)
print("type of y_train_fine_int: ", type(y_train_fine_int))
print("\n")
print("shape of x_train: ", x_train.shape)
print("values of x_train: ", x_train.dtype)
print("type of x_train: ", type(x_train))
> shape of y_train_fine_int: (50000, 1)
> values of y_train_fine_int: int32
> type of y_train_fine_int: <class 'numpy.ndarray'>
>
>
> shape of x_train: (50000, 32, 32, 3)
> values of x_train: uint8
> type of x_train: <class 'numpy.ndarray'>
Subset (first 50 - i.e., the batch_size) the true training data labels and generate predictions (softmax probabilities) on that subset
y_true = y_train_fine_int[0:batch_size]
y_pred = model.predict(x_train[0:batch_size])
# Get shape and datatypes of the subset and predictions
print("shape of y_true: ", y_true.shape)
print("values of y_true: ", y_true.dtype)
print("type of y_true: ", type(y_true))
print("\n")
print("shape of y_pred: ", y_pred.shape)
print("values of y_pred: ", y_pred.dtype)
print("type of y_pred: ", type(y_pred), "\n")
> 2/2 [==============================] - 0s 45ms/step
> shape of y_true: (50, 1)
> values of y_true: int32
> type of y_true: <class 'numpy.ndarray'>
>
>
> shape of y_pred: (50, 100)
> values of y_pred: float32
> type of y_pred: <class 'numpy.ndarray'>
Use mapping functions to derive the true and predicted coarse labels (softmax proabilities)
y_true_coarse = get_y_true_coarse(y_true)
y_pred_coarse = get_y_pred_coarse(y_pred)
# Get shape and datatypes of coarse true labels and predictions (softmax probabilities)
print("shape of y_true_coarse: ", y_true_coarse.shape)
print("values of y_true_coarse: ", y_true_coarse.dtype)
print("type of y_true_coarse: ", type(y_true_coarse))
print("\n")
print("shape of y_pred_coarse: ", y_pred_coarse.shape)
print("values of y_pred_coarse: ", y_pred_coarse.dtype)
print("type of y_pred_coarse: ", type(y_pred_coarse))
> shape of y_true_coarse: (50, 1)
> values of y_true_coarse: <dtype: 'int32'>
> type of y_true_coarse: <class 'tensorflow.python.framework.ops.EagerTensor'>
>
> shape of y_pred_coarse: (50, 20)
> values of y_pred_coarse: <dtype: 'float32'>
> type of y_pred_coarse: <class 'tensorflow.python.ops.resource_variable_ops.ResourceVariable'>
DETERMINE WHETHER LOSS FUNCTIONS ARE CORRECTLY WORKING
print("fine loss with function: ", crossentropy_loss(y_true, y_pred))
print("fine loss manually: ", SparseCategoricalCrossentropy()(y_true, y_pred), "\n")
print("coarse loss with function: ", hierarchical_loss(y_true, y_pred))
print("coarse loss manually: ", SparseCategoricalCrossentropy()(y_true_coarse, y_pred_coarse), "\n")
H = 0.5
print("total loss with function: ", custom_loss(y_true, y_pred))
print("total loss manually: ", (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred))
> fine loss with function: tf.Tensor(5.1430206, shape=(), dtype=float32)
> fine loss manually: tf.Tensor(5.1430206, shape=(), dtype=float32)
>
> coarse loss with function: tf.Tensor(3.1151817, shape=(), dtype=float32)
> coarse loss manually: tf.Tensor(3.1151817, shape=(), dtype=float32)
>
> total loss with function: tf.Tensor(4.1291013, shape=(), dtype=float32)
> total loss manually: tf.Tensor(4.1291013, shape=(), dtype=float32)
Might just be a typo on your part. When you compile your model, I think you accidentally used hierarchical_loss instead of custom_loss?
On your question on why does the coarse accuracy hovers around 1%: I think TensorFlow calculates accuracy by comparing inputs, i.e., your predicted fine label with your true fine label, so even if you convert them both to coarse for your loss function, TensorFlow doesn't know that. Hence even if your model "correctly" predicted a squirrel as a hamster (both are small mammals), TensorFlow will count that as an inaccurate prediction.
I think you're correct in assuming that the "looking up dictionary" operation is indifferentiable. I think you could use one-hot coding to do this instead, so TensorFlow can backpropagate:
idx = sorted(FineInts_to_CoarseInts.values())
one_hot_table = np.zeros((len(idx), max(idx) + 1))
one_hot_table[np.arange(len(one_hot_table)), idx] = 1
one_hot_table = tf.convert_to_tensor(one_hot_table)
def hierarchical_loss(y_true, y_pred):
# Convert to one hot:
y_true = tf.reshape(np.eye(100)[y_true], [50,100])
y_true_coarse = y_true # one_hot_table
y_pred_coarse = tf.cast(y_pred, tf.double) # one_hot_table
return CategoricalCrossentropy()(y_true_coarse, y_pred_coarse)
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * tf.cast(crossentropy_loss(y_true, y_pred), tf.double) + H * hierarchical_loss(y_true, y_pred)
return total_loss
There are quite sophisticated implementations of hierarchical classification online.
I replaced the table lookup operation in the post with a matrix operation (inspired by #Duc Nguyen), as the former was likely not differentiable and the source of the error. I did one-hot-encode my all labels with the to_categorical function in tensorflow upfront (not shown), which was necessary for the revised loss function below to work. Importantly, this loss function does not require eager execution.
THIS CELL CREATES A MATRIX TO MAP THE ONE-HOT-ENCODED FINE LABELS TO THE ONE-HOT-ENCODED COARSE LABELS
Matrix_Fine_to_Coarse_OneHot = np.zeros(shape=[num_fine_classes, num_coarse_classes], dtype=np.int32)
idx = list(range(num_fine_classes)), list(FineInts_to_CoarseInts.values())
Matrix_Fine_to_Coarse_OneHot[idx] = 1
Matrix_Fine_to_Coarse_OneHot = tf.constant(Matrix_Fine_to_Coarse_OneHot, dtype=tf.float32)
THIS CELL DEFINES THE CUSTOM LOSS FUNCTION
#tf.function
def crossentropy_loss(y_true, y_pred):
return CategoricalCrossentropy()(y_true, y_pred)
#tf.function
def hierarchical_loss(y_true, y_pred):
y_true_coarse = tf.matmul(y_true, Matrix_Fine_to_Coarse_OneHot)
y_pred_coarse = tf.matmul(y_pred, Matrix_Fine_to_Coarse_OneHot)
return CategoricalCrossentropy()(y_true_coarse, y_pred_coarse)
#tf.function
def custom_loss(y_true, y_pred):
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss

How to use the input gradients as variables within a custom loss function in Keras?

I am using the input gradient as feature important and want to compare the feature importance of a train datapoint with the human annotated feature importance. I would like to make this comparison differentiable such that it can be learned through backpropagation. For that, I am writing a custom loss function that in addition to the regular loss (e.g. m.s.e. on the prediction vs true labels) also checks whether the input gradient is correct (e.g. m.s.e. of the input gradient vs the human annotated feature importance).
With the following code I am able to get the input gradient:
from keras import backend as K
import numpy as np
from keras.models import Model
from keras.layers import Input, Dense
def normalize(x):
# utility function to normalize a tensor by its L2 norm
return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)
# Amount of training samples
N = 1000
input_dim = 10
# Generate training set make the 1st and 2nd feature same as the target feature
X = np.random.standard_normal(size=(N, input_dim))
y = np.random.randint(low=0, high=2, size=(N, 1))
X[:, 1] = y[:, 0]
X[:, 2] = y[:, 0]
# Create simple model
inputs = Input(shape=(input_dim,))
x = Dense(10, name="dense1")(inputs)
output = Dense(1, activation='sigmoid')(x)
model = Model(input=[inputs], output=output)
# Compile and fit model
model.compile(optimizer='adam', loss="mse", metrics=['accuracy'])
model.fit([X], y, epochs=100, batch_size=64)
# Get function to get input gradients
gradients = K.gradients(model.output, model.input)[0]
gradient_function = K.function([model.input], [normalize(gradients)])
# Get input gradient values of the training-set
grads_val = gradient_function([X])[0]
print(grads_val[:2])
This prints the following (you can see that the 1st and the 2nd features have the highest importance):
[[ 1.2629046e-02 2.2765596e+00 2.1479919e+00 2.1558853e-02
4.5277486e-03 2.9851785e-03 9.5279224e-04 -1.0903150e-02
-1.2230731e-02 2.1960819e-02]
[ 1.1318034e-02 2.0402350e+00 1.9250139e+00 1.9320872e-02
4.0577268e-03 2.6752844e-03 8.5390132e-04 -9.7713526e-03
-1.0961102e-02 1.9681118e-02]]
How can I write a custom loss function in which the input gradients are differentiable?
I started with the following loss function.
from keras.losses import mean_squared_error
def custom_loss():
# human annotated feature importance
# Let's say that it says to only look at the second feature
human_feature_importance = []
for i in range(N):
human_feature_importance.append([0,0,1,0,0,0,0,0,0,0])
def loss(y_true, y_pred):
# Get regular loss
regular_loss_value = mean_squared_error(y_true, y_pred)
# Somehow get the input gradient of each training sample as a tensor
# It should be differential w.r.t. all of the weights
gradients = ??
feature_importance_loss_value = mean_squared_error(gradients, human_feature_importance)
# Combine the both losses
return regular_loss_value + feature_importance_loss_value
return loss
I also found an implementation in tensorflow to make the input gradient differentialble: https://github.com/dtak/rrr/blob/master/rrr/tensorflow_perceptron.py#L18

How to deal with triplet loss when at time of input i have only two files i.e. at time of testing

I am implementing a siamese network in which i know how to calculate triplet loss by picking anchor, positive and negative by dividing input in three parts(which is a handcrafted feature vector) and then calculating it at time of training.
anchor_output = ... # shape [None, 128]
positive_output = ... # shape [None, 128]
negative_output = ... # shape [None, 128]
d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)
d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)
loss = tf.maximum(0., margin + d_pos - d_neg)
loss = tf.reduce_mean(loss)
But the problem is when at time of testing i would be having only two files positive and negative then how i would deal with(triplets, as i need one more anchor file but my app only take one picture and compare with in database so only two files in this case), I searched a lot but nobody provided code to deal with this problem only there was code to implement triplet loss but not for whole scenario.
AND I DONT WANT TO USE CONTRASTIVE LOSS
Colab notebook with test code on CIFAR 10:
https://colab.research.google.com/drive/1VgOTzr_VZNHkXh2z9IiTAcEgg5qr19y0
The general idea:
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
img_width = 128
img_height = 128
img_colors = 3
margin = 1.0
VECTOR_SIZE = 32
def triplet_loss(y_true, y_pred):
""" y_true is a dummy value that should be ignored
Uses the inverse of the cosine similarity as a loss.
"""
anchor_vec = y_pred[:, :VECTOR_SIZE]
positive_vec = y_pred[:, VECTOR_SIZE:2*VECTOR_SIZE]
negative_vec = y_pred[:, 2*VECTOR_SIZE:]
d1 = keras.losses.cosine_proximity(anchor_vec, positive_vec)
d2 = keras.losses.cosine_proximity(anchor_vec, negative_vec)
return K.clip(d2 - d1 + margin, 0, None)
def make_image_model():
""" Build a convolutional model that generates a vector
"""
inp = Input(shape=(img_width, img_height, img_colors))
l1 = Conv2D(8, (2, 2))(inp)
l1 = MaxPooling2D()(l1)
l2 = Conv2D(16, (2, 2))(l1)
l2 = MaxPooling2D()(l2)
l3 = Conv2D(16, (2, 2))(l2)
l3 = MaxPooling2D()(l3)
conv_out = Flatten()(l3)
out = Dense(VECTOR_SIZE)(conv_out)
model = Model(inp, out)
return model
def make_siamese_model(img_model):
""" Siamese model input are 3 images base, positive, negative
output is a dummy variable that is ignored for the purposes of loss
calculation.
"""
anchor = Input(shape=(img_width, img_height, img_colors))
positive = Input(shape=(img_width, img_height, img_colors))
negative = Input(shape=(img_width, img_height, img_colors))
anchor_vec = img_model(anchor)
positive_vec = img_model(positive)
negative_vec = img_model(negative)
vecs = Concatenate(axis=1)([anchor_vec, positive_vec, negative_vec])
model = Model([anchor, positive, negative], vecs)
model.compile('adam', triplet_loss)
return model
img_model = make_image_model()
train_model = make_siamese_model(img_model)
img_model.summary()
train_model.summary()
###
train_model.fit(X, dummy_y, ...)
img_model.save('image_model.h5')
###
# In order to use the model
vec_base = img_model.predict(base_image)
vec_test = img_model.predict(test_image)
compare cosine similarity of vec_base and vec_test in order to determine whether base and test are within the acceptable criteria.

Input size (depth of inputs) must be accessible via shape inference, but saw value None error whaen trying to set tf.expand_dims axis to 0

I am trying to use 20 news groups data set available in sklearn to train a LSTM to do incremental learning (classification). I used the sklearn's TfidfVectorizer to pre-process the data. Then I turned the resulting sparse matrix into a numpy array before feeding it. After that when coding the below line:
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs_, initial_state=initial_state)
It gave an error saying that the 'inputs_' should have 3 dimensions. so I used:
inputs_ = tf.expand_dims(inputs_, 0)
To expand the dimension. But when I do that i get the error:
ValueError: Input size (depth of inputs) must be accessible via shape
inference, but saw value None.
The shape of 'input_' is:
(1, 134410)
I already went through this post, but it did not help.
I cannot seem to understand how to solve this issue. Any help is much appreciated. Thank you in advance!
show below is my complete code:
import os
from collections import Counter
import tensorflow as tf
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.datasets import fetch_20newsgroups
import matplotlib as mplt
from matplotlib import cm
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from sklearn.metrics import f1_score, recall_score, precision_score
from string import punctuation
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
def pre_process():
newsgroups_data = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
vectorizer = TfidfVectorizer()
features = vectorizer.fit_transform(newsgroups_data.data)
lb = LabelBinarizer()
labels = np.reshape(newsgroups_data.target, [-1])
labels = lb.fit_transform(labels)
return features, labels
def get_batches(x, y, batch_size=1):
for ii in range(0, len(y), batch_size):
yield x[ii:ii + batch_size], y[ii:ii + batch_size]
def plot_error(errorplot, datapoint, numberOfWrongPreds):
errorplot.set_xdata(np.append(errorplot.get_xdata(), datapoint))
errorplot.set_ydata(np.append(errorplot.get_ydata(), numberOfWrongPreds))
errorplot.autoscale(enable=True, axis='both', tight=None)
plt.draw()
def train_test():
features, labels = pre_process()
#Defining Hyperparameters
epochs = 1
lstm_layers = 1
batch_size = 1
lstm_size = 30
learning_rate = 0.003
print(lstm_size)
print(batch_size)
print(epochs)
#--------------placeholders-------------------------------------
# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
tf.set_random_seed(1)
inputs_ = tf.placeholder(tf.float32, [None,None], name = "inputs")
# labels_ = tf.placeholder(dtype= tf.int32)
labels_ = tf.placeholder(tf.int32, [None,None], name = "labels")
#getting dynamic batch size according to the input tensor size
# dynamic_batch_size = tf.shape(inputs_)[0]
#output_keep_prob is the dropout added to the RNN's outputs, the dropout will have no effect on the calculation of the subsequent states.
keep_prob = tf.placeholder(tf.float32, name = "keep_prob")
# Your basic LSTM cell
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Add dropout to the cell
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
#Stack up multiple LSTM layers, for deep learning
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
# Getting an initial state of all zeros
initial_state = cell.zero_state(batch_size, tf.float32)
inputs_ = tf.expand_dims(inputs_, 0)
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs_, initial_state=initial_state)
#hidden layer
hidden = tf.layers.dense(outputs[:, -1], units=25, activation=tf.nn.relu)
logit = tf.contrib.layers.fully_connected(hidden, 1, activation_fn=None)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logit, labels=labels_))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
saver = tf.train.Saver()
# ----------------------------online training-----------------------------------------
with tf.Session(graph=graph) as sess:
tf.set_random_seed(1)
sess.run(tf.global_variables_initializer())
iteration = 1
state = sess.run(initial_state)
wrongPred = 0
errorplot, = plt.plot([], [])
for ii, (x, y) in enumerate(get_batches(features, labels, batch_size), 1):
feed = {inputs_: x.toarray(),
labels_: y,
keep_prob: 0.5,
initial_state: state}
predictions = tf.round(tf.nn.softmax(logit)).eval(feed_dict=feed)
print("----------------------------------------------------------")
print("Iteration: {}".format(iteration))
print("Prediction: ", predictions)
print("Actual: ",y)
pred = np.array(predictions)
print(pred)
print(y)
if not ((pred==y).all()):
wrongPred += 1
if ii % 27 == 0:
plot_error(errorplot,ii,wrongPred)
loss, states, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)
print("Train loss: {:.3f}".format(loss))
iteration += 1
saver.save(sess, "checkpoints/sentiment.ckpt")
errorRate = wrongPred/len(labels)
print("ERROR RATE: ", errorRate )
if __name__ == '__main__':
train_test()
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.
This error is given because you don't specify the size nor the amount of inputs.
I got the script working like this:
inputs_ = tf.placeholder(tf.float32, [1,None], name = "inputs")
inputs_withextradim = tf.expand_dims(inputs_, 2)
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs_withextradim, initial_state=initial_state)

How do I obtain predictions and probabilities from new data input to a CNN in Tensorflow

I'll preface this by saying this is my first posted question on SO. I've just recently started working with Tensorflow, and have been attempting to apply a convolutional-neural network model approach for classification of .csv records in a file representing images from scans of microarray data. (FYI: Microarrays are a grid of spotted DNA on a glass slide, representing specific DNA target sequences for determining the presence of those DNA targets in a sample. The individual pixels represent fluorescence intensity value from 0-1). The file has ~200,000 records in total. Each record (image) has 10816 pixels that represent DNA sequences from known viruses, and one index label which identifies the virus species. The pixels create a pattern which is unique to each of the different viruses. There are 2165 different viruses in total represented within the 200,000 records. I have trained the network on images of labeled microarray datasets, but when I try to pass a new dataset through to classify it/them as one of the 2165 different viruses and determine predicted values and probabilities, I don't seem to be having much luck. This is the code that I am currently using for this:
import tensorflow as tf
import numpy as np
import csv
def extract_data(filename):
print("extracting data...")
NUM_LABELS = 2165
NUM_FEATURES = 10816
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
labels.append(row[3])#(int(row[7])) #<<<IT ALWAYS PREDICTS THIS VALUE!
for x in row [4:10820]:
fvecs.append(float(x))
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
def TestModels():
fvecs_np, labels_onehot = extract_data("MicroarrayTestData.csv")
print('RESTORING NN MODEL')
weights = {}
biases = {}
sess=tf.Session()
init = tf.global_variables_initializer()
#Load meta graph and restore weights
ModelID = "MicroarrayCNN_Data-1000.meta"
print("RESTORING:::", ModelID)
saver = tf.train.import_meta_graph(ModelID)
saver.restore(sess,tf.train.latest_checkpoint('./'))
graph = tf.get_default_graph()
x = graph.get_tensor_by_name("x:0")
y = graph.get_tensor_by_name("y:0")
keep_prob = tf.placeholder(tf.float32)
y_ = tf.placeholder("float", shape=[None, 2165])
wc1 = graph.get_tensor_by_name("wc1:0")
wc2 = graph.get_tensor_by_name("wc2:0")
wd1 = graph.get_tensor_by_name("wd1:0")
Wout = graph.get_tensor_by_name("Wout:0")
bc1 = graph.get_tensor_by_name("bc1:0")
bc2 = graph.get_tensor_by_name("bc2:0")
bd1 = graph.get_tensor_by_name("bd1:0")
Bout = graph.get_tensor_by_name("Bout:0")
weights = {wc1, wc2, wd1, Wout}
biases = {bc1, bc2, bd1, Bout}
print("NEXTArgmax")
prediction=tf.argmax(y,1)
probabilities = y
predY = prediction.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
probY = probabilities.eval(feed_dict={x: fvecs_np, y: labels_onehot}, session=sess)
accuracy = tf.reduce_mean(tf.cast(prediction, "float"))
print(sess.run(accuracy, feed_dict={x: fvecs_np, y: labels_onehot}))
print("%%%%%%%%%%%%%%%%%%%%%%%%%%")
print("Predicted::: ", predY, accuracy)
print("%%%%%%%%%%%%%%%%%%%%%%%%%%")
feed_dictTEST = {y: labels_onehot}
probabilities=probY
print("probabilities", probabilities.eval(feed_dict={x: fvecs_np}, session=sess))
########## Run Analysis ###########
TestModels()
So, when I run this code I get the correct prediction for the test set, although I am not sure I believe it, because it appears that whatever value I append in line 14 (see below) is the output it predicts:
labels.append(row[3])#<<<IT ALWAYS PREDICTS THIS VALUE!
I don't understand this, and it makes me suspicious that I've set up the CNN incorrectly, as I would have expected it to ignore my input label and determine a bast match from the trained network based on the trained patterns. The only thing I can figure is that when I pass the value through for the prediction; it is instead training the model on this data as well, and then predicting itself. Is this a correct assumption, or am I misinterpreting how Tensorflow works?
The other issue is that when I try to use code that (based on other tutorials) which is supposed to output the probabilities of all of the 2165 possible outputs, I get the error:
InvalidArgumentError (see above for traceback): Shape [-1,2165] has negative dimensions
[[Node: y = Placeholder[dtype=DT_FLOAT, shape=[?,2165], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
To me, it looks like it is the correct layer based on the 2165 value in the Tensor shape, but I don't understand the -1 value. So, to wrap up the summary, my questions are:
Based on the fact that I get the value that I have in the label of the input data, is this the correct method to make a classification using this model?
Am I missing a layer or have I configured the model incorrectly in order to extract the probabilities of all of the possible output classes, or am I using the wrong code to extract the information? I try to print out the accuracy to see if that would work, but instead it outputs the description of a tensor, so clearly that is incorrect as well.
(ADDITIONAL INFORMATION)
As requested, I'm also including the original code that was used to train the model, which is now below. You can see I do sort of a piece meal training of a limited number of related records at a time by their taxonomic relationships as I iterate through the file. This is mostly because the Mac that I'm training on (Mac Pro w/ 64GB ram) tends to give me the "Killed -9" error due to overuse of resources if I don't do it this way. There may be a better way to do it, but this seems to work.
Original Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
from __future__ import print_function
import tensorflow as tf
import numpy as np
import csv
import random
# Parameters
num_epochs = 2
train_size = 1609
learning_rate = 0.001 #(larger >speed, lower >accuracy)
training_iters = 5000 # How much do you want to train (more = better trained)
batch_size = 32 #How many samples to train on, size of the training batch
display_step = 10 # How often to diplay what is going on during training
# Network Parameters
n_input = 10816 # MNIST data input (img shape: 28*28)...in my case 104x104 = 10816(rough array size)
n_classes = 2165 #3280 #2307 #787# Switched to 100 taxa/training set, dynamic was too wonky.
dropout = 0.75 # Dropout, probability to keep units. Jeffery Hinton's group developed it, that prevents overfitting to find new paths. More generalized model.
# Functions
def extract_data(filename):
print("extracting data...")
# arrays to hold the labels and feature vectors.
NUM_LABELS = 2165
NUM_FEATURES = 10826
taxCount = 0
taxCurrent = 0
labels = []
fvecs = []
rowCount = 0
#iterate over the rows, split the label from the features
#convert the labels to integers and features to floats
print("entering CNN loop")
for line in open(filename):
rowCount = rowCount + 1
row = line.split(',')
taxCurrent = row[3]
print("profile:", row[0:12])
labels.append(int(row[3]))
fvecs.append([float(x) for x in row [4:10820]])
#convert the array of float arrasy into a numpy float matrix
fvecs_np = np.matrix(fvecs).astype(np.float32)
#convert the array of int lables inta a numpy array
labels_np = np.array(labels).astype(dtype=np.uint8)
#convert the int numpy array into a one-hot matrix
labels_onehot = (np.arange(NUM_LABELS) == labels_np[:, None]).astype(np.float32)
print("arrays converted")
return fvecs_np, labels_onehot
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1): #Layer 1 : Convolutional layer
# Conv2D wrapper, with bias and relu activation
print("conv2d")
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME') # Strides are the tensors...list of integers. Tensors=data
x = tf.nn.bias_add(x, b) #bias is the tuning knob
return tf.nn.relu(x) #rectified linear unit (activation function)
def maxpool2d(x, k=2): #Layer 2 : Takes samples from the image. (This is a 4D tensor)
print("maxpool2d")
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')
# Create model
def conv_net(x, weights, biases, dropout):
print("conv_net setup")
# Reshape input picture
x = tf.reshape(x, shape=[-1, 104, 104, 1]) #-->52x52 , -->26x26x64
# Convolution Layer
conv1 = conv2d(x, weights['wc1'], biases['bc1']) #defined above already
# Max Pooling (down-sampling)
conv1 = maxpool2d(conv1, k=2)
print(conv1.get_shape)
# Convolution Layer
conv2 = conv2d(conv1, weights['wc2'], biases['bc2']) #wc2 and bc2 are just placeholders...could actually skip this layer...maybe
# Max Pooling (down-sampling)
conv2 = maxpool2d(conv2, k=2)
print(conv2.get_shape)
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1) #activation function for the NN
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)
# Output, class prediction
out = tf.add(tf.matmul(fc1, weights['Wout']), biases['Bout'])
return out
def Train_Network(Txid_IN, Sess_File_Name):
import tensorflow as tf
tf.reset_default_graph()
x,y = 0,0
weights = {}
biases = {}
# tf Graph input
print("setting placeholders")
x = tf.placeholder(tf.float32, [None, n_input], name="x") #Gateway for data (images)
y = tf.placeholder(tf.float32, [None, n_classes], name="y") # Gateway for data (labels)
keep_prob = tf.placeholder(tf.float32) #dropout # Gateway for dropout(keep probability)
# Store layers weight & bias
#CREATE weights
weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32]),name="wc1"), #
# 5x5 conv, 32 inputs, 64 outputs
'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64]),name="wc2"),
# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([26*26*64, 1024]),name="wd1"),
# 1024 inputs, 10 outputs (class prediction)
'Wout': tf.Variable(tf.random_normal([1024, n_classes]),name="Wout")
}
biases = {
'bc1': tf.Variable(tf.random_normal([32]), name="bc1"),
'bc2': tf.Variable(tf.random_normal([64]), name="bc2"),
'bd1': tf.Variable(tf.random_normal([1024]), name="bd1"),
'Bout': tf.Variable(tf.random_normal([n_classes]), name="Bout")
}
# Construct model
print("constructing model")
pred = conv_net(x, weights, biases, keep_prob)
print(pred)
# Define loss(cost) and optimizer
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) Deprecated version of the statement
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = pred, labels=y)) #added reduce_mean 6/27
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
print("%%%%%%%%%%%%%%%%%%%%")
print ("%% ", correct_pred)
print ("%% ", accuracy)
print("%%%%%%%%%%%%%%%%%%%%")
# Initializing the variables
#init = tf.initialize_all_variables()
init = tf.global_variables_initializer()
saver = tf.train.Saver()
fvecs_np, labels_onehot = extract_data("MicroarrayDataOUT.csv") #CHAGE TO PICORNAVIRUS!!!!!AHHHHHH!!!
print("starting session")
# Launch the graph
FitStep = 0
with tf.Session() as sess: #graph is encapsulated by its session
sess.run(init)
step = 1
# Keep training until reach max iterations (training_iters)
while step * batch_size < training_iters:
if FitStep >= 5:
break
else:
#iterate and train
print(step)
print(fvecs_np, labels_onehot)
for step in range(num_epochs * train_size // batch_size):
sess.run(optimizer, feed_dict={x: fvecs_np, y: labels_onehot, keep_prob:dropout}) #no dropout???...added Keep_prob:dropout
if FitStep >= 5:
break
#else:
###batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
###sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
### keep_prob: dropout}) <<<<SOMETHING IS WRONG IN HERE?!!!
if step % display_step == 0:
# Calculate batch loss and accuracy
loss, acc = sess.run([cost, accuracy], feed_dict={x: fvecs_np,
y: labels_onehot,
keep_prob: 1.})
print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
"{:.6f}".format(np.mean(loss)) + ", Training Accuracy= " + \
"{:.5f}".format(acc))
TrainAcc = float("{:.5f}".format(acc))
#print("******", TrainAcc)
if TrainAcc >= .99: #Changed from .95 temporarily
print(FitStep)
FitStep = FitStep+1
saver.save(sess, Sess_File_Name, global_step=1000) #
print("Saved Session:", Sess_File_Name)
step += 1
print("Optimization Finished!")
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: fvecs_np[:256],
y: labels_onehot[:256],
keep_prob: 1.}))
#feed_dictTEST = {x: fvecs_np[50]}
#prediction=tf.argmax(y,1)
#print(prediction)
#best = sess.run([prediction],feed_dictTEST)
#print(best)
print("DONE")
sess.close()
def Tax_Iterator(CSV_inFile, CSV_outFile): #Deprecate
#Need to copy *.csv file to MySQL for sorting
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
TaxCount = 0
TaxThreshold = 2165
ThresholdStep = 2165
PrevTax = 0
linecounter = 0
#Open all GenBank profile list
for line in open(CSV_inFile):
linecounter = linecounter+1
print(linecounter)
resultFile = open(CSV_outFile,'a')
wr = csv.writer(resultFile, dialect='excel')
# Check for new TXID
row = line.split(',')
print(row[7], "===", PrevTax)
if row[7] != PrevTax:
print("X1")
TaxCount = TaxCount+1
PrevTax = row[7]
#Check it current Tax count is < or > threshold
# < threshold
print(TaxCount,"=+=", TaxThreshold)
if TaxCount<=3300:
print("X2")
CurrentTax= row[7]
CurrTxCount = CurrentTax
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
wr.writerow(row[0:-1])
# is > threshold
else:
print("X3")
# but same TXID....
print(row[7], "=-=", CurrentTax)
if row[7]==CurrentTax:
print("X4")
CurrentTax= row[7]
print("TaxCount=", TaxCount)
print( "Add to CSV")
print("row:", CurrentTax, "***", row[0:15])
wr.writerow(row[0:-1])
# but different TXID...
else:
print(row[7], "=*=", CurrentTax)
if row[7]>CurrentTax:
print("X5")
TaxThreshold=TaxThreshold+ThresholdStep
resultFile.close()
Sess_File_Name = "CNN_VirusIDvSPECIES_XXALL"+ str(TaxThreshold-ThresholdStep)
print("<<<< Start Training >>>>"
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
print("Training complete")
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
CurrentTax= row[7]
#reset tax count
CurrTxCount = 0
TaxCount = 0
resultFile.close()
Sess_File_Name = "MicroarrayCNN_Data"+ str(TaxThreshold+ThresholdStep)
print("<<<< Start Training >>>>")
print("Training on :: ", CurrTxCount, "Taxa", TaxCount, "data points.")
Train_Network(CurrTxCount, Sess_File_Name)
resultFileINIT = open(CSV_outFile,'w')
resultFileINIT.close()
CurrentTax= row[7]
Tax_Iterator("MicroarrayInput.csv", "MicroarrayOutput.csv")
You defined prediction as prediction=tf.argmax(y,1). And in both feed_dict, you feed labels_onehot for y. Consequently, your "prediction" is always equal to the labels.
As you didn't post the code you used to train your network, I can't tell you what exactly you need to change.
Edit: I have isses understanding the underlying problem you're trying to solve - based on your code, you're trying to train a neural network with 2165 different classes using 1609 training examples. How is this even possible? If each example had a different class, there would still be some classes without any training example. Or does one image belong to many classes? From your statement at the beginning of your question, I had assumed you're trying to output a real-valued number between 0-1.
I'm actually surprised that the code actually worked as it looks like you're adding only a single number to your labels list, but your model expects a list with length 2165 for each training example.

Resources