DeepFool attack on MLP provide bad results - security

I'm trying to attack an MLP with DeepFool, but plotting results I have some strange behavior.
First of all the MLP structure is as follows:
Dense(16, activation='relu', input_shape=(512,)) Dense(16, activation='relu') Dense(2, activation='softmax')
And it is trained with a standard approach passing a training set and validation set with labels, using following parameters:
training_params = { 'optimizer': 'adam', 'loss': 'sparse_categorical_crossentropy', 'metrics': ['accuracy'] }
I know that DeepFool can be used only deleting classification layers from the network, so i deleted the last Dense layer and create a KerasClassifier:
logit_model = tf.keras.Model(MLP.input, MLP.layers[-2].output) classifier = KerasClassifier(clip_values=(0, 8), model=logit_model)
N.B. clip_values=(0,8) because feature vectors assume values between 0 and 8.
When i try to attack the MLP with DeepFool tryinng different values of epsilon, it happens that the perturbation remain constant and also if i pass to the attack a max perturbation (epsilon) of 0.001, the adversarial sample perturbation is 0.7. I will show an example of code and the relative output below.
The attack
`
import matplotlib.pyplot as plt
import numpy as np
epsilon_list = [0, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100]
max_iter = 10
acc = []
pert = []
for eps in epsilon_list:
attack = DeepFool(classifier=classifier, epsilon=eps, max_iter=10, verbose=False)
test_samples_adv = attack.generate(X_test_copy)
loss_test, accuracy_test = MLP.evaluate(test_samples_adv, Y_test)
perturbation = np.mean(np.abs((test_samples_adv - X_test_copy)))
print('Accuracy on adversarial test data: {:4.2f}%'.format(accuracy_test * 100))
print('Average perturbation: {:4.2f}'.format(perturbation))
acc.append(accuracy_test)
pert.append(perturbation)
x = np.array(pert)
y = np.array(acc)
plotting_curves(x, y)
`
plotting_curves is just a function that plot a graph passing x and y.
These are the results:
(https://i.stack.imgur.com/WMGBp.png)
Can anybody explain me if it has sense and why?

Related

Why is eager execute required when using my custom loss function?

QUESTION: Why does my custom loss function require run_eagerly parameter to be set to True in the compile method in TensorFlow?
BACKGROUND: I have built a CNN with TensorFlow 2.8.1 to classify CIFAR-100 images using a custom loss function. The CIFAR dataset includes 32x32-pixel RGB images of 100 fine classes (e.g., bear, car) categorized into 20 coarse classes (e.g., large omnivore, vehicle). My custom loss function is a weighted sum of two other loss functions (see code below). The first component is the crossentropy loss for the fine label. The second component is the crossentropy loss for the coarse label. My hope is that this custom loss function will enforce accurate classification of the coarse label to get a more accurate classifications of the fine label (fingers crossed). The comparator will be crossentropy loss of just the fine label (the baseline model). Note that to derive the coarse (hierarchical) loss component, I had to map the y_true (true fine label, integer) and y_pred (predicted softmax probabilities for the fine labels, vector) to the y_true_coarse_int (true coarse label, integer) and y_pred_coarse_hot (predicted coarse label, one hot encoded vector), respectively. FineInts_to_CoarseInts is a python dictionary that allows this mapping.
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
def hierarchical_loss(y_true, y_pred):
y_true = tensorflow.cast(y_true, dtype=float)
y_true_reshaped = tensorflow.reshape(y_true, -1)
y_true_coarse_int = [FineInts_to_CoarseInts[K.eval(y_true_reshaped[i])] for i in range(y_true_reshaped.shape[0])]
y_true_coarse_int = tensorflow.cast(y_true_coarse_int, dtype=tensorflow.float32)
y_pred = tensorflow.cast(y_pred, dtype=float)
y_pred_int = tensorflow.argmax(y_pred, axis=1)
y_pred_coarse_int = [FineInts_to_CoarseInts[K.eval(y_pred_int[i])] for i in range(y_pred_int.shape[0])]
y_pred_coarse_int = tensorflow.cast(y_pred_coarse_int, dtype=tensorflow.float32)
y_pred_coarse_hot = to_categorical(y_pred_coarse_int, 20)
return SparseCategoricalCrossentropy()(y_true_coarse_int, y_pred_coarse_hot)
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
In the loss function I am using TensorFlow tensor operations or Keras backend operations, so I do not understand why I have to compile the model with run_eagerly=True (see below). The one potential problem I see is the use of K.eval but I've seen other code that used this and compiling with eager execution was not required. Therefore, I am assuming that this can't be the reasons.
# THIS CODE CELL IS TO COMPILE THE MODEL
model.compile(optimizer="adam", loss=custom_loss, metrics="accuracy", run_eagerly=True)
The full code is below:
# THIS CODE CELL LOADS THE PACKAGES USED IN THIS NOTEBOOK
# Load core packages for data analysis and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import sys
!{sys.executable} -m pip install pydot
!{sys.executable} -m pip install graphviz
# Load deep learning packages
import tensorflow
from tensorflow.keras.datasets.cifar100 import load_data
from tensorflow.keras import (Model, layers)
from tensorflow.keras.losses import SparseCategoricalCrossentropy
import tensorflow.keras.backend as K
from tensorflow.keras.utils import (to_categorical, plot_model)
from tensorflow.lookup import (StaticHashTable, KeyValueTensorInitializer)
# Load model evaluation packages
import sklearn
from sklearn.metrics import (confusion_matrix, classification_report)
# Print versions of main ML packages
print("Tensorflow version " + tensorflow.__version__)
print("Scikit learn version " + sklearn.__version__)
# THIS CODE CELL LOADS DATASETS AND CHECKS DATA DIMENSIONS
# There is an option to load the "fine" (100 fine classes) or "coarse" (20 super classes) labels with integer (int) encodings
# We will load both labels for hierarchical classification tasks
(x_train, y_train_fine_int), (x_test, y_test_fine_int) = load_data(label_mode="fine")
(_, y_train_coarse_int), (_, y_test_coarse_int) = load_data(label_mode="coarse")
# EXTRACT DATASET PARAMETERS FOR USE LATER ON
num_fine_classes = 100
num_coarse_classes = 20
input_shape = x_train.shape[1:]
# THIS CODE CELL PROVIDES THE CODE TO LINK INTEGER LABELS TO MEANINGFUL WORD LABELS
# Fine and coarse labels are provided as integers. We will want to link them both to meaningful world labels.
# CREATE A DICTIONARY TO MAP THE 20 COARSE LABELS TO THE 100 FINE LABELS
# This mapping comes from https://keras.io/api/datasets/cifar100/
# Except "computer keyboard" should just be "keyboard" for the encoding to work
CoarseLabels_to_FineLabels = {
"aquatic mammals": ["beaver", "dolphin", "otter", "seal", "whale"],
"fish": ["aquarium fish", "flatfish", "ray", "shark", "trout"],
"flowers": ["orchids", "poppies", "roses", "sunflowers", "tulips"],
"food containers": ["bottles", "bowls", "cans", "cups", "plates"],
"fruit and vegetables": ["apples", "mushrooms", "oranges", "pears", "sweet peppers"],
"household electrical devices": ["clock", "keyboard", "lamp", "telephone", "television"],
"household furniture": ["bed", "chair", "couch", "table", "wardrobe"],
"insects": ["bee", "beetle", "butterfly", "caterpillar", "cockroach"],
"large carnivores": ["bear", "leopard", "lion", "tiger", "wolf"],
"large man-made outdoor things": ["bridge", "castle", "house", "road", "skyscraper"],
"large natural outdoor scenes": ["cloud", "forest", "mountain", "plain", "sea"],
"large omnivores and herbivores": ["camel", "cattle", "chimpanzee", "elephant", "kangaroo"],
"medium-sized mammals": ["fox", "porcupine", "possum", "raccoon", "skunk"],
"non-insect invertebrates": ["crab", "lobster", "snail", "spider", "worm"],
"people": ["baby", "boy", "girl", "man", "woman"],
"reptiles": ["crocodile", "dinosaur", "lizard", "snake", "turtle"],
"small mammals": ["hamster", "mouse", "rabbit", "shrew", "squirrel"],
"trees": ["maple", "oak", "palm", "pine", "willow"],
"vehicles 1": ["bicycle", "bus", "motorcycle", "pickup" "truck", "train"],
"vehicles 2": ["lawn-mower", "rocket", "streetcar", "tank", "tractor"]
}
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED COARSE LABEL TO THE WORD LABEL
# Create list of Course Labels
CoarseLabels = list(CoarseLabels_to_FineLabels.keys())
# The target variable in CIFER100 is encoded such that the coarse class is assigned an integer based on its alphabetical order
# The CoarseLabels list is already alphabetized, so no need to sort
CoarseInts_to_CoarseLabels = dict(enumerate(CoarseLabels))
# CREATE A DICTIONARY TO MAP THE WORD LABEL TO THE INTEGER-ENCODED COARSE LABEL
CoarseLabels_to_CoarseInts = dict(zip(CoarseLabels, range(20)))
# CREATE A DICTIONARY TO MAP THE 100 FINE LABELS TO THE 20 COARSE LABELS
FineLabels_to_CoarseLabels = {}
for CoarseLabel in CoarseLabels:
for FineLabel in CoarseLabels_to_FineLabels[CoarseLabel]:
FineLabels_to_CoarseLabels[FineLabel] = CoarseLabel
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABEL TO THE WORD LABEL
# Create a list of the Fine Labels
FineLabels = list(FineLabels_to_CoarseLabels.keys())
# The target variable in CIFER100 is encoded such that the fine class is assigned an integer based on its alphabetical order
# Sort the fine class list.
FineLabels.sort()
FineInts_to_FineLabels = dict(enumerate(FineLabels))
# CREATE A DICTIONARY TO MAP THE INTEGER-ENCODED FINE LABELS TO THE INTEGER-ENCODED COARSE LABELS
b = list(dict(sorted(FineLabels_to_CoarseLabels.items())).values())
FineInts_to_CoarseInts = dict(zip(range(100), [CoarseLabels_to_CoarseInts[i] for i in b]))
#Tensor version of dictionary
#fine_to_coarse = tensorflow.constant(list((FineInts_to_CoarseInts).items()), dtype=tensorflow.int8)
# THIS CODE CELL IS TO BUILD A FUNCTIONAL MODEL
inputs = layers.Input(shape=input_shape)
x = layers.BatchNormalization()(inputs)
x = layers.Conv2D(64, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.Conv2D(1024, (3, 3), padding='same', activation="relu")(x)
x = layers.MaxPooling2D(pool_size=(2, 2))(x)
x = layers.Dropout(0.30)(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
x = layers.Dense(512, activation = "relu")(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.30)(x)
output_fine = layers.Dense(num_fine_classes, activation="softmax", name="output_fine")(x)
model = Model(inputs=inputs, outputs=output_fine)
# THIS CODE CELL IS TO DEFINE A CUSTOM LOSS FUNCTION
def crossentropy_loss(y_true, y_pred):
return SparseCategoricalCrossentropy()(y_true, y_pred)
def hierarchical_loss(y_true, y_pred):
y_true = tensorflow.cast(y_true, dtype=float)
y_true_reshaped = tensorflow.reshape(y_true, -1)
y_true_coarse_int = [FineInts_to_CoarseInts[K.eval(y_true_reshaped[i])] for i in range(y_true_reshaped.shape[0])]
y_true_coarse_int = tensorflow.cast(y_true_coarse_int, dtype=tensorflow.float32)
y_pred = tensorflow.cast(y_pred, dtype=float)
y_pred_int = tensorflow.argmax(y_pred, axis=1)
y_pred_coarse_int = [FineInts_to_CoarseInts[K.eval(y_pred_int[i])] for i in range(y_pred_int.shape[0])]
y_pred_coarse_int = tensorflow.cast(y_pred_coarse_int, dtype=tensorflow.float32)
y_pred_coarse_hot = to_categorical(y_pred_coarse_int, 20)
return SparseCategoricalCrossentropy()(y_true_coarse_int, y_pred_coarse_hot)
def custom_loss(y_true, y_pred):
H = 0.5
total_loss = (1 - H) * crossentropy_loss(y_true, y_pred) + H * hierarchical_loss(y_true, y_pred)
return total_loss
# THIS CODE CELL IS TO COMPILE THE MODEL
model.compile(optimizer="adam", loss=crossentropy_loss, metrics="accuracy", run_eagerly=False)
# THIS CODE CELL IS TO TRAIN THE MODEL
history = model.fit(x_train, y_train_fine_int, epochs=200, validation_split=0.25, batch_size=100)
# THIS CODE CELL IS TO VISUALIZE THE TRAINING
history_frame = pd.DataFrame(history.history)
history_frame.to_csv("history.csv")
history_frame.loc[:, ["accuracy", "val_accuracy"]].plot()
history_frame.loc[:, ["loss", "val_loss"]].plot()
plt.show()
# THIS CODE CELL IS TO EVALUATE THE MODEL ON AN INDEPENDENT DATASET
score = model.evaluate(x_test, y_test_fine_int, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Calculate gradient of validation error w.r.t inputs using Keras/Tensorflow or autograd

I need to calculate the gradient of the validation error w.r.t inputs x. I'm trying to see how much the validation error changes when I perturb one of the training samples.
The validation error (E) explicitly depends on the model weights (W).
The model weights explicitly depend on the inputs (x and y).
Therefore, the validation error implicitly depends on the inputs.
I'm trying to calculate the gradient of E w.r.t x directly.
An alternative approach would be to calculate the gradient of E w.r.t W (can easily be calculated) and the gradient of W w.r.t x (cannot do at the moment), which would allow the gradient of E w.r.t x to be calculated.
I have attached a toy example. Thanks in advance!
import numpy as np
import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
from autograd import grad
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()
# Normalize the images.
train_images = (train_images / 255) - 0.5
test_images = (test_images / 255) - 0.5
# Flatten the images.
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))
# Build the model.
model = Sequential([
Dense(64, activation='relu', input_shape=(784,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax'),
])
# Compile the model.
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
# Train the model.
model.fit(
train_images,
to_categorical(train_labels),
epochs=5,
batch_size=32,
)
model.save_weights('model.h5')
# Load the model's saved weights.
# model.load_weights('model.h5')
calculate_mse = tf.keras.losses.MeanSquaredError()
test_x = test_images[:5]
test_y = to_categorical(test_labels)[:5]
train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]
train_y = tf.convert_to_tensor(train_y, np.float32)
train_x = tf.convert_to_tensor(train_x, np.float64)
with tf.GradientTape() as tape:
tape.watch(train_x)
model.fit(train_x, train_y, epochs=1, verbose=0)
valid_y_hat = model(test_x, training=False)
mse = calculate_mse(test_y, valid_y_hat)
de_dx = tape.gradient(mse, train_x)
print(de_dx)
# approach 2 - does not run
def calculate_validation_mse(x):
model.fit(x, train_y, epochs=1, verbose=0)
valid_y_hat = model(test_x, training=False)
mse = calculate_mse(test_y, valid_y_hat)
return mse
train_x = train_images[:1]
train_y = to_categorical(train_labels)[:1]
validation_gradient = grad(calculate_validation_mse)
de_dx = validation_gradient(train_x)
print(de_dx)
Here's how you can do this. Derivation is as below.
Few things to note,
I have reduced the feature size from 784 to 256 as I was running out of memory in colab (line marked in the code) . Might have to do some mem profiling to find out why
Only computed grads for the first layer. Easily extendable to other layers
Disclaimer: this derivation is correct to best of my knowledge. Please do some research and verify that it is the case. You will run into memory issues for larger inputs and layer sizes.
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
f = 256
model = Sequential([
Dense(64, activation='relu', input_shape=(f,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax'),
])
# Compile the model.
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'],
)
w = model.weights[0]
# Inputs and labels
x_tr = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_tr = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_tr_onehot = tf.keras.utils.to_categorical(y_tr, num_classes=10).astype('float32')
x_v = tf.Variable(np.random.normal(size=(1,f)), shape=(1, f), dtype='float32')
y_v = np.random.choice([0,1,2,3,4,5,6,7,8,9], size=(1,1))
y_v_onehot = tf.keras.utils.to_categorical(y_v, num_classes=10).astype('float32')
# In the context of GradientTape
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
y_tr_pred = model(x_tr)
tr_loss = tf.keras.losses.MeanSquaredError()(y_tr_onehot, y_tr_pred)
tmp_g = tape2.gradient(tr_loss, w)
print(tmp_g.shape)
# d(dE_tr/d(theta))/dx
# Warning this step consumes lot of memory for large layers
lr = 0.001
grads_1 = -lr * tape1.jacobian(tmp_g, x_tr)
with tf.GradientTape() as tape3:
y_v_pred = model(x_v)
v_loss = tf.keras.losses.MeanSquaredError()(y_v_onehot, y_v_pred)
# dE_val/d(theta)
grads_2 = tape3.gradient(v_loss, w)[tf.newaxis, :]
# Just crunching the dimension to get the final desired shape of (1,256)
grad = tf.matmul(tf.reshape(grads_2,[1, -1]), tf.reshape(tf.transpose(grads_1,[2,1,0,3]),[1, -1, 256]))

How to create an autoencoder where each layer of encoder should represent the same as a layer of the decoder

I want to build an autoencoder where each layer in the encoder has the same meaning as a correspondent layer in the decoder. So if the autoencoder is perfectly trained, the values of those layers should be roughly the same.
So lets say the autoencoder consists of e1 -> e2 -> e3 -> d2 -> d1, whereas e1 is the input and d1 the output. A normal autoencoder trains to have the same result in d1 as e1, but I want the additional constraint, that e2 and d2 are the same. Therefore I want an additional backpropagation path which leads from d2 to e2 and trains at the same time as the normal path from d1 to e1. (d stands for decoder, e for encoder).
I tried to use the error between e2 and d2 as a regularization term with the CustomRegularization layer from the first answer of this link https://github.com/keras-team/keras/issues/5563. I also use this for the error between e1 and d1 instead of the normal path.
The following code is written such that more than 1 intermediate layer can be handled and also uses 4 layers.
In the out commented code is a normal autoencoder which only propagates from start to end.
from keras.layers import Dense
import numpy as np
from keras.datasets import mnist
from keras.models import Model
from keras.engine.topology import Layer
from keras import objectives
from keras.layers import Input
import keras
import matplotlib.pyplot as plt
#A layer which can be given as an output to force a regularization term between two layers
class CustomRegularization(Layer):
def __init__(self, **kwargs):
super(CustomRegularization, self).__init__(**kwargs)
def call(self, x, mask=None):
ld=x[0]
rd=x[1]
bce = objectives.binary_crossentropy(ld, rd)
loss2 = keras.backend.sum(bce)
self.add_loss(loss2, x)
return bce
def get_output_shape_for(self, input_shape):
return (input_shape[0][0],1)
def zero_loss(y_true, y_pred):
return keras.backend.zeros_like(y_pred)
#Create regularization layer between two corresponding layers of encoder and decoder
def buildUpDownRegularization(layerNo, input, up_layers, down_layers):
for i in range(0, layerNo):
input = up_layers[i](input)
start = input
for i in range(layerNo, len(up_layers)):
input = up_layers[i](input)
for j in range(0, len(down_layers) - layerNo):
input = down_layers[j](input)
end = input
cr = CustomRegularization()([start, end])
return cr
# Define shape of the network, layers, some hyperparameters and training data
sizes = [784, 400, 200, 100, 50]
up_layers = []
down_layers = []
for i in range(1, len(sizes)):
layer = Dense(units=sizes[i], activation='sigmoid', input_dim=sizes[i-1])
up_layers.append(layer)
for i in range(len(sizes)-2, -1, -1):
layer = Dense(units=sizes[i], activation='sigmoid', input_dim=sizes[i+1])
down_layers.append(layer)
batch_size = 128
num_classes = 10
epochs = 100
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
x_train = x_train.reshape([x_train.shape[0], 28*28])
x_test = x_test.reshape([x_test.shape[0], 28*28])
y_train = x_train
y_test = x_test
optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
"""
### Normal autoencoder like in base mnist example
model = keras.models.Sequential()
for layer in up_layers:
model.add(layer)
for layer in down_layers:
model.add(layer)
model.compile(optimizer=optimizer, loss=keras.backend.binary_crossentropy)
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)
score = model.evaluate(x_test, y_test, verbose=0)
#print('Test loss:', score[0])
#print('Test accuracy:', score[1])
decoded_imgs = model.predict(x_test)
n = 10 # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
"""
### My autoencoder where each subpart is also an autoencoder
#This part is only because the model needs a path from start to end, contentwise this should do nothing
output = input = Input(shape=(sizes[0],))
for i in range(0, len(up_layers)):
output = up_layers[i](output)
for i in range(0, len(down_layers)):
output = down_layers[i](output)
crs = [output]
losses = [zero_loss]
#Build the regularization layer
for i in range(len(up_layers)):
crs.append(buildUpDownRegularization(i, input, up_layers, down_layers))
losses.append(zero_loss)
#Create and train model with adapted training data
network = Model([input], crs)
optimizer = keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
network.compile(loss=losses, optimizer=optimizer)
dummy_train = np.zeros([y_train.shape[0], 1])
dummy_test = np.zeros([y_test.shape[0], 1])
training_data = [y_train]
test_data = [y_test]
for i in range(len(network.outputs)-1):
training_data.append(dummy_train)
test_data.append(dummy_test)
network.fit(x_train, training_data, batch_size=batch_size, epochs=epochs,verbose=1, validation_data=(x_test, test_data))
score = network.evaluate(x_test, test_data, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
decoded_imgs = network.predict(x_test)
n = 10 # how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[0][i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
If you run the code as is it will show, that the reproduction ability is no longer given in my code.
I expect a similar behavior to the uncommented code, which shows a normal autoencoder.
Edit: As mentioned in the answers this works well with MSE instead of crossentropy and a lr of .01. 100 epochs with that setting produce really good results.
Edit 2: I would like that the backpropagation works as in this [image] (https://imgur.com/OOo757x). So the backpropagation of the loss of a certain layer stops at the corresponding layer. I think I didn't make this clear before and I don't know if the code currently does that.
Edit 3: Although this code runs and returns a good looking solution the CustomRegularization layer is not doing what I thought it would do, therefore it does not do the same things as in the description.
It seems like the main issue is the use of binary cross-entropy to minimize the difference between encoder and decoder. The internal representation in the network is not going to be a single class probability like the output might be if you were classifying MNIST digits. I was able to get your network to output some reasonable-looking reconstructions with these simple changes:
Using objectives.mean_squared_error instead of objectives.binary_crossentropy in the CustomRegularization class
Changing number of epochs to 5
Changing learning rate to .01
Changes 2 and 3 were simply made to speed up the testing. Change 1 is the key here. Cross entropy is designed for problems where there is a binary "ground truth" variable and an estimate of that variable. However, you do not have a binary truth value in the middle of your network, only at the output layer. Thus a cross entropy loss function in the middle of the network doesn't make much sense (at least to me) -- it will be trying to measure entropy for a variable that isn't binary. Mean squared error, on the other hand, is a bit more generic and should work for this case since you are simply minimizing the difference between two real values. In essence, the middle of the network is performing regression (difference between activations in two continuous values, i.e. layers), not classification, so it needs a loss function that is appropriate for regression.
I also want to suggest that there may be a better approach to accomplish what you want. If you really want the encoder and decoder to be exactly the same, you can share weights between them. Then they will be identical, not just highly similar, and your model will have fewer parameters to train. There is a decent explanation of shared (tied) weights autoencoders with Keras here if you're curious.
Reading your code it does seem like it is doing what you want in your illustration, but I am not really sure how to verify that.

Deep learning Keras model CTC_Loss gives loss = infinity

i've a CRNN model for text recognition, it was published on Github, trained on english language,
Now i'm doing the same thing using this algorithm but for arabic.
My ctc function is:
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
y_pred = y_pred[:, 2:, :]
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
My Model is:
def get_Model(training):
img_w = 128
img_h = 64
# Network parameters
conv_filters = 16
kernel_size = (3, 3)
pool_size = 2
time_dense_size = 32
rnn_size = 128
if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
# Initialising the CNN
act = 'relu'
input_data = Input(name='the_input', shape=input_shape, dtype='float32')
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv1')(input_data)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)
inner = Conv2D(conv_filters, kernel_size, padding='same',
activation=act, kernel_initializer='he_normal',
name='conv2')(inner)
inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)
conv_to_rnn_dims = (img_w // (pool_size ** 2), (img_h // (pool_size ** 2)) * conv_filters)
inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)
# cuts down input size going into RNN:
inner = Dense(time_dense_size, activation=act, name='dense1')(inner)
# Two layers of bidirectional GRUs
# GRU seems to work as well, if not better than LSTM:
gru_1 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru1')(inner)
gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru1_b')(inner)
gru1_merged = add([gru_1, gru_1b])
gru_2 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru2')(gru1_merged)
gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru2_b')(gru1_merged)
# transforms RNN output to character activations:
inner = Dense(num_classes+1, kernel_initializer='he_normal',
name='dense2')(concatenate([gru_2, gru_2b]))
y_pred = Activation('softmax', name='softmax')(inner)
Model(inputs=input_data, outputs=y_pred).summary()
labels = Input(name='the_labels', shape=[30], dtype='float32')
input_length = Input(name='input_length', shape=[1], dtype='int64')
label_length = Input(name='label_length', shape=[1], dtype='int64')
# Keras doesn't currently support loss funcs with extra parameters
# so CTC loss is implemented in a lambda layer
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])
# clipnorm seems to speeds up convergence
# the loss calc occurs elsewhere, so use a dummy lambda func for the loss
if training:
return Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)
return Model(inputs=[input_data], outputs=y_pred)
Then i compile it with SGD optimizer (Tried SGD,adam)
sgd = SGD(lr=0.0000002, decay=1e-6, momentum=0.9, nesterov=True, clipnorm=5)
model.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer=sgd)
Then i fit the model with my training set (Images of words up to 30 characters) into (sequence of labels of 30
model.fit_generator(generator=tiger_train.next_batch(),
steps_per_epoch=int(tiger_train.n / batch_size),
epochs=30,
callbacks=[checkpoint],
validation_data=tiger_val.next_batch(),
validation_steps=int(tiger_val.n / val_batch_size))
Once it starts, it give me loss = inf, after many searches, i didn't find any similar problem.
So my questions is, how can i solve this, what can make a ctc_loss compute an infinite cost?
Thanks in advance
I found the problem, it was dimensions problem,
For R-CNN OCR using CTC layer, if you are detecting a sequence with length n, you should have an image with at least a width of (2*n-1). The more the better till you reach the best image/timesteps ratio to let the CTC layer able to recognize the letter correctly. If image with is less than (2*n-1), it will give a nan loss.
This error is happened when image text have two equal characters in the same sequence e.g happen --> pp. for so that you can remove data that has this characteristic.

About 'Building Autoencoders in Keras'?

I read the Building Autoencoders in Keras, the url is https://blog.keras.io/building-autoencoders-in-keras.html
In the section of Adding a sparsity constraint on the encoded representations, I have tried according to his description, but the loss can't down to 0.11, instead around 0.26.
So the result is fuzzy:
Can anyone who has done this experiment tell me what's wrong with it?
It's my code:
from keras.layers import Input, Dense
from keras.models import Model
from keras import regularizers
encoding_dim = 32 # 压缩后维度
input_img = Input(shape = (784,))
# 编码
encoded = Dense(encoding_dim, activation = 'relu',
activity_regularizer = regularizers.l1(1e-4)
)(input_img)
# 解码
decoded = Dense(784, activation = 'sigmoid')(encoded)
# 创建自动编码器
autoencoder = Model(input_img, decoded)
# 编码器
encoder = Model(input_img, encoded)
encoded_input = Input(shape = (encoding_dim,))
# 最后一层全连接层作为解码器
decoder_layer = autoencoder.layers[-1]
# 解码器
decoder = Model(encoded_input, decoder_layer(encoded_input))
# 编译模块
autoencoder.compile(optimizer = 'adadelta', loss = 'binary_crossentropy')
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape(x_train.shape[0], np.prod(x_train.shape[1:]))
x_test = x_test.reshape(x_test.shape[0], np.prod(x_test.shape[1:]))
autoencoder.fit(x_train, x_train,
epochs = 100,
batch_size = 256,
shuffle = True,
validation_data = (x_test, x_test))
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
import matplotlib.pyplot as plt
n = 10
plt.figure(figsize = (20, 4))
for i in range(10):
# 原图
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.set_axis_off()
# 解码后的图
ax = plt.subplot(2, n, n + i + 1)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.set_axis_off()
plt.savefig('simpleSparse.png')
from keras import backend as K
K.clear_session()
I copied your code verbatim, and reproduced the error you got.
Solution: reduce the batch size from 256, to 16. You'll noice a huge difference in output even after 10 epochs of training.
Explanation: What is probably going on is that even though there is a reduction in your training loss, you are averaging the gradient over too many examples, to a point that the step you are taking in the direction of the gradient is cancelling itself out in some higher dimensional space and your learning algorithm is being tricked into thinking its converged to a local minima, when in reality it can't decided where to go. This last part explains why it seems that all the outputs you get look blurry and exactly the same.
Update: Reduce batch size to 4, and you'll get near perfect reconstruction even after 10 epochs.
You need to change the batch size in your code.
autoencoder.fit(x_train, x_train,epochs = 10,batch_size = 16,shuffle = True,validation_data = (x_test, x_test))
known bug. need to set regularizer to 10e-7
activity_regularizer=regularizers.activity_l1(10e-7))(input_img)
after 50 epochs, val_loss: 0.1424
https://github.com/keras-team/keras/issues/5414
If you drop your batch size it takes much longer

Resources