New to theano. Trying to add a term to a loss function to penalize negative weights - theano

To be clear, by weights I mean the entries in the matrices (Ws) of the affine transformation in a node of a neural net.
I start with categorical_crossentropy as my loss function. And I want to add an additional term to penalize negative weights.
To this end I want to introduce a term of the form
theano.tensor.sum(theano.tensor.exp(-10 * ws))
Where "ws" are the weights.
If I follow the source code of categorical_crossentropy:
if true_dist.ndim == coding_dist.ndim:
return -tensor.sum(true_dist *tensor.log(coding_dist), axis=coding_dist.ndim - 1)
elif true_dist.ndim == coding_dist.ndim - 1:
return crossentropy_categorical_1hot(coding_dist, true_dist)
else:
raise TypeError('rank mismatch between coding and true distributions')
Seems like I should update the third line (from the bottom) to read
crossentropy_categorical_1hot(coding_dist, true_dist) + theano.tensor.sum(theano.tensor.exp(- 10 * ws))
And change the declaration of the function to be
my_categorical_crossentropy(coding_dist, true_dist, ws) Where in calling for my_categorical_crossentropy I write
loss = my_categorical_crossentropy(net_output, true_output, l_layers[1].W)
with, for a start, l_layers[1].W to be the weights coming from the first layer of my neural net.
With those updates, I go on writing:
loss = aggregate(loss, mode = 'mean')
updates = sgd(loss, all_params, learning_rate = 0.005)
train = theano.function([l_input.input_var, true_output], loss, updates = updates)
[...]
This passes the compiler and everything runs smoothly, the training of the network completes. However, for some reason the additional term " theano.tensor.sum(theano.tensor.exp(- 10 * ws)) is ignored, it seems not to effect the loss value.
I was trying to look into Theano documentation, but so far I could not figure out what might be wrong? The weighs l_layers[1].W are shared variables, so I could not pass those as
train = theano.function([l_input.input_var, true_output, l_layers[1].W], loss, updates = updates)
Any comments are welcome. Thanks!
Solution
Though, I didn't find why what I did, didn't work, adding the penalty term outside the 'categorical_crossentropy' as suggested in the comments did solve the problem:
loss = aggregate(categorical_crossentropy(net_output, true_output) + theano.tensor.sum(theano.tensor.exp(- 10 * l_layers[1].W))

Related

Why do genetic algorithms converge to end up with a population that is identical?

I was implementing a genetic algorithm with tf keras, where i manualy modify the weight, make the gene cross over, all that. Ive found that after a few docen generations, the predictions of all the network are essentialy identical, and after a few more generations the predictions are exactly the same. trying to google the problem i found this page
that mentions the problem in a conceptual level but i cant understand how this would happen if im manualy creating genetic diverity every generation.
def model_mutate(weights,var):
for i in range(len(weights)):
for j in range(len(weights[i])):
if( random.uniform(0,1) < 0.2): #learing rate of 15%
change = np.random.uniform(-var,var,weights[i][j].shape)
weights[i][j] += change
return weights
def crossover_brains(parent1, parent2):
global brains
weight1 = parent1.get_weights()
weight2 = parent2.get_weights()
new_weight1 = weight1
new_weight2 = weight2
gene = random.randint(0,len(new_weight1)-1) #we change a random weight
#or set of weights
new_weight1[gene] = weight2[gene]
new_weight2[gene] = weight1[gene]
q=np.asarray([new_weight1,new_weight2],dtype=object)
return q
def evolve(best_fit1,best_fit2):
global generation
global best_brain
global best_brain2
mutations=[]
for i in range(total_brains//2):
cross_weights=model_crossover(best_fit1,best_fit2)
mutation1=model_mutate(cross_weights[0],0.5)
mutation2=model_mutate(cross_weights[1],0.5)
mutations.append(mutation1)
mutations.append(mutation2)
for i in range(total_brains):
brains[i].set_weights(mutations[i])
generation+=1
def find_best_fit():
fitness=np.loadtxt("fitness.txt")
print(f"fitness average {np.mean(fitness)} in generation {generation}")
print(f"fitness max is {np.max(fitness)} in generation {generation} ")
fitness_t.append(np.mean(fitness))
maxfit1=np.max(fitness)
best_fit1=np.where(fitness==maxfit1)[0]
fitness[best_fit1]=0
maxfit2=np.max(fitness)
best_fit2=np.where(fitness==maxfit2)[0]
if len(best_fit1)>1: #this is a band_aid for when several indiviuals are the same
# this would lead to best_fit(1,2) being an array of indeces
best_fit1=best_fit1[0]
if len(best_fit2)>1:
best_fit2=best_fit2[0]
return int(best_fit1),int(best_fit2)
bf1,bf2=find_best_fit()
evolve(bf1,bf2)
This is the code im using to set the modified weights to the existing keras models (mostly not mine, i dont understand it enough to have created this myself)
if keras is working how i think its working, then i dont see how this would converge to anything that does not maximize fitness, further more, it seems to be decreasing over time.

How to use Keras Conv2D layers with OpenAI gym?

Using OpenAI's gym environment, I've created my own environment in which the observation space of box type, and the shape is (21,21,1).
The intention is to use a keras Conv2D layer as the model's input. Ideally, the shape going into this model would be (None,21,21,1), with None representing the batch size. Kera's documentation is here: https://keras.io/api/layers/convolution_layers/convolution2d/
The issue I'm having is that an extra dimension is being required while checking the shaping. Because of this, the shape it expects is (None,1,21,21,1). This is prohibiting me from using MaxPooling layers in the model. After investigating the keras RL library, this is due to two functions that are adding this dimensionality.
The first function is found in memory.py, where a current observation is put into a list and returned as such. Here:
def get_recent_state(self, current_observation):
"""Return list of last observations
# Argument
current_observation (object): Last observation
# Returns
A list of the last observations
"""
# This code is slightly complicated by the fact that subsequent observations might be
# from different episodes. We ensure that an experience never spans multiple episodes.
# This is probably not that important in practice but it seems cleaner.
state = [current_observation]
idx = len(self.recent_observations) - 1
for offset in range(0, self.window_length - 1):
current_idx = idx - offset
current_terminal = self.recent_terminals[current_idx - 1] if current_idx - 1 >= 0 else False
if current_idx < 0 or (not self.ignore_episode_boundaries and current_terminal):
# The previously handled observation was terminal, don't add the current one.
# Otherwise we would leak into a different episode.
break
state.insert(0, self.recent_observations[current_idx])
while len(state) < self.window_length:
state.insert(0, zeroed_observation(state[0]))
return state
The second function is called just after and computes the Q values based on the recent observation. It creates a list of the state when passing onto "compute_batch_q_values".
def compute_q_values(self, state):
q_values = self.compute_batch_q_values([state]).flatten()
assert q_values.shape == (self.nb_actions,)
return q_values
I understand that one extra dimension should be added to represent the batch size, but is it twice? Can anyone explain why this is or how to use Conv2d layers with OpenAI gym?
Thanks.

Keras XOR not training

We have been trying for a while to get this to work. This is probably the easiest example to create and so now we need help. We've been changing the number of epochs in the fit function and that's giving us different results, but never anything good, and when we increase them too much they will always converge on 0.5.
#%%
inputValues = numpy.array([[0,0],[0,1],[1,0],[1,1]])
inputResults = numpy.array([[0],[1],[1],[0]])
print(inputValues)
print(inputResults)
#%%
model = keras.Sequential([
keras.layers.Flatten(input_shape=(2,)),
keras.layers.Dense(units=2, activation=("relu")),
keras.layers.Dense(units=2, activation=("softmax"))
])
model.compile(loss = keras.losses.SparseCategoricalCrossentropy(), optimizer = tensorflow.optimizers.Adam(), metrics=['accuracy'])
model.fit(inputValues, inputResults, epochs=2500)
model.summary()
print(model.weights)
#%%
print(model.predict_proba(inputValues))
print("End of file.")
From my understanding of ANN's we should have 2 inputs in the first layer, specifically for the XOR example. And two outputs for the output (either a 0, or a 1). I assume that since it is not required to say what these outputs are (0 or 1), tensor flow is dealing with this automatically by comparing the results in the fit function? Lastly, we have tried with both a hidden layer (of 2) and without and still don't seem to get any better results.
Could someone let us know what we have done wrong?
Your problem is essentially a binary classification problem, because the output can be either 0 or 1. For this you don't need two ouput neurons; one will do, with a sigmoid function that will return either a 0 or a 1 as output (sigmoid generally works well for binary classification, because its characteristic S-shape will get you values either close to 0 or close to 1).
Another adjustment you need to make is to set the loss function to binary crossentropy; your choice, sparse categorical crossentropy, is suitable for classifications into more than 2 categories.
So the code that I tried is:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(2,)),
keras.layers.Dense(units=4, activation=("sigmoid")),
keras.layers.Dense(units=1, activation=("sigmoid"))
])
model.compile(loss = keras.losses.BinaryCrossentropy(from_logits=False), optimizer = optimizers.Adam(), metrics=['accuracy'])
model.fit(inputValues, inputResults, epochs=2500)
With these settings I got training accuracy to 1.0000. It took a while to get there, and I suppose that could be sped up by playing around with the learning rate, but it should be enough to get the job done.

How to resolve KeyError: 'val_mean_absolute_error' Keras 2.3.1 and TensorFlow 2.0 From Chollet Deep Learning with Python

I am on section 3.7 of Chollet's book Deep Learning with Python.
The project is to find the median price of homes in a given Boston suburbs in the 1970's.
https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/3.7-predicting-house-prices.ipynb
At section "Validating our approach using K-fold validation" I try to run this block of code:
num_epochs = 500
all_mae_histories = []
for i in range(k):
print('processing fold #', i)
# Prepare the validation data: data from partition # k
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# Prepare the training data: data from all other partitions
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)
# Build the Keras model (already compiled)
model = build_model()
# Train the model (in silent mode, verbose=0)
history = model.fit(partial_train_data, partial_train_targets,
validation_data=(val_data, val_targets),
epochs=num_epochs, batch_size=1, verbose=0)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)
I get an error KeyError: 'val_mean_absolute_error'
mae_history = history.history['val_mean_absolute_error']
I am guessing the solution is figure out the correct parameter to replace val_mean_absolute_error. I've tried looking into some Keras documentation for what would be the correct key value. Anyone know the correct key value?
The problem in your code is that, when you compile your model, you do not add the specific 'mae' metric.
If you wanted to add the 'mae' metric in your code, you would need to do like this:
model.compile('sgd', metrics=[tf.keras.metrics.MeanAbsoluteError()])
model.compile('sgd', metrics=['mean_absolute_error'])
After this step, you can try to see if the correct name is val_mean_absolute_error or val_mae. Most likely, if you compile your model like I demonstrated in option 2, your code will work with "val_mean_absolute_error".
Also, you should also put the code snippet where you compile your model, it is missing in the question text from above(i.e. the build_model() function)
I replaced 'val_mean_absolute_error' with 'val_mae' and it worked for me
FYI, I had the same problem that persisted even after changing the line history.history['val_mae'] as described in the answer.
In my case, in order for the val_mae dict object to be present in history.history object, I needed to ensure that the model.fit() code included the 'validation_data = (val_data, val_targets)' argument. I neglected to do this initially.
I update it by below code line:
mae_history = history.history["mae"]
History object should contain the same names as what you compile.
For example:
mean_absolute_error gives val_mean_absolute_error
mae gives val_mae
accuracy gives val_accuracy
acc gives val_acc

4-layer neural network using relu activation function doesn't work well

I've been trying to make 4-layer neural network using relu activation function
But it doesn't work well...
I guess the problem is back propagation part.
because rest of the code works well when i was using sigmoid activation function
I only fixed back propagation part
so could you anyone teach me what's wrong with my code
Upcoming code is part of my neural network class
Additionally i don't want use any deep learning frame work Sorry..!
# train the neural network
def train(self, inputs_list, targets_list):
# convert inputs list to 2d array
inputs = numpy.array(inputs_list, ndmin=2).T
targets = numpy.array(targets_list, ndmin=2).T
# calculate signals into hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# calculate the signals emerging from hidden layer
hidden_outputs = self.activation_function(hidden_inputs)
# calculate signals into hidden layer
hidden_inputs2 = numpy.dot(self.wh1h2, hidden_outputs)
# calculate the signals emerging from hidden layer
hidden_outputs2 = self.activation_function(hidden_inputs2)
# calculate signals into final output layer
final_inputs = numpy.dot(self.wh2o, hidden_outputs2)
# calculate the signals emerging from final output layer
final_outputs = self.activation_function(final_inputs)
# output layer error is the (target - actual)
output_errors = targets - final_outputs
# hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors2 = numpy.dot(self.wh2o.T, output_errors)
# hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.wh1h2.T, hidden_errors2)
#Back propagation part
# update the weights for the links between the hidden and output layers
# self.wh2o += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs2))
self.wh2o += self.lr * numpy.dot((output_errors * numpy.heaviside(final_inputs,0.0) ), numpy.transpose(hidden_outputs2))
# update the weights for the links between the input and hidden layers
self.wh1h2 += self.lr * numpy.dot((hidden_errors2 * numpy.heaviside(hidden_inputs2, 0.0) ), numpy.transpose(hidden_outputs))
# update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * numpy.heaviside(hidden_inputs, 0.0) ), numpy.transpose(inputs))
pass
wh2o means weight which propagte secod hidden layer to outputlayer
wh1h2 means weight which propagte first hidden layer to second layer
wih means weight which propagte input layer to hidden layer
Without looking at the specifics of your code; relu's popularity stems mostly from its success in CNN's. For small regression problems like this, its a pretty awful choice, as it really brings the vanishing gradient problem to the forefront; which for complicated reasons isnt as much of an issue in big CNN problems. There are various ways of making your architecture more robust to vanishing gradients; but not using relu would be my first suggestion (maxout is your best friend of vanishing gradients are a persistent problem). Bottom line; this may very well have nothing to do with a problem in your code, but be purely one of architecture.

Resources