I have a deep learning problem, which I intend to solve in Keras with CNN. The task is 1D regression, for which I generate grayscale images using 2 parameters and the parameter to be deduced by the network (this is temperature difference). The image generation has 3 parameters, everything else is random. Naturally, the image generation occurs only in a region of each of the parameters. Of course the images visually represent the temperature difference.
The network has 2 inputs: two scalars as a vector (the 2 additional parameters for image generation) and the image. The aim of the teaching is to deduce the temperature difference from the supplied image.
My problem is the image generation is not always possible because of geometric constraints. There is a subregion of the 3 parameters used for generation, where it will fail. The red circles represent this in the figure below. Two axes of the figure is logarithmic, but as seen from the sample distribution, the parameter pick uses exponential-like distribution.
Learning proves to be quite good on the lower left region of the parameters, but totally unusable on the other end.
My question is if the poor model performance can be a result of the shape of the taught parameters, especially the failed region?
I forgot to mention that the images used are 256*256 8-bit grayscale. Training code:
def createMlp(aRepeatParameter:int):
vectorSize = aRepeatParameter * 2
inputs = Input(shape=(vectorSize,))
x = inputs
return Model(inputs, x)
def createCnn():
filters=(64, 16, 4)
inputShape = (256, 256, 1)
chanDim = -1
inputs = Input(shape=inputShape)
x = inputs
for (i, f) in enumerate(filters):
x = Conv2D(f, (3, 3), padding="same")(x)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization(axis=chanDim)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(128, activation=LeakyReLU(alpha=0.3))(x)
x = Dense(16, activation=LeakyReLU(alpha=0.3))(x)
x = BatchNormalization(axis=chanDim)(x)
x = Dropout(0.5)(x)
x = Dense(4)(x)
x = LeakyReLU(alpha=0.3)(x)
return Model(inputs, x)
repeatParameter:int = 2
mlp = createMlp(repeatParameter)
cnn = createCnn()
combinedInput = Concatenate(axis=1)([mlp.output, cnn.output])
x = Dense(4, activation=LeakyReLU(alpha=0.3))(combinedInput)
x = Dense(1, activation="linear")(x)
model = Model(inputs=[mlp.input, cnn.input], outputs=x)
batchSize = 32
sampleSize = 96000
validationSize = 16000
port = 12345
trainingSteps = math.ceil(sampleSize / batchSize)
learningRate = ExponentialDecay(initial_learning_rate=0.001, decay_steps=trainingSteps, decay_rate=0.05)
opt = Adam(learning_rate=learningRate)
model.compile(loss="mean_squared_error", optimizer=opt, metrics=["mean_absolute_percentage_error"])
model.fit(landscapeGenerator.generate(batchSize, repeatParameter, port), validation_data=landscapeGenerator.generate(batchSize, repeatParameter, port),
epochs=50, steps_per_epoch=trainingSteps, validation_steps=validationSize/batchSize )
Related
Is there a more efficient way to compute Jacobian (there must be, it doesn't even run for a single batch) I want to compute the loss as given in the self-explanatory neural network. Input has a shape of (32, 365, 3) where 32 is the batch size. The loss I want to minimize is Equation 3 of the paper.
I believe that I am not using the GradientTape optimally.
def compute_loss_theta(tape, parameter, concept, output, x):
b = x.shape[0]
in_dim = (x.shape[1], x.shape[2])
feature_dim = in_dim[0]*in_dim[1]
J = tape.batch_jacobian(concept, x)
grad_fx = tape.gradient(output, x)
grad_fx = tf.reshape(grad_fx,shape=(b, feature_dim))
J = tf.reshape(J, shape=(b, feature_dim, feature_dim))
parameter = tf.expand_dims(parameter, axis =1)
loss_theta_matrix = grad_fx - tf.matmul(parameter, J)
loss_theta = tf.norm(loss_theta_matrix)
return loss_theta
for i in range(10):
for x, y in train_dataset:
with tf.GradientTape(persistent=True) as tape:
tape.watch(x)
parameter, concept, output = model(x)
loss_theta = compute_loss_theta(tape, parameter, concept, output , x)
loss_y = loss_object(y_true=y, y_pred=output)
loss_value = loss_y + eps*loss_theta
gradients = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
I need to create a neural network that approximates a function given its parameters. I give four parameters to my neural network (A, x0, phi, omega) and I want to obtain, as output,
A sin(omega x + phi) + x0
(I need this net as a part of another network)
However, I am not able to train the network as I obtain a very poor convergence. Why is that?
I use a fully connected network with three layers. This is the code
def get_batches(N_batches):
A = tan( random.uniform(low=0.,high=2*pi,size=[N_batches,1]))
x0 = random.randn(N_batches,1)*10
omega = random.uniform(low=0.,high=10*pi, size=[N_batches,1])
phi = random.uniform(low=0.,high=2*pi, size=[N_batches,1])
x = linspace(0,t_max, n_max)
x = tile(x,N_batches).reshape(N_batches,n_max)
return (A*sin(omega*x+phi) + x0, hstack([A,x0,phi,omega]) )
N_batches = 80
N_epochs = 50
t_max = 5.0
n_max = 100
n_par = 4
net_layers = []
net_inp = Input(shape=(n_par,))
net_layers.append(Dense(25, input_shape=(n_par,), activation="relu"))
net_layers.append(Dense(25, activation="relu"))
net_layers.append(Dense(25, activation="relu"))
net_layers.append(Dense(n_max, activation="linear"))
net_l = net_inp
for i in range(len(net_layers)):
net_l = net_layers[i](net_l)
net = Model(net_inp, net_l)
net.compile(loss="mean_squared_error", optimizer="adam")
costs = zeros(N_epochs)
for i in range(N_epochs):
y_true, y_in = get_batches(N_batches)
costs[i]=net.train_on_batch(y_in,y_true)
Even if I train more, I don't get better results than this
picture (approximated function and real function plot for a test sample):
The plot of the cost function is quite strange:
What mistakes did I do? Thank you!
I have a CNN code that was written using tensorflow library:
x_img = tf.placeholder(tf.float32)
y_label = tf.placeholder(tf.float32)
def convnet_3d(x_img, W):
conv_3d_layer = tf.nn.conv3d(x_img, W, strides=[1,1,1,1,1], padding='VALID')
return conv_3d_layer
def maxpool_3d(x_img):
maxpool_3d_layer = tf.nn.max_pool3d(x_img, ksize=[1,2,2,2,1], strides=[1,2,2,2,1], padding='VALID')
return maxpool_3d_layer
def convolutional_neural_network(x_img):
weights = {'W_conv1_layer':tf.Variable(tf.random_normal([3,3,3,1,32])),
'W_conv2_layer':tf.Variable(tf.random_normal([3,3,3,32,64])),
'W_fc_layer':tf.Variable(tf.random_normal([409600,1024])),
'W_out_layer':tf.Variable(tf.random_normal([1024, num_classes]))}
biases = {'b_conv1_layer':tf.Variable(tf.random_normal([32])),
'b_conv2_layer':tf.Variable(tf.random_normal([64])),
'b_fc_layer':tf.Variable(tf.random_normal([1024])),
'b_out_layer':tf.Variable(tf.random_normal([num_classes]))}
x_img = tf.reshape(x_img, shape=[-1, img_x, img_y, img_z, 1])
conv1_layer = tf.nn.relu(convnet_3d(x_img, weights['W_conv1_layer']) + biases['b_conv1_layer'])
conv1_layer = maxpool_3d(conv1_layer)
conv2_layer = tf.nn.relu(convnet_3d(conv1_layer, weights['W_conv2_layer']) + biases['b_conv2_layer'])
conv2_layer = maxpool_3d(conv2_layer)
fc_layer = tf.reshape(conv2_layer,[-1, 409600])
fc_layer = tf.nn.relu(tf.matmul(fc_layer, weights['W_fc_layer'])+biases['b_fc_layer'])
fc_layer = tf.nn.dropout(fc_layer, keep_rate)
output_layer = tf.matmul(fc_layer, weights['W_out_layer'])+biases['b_out_layer']
return output_layer
my input image x_img is 25x25x25(3d image), I have some questions about the code:
1- is [3,3,3,1,32] in 'W_conv1_layer' means [width x height x depth x channel x number of filters]?
2- in 'W_conv2_layer' weights are [3,3,3,32,64], why the output is 64? I know that 3x3x3 is filter size and 32 is input come from first layer.
3- in 'W_fc_layer' weights are [409600,1024], 1024 is number of nodes in FC layer, but where this magic number '409600' come from?
4- before the image get into the conv layers why we need to reshape the image
x_img = tf.reshape(x_img, shape=[-1, img_x, img_y, img_z, 1])
All the answers can be found in the official doc of conv3d.
The weights should be [filter_depth, filter_height, filter_width, in_channels, out_channels]
The numbers 32 and 64 are chosen because it works simply they are just hyperparameters
409600 comes from reshaping the output of maxpool3d (it is probably a mistake the real size should be 4096 see comments)
Because tensorflow expects certain layouts for its input
Your should try implementing a simple convnet on images before moving to more complicated stuff.
I'm working on a RNN architecture which does speech enhancement. The dimensions of the input is [XX, X, 1024] where XX is the batch size and X is the variable sequence length.
The input to the network is positive valued data and the output is masked binary data(IBM) which is later used to construct enhanced signal.
For instance, if the input to network is [10, 65, 1024] the output will be [10,65,1024] tensor with binary values. I'm using Tensorflow with mean squared error as loss function. But I'm not sure which activation function to use here(which keeps the outputs either zero or one), Following is the code I've come up with so far
tf.reset_default_graph()
num_units = 10 #
num_layers = 3 #
dropout = tf.placeholder(tf.float32)
cells = []
for _ in range(num_layers):
cell = tf.contrib.rnn.LSTMCell(num_units)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob = dropout)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
X = tf.placeholder(tf.float32, [None, None, 1024])
Y = tf.placeholder(tf.float32, [None, None, 1024])
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
out_size = Y.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(output, out_size)
prediction = (logit)
flat_Y = tf.reshape(Y, [-1] + Y.shape.as_list()[2:])
flat_logit = tf.reshape(logit, [-1] + logit.shape.as_list()[2:])
loss_op = tf.losses.mean_squared_error(labels=flat_Y, predictions=flat_logit)
#adam optimizier as the optimization function
optimizer = tf.train.AdamOptimizer(learning_rate=0.001) #
train_op = optimizer.minimize(loss_op)
#extract the correct predictions and compute the accuracy
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Also my reconstruction isn't good. Can someone suggest on improving the model?
If you want your outputs to be either 0 or 1, to me it seems a good idea to turn this into a classification problem. To this end, I would use a sigmoidal activation and cross entropy:
...
prediction = tf.nn.sigmoid(logit)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logit))
...
In addition, from my point of view the hidden dimensionality (10) of your stacked RNNs seems quite small for such a big input dimensionality (1024). However this is just a guess, and it is something that needs to be tuned.
I'm using deep learning approach to address a regression problem with multi outputs (16 outputs), each output is between [0,1] and the sum is 1.
I am confused about which loss function is ideal to this problem, I have already test Mean squared error and Mean Absolute Error but Neural network predicts always the same value.
model = applications.VGG16(include_top=False, weights = None, input_shape = (256, 256, 3))
x = model.output
x = Flatten()(x)
x = Dense(1024)(x)
x=BatchNormalization()(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
x = Dense(512)(x)
x=BatchNormalization()(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
predictions = Dense(16,activation="sigmoid")(x)
model_final = Model(input = model.input, output = predictions)
model_final.compile(loss ='mse', optimizer = Adam(lr=0.1), metrics=['mae'])
What you are describing sounds more like a classification task, since you want to get a probability distribution at the end.
Therefore you should use a softmax (for example) in the last layer and cross-entropy as loss measure.