I need to create a neural network that approximates a function given its parameters. I give four parameters to my neural network (A, x0, phi, omega) and I want to obtain, as output,
A sin(omega x + phi) + x0
(I need this net as a part of another network)
However, I am not able to train the network as I obtain a very poor convergence. Why is that?
I use a fully connected network with three layers. This is the code
def get_batches(N_batches):
A = tan( random.uniform(low=0.,high=2*pi,size=[N_batches,1]))
x0 = random.randn(N_batches,1)*10
omega = random.uniform(low=0.,high=10*pi, size=[N_batches,1])
phi = random.uniform(low=0.,high=2*pi, size=[N_batches,1])
x = linspace(0,t_max, n_max)
x = tile(x,N_batches).reshape(N_batches,n_max)
return (A*sin(omega*x+phi) + x0, hstack([A,x0,phi,omega]) )
N_batches = 80
N_epochs = 50
t_max = 5.0
n_max = 100
n_par = 4
net_layers = []
net_inp = Input(shape=(n_par,))
net_layers.append(Dense(25, input_shape=(n_par,), activation="relu"))
net_layers.append(Dense(25, activation="relu"))
net_layers.append(Dense(25, activation="relu"))
net_layers.append(Dense(n_max, activation="linear"))
net_l = net_inp
for i in range(len(net_layers)):
net_l = net_layers[i](net_l)
net = Model(net_inp, net_l)
net.compile(loss="mean_squared_error", optimizer="adam")
costs = zeros(N_epochs)
for i in range(N_epochs):
y_true, y_in = get_batches(N_batches)
costs[i]=net.train_on_batch(y_in,y_true)
Even if I train more, I don't get better results than this
picture (approximated function and real function plot for a test sample):
The plot of the cost function is quite strange:
What mistakes did I do? Thank you!
Related
I have a deep learning problem, which I intend to solve in Keras with CNN. The task is 1D regression, for which I generate grayscale images using 2 parameters and the parameter to be deduced by the network (this is temperature difference). The image generation has 3 parameters, everything else is random. Naturally, the image generation occurs only in a region of each of the parameters. Of course the images visually represent the temperature difference.
The network has 2 inputs: two scalars as a vector (the 2 additional parameters for image generation) and the image. The aim of the teaching is to deduce the temperature difference from the supplied image.
My problem is the image generation is not always possible because of geometric constraints. There is a subregion of the 3 parameters used for generation, where it will fail. The red circles represent this in the figure below. Two axes of the figure is logarithmic, but as seen from the sample distribution, the parameter pick uses exponential-like distribution.
Learning proves to be quite good on the lower left region of the parameters, but totally unusable on the other end.
My question is if the poor model performance can be a result of the shape of the taught parameters, especially the failed region?
I forgot to mention that the images used are 256*256 8-bit grayscale. Training code:
def createMlp(aRepeatParameter:int):
vectorSize = aRepeatParameter * 2
inputs = Input(shape=(vectorSize,))
x = inputs
return Model(inputs, x)
def createCnn():
filters=(64, 16, 4)
inputShape = (256, 256, 1)
chanDim = -1
inputs = Input(shape=inputShape)
x = inputs
for (i, f) in enumerate(filters):
x = Conv2D(f, (3, 3), padding="same")(x)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization(axis=chanDim)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(128, activation=LeakyReLU(alpha=0.3))(x)
x = Dense(16, activation=LeakyReLU(alpha=0.3))(x)
x = BatchNormalization(axis=chanDim)(x)
x = Dropout(0.5)(x)
x = Dense(4)(x)
x = LeakyReLU(alpha=0.3)(x)
return Model(inputs, x)
repeatParameter:int = 2
mlp = createMlp(repeatParameter)
cnn = createCnn()
combinedInput = Concatenate(axis=1)([mlp.output, cnn.output])
x = Dense(4, activation=LeakyReLU(alpha=0.3))(combinedInput)
x = Dense(1, activation="linear")(x)
model = Model(inputs=[mlp.input, cnn.input], outputs=x)
batchSize = 32
sampleSize = 96000
validationSize = 16000
port = 12345
trainingSteps = math.ceil(sampleSize / batchSize)
learningRate = ExponentialDecay(initial_learning_rate=0.001, decay_steps=trainingSteps, decay_rate=0.05)
opt = Adam(learning_rate=learningRate)
model.compile(loss="mean_squared_error", optimizer=opt, metrics=["mean_absolute_percentage_error"])
model.fit(landscapeGenerator.generate(batchSize, repeatParameter, port), validation_data=landscapeGenerator.generate(batchSize, repeatParameter, port),
epochs=50, steps_per_epoch=trainingSteps, validation_steps=validationSize/batchSize )
Is there a more efficient way to compute Jacobian (there must be, it doesn't even run for a single batch) I want to compute the loss as given in the self-explanatory neural network. Input has a shape of (32, 365, 3) where 32 is the batch size. The loss I want to minimize is Equation 3 of the paper.
I believe that I am not using the GradientTape optimally.
def compute_loss_theta(tape, parameter, concept, output, x):
b = x.shape[0]
in_dim = (x.shape[1], x.shape[2])
feature_dim = in_dim[0]*in_dim[1]
J = tape.batch_jacobian(concept, x)
grad_fx = tape.gradient(output, x)
grad_fx = tf.reshape(grad_fx,shape=(b, feature_dim))
J = tf.reshape(J, shape=(b, feature_dim, feature_dim))
parameter = tf.expand_dims(parameter, axis =1)
loss_theta_matrix = grad_fx - tf.matmul(parameter, J)
loss_theta = tf.norm(loss_theta_matrix)
return loss_theta
for i in range(10):
for x, y in train_dataset:
with tf.GradientTape(persistent=True) as tape:
tape.watch(x)
parameter, concept, output = model(x)
loss_theta = compute_loss_theta(tape, parameter, concept, output , x)
loss_y = loss_object(y_true=y, y_pred=output)
loss_value = loss_y + eps*loss_theta
gradients = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))
I am trying to implement a neural network with one hidden layer that can represent the solution to a PDE (let's say the Laplace equation). The objective function therefore depends on the gradient of the neural network w.r.t its input.
Now, I have implemented the calculation of the second derivatives using Lambda layers. However when I try to compute the gradient of the output with respect to the parameters of the model, I get an error.
def grad(y, x, nameit):
return Lambda(lambda z: K.gradients(z[0], z[1]), output_shape = [1], name = nameit)([y,x])
def network(i):
m = Dense(100, activation='sigmoid')(i)
j = Dense(1, name="networkout")(m)
return j
x1 = Input(shape=(1,))
a = network(x1)
b = grad(a, x1, "dudx1")
c = grad(b, x1, "dudx11")
model = Model(inputs = [x1], outputs=[c])
model.compile(optimizer='rmsprop',
loss='mean_squared_error',
metrics=['accuracy'])
x1_data = np.random.random((20, 1))
labels = np.zeros((20,1))
model.fit(x1_data,labels)
This is the error:
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Why can't Keras compute the gradients w.r.t the trainable parameters?
Problem is in networkout layer. It maintains a linear activation which prevents gradients from pass through it, and therefore, return 'None' gradients error. In this case you need to add any activation function except linear to networkout layer.
def network(i):
m = layers.Dense(100)(i)
j = layers.Dense(1, name="networkout", activation='relu')(m)
return j
However, previous layer can have a linear activation.
I'm working on a RNN architecture which does speech enhancement. The dimensions of the input is [XX, X, 1024] where XX is the batch size and X is the variable sequence length.
The input to the network is positive valued data and the output is masked binary data(IBM) which is later used to construct enhanced signal.
For instance, if the input to network is [10, 65, 1024] the output will be [10,65,1024] tensor with binary values. I'm using Tensorflow with mean squared error as loss function. But I'm not sure which activation function to use here(which keeps the outputs either zero or one), Following is the code I've come up with so far
tf.reset_default_graph()
num_units = 10 #
num_layers = 3 #
dropout = tf.placeholder(tf.float32)
cells = []
for _ in range(num_layers):
cell = tf.contrib.rnn.LSTMCell(num_units)
cell = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob = dropout)
cells.append(cell)
cell = tf.contrib.rnn.MultiRNNCell(cells)
X = tf.placeholder(tf.float32, [None, None, 1024])
Y = tf.placeholder(tf.float32, [None, None, 1024])
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
out_size = Y.get_shape()[2].value
logit = tf.contrib.layers.fully_connected(output, out_size)
prediction = (logit)
flat_Y = tf.reshape(Y, [-1] + Y.shape.as_list()[2:])
flat_logit = tf.reshape(logit, [-1] + logit.shape.as_list()[2:])
loss_op = tf.losses.mean_squared_error(labels=flat_Y, predictions=flat_logit)
#adam optimizier as the optimization function
optimizer = tf.train.AdamOptimizer(learning_rate=0.001) #
train_op = optimizer.minimize(loss_op)
#extract the correct predictions and compute the accuracy
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
Also my reconstruction isn't good. Can someone suggest on improving the model?
If you want your outputs to be either 0 or 1, to me it seems a good idea to turn this into a classification problem. To this end, I would use a sigmoidal activation and cross entropy:
...
prediction = tf.nn.sigmoid(logit)
loss_op = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=Y, logits=logit))
...
In addition, from my point of view the hidden dimensionality (10) of your stacked RNNs seems quite small for such a big input dimensionality (1024). However this is just a guess, and it is something that needs to be tuned.
I'm using deep learning approach to address a regression problem with multi outputs (16 outputs), each output is between [0,1] and the sum is 1.
I am confused about which loss function is ideal to this problem, I have already test Mean squared error and Mean Absolute Error but Neural network predicts always the same value.
model = applications.VGG16(include_top=False, weights = None, input_shape = (256, 256, 3))
x = model.output
x = Flatten()(x)
x = Dense(1024)(x)
x=BatchNormalization()(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
x = Dense(512)(x)
x=BatchNormalization()(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
predictions = Dense(16,activation="sigmoid")(x)
model_final = Model(input = model.input, output = predictions)
model_final.compile(loss ='mse', optimizer = Adam(lr=0.1), metrics=['mae'])
What you are describing sounds more like a classification task, since you want to get a probability distribution at the end.
Therefore you should use a softmax (for example) in the last layer and cross-entropy as loss measure.