Matrix calculations in SIMPLERNN KERAS - keras

I'm trying to understand the Matrix Calculations involved in SIMPLERNN.
I have understood from some blog posts and stackoverflow that SimpleRNN(units) creates a layer where in each layer contains no.of units of RNN.
SimpleRNN involves the following calculations
W = kernel #shape-(1,units)
U = recurrent_kernel #shape-(units,units)
B = bias #shape -(units,)
output = new_state = act(W * input + U * state + B)
Help me understand what are the input and output dimensions of the following below code snippet.
Function for generating X and Y. Where Y is the Cummulative sum of X.
'def generate_batch(n=256):
X = np.random.choice(a=[0, 1], size = n*seq_len, p=[0.9, 0.1]).reshape(n, -1)
y = np.cumsum(X, axis=1)
X = X.reshape(n, -1, 1)
y = y.reshape(n, -1, 1)
return(X, y) #returns shape X,y-(256,60,1)
model = Sequential()
model.add(SimpleRNN(10, input_shape=(60, 1), return_sequences=True))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
X, y = generate_batch()#(256,60,1)
model.fit(X, y, verbose=0, epochs=1)'
please help me figure out the dimensions of input to RNN ,state and output dimension from RNN.How does the matrix calculations occur i.e (W * input + U * state + B).

Related

Deep learning in partially defined parameter space

I have a deep learning problem, which I intend to solve in Keras with CNN. The task is 1D regression, for which I generate grayscale images using 2 parameters and the parameter to be deduced by the network (this is temperature difference). The image generation has 3 parameters, everything else is random. Naturally, the image generation occurs only in a region of each of the parameters. Of course the images visually represent the temperature difference.
The network has 2 inputs: two scalars as a vector (the 2 additional parameters for image generation) and the image. The aim of the teaching is to deduce the temperature difference from the supplied image.
My problem is the image generation is not always possible because of geometric constraints. There is a subregion of the 3 parameters used for generation, where it will fail. The red circles represent this in the figure below. Two axes of the figure is logarithmic, but as seen from the sample distribution, the parameter pick uses exponential-like distribution.
Learning proves to be quite good on the lower left region of the parameters, but totally unusable on the other end.
My question is if the poor model performance can be a result of the shape of the taught parameters, especially the failed region?
I forgot to mention that the images used are 256*256 8-bit grayscale. Training code:
def createMlp(aRepeatParameter:int):
vectorSize = aRepeatParameter * 2
inputs = Input(shape=(vectorSize,))
x = inputs
return Model(inputs, x)
def createCnn():
filters=(64, 16, 4)
inputShape = (256, 256, 1)
chanDim = -1
inputs = Input(shape=inputShape)
x = inputs
for (i, f) in enumerate(filters):
x = Conv2D(f, (3, 3), padding="same")(x)
x = LeakyReLU(alpha=0.3)(x)
x = BatchNormalization(axis=chanDim)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(128, activation=LeakyReLU(alpha=0.3))(x)
x = Dense(16, activation=LeakyReLU(alpha=0.3))(x)
x = BatchNormalization(axis=chanDim)(x)
x = Dropout(0.5)(x)
x = Dense(4)(x)
x = LeakyReLU(alpha=0.3)(x)
return Model(inputs, x)
repeatParameter:int = 2
mlp = createMlp(repeatParameter)
cnn = createCnn()
combinedInput = Concatenate(axis=1)([mlp.output, cnn.output])
x = Dense(4, activation=LeakyReLU(alpha=0.3))(combinedInput)
x = Dense(1, activation="linear")(x)
model = Model(inputs=[mlp.input, cnn.input], outputs=x)
batchSize = 32
sampleSize = 96000
validationSize = 16000
port = 12345
trainingSteps = math.ceil(sampleSize / batchSize)
learningRate = ExponentialDecay(initial_learning_rate=0.001, decay_steps=trainingSteps, decay_rate=0.05)
opt = Adam(learning_rate=learningRate)
model.compile(loss="mean_squared_error", optimizer=opt, metrics=["mean_absolute_percentage_error"])
model.fit(landscapeGenerator.generate(batchSize, repeatParameter, port), validation_data=landscapeGenerator.generate(batchSize, repeatParameter, port),
epochs=50, steps_per_epoch=trainingSteps, validation_steps=validationSize/batchSize )

Computing Jacobian and Derivative in Tensorflow is extremely slow

Is there a more efficient way to compute Jacobian (there must be, it doesn't even run for a single batch) I want to compute the loss as given in the self-explanatory neural network. Input has a shape of (32, 365, 3) where 32 is the batch size. The loss I want to minimize is Equation 3 of the paper.
I believe that I am not using the GradientTape optimally.
def compute_loss_theta(tape, parameter, concept, output, x):
b = x.shape[0]
in_dim = (x.shape[1], x.shape[2])
feature_dim = in_dim[0]*in_dim[1]
J = tape.batch_jacobian(concept, x)
grad_fx = tape.gradient(output, x)
grad_fx = tf.reshape(grad_fx,shape=(b, feature_dim))
J = tf.reshape(J, shape=(b, feature_dim, feature_dim))
parameter = tf.expand_dims(parameter, axis =1)
loss_theta_matrix = grad_fx - tf.matmul(parameter, J)
loss_theta = tf.norm(loss_theta_matrix)
return loss_theta
for i in range(10):
for x, y in train_dataset:
with tf.GradientTape(persistent=True) as tape:
tape.watch(x)
parameter, concept, output = model(x)
loss_theta = compute_loss_theta(tape, parameter, concept, output , x)
loss_y = loss_object(y_true=y, y_pred=output)
loss_value = loss_y + eps*loss_theta
gradients = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(gradients, model.trainable_weights))

Keras multiply parallel layers' outputs with constrained weigths

I have 3 parallel MLPs and want to obtain the following in Keras:
Out = W1 * Out_MLP1 + W2 * Out_MLP2 + W3 * Out_MLP3
where Out_MLPs are output layer of each MLP and have dimension of (10,) and W1, W2 and W3 are three trainable weights (floats) where they satisfy the following condition:
W1 + W2 + W3 = 1
What is the best way to implement this with Keras functional API? What if we had N parallel layers?
what you need is to apply a softmax on a set of learnable weights, in order to grant that they sum up to 1.
We initialize our learnable weights in a custom layer. this layer receives the output of our MLPs and combines them following our logic W1 * Out_MLP1 + W2 * Out_MLP2 + W3 * Out_MLP3. the output will be a tensor of shape (10,).
class W_ADD(Layer):
def __init__(self, n_output):
super(W_ADD, self).__init__()
self.W = tf.Variable(initial_value=tf.random.uniform(shape=[1,1,n_output], minval=0, maxval=1),
trainable=True) # (1,1,n_inputs)
def call(self, inputs):
# inputs is a list of tensor of shape [(n_batch, n_feat), ..., (n_batch, n_feat)]
# expand last dim of each input passed [(n_batch, n_feat, 1), ..., (n_batch, n_feat, 1)]
inputs = [tf.expand_dims(i, -1) for i in inputs]
inputs = Concatenate(axis=-1)(inputs) # (n_batch, n_feat, n_inputs)
weights = tf.nn.softmax(self.W, axis=-1) # (1,1,n_inputs)
# weights sum up to one on last dim
return tf.reduce_sum(weights*inputs, axis=-1) # (n_batch, n_feat)
in this dummy example, I create a network that has 3 parallel MLPs
inp1 = Input((100))
inp2 = Input((100))
inp3 = Input((100))
x1 = Dense(32, activation='relu')(inp1)
x2 = Dense(32, activation='relu')(inp2)
x3 = Dense(32, activation='relu')(inp3)
x1 = Dense(10, activation='linear')(x1)
x2 = Dense(10, activation='linear')(x2)
x3 = Dense(10, activation='linear')(x3)
mlp_outputs = [x1,x2,x3]
out = W_ADD(n_output=len(mlp_outputs))(mlp_outputs)
m = Model([inp1,inp2,inp3], out)
m.compile('adam','mse')
X1 = np.random.uniform(0,1, (1000,100))
X2 = np.random.uniform(0,1, (1000,100))
X3 = np.random.uniform(0,1, (1000,100))
y = np.random.uniform(0,1, (1000,10))
m.fit([X1,X2,X3], y, epochs=10)
as you can see this is easily generalizable in case of N parallel layers

Using autograd to compute Jacobian matrix of outputs with respect to inputs

I apologize if this question is obvious or trivial. I am very new to pytorch and I am trying to understand the autograd.grad function in pytorch. I have a neural network G that takes in inputs (x,t) and outputs (u,v). Here is the code for G:
class GeneratorNet(torch.nn.Module):
"""
A three hidden-layer generative neural network
"""
def __init__(self):
super(GeneratorNet, self).__init__()
self.hidden0 = nn.Sequential(
nn.Linear(2, 100),
nn.LeakyReLU(0.2)
)
self.hidden1 = nn.Sequential(
nn.Linear(100, 100),
nn.LeakyReLU(0.2)
)
self.hidden2 = nn.Sequential(
nn.Linear(100, 100),
nn.LeakyReLU(0.2)
)
self.out = nn.Sequential(
nn.Linear(100, 2),
nn.Tanh()
)
def forward(self, x):
x = self.hidden0(x)
x = self.hidden1(x)
x = self.hidden2(x)
x = self.out(x)
return x
Or simply G(x,t) = (u(x,t), v(x,t)) where u(x,t) and v(x,t) are scalar valued. Goal: Compute $\frac{\partial u(x,t)}{\partial x}$ and $\frac{\partial u(x,t)}{\partial t}$. At every training step, I have a minibatch of size $100$ so u(x,t) is a [100,1] tensor. Here is my attempt to compute the partial derivatives, where coords is the input (x,t) and just like below I added the requires_grad_(True) flag to the coords as well:
tensor = GeneratorNet(coords)
tensor.requires_grad_(True)
u, v = torch.split(tensor, 1, dim=1)
du = autograd.grad(u, coords, grad_outputs=torch.ones_like(u), create_graph=True,
retain_graph=True, only_inputs=True, allow_unused=True)[0]
du is now a [100,2] tensor.
Question: Is this the tensor of the partials for the 100 input points of the minibatch?
There are similar questions like computing derivatives of the output with respect to inputs but I could not really figure out what's going on. I apologize once again if this is already answered or trivial. Thank you very much.
The code you posted should give you the partial derivative of your first output w.r.t. the input. However, you also have to set requires_grad_(True) on the inputs, as otherwise PyTorch does not build up the computation graph starting at the input and thus it cannot compute the gradient for them.
This version of your code example computes du and dv:
net = GeneratorNet()
coords = torch.randn(10, 2)
coords.requires_grad = True
tensor = net(coords)
u, v = torch.split(tensor, 1, dim=1)
du = torch.autograd.grad(u, coords, grad_outputs=torch.ones_like(u))[0]
dv = torch.autograd.grad(v, coords, grad_outputs=torch.ones_like(v))[0]
You can also compute the partial derivative for a single output:
net = GeneratorNet()
coords = torch.randn(10, 2)
coords.requires_grad = True
tensor = net(coords)
u, v = torch.split(tensor, 1, dim=1)
du_0 = torch.autograd.grad(u[0], coords)[0]
where du_0 == du[0].

optimize a function with two bounded matrices as inputs

I am trying to optimise a loss function which takes two inputs "m, d" as inputs. Both of these are (32, 32, 1) matrices. I am not able to figure out how to bound/constrain their values between 0 and 1. "m, d" are filters that I apply to some input being fed into a trained ML model.
I have looked at these documentations
https://scipy-lectures.org/advanced/mathematical_optimization/index.html#id54
(See Box-Bounds; hyperlink in Chapter Contents)
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
scipy minimize with constraints
def lossfunction(MD):
m = MD[:, :, 0]
d = MD[:, :, 1]
x = data[np.argwhere(label != 6)]
xt = np.multiply((1 - m), x) + np.multiply(m, d) # Todo: Apply Filter
num_examples = xt.shape[0]
sess = tf.get_default_session()
totalloss = 0
for offset in range(0, num_examples, BATCH_SIZE):
batchx, batchy = xt[offset:offset + BATCH_SIZE], (np.ones(BATCH_SIZE) * targetlabel)
loss = sess.run(loss_operation, feed_dict={x: batchx, y: batchy, prob: 0.8})
totalloss = totalloss + loss
finalloss = totalloss + lam * np.linalg.norm(m, 1)
return finalloss
optimize.minimize(lossfunction, np.zeros((32, 32, 2)), bounds=((0, 1), (0, 1)))
I get this error message: ValueError: length of x0 != length of bounds
I understand that the bounds and inputs should be of the same dimensions.
Is there a convenient way of inputting the bounds?

Resources