I am building a simple convolutional network using the Lasagne package and wanted to add a ReLu layer with a simple threshold [max(0,x-threshold)] but could only find rectifiers without a trainable parameter (lasagne.layers.NonlinearityLayer) or that has a parameter that is being multiplied (lasagne.layers.ParametricRectifierLayer). Does this layer exist or am I missing something obvious?
Thank you for any help! Terry
I don't think that exists. With the reason being that you usually have a trainable layer before the relu (e.g. convolutional or fully connected), which has a bias included. Moving the data by some bias is equivalent to having a threshold at the relu.
If you don't have a trainable layer before the relu, you can also explicitly add a lasagne.layers.BiasLayer (http://lasagne.readthedocs.org/en/latest/modules/layers/special.html)
Hope this helps
Michael
Related
I am learning CNN trainable parameters calculation in Keras. I just wonder why we consider filter calculation as trainable parameters? Since the convolution process is a fixed calculation (i.e. matrix multiplication) and there are nothing need to update (trainable). I know there is a formula but why we consider this as trainable parameters. For example: in the first conV2D, image size, say 10x10x1, filter 3 x 3 , 1 filter, the parameters in keras is 10 (3x3+1).
Alex
In your convolution layer there is a 3x3(x1) kernel (the x1 since your image only has a single channel). The values in the convolution layer's kernel are learned parameters. In addition to the kernel itself, the convolution layer (may, it's usually optional) have a learnable bias parameter (that's the +1 in your formula). It's a bit hard to understand from your question, but it looks like in your setup you are asking the layer to learn the parameters for 10 different convolutional kernels (each with a bias) hence the 10(3x3+1) learned parameters.
I have a question about the terms of MLP in Keras.
what does the density of a layer mean?
is it the same as the number of neurons? if it is, so what's the role of input_dim?
I have never head of the "density" of a layer in the context of vanilla feed forward networks. I would assume it refers to the number of neurons, but really it depends on context.
Input layer with a certain dimension and the first hidden layer with input_dim argument are both equivalent ways to handle input in Keras.
I wanted to use the layer Conv2Dtranspose and Conv3Dtranspose of Keras in order to do deconvolution (upsampling and convolution at the same time). I can get my model built and compiled, but the upsampling part does not seem to work, even when I modify the parameter dilate_rate.
Any idea how to do the upsampling part using ConvXDtranspose?
Or am I misunderstanding how this layer works?
I misunderstood how this works. The dilate_rate parameter is about the convolution radius to allow "a trou" convolutions.
I was wondering, if the GaussianDropout Layer in Keras retains probability like the Dropout Layer.
The Dropout Layer is implemented as an Inverted Dropout which retains probability.
If you aren't aware of the problem you may have a look at the discussion and specifically at the linxihui's answer.
The crucial point which makes the Dropout Layer retaining the probability is the call of K.dropout, which isn't called by a GaussianDropout Layer.
Is there any reason why GaussianDropout Layer does not retain probability?
Or is it retaining but in another way being unseen?
Similar - referring to the Dropout Layer: Is the Keras implementation of dropout correct?
So Gaussian dropout doesn't need to retain probability as its origin comes from applying Central Limit Theorem to inverted dropout. The details might be found here in 2nd and 3rd paragraph of Multiplicative Gaussian Noise chapter (chapter 10, p. 1951).
My problem is to take all hidden outputs from an LSTM and use them as training examples for a single dense layer. Flattening the output of the hidden layers and feeding them to a dense layer is not what I am looking to do. I have tried the following things:
I have considered Timedistributed wrapper for the dense layer (https://keras.io/layers/wrappers/). But, this seems to apply the same layer to every time slice, which is not what I want. In other words, the Timedistributed wrapper has input_shape of a 3D tensor (number of samples, number of timesteps, number of features) and produces another 3D tensor of the same type: (number of samples, number of timesteps, number of features). Instead what I want is a 2D tensor as output, which looks like (number of samples*number of timesteps, number of features)
There was a pull request for an AdvancedReshapeLayer: https://github.com/fchollet/keras/pull/36 on GitHub. This seems to be exactly what I am looking for. Unfortunately, it appears like that pull request was closed with no conclusive outcome.
I tried to build my own lambda layer to accomplish what I want as follows:
A). model.add(LSTM(NUM_LSTM_UNITS, return_sequences=True, activation='tanh')) #
B). model.add(Lambda(lambda x: x, output_shape=lambda x: (x[0]*x[1], x[2])))
C). model.add(Dense(NUM_CLASSES, input_dim=NUM_LSTM_UNITS))
mode.output_shape after (A) prints: (BATCH_SIZE, NUM_TIME_STEPS, NUM_LSTM_UNITS) and model.output_shape after (B) prints: (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS)
Which is exactly what I am trying to achieve.
Unfortunately, when I try to run step (C). I get the following error:
Input 0 is incompatible with layer dense_1: expected ndim=2, found
ndim=3
This is baffling since when I print model.output_shape after (B), I do indeed see (BATCH_SIZE*NUM_OF_TIMESTEPS, NUM_LSTM_UNITS), which is of ndim=2.
Really appreciate any help with this.
EDIT: When I try to use the functional API instead of a sequential model, I still get the same error on step (C)
You can use backend reshape which includes batch_size dimension.
def backend_reshape(x):
return backend.reshape(x, (-1, NUM_LSTM_UNITS))
model.add(Lambda(backend_reshape, output_shape=(NUM_LSTM_UNITS,)))