The filter generate by keras Convolution2D - keras

I have a question about the function 'Convolution2D' in keras.
model.add(Convolution2D(
nb_filter=32,
nb_row=5,
nb_col=5,
border_mode='same',
input_shape=(1,28,28),
))
By doing this, 32 5*5 filters will be used to convolute the input. But only size of the filters is specified, what does these filters look like? Are they all the same or random numbers in each one?

There is only one filter that is convolved over the layer input.
The filter is initialized using the kernel_initializer which is glorot_uniform
by default.
From the documentation on glorot_uniform:
It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor
Note that the filter changes during the training of the layer. It is optimized to recognize features that help your model make a correct classification.
I found a good explanation here.

Related

when to use minmaxscaler to re-scale input data (LSTM, KERAS)

My smallest value in my training dataset is 0.1 and my highest is about 500. my dataset is made about 1500 row and 09 columns.
I'm not sur about that but, is it mandatory to rescale the input data into [0,1] (wiht minmaxscaler for exemple), or is it just to speed the training ?
and second question, is this scaling is du to the model used (LSTM, DENSE, etc.) or does it work for anyone ? For example, my système is :
model = Sequential()
model.add(LSTM(10, input_shape=(12,12),return_sequences=True, activation='tanh'))
model.add(LSTM(10,return_sequences=False,activation='tanh'))
model.add(Dense(5))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=2)
Scaling your data for ML is done for all types of applications. It's meant to help the model converge faster. You can check out this link for a detailed explanation as to the benefits of feature scaling.
There are different ways you can scale the data, such as min-max or standard scaling; both of which are applicable for your model. If you know you have a fixed min and max in your dataset (e.g. images), you can use min-max scaling to fix your input and/or output data to be between 0 and 1.
For other applications where you do not have fixed bounds, standard scaling is useful. This gives all of your features zero-mean and unit variance. Therefore, the distributions of inputs and/or outputs are the same, and the model can treat them as such. If there is no scaling performed, the model will essentially be forced to think certain features are more important than others, rather than being able to learn those things.
The scaling for your outputs is important in defining the activation function for the output layer. If you have min-max scaled outputs, you can use sigmoid, because it bounds the outputs to between 0 and 1. If you are using standard scaling for the outputs, you would want to be sure you use a linear activation function, because technically standard-scaled outputs are not bounded. The choice of output activation is important, and knowledge of how your outputs are scaled is important in determining which activation to use.
Note: even if you had min-max scaling for your outputs, that does not restrict the activations you can use for your hidden layers.

Normalization of input data in Keras

One common task in DL is that you normalize input samples to zero mean and unit variance. One can "manually" perform the normalization using code like this:
mean = np.mean(X, axis = 0)
std = np.std(X, axis = 0)
X = [(x - mean)/std for x in X]
However, then one must keep the mean and std values around, to normalize the testing data, in addition to the Keras model being trained. Since the mean and std are learnable parameters, perhaps Keras can learn them? Something like this:
m = Sequential()
m.add(SomeKerasLayzerForNormalizing(...))
m.add(Conv2D(20, (5, 5), input_shape = (21, 100, 3), padding = 'valid'))
... rest of network
m.add(Dense(1, activation = 'sigmoid'))
I hope you understand what I'm getting at.
Add BatchNormalization as the first layer and it works as expected, though not exactly like the OP's example. You can see the detailed explanation here.
Both the OP's example and batch normalization use a learned mean and standard deviation of the input data during inference. But the OP's example uses a simple mean that gives every training sample equal weight, while the BatchNormalization layer uses a moving average that gives recently-seen samples more weight than older samples.
Importantly, batch normalization works differently from the OP's example during training. During training, the layer normalizes its output using the mean and standard deviation of the current batch of inputs.
A second distinction is that the OP's code produces an output with a mean of zero and a standard deviation of one. Batch Normalization instead learns a mean and standard deviation for the output that improves the entire network's loss. To get the behavior of the OP's example, Batch Normalization should be initialized with the parameters scale=False and center=False.
There's now a Keras layer for this purpose, Normalization. At time of writing it is in the experimental module, keras.layers.experimental.preprocessing.
https://keras.io/api/layers/preprocessing_layers/core_preprocessing_layers/normalization/
Before you use it, you call the layer's adapt method with the data X you want to derive the scale from (i.e. mean and standard deviation). Once you do this, the scale is fixed (it does not change during training). The scale is then applied to the inputs whenever the model is used (during training and prediction).
from keras.layers.experimental.preprocessing import Normalization
norm_layer = Normalization()
norm_layer.adapt(X)
model = keras.Sequential()
model.add(norm_layer)
# ... Continue as usual.
Maybe you can use sklearn.preprocessing.StandardScaler to scale you data,
This object allow you to save the scaling parameters in an object,
Then you can use Mixin types inputs into you model, lets say:
Your_model
[param1_scaler, param2_scaler]
Here is a link https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/
https://keras.io/getting-started/functional-api-guide/
There's BatchNormalization, which learns mean and standard deviation of the input. I haven't tried using it as the first layer of the network, but as I understand it, it should do something very similar to what you're looking for.

In 'ResNet50 model for Keras', why use 1x1 convolution with stride = 2?

ResNet50 is here: https://github.com/fchollet/deep-learning-models/blob/master/resnet50.py
In a 'conv_block', the first layer is like this:
x = Conv2D(filters1 = 64, # number of filters
kernel_size=(1, 1), # height/width of filters
strides=(2, 2) # stride
)(input_tensor)
My question is:
Isn't this layer going to miss some pixels?
This 1x1 convolutions only look at 1 pixel, and then move 2 pixels(stride=2).
It was mentioned in the original paper of Resnet:
The convolutional layers mostly have 3×3 filters and
follow two simple design rules: (i) for the same output
feature map size, the layers have the same number of filters; and (ii) if the feature map size is halved, the number of filters is doubled so as to preserve the time complexity per layer. We perform downsampling directly by
convolutional layers that have a stride of 2
So you may consider it as a replacement for Pooling layer, and it also reduces calculation complexity of the whole model comparing to calculate the whole activation map and then pooling it.

How is a filter assigned with feature in CNN? (or is it assigned?)

Lets say the first conv layer has 32 filters of size 5x5 with stride of 1.
model.add(Conv2D(32, (5, 5), input_shape=input_shape))
Lets say the image is of size 32x32x3(channesl). So when a filter convolves with a part of an image, is it already looking for a specific feature? I understand that the filter matrix is initialized with random numbers. But do they already have a sort of purpose to what they are looking for? Could you explain how features are being detected in CNN?
The goal of a convolutional layer is filtering. As we move over an image we effectively check for patterns in that section of the image. This works because of filters, stacks of weights represented as a vector, which are multiplied by the values outputed by the convolution.When training an image, these weights change, and so when it is time to evaluate an image, these weights return high values if it thinks it is seeing a pattern it has seen before. The combinations of high weights from various filters let the network predict the content of an image.
So, when a filter convolves with a part of an image, at first, it doesn't know it is feature or not, by training and changing weights, the filters are adaptive to the features in images so that the Loss function should be minimum with the ground truth. The reason for initialization is just we will change weights so that the predicted value will be as closest as possible to the given label.

Keras : order in which dimensions of an input tensor is specified

Consider an input layer in keras as:
model.add(layers.Dense(32, input_shape=(784,)))
What this says is input is a 2D tensor where axix=0 (batch dimension) is not specified while axis=1 is 784. Axis=0 can take any value.
My question is: isnt this style confusing?
Ideally, should it not be
input_shape=(?,784)
This reflects axis=0 is wildcard while axis=1 should be 784
Any particular reason why it is so ? Am I missing something here ?
The consistency in this case is between the sizes of the layers and the size of the input. In general, the shapes are assumed to represent the nature of the data; in that sense, the batch dimension is not part of the data itself, but rather how you group it for training or evaluation. So, in your code snippet, it is quite clear that you have inputs with 784 features and a first layer producing a vector of 32 features. If you want to explicitly include the batch dimension, you can use instead batch_input_shape=(None, 784) (this is sometimes necessary, for example if you want to give batches of a fixed size but with an additional time dimension of unknown size). This is explained in the Sequential model guide, but also matches the documentation of the Input layer, where you can give a shape or batch_shape parameter (analogous to input_shape or batch_input_shape).

Resources