How to Implement the Conv1DTranspose in keras? - keras

I Know there is the Conv2DTranspose in keras which can be used in Image. We need to use it in NLP, so the 1D deconvolution is needed.
How do we implement the Conv1DTranspose in keras?

Use keras backend to fit the input tensor to 2D transpose convolution. Do not always use transpose operation for it will consume a lot of time.
import keras.backend as K
from keras.layers import Conv2DTranspose, Lambda
def Conv1DTranspose(input_tensor, filters, kernel_size, strides=2, padding='same'):
"""
input_tensor: tensor, with the shape (batch_size, time_steps, dims)
filters: int, output dimension, i.e. the output tensor will have the shape of (batch_size, time_steps, filters)
kernel_size: int, size of the convolution kernel
strides: int, convolution step size
padding: 'same' | 'valid'
"""
x = Lambda(lambda x: K.expand_dims(x, axis=2))(input_tensor)
x = Conv2DTranspose(filters=filters, kernel_size=(kernel_size, 1), strides=(strides, 1), padding=padding)(x)
x = Lambda(lambda x: K.squeeze(x, axis=2))(x)
return x

In my answer, I suppose you are previously using Conv1D for the convolution.
Conv2DTranspose is new in Keras2, it used to be that what it does was done by a combination of UpSampling2D and a convolution layer. In StackExchange[Data Science] there is a very interesting discussion about what are deconvolutional layers (one answer includes very usefull animated gifs).
Check this discussion about "Why all convolutions (no deconvolutions) in "Building Autoencoders in Keras" interesting. Here is an excerpt: "As Francois has explained multiple times already, a deconvolution layer is only a convolution layer with an upsampling. I don't think there is an official deconvolution layer. The result is the same." (The discussion goes on, it might be that they are approximately, not exactly the same - also, since then, Keras 2 introduced Conv2DTranspose)
The way I understand it, a combination of UpSampling1D and then Convolution1D is what you are looking for, I see no reason to go to 2D.
If however you want to go with Conv2DTranspose, you will need to first Reshape the input from 1D to 2D e.g.
model = Sequential()
model.add(
Conv1D(
filters = 3,
kernel_size = kernel_size,
input_shape=(seq_length, M),#When using this layer as the first layer in a model, provide an input_shape argument
)
)
model.add(
Reshape( ( -1, 1, M) )
)
model.add(
keras.layers.Conv2DTranspose(
filters=M,
kernel_size=(10,1),
data_format="channels_last"
)
)
The inconvenient part for using Conv2DTranspose is that you need to specify seq_length and cannot have it as None (arbitrary length series)
Unfortunately, the same is true with UpSampling1D for TensorFlow back-end (Theano seems to be once again better here - too bad its not gonna be around)

In TensorFlow v2.2.0 the Conv1DTranspose layer has been implemented in the tf.keras.layers API. Check it out!

You can reshape it to occupy an extra dimension, run the deconvolution, and then reshape it back. In practice, this works. But I've not really thought very hard if it has any theoretical implications (but it seems to theoretically also be fine as you are not going to "convolve" over that dimension
x = Reshape( ( -1, 1 ) )( x )
x = Permute( ( 3, 1, 2 ) )( x )
x = Conv2DTranspose( filters, kernel )( x )
x = Lambda( K.squeeze, arguments={"axis":1} )( x )

Related

Calculate variance with a kernel size in a tensor

Like what nn.Conv2d or nn.AvgPool2d do with a tensor and a kernel size, I would like to calculate the variances of a tensor with a kernel size. How can I achieve this? I guess maybe source code of pytorch should be touched?
If it's only the variance you are after, you can use the fact that
var(x) = E[x^2] - E[x]^2
Using avg_pool2d you can estimate the local average of x and of x squared:
import torch.nn.functional as nnf
running_var = nnf.avg_pool2d(x**2, kernel_size=2, stride=1) - nnf.avg_pool2d(x, kernel_size=2,stride=1)**2
However, if you want a more general method of performing "sliding window" operations, you should become familiarized with unfold and fold:
u = nnf.unfold(x, kernel_size=2, stride=1) # get all kernel_size patches as vectors
running_var2 = torch.var(u, unbiased=False, dim=1)
# reshape back to original shape ("folding")
running_var2 = running_var2.reshape(x.shape[0], 1, x.shape[2]-1, x.shape[3]-1)

output layer regularization implementation

I’m building a NN model using keras, and I wish to impose a constraint on it that doesn’t (directly) have to do with the weights. Would be very grateful for some help / points me towards some relevant keywords to look up. The constraint I wish to impose is a bit complex, but it can be simplified in the following manner: I wish to impose a constraint on the output of certain inputs of the net. For the sake of simplicity, let’s say the constraint looks like NN(3)+NN(4) < 10, where NN is the neural net, which can be seen as a function. How can I impose such a constraint? Thank you very much in advance for any help on the subject!
edit: A more detailed explanation of what I'm trying to do and why.
The theoretical model I'm building is this:
I'm feeding the output of the first net into the input of the second net, along with an additive gaussian noise.
The constraint I wish to impose is on the output of the first NN (g). Why? Without a constraint, the net maps the inputs to outputs as high as it possibly can in order to make the additive noise as insignificant as possible. And rightly so, this is the optimal encoding function g, but it's not very interesting :) And so I wish to impose a constraint on the output of the first NN (g). More specifically, the constraint is on the total power of the function: integral{ fX(x) * g(x)^2 dx }. But this can be simplified more or less, to a function that looks something like what I described earlier - g(3)+g(4)<10. More specifically, the function is sum { fX(x) * g(i)^2 * dx } < max_power, for some sampled inputs i.
This is the problem, now here's how I attempted to implement it:
model = Sequential([
Dense(300, input_dim=1, activation='relu'),
Dense(300, activation='relu'),
Dense(1, activation='linear', name=encoder_output),
GaussianNoise(nvar, name='noise'),
Dense(300, activation='relu', name=decoder_input),
Dense(300, activation='relu'),
Dense(1, activation='linear', name=decoder_output),
])
Mainly, this is supposedly a single neural net, and not really 2 (although there is no difference obviously).
The import things to note is the input dim 1, output dim 1 (x and y in the diagram), and the gaussian noise in the middle. The hidden layers are not very interesting right now, I'll optimize them at a later point.
In this model, I wish to impose a constraint on the output of a (supposedly) hidden layer named encoder_output. Hope this clarifies things.
You could use a multi input/multi output model with shared weights layers. The model could for example look like this:
from keras.layers import Input, Dense, Add
from keras.models import Model
# Shared weights layers
hidden = Dense(10, activation='relu')
nn_output = Dense(1, activation='relu')
x1 = Input(shape=(1,))
h1 = hidden(x1)
y1 = nn_output(h1)
x2 = Input(shape=(1,))
h2 = hidden(x2)
y2 = nn_output(h2)
# Your constraint
# In case it should be more complicated, you can implement
# a custom keras layer
sum = Add()([y1, y2])
model = Model(inputs=[x1, x2], outputs=[y1, y2, sum])
model.compile(optimizer='sgd', loss='mse')
X_train_1 = [3,4]
X_train_2 = [4,3]
y_train_1 = [123,456] # your expected output
y_train_2 = [456,123] # your expected output
s = [10,10] # expected sums
model.fit([X_train_1, X_train_2], [y_train_1, y_train_2, s], epochs=10)
If you have no exact value for your constraint that can be used as an expected output, you can remove it from the outputs and write a simple custom regularizer that would be used on it. There is a simple example for a custom regularizer in the Keras documentation.

how to obtain the runtime batch size of a Keras model

Based on this post. I need some basic implementation help. Below you see my model using a Dropout layer. When using the noise_shape parameter, it happens that the last batch does not fit into the batch size creating an error (see other post).
Original model:
def LSTM_model(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
model = Sequential()
model.add(Masking(mask_value=MaskWert, input_shape=(X_train.shape[1],X_train.shape[2]) ))
model.add(Dropout(dropout, noise_shape=(batchsize, 1, X_train.shape[2]) ))
model.add(Dense(hidden_units, activation='sigmoid', kernel_constraint=max_norm(max_value=4.) ))
model.add(LSTM(hidden_units, return_sequences=True, dropout=dropout, recurrent_dropout=dropout))
Now Alexandre Passos suggested to get the runtime batchsize with tf.shape. I tried to implement the runtime batchsize idea it into Keras in different ways but never working.
import Keras.backend as K
def backend_shape(x):
return K.shape(x)
def LSTM_model(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
batchsize=backend_shape(X_train)
model = Sequential()
...
model.add(Dropout(dropout, noise_shape=(batchsize[0], 1, X_train.shape[2]) ))
...
But that did just give me the input tensor shape but not the runtime input tensor shape.
I also tried to use a Lambda Layer
def output_of_lambda(input_shape):
return (input_shape)
def LSTM_model_2(X_train,Y_train,dropout,hidden_units,MaskWert,batchsize):
model = Sequential()
model.add(Lambda(output_of_lambda, outputshape=output_of_lambda))
...
model.add(Dropout(dropout, noise_shape=(outputshape[0], 1, X_train.shape[2]) ))
And different variants. But as you already guessed, that did not work at all.
Is the model definition actually the correct place?
Could you give me a tip or better just tell me how to obtain the running batch size of a Keras model? Thanks so much.
The current implementation does adjust the according to the runtime batch size. From the Dropout layer implementation code:
symbolic_shape = K.shape(inputs)
noise_shape = [symbolic_shape[axis] if shape is None else shape
for axis, shape in enumerate(self.noise_shape)]
So if you give noise_shape=(None, 1, features) the shape will be (runtime_batchsize, 1, features) following the code above.

Keras conv1d layer parameters: filters and kernel_size

I am very confused by these two parameters in the conv1d layer from keras:
https://keras.io/layers/convolutional/#conv1d
the documentation says:
filters: Integer, the dimensionality of the output space (i.e. the number output of filters in the convolution).
kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.
But that does not seem to relate to the standard terminologies I see on many tutorials such as https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/ and https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/
Using the second tutorial link which uses Keras, I'd imagine that in fact 'kernel_size' is relevant to the conventional 'filter' concept which defines the sliding window on the input feature space. But what about the 'filter' parameter in conv1d? What does it do?
For example, in the following code snippet:
model.add(embedding_layer)
model.add(Dropout(0.2))
model.add(Conv1D(filters=100, kernel_size=4, padding='same', activation='relu'))
suppose the embedding layer outputs a matrix of dimension 50 (rows, each row is a word in a sentence) x 300 (columns, the word vector dimension), how does the conv1d layer transforms that matrix?
Many thanks
You're right to say that kernel_size defines the size of the sliding window.
The filters parameters is just how many different windows you will have. (All of them with the same length, which is kernel_size). How many different results or channels you want to produce.
When you use filters=100 and kernel_size=4, you are creating 100 different filters, each of them with length 4. The result will bring 100 different convolutions.
Also, each filter has enough parameters to consider all input channels.
The Conv1D layer expects these dimensions:
(batchSize, length, channels)
I suppose the best way to use it is to have the number of words in the length dimension (as if the words in order formed a sentence), and the channels be the output dimension of the embedding (numbers that define one word).
So:
batchSize = number of sentences
length = number of words in each sentence
channels = dimension of the embedding's output.
The convolutional layer will pass 100 different filters, each filter will slide along the length dimension (word by word, in groups of 4), considering all the channels that define the word.
The outputs are shaped as:
(number of sentences, 50 words, 100 output dimension or filters)
The filters are shaped as:
(4 = length, 300 = word vector dimension, 100 output dimension of the convolution)
Below code from the explanation can help do this. I went similar question and answered it myself.
from tensorflow.keras.layers import MaxPool1D
import tensorflow.keras.backend as K
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv1D
tf.random.set_seed(1) # nowadays instead of tf.set_random_seed(1)
batch,rows,cols = 3,8,3
m, n, k = batch, rows, cols
input_shape = (batch,rows,cols)
np.random.seed(132) # nowadays instead of np.set_random_seed = 132
data = np.random.randint(low=1,high=6,size=input_shape,dtype='int32')
data = np.float32(data)
data = tf.constant(data)
print("Data:")
print(K.eval(data))
print()
print(f'm,n,k:{input_shape}')
from tensorflow.keras.layers import Conv1D
#############################
# Understandin filters and kernel_size
##############################
num_filters=5
kernel_size= 3
'''
Few Notes about Kernel_size:
1. max_kernel_size == max_rows
2. since Conv1D, we are creating 1D Matrix of 1's with kernel_size
if kernel_size = 1, [[1,1,1..]]
if kernel_size = 2, [[1,1,1..][1,1,1,..]]
if kernel_size = 3, [[1,1,1..][1,1,1,..]]
I have chosen tf.keras.initializers.constant(1) to create a matrix of Ones.
Size of matrix is Kernel_Size
'''
y= Conv1D(filters=num_filters,kernel_size=kernel_size,
kernel_initializer=tf.keras.initializers.constant(1),
#glorot_uniform(seed=12)
input_shape=(k,n)
)(data)
#########################
# Checking the out outcome
#########################
print(K.eval(y))
print(f' Resulting output_shape == (batch_size, num_rows-kernel_size+1,num_filters): {y.shape}')
# # Verification
K.eval(tf.math.reduce_sum(data,axis=(2,1), # Sum along axis=2, and then along
axis=1,keep_dims=True)
###########################################
# Understanding MaxPool and Strides in
##########################################
pool = MaxPool1D(pool_size=3,strides=3)(y)
print(K.eval(pool))
print(f'Shape of Pool: {pool.shape}')

Merge a forward lstm and a backward lstm in Keras

I would like to merge a forward LSTM and a backward LSTM in Keras. The input array of the backward LSTM is different from that of a forward LSTM. Thus, I cannot use keras.layers.Bidirectional.
The forward input is (10, 4).
The backward input is (12, 4) and it is reversed before put into the model. I would like to reverse it again after LSTM and merge it with the forward.
The simplified model is as follows.
from lambdawithmask import Lambda as MaskLambda
def reverse_func(x, mask=None):
return tf.reverse(x, [False, True, False])
forward = Sequential()
backward = Sequential()
model = Sequential()
forward.add(LSTM(input_shape = (10, 4), output_dim = 4, return_sequences = True))
backward.add(LSTM(input_shape = (12, 4), output_dim = 4, return_sequences = True))
backward.add(MaskLambda(function=reverse_func, mask_function=reverse_func))
model.add(Merge([forward, backward], mode = "concat", concat_axis = 1))
When I run this, the error message is:
Tensors in list passed to 'values' of 'ConcatV2' Op have types [bool, float32] that don't all match.
Could anyone help me? I coded in Python 3.5.2 with Keras (2.0.5) and the backend is tensorflow (1.2.1).
First of all, if you have two different inputs, you cannot use a Sequential model. You must use the functional API Model:
from keras.models import Model
The two first models can be sequential, no problem, but the junction must be a regular model. When it's about concatenating, I also use the functional approach (create the layer, then pass the input):
junction = Concatenate(axis=1)([forward.output,backward.output])
Why axis=1? You can only concatenate things with the same shape. Since you have 10 and 12, they're not compatible unless you use this exact axis for the merge, which is the second axis, considering you have (BatchSize, TimeSteps, Units)
For creating the final model, use the Model, specify the inputs and outputs:
model = Model([forward.input,backward.input], junction)
In the model to be reversed, use simply a Lambda layer. A MaskLambda does more than just the function you want. I also suggest you use the keras backend insted of tensorflow functions:
import keras.backend as K
#instead of the MaskLambda:
backward.add(Lambda(lambda x: K.reverse(x,axes=[1]), output_shape=(12,?))
Here, the ? is the amount of units your LSTM layers have. See PS at the end.
PS: I'm not sure output_dim is useful in the LSTM layer. It's necessary in Lambda layers, but I never use it anywhere else. Shapes are natural consequences of the amount of "units" you put in your layers. Strangely, you didn't specify the amount of units.
PS2: How exactly do you want to concatenate two sequences with different sizes?
As said in above answer, using a Functional API offers you much flexibility in case of multi input/output models. You can simply set the go_backwards argument as True to reverse the traversal of the input vector by the LSTM layer.
I have defined the smart_merge function below which merges the forward and backward LSTM layers together along with handling the single traversal case.
from keras.models import Model
from keras.layers import Input, merge
def smart_merge(vectors, **kwargs):
return vectors[0] if len(vectors)==1 else merge(vectors, **kwargs)
input1 = Input(shape=(10,4), dtype='int32')
input2 = Input(shape=(12,4), dtype='int32')
LtoR_LSTM = LSTM(56, return_sequences=False)
LtoR_LSTM_vector = LtoR_LSTM(input1)
RtoL_LSTM = LSTM(56, return_sequences=False, go_backwards=True)
RtoL_LSTM_vector = RtoL_LSTM(input2)
BidireLSTM_vector = [LtoR_LSTM_vector]
BidireLSTM_vector.append(RtoL_LSTM_vector)
BidireLSTM_vector= smart_merge(BidireLSTM_vector, mode='concat')

Resources