I am using a Siamese architecture in my model for a classification task of whether both the inputs are similar.
in1 = Input(shape=(None,), dtype='int32', name='in1')
x1 = Embedding(output_dim=dim, input_dim=n_symbols, input_length=None,
weights=[embedding_weights], name='x1')(in1)
in2 = Input(shape=(None,), dtype='int32', name='in2')
x2 = Embedding(output_dim=dim, input_dim=n_symbols, input_length=None,
weights=[embedding_weights], name='x2')(in2)
l = Bidirectional(LSTM(units=100, return_sequences=False))
y1 = l(x1)
y2 = l(x2)
y = concatenate([y1, y2])
out = Dense(1, activation='sigmoid')(y)
model = Model(inputs=[in1, in2], outputs=[out])
It works correctly as the number of weights to be trained remain the same even when I use a single input. The thing that confused my though, was the tensorboard vizualization of the model.
tensorboard graph
Shouldn't both x1 and x2 map to the same bidirectional node?
Also, what do the 18 and 32 tensors signify?
Related
I'm trying to add an extra branch to the original model to improve its capability.
To be more specific, the original model looks like this:
input1 = Input()
x1 = branch1(x1)
x1 = residualnet(x1)
model = Model(inputs = [input1], outputs = x1)
My improvements would be like this:
input1 = Input()
input2 = Input()
x1 = branch1(x1) #1
x2 = branch2(x2) #2
x1 = concat([x1, x2])
x1 = residualnet(x1) #3
model = Model(inputs = [input1, input2], outputs = x1)
In this condition, I already have trained weights to #1 and #3. What is the best way to apply trained weights to those layers?
This question is about TensorFlow (and TensorBoard) version 2.2rc3, but I have experienced the same issue with 2.1.
Consider the following weird code:
from datetime import datetime
import tensorflow as tf
from tensorflow import keras
inputs = keras.layers.Input(shape=(784, ))
x1 = keras.layers.Dense(32, activation='relu', name='Model/Block1/relu')(inputs)
x1 = keras.layers.Dropout(0.2, name='Model/Block1/dropout')(x1)
x1 = keras.layers.Dense(10, activation='softmax', name='Model/Block1/softmax')(x1)
x2 = keras.layers.Dense(32, activation='relu', name='Model/Block2/relu')(inputs)
x2 = keras.layers.Dropout(0.2, name='Model/Block2/dropout')(x2)
x2 = keras.layers.Dense(10, activation='softmax', name='Model/Block2/softmax')(x2)
x3 = keras.layers.Dense(32, activation='relu', name='Model/Block3/relu')(inputs)
x3 = keras.layers.Dropout(0.2, name='Model/Block3/dropout')(x3)
x3 = keras.layers.Dense(10, activation='softmax', name='Model/Block3/softmax')(x3)
x4 = keras.layers.Dense(32, activation='relu', name='Model/Block4/relu')(inputs)
x4 = keras.layers.Dropout(0.2, name='Model/Block4/dropout')(x4)
x4 = keras.layers.Dense(10, activation='softmax', name='Model/Block4/softmax')(x4)
outputs = x1 + x2 + x3 + x4
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.summary()
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
model.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.RMSprop(),
metrics=['accuracy'])
logdir = "logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)
model.fit(x_train, y_train,
batch_size=64,
epochs=5,
validation_split=0.2,
callbacks=[tensorboard_callback])
When running it and looking at the graph created in TensorBoard
you will see the following.
As can be seen, the addition operations are really ugly.
When replacing the line
outputs = x1 + x2 + x3 + x4
With the lines:
outputs = keras.layers.add([x1, x2], name='Model/add/add1')
outputs = keras.layers.add([outputs, x3], name='Model/add/add2')
outputs = keras.layers.add([outputs, x4], name='Model/add/add3')
a much nicer graph is created by TensorBoard (in this second screenshot, the Model as well as one of the inner blocks are shown in details).
The difference between the two representations of the model is that in the second one, we could name the addition operations and group them.
I could not find any way to name these operations, unless by using the keras.layers.add(). In this model the problem does not look that critical as the model is simple, and it is easy to replace + with keras.layers.add(). However, in more complex models, it can become a real pain. For example, operations such as t[:, start:end] should be translated to complex calls to tf.strided_slice(). So my models representations are quite messy with plenty of cryptic gather, stride and concat operations.
I wonder if there is a way to wrap / group such operations to allow nicer graphs in TensorBoard.
outputs = keras.layers.Add()([x1, x2, x3, x4])
Following the hint from Marco Cerliani, Lambda layer is indeed very useful here. So the following code will group nicely the +:
outputs = keras.layers.Lambda(lambda x: x[0] + x[1], name='Model/add/add1')([x1, x2])
outputs = keras.layers.Lambda(lambda x: x[0] + x[1], name='Model/add/add2')([outputs, x2])
outputs = keras.layers.Lambda(lambda x: x[0] + x[1], name='Model/add/add3')([outputs, x2])
Or if needed to wrap strides, the following code will group nicely the t[]:
x1 = keras.layers.Lambda(lambda x: x[:, 0:5], name='Model/stride_concat/stride1')(x1) # instead of x1 = x1[:, 0:5]
x2 = keras.layers.Lambda(lambda x: x[:, 5:10], name='Model/stride_concat/stride2')(x2) # instead of x2 = x2[:, 5:10]
outputs = keras.layers.concatenate([x1, x2], name='Model/stride_concat/concat')
This answers the question asked. But actually, there is still an open issue that is described in another question: 'TensorFlowOpLayer messes up the TensorBoard graphs'
I am trying to build an image segmentation model with a Keras mobilenet model pre-trained on imagenet dataset. How ever to train the model further, I want to add the U-net layers to the existing model and only train the layers of u-net architecture with mobilenet model helping as a backbone.
Problem: The last layer of mobilenet model is of dimensions (7x7x1024), which is a RelU layer, I wish want to re-shape this to (256x256x3) which can be understood by the U-net input layer.
not the last layer, but creating a unet on mobilenet can be done using the below code:
ALPHA = 1 # Width hyper parameter for MobileNet (0.25, 0.5, 0.75, 1.0). Higher width means more accurate but slower
IMAGE_HEIGHT = 224
IMAGE_WIDTH = 224
HEIGHT_CELLS = 28
WIDTH_CELLS = 28
def create_model(trainable=True):
model = MobileNet(input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3), include_top=False, alpha=ALPHA, weights="imagenet")
block0 = model.get_layer("conv_pw_1_relu").output
block = model.get_layer("conv_pw_1_relu").output
block1 = model.get_layer("conv_pw_3_relu").output
block2 = model.get_layer("conv_pw_5_relu").output
block3 = model.get_layer("conv_pw_11_relu").output
block4 = model.get_layer("conv_pw_13_relu").output
x = Concatenate()([UpSampling2D()(block4), block3])
x = Concatenate()([UpSampling2D()(x), block2])
x = Concatenate()([UpSampling2D()(x), block1])
x = Concatenate()([UpSampling2D()(x), block])
# x = Concatenate()([UpSampling2D()(x), block0])
x = UpSampling2D()(x)
x = Conv2D(1, kernel_size=1, activation="sigmoid")(x)
x = Reshape((IMAGE_HEIGHT, IMAGE_HEIGHT))(x)
return Model(inputs=model.input, outputs=x)
I need to create a neural network that approximates a function given its parameters. I give four parameters to my neural network (A, x0, phi, omega) and I want to obtain, as output,
A sin(omega x + phi) + x0
(I need this net as a part of another network)
However, I am not able to train the network as I obtain a very poor convergence. Why is that?
I use a fully connected network with three layers. This is the code
def get_batches(N_batches):
A = tan( random.uniform(low=0.,high=2*pi,size=[N_batches,1]))
x0 = random.randn(N_batches,1)*10
omega = random.uniform(low=0.,high=10*pi, size=[N_batches,1])
phi = random.uniform(low=0.,high=2*pi, size=[N_batches,1])
x = linspace(0,t_max, n_max)
x = tile(x,N_batches).reshape(N_batches,n_max)
return (A*sin(omega*x+phi) + x0, hstack([A,x0,phi,omega]) )
N_batches = 80
N_epochs = 50
t_max = 5.0
n_max = 100
n_par = 4
net_layers = []
net_inp = Input(shape=(n_par,))
net_layers.append(Dense(25, input_shape=(n_par,), activation="relu"))
net_layers.append(Dense(25, activation="relu"))
net_layers.append(Dense(25, activation="relu"))
net_layers.append(Dense(n_max, activation="linear"))
net_l = net_inp
for i in range(len(net_layers)):
net_l = net_layers[i](net_l)
net = Model(net_inp, net_l)
net.compile(loss="mean_squared_error", optimizer="adam")
costs = zeros(N_epochs)
for i in range(N_epochs):
y_true, y_in = get_batches(N_batches)
costs[i]=net.train_on_batch(y_in,y_true)
Even if I train more, I don't get better results than this
picture (approximated function and real function plot for a test sample):
The plot of the cost function is quite strange:
What mistakes did I do? Thank you!
I'm using deep learning approach to address a regression problem with multi outputs (16 outputs), each output is between [0,1] and the sum is 1.
I am confused about which loss function is ideal to this problem, I have already test Mean squared error and Mean Absolute Error but Neural network predicts always the same value.
model = applications.VGG16(include_top=False, weights = None, input_shape = (256, 256, 3))
x = model.output
x = Flatten()(x)
x = Dense(1024)(x)
x=BatchNormalization()(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
x = Dense(512)(x)
x=BatchNormalization()(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
predictions = Dense(16,activation="sigmoid")(x)
model_final = Model(input = model.input, output = predictions)
model_final.compile(loss ='mse', optimizer = Adam(lr=0.1), metrics=['mae'])
What you are describing sounds more like a classification task, since you want to get a probability distribution at the end.
Therefore you should use a softmax (for example) in the last layer and cross-entropy as loss measure.