Why there is only one input in attention model in Keras? - keras

In this code, the author defined 2 input, but there is only one input feed to the model. There should be some bugs, however, I can run it. I wonder why I can successfully run this code.
def han():
# refer to 4.2 in the paper whil reading the following code
# Input for one day : max article per day =40, dim_vec=200
input1 = Input(shape=(40, 200), dtype='float32')
# Attention Layer
dense_layer = Dense(200, activation='tanh')(input1)
softmax_layer = Activation('softmax')(dense_layer)
attention_mul = multiply([softmax_layer,input1])
#end attention layer
vec_sum = Lambda(lambda x: K.sum(x, axis=1))(attention_mul)
pre_model1 = Model(input1, vec_sum)
pre_model2 = Model(input1, vec_sum)
# Input of the HAN shape (None,11,40,200)
# 11 = Window size = N in the paper 40 = max articles per day, dim_vec = 200
input2 = Input(shape=(11, 40, 200), dtype='float32')
# TimeDistributed is used to apply a layer to every temporal slice of an input
# So we use it here to apply our attention layer ( pre_model ) to every article in one day
# to focus on the most critical article
pre_gru = TimeDistributed(pre_model1)(input2)
# bidirectional gru
l_gru = Bidirectional(GRU(100, return_sequences=True))(pre_gru)
# We apply attention layer to every day to focus on the most critical day
post_gru = TimeDistributed(pre_model2)(l_gru)
# MLP to perform classification
dense1 = Dense(100, activation='tanh')(post_gru)
dense2 = Dense(3, activation='tanh')(dense1)
final = Activation('softmax')(dense2)
final_model = Model(input2, final)
final_model.summary()
return final_model

Keras models can be used as layers. In the code above, input1 is used to define pre_model{1,2}. The models are then called bellow by the model named final_model.
final_model has a single input layer.

Related

Add units (Neurons) to an existing model in Keras

I have trained a model and I want to add more units to it's hidden units and train it for some more epochs. I am implementing a constructive learning algorithm. How can I add neuron to an existing model hidden layer ? And also is there a way to only train the added units parameters and other parameters get freezed ? (In KERAS)
def create_first_sub_NN(X):
sub_input = tf.keras.Input(shape=(X.shape[1],))
h = Dense(1, activation="sigmoid",name="hidden")(sub_input)
h = tf.keras.Model(inputs=sub_input, outputs=h)
m_combined = tf.keras.layers.concatenate([h.input, h.output])
out = Dense(1, activation="relu")(m_combined)
out = tf.keras.Model(inputs=sub_input, outputs=out)
return out
def train_current_model(model,input_groups,Y,error_thr):
opt = keras.optimizers.Adam(learning_rate=0.01)
callbacks = stopAtLossValue()
# overfitCallback = EarlyStopping(monitor='loss', min_delta=5,
patience=10) # if for 10 epochs the error did not decreased more than 5, then stop the current network training
model.compile(optimizer=opt, loss='mean_absolute_error')
model.fit(input_groups, train_label, epochs=100, batch_size=32,callbacks=[callbacks])
enter code here
model = create_first_sub_NN(X1_train)
keras.utils.plot_model(model, "first.png",show_shapes=True)
print(model.summary())
list_of_inputs = [sub_X_list[0]]
train_current_model(model, list_of_inputs, train_label, 0.1)
# how to add number of units in my hidden layer for the
enter code here
I want to add neuron to my hidden layer repetitively, until my network error gets below the threshold.
I solved the problem. Instead of adding a neuron to the current layer, We can add another Dense layer which is connected to the next and previous layer and then concatenate the new layer with the old one.

How to stack same RNN for every layer?

I would like to know how to stack many layers of RNN but every layer are the same RNN. I want every layer share the same weight. I have read stack LSTM and RNN, but I found that each layer was not the same.
1 layer code:
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
first_layer_output = first_layer(Emb_output)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(first_layer_output )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
RNN 1 layer
inputs = keras.Input(shape=(maxlen,), batch_size = batch_size)
Emb_layer = layers.Embedding(max_features,word_dim)
Emb_output = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=True,stateful =True)
first_layer_output = first_layer(Emb_output)
first_layer_state = first_layer.states
second_layer = layers.SimpleRNN(n_hidden,use_bias=True,return_sequences=False,stateful =False)
second_layer_set_state = second_layer(first_layer_output, initial_state=first_layer_state)
dense_layer = layers.Dense(1, activation='sigmoid')
dense_output = dense_layer(second_layer_set_state )
model = keras.Model(inputs=inputs, outputs=dense_output)
model.summary()
enter image description here
Stack RNN 2 layer.
For example, I want to build two layers RNN, but the first layer and the second must have the same weight, such that when I update the weight in the first layer the second layer must be updated and share the same value. As far as I know, TF has RNN.state. It returns the value from the previous layer. However, when I use this, it seems that each layer is treated independently. The 2-layer RNN that I want should have trainable parameters equal to the 1-layer since they shared the same weight, but this did not work.
You can view the layer object as a container for the weights that knows how to apply the weights. You can use the layer object as many times as you want. Assuming the embedding and the RNN dimension are the same, you can do:
states = Emb_layer(inputs)
first_layer = layers.SimpleRNN(n_hidden, use_bias=True, return_sequences=True)
for _ in range(10):
states = first_layer(states)
There is no reason to set stateful to true. This is used when you split long sequences into multiple batches and what the RNN to remember the state between batches, so you do not have yo manually set initial states. You can get the final state of the RNN (that you wany you want to use for classification) by simply indexing the last position from states.

Graph disconnected issue in Keras

Architecture I want to implement
I wish to implement this architecture with Keras functional API. I am new to this and here is my code for now (which gets stuck at concatenating inputs).
# Arbitrary dimension for all embeddings
embedding_dim = 10
# Quarter hour of the day embedding
input_quarter_hour = Input(shape=(1,))
embed_quarter_hour = Embedding(metadata['n_quarter_hours'], embedding_dim, input_length=1)(input_quarter_hour)
embed_quarter_hour = Reshape((embedding_dim,))(embed_quarter_hour)
# Day of the week embedding
input_day_of_week = Input(shape=(1,))
embed_day_of_week = Embedding(metadata['n_days_per_week'], embedding_dim, input_length=1)(input_day_of_week)
embed_day_of_week = Reshape((embedding_dim,))(embed_day_of_week)
# Week of the year embedding
input_week_of_year = Input(shape=(1,))
embed_week_of_year = Embedding(metadata['n_weeks_per_year'], embedding_dim, input_length=1)(input_week_of_year)
embed_week_of_year = Reshape((embedding_dim,))(embed_week_of_year)
# Client ID embedding
input_client_ids = Input(shape=(1,))
embed_client_ids = Embedding(metadata['n_client_ids'], embedding_dim, input_length=1)(input_client_ids)
embed_client_ids = Reshape((embedding_dim,))(embed_client_ids)
# Taxi ID embedding
input_taxi_ids = Input(shape=(1,))
embed_taxi_ids = Embedding(metadata['n_taxi_ids'], embedding_dim, input_length=1)(input_taxi_ids)
embed_taxi_ids = Reshape((embedding_dim,))(embed_taxi_ids)
# Taxi stand ID embedding
input_stand_ids = Input(shape=(1,))
embed_stand_ids = Embedding(metadata['n_stand_ids'], embedding_dim, input_length=1)(input_stand_ids)
embed_stand_ids = Reshape((embedding_dim,))(embed_stand_ids)
# GPS coordinates (5 first lat/long and 5 latest lat/long, therefore 20 values)
coords_in = Input(shape=(20,))
coords_out = Dense(1, input_dim=20, init='normal')(coords_in)
#model = Sequential()
concatenated = concatenate([
embed_quarter_hour,
embed_day_of_week,
embed_week_of_year,
embed_client_ids,
embed_taxi_ids,
embed_stand_ids,
coords_out
])
out = Dense(500, activation='relu')(concatenated)
out = Dense(len(clusters),activation='softmax',name='output_layer')(out)
cast_clusters = K.cast_to_floatx(clusters)
def destination(probabilities):
return tf.matmul(probabilities, cast_clusters)
out = Activation(destination)(out)
model = Model(concatenated,out)
I am getting this error :
Graph disconnected: cannot obtain value for tensor
Tensor("input_64:0", shape=(?, 1), dtype=float32) at layer "input_64".
The following previous layers were accessed without issue: [].
I am guessing the problem stems from the size of my tensors... But I don't now how to debug this kind of code.
You should pass a list of all inputs to the model when creating a Keras Model instance. The variable concatenated that you are using in your code does not contain the inputs but instead contains the outputs of certain layers. Moreover, you should not concatenate your inputs but simply use a list.
The following code should work:
inputs = [
input_quarter_hour,
input_day_of_week,
input_week_of_year,
input_client_ids,
input_taxi_ids,
input_stand_ids,
coords_in
]
model = Model(inputs=inputs, outputs=out)

Convert code to new keras version (functional API) or how to concatenate 2 models

Megre doesn't work anymore. I tried the new functional API (concatenate, add, multiply) but it doesn't work for models. How to implement it?
lower_model = [self.build_network(self.model_config['critic_lower'], input_shape=(self.history_length, self.n_stock, 1))
for _ in range(1 + self.n_smooth + self.n_down)]
merged = Merge(lower_model, mode='concat')
# upper layer
upper_model = self.build_network(self.model_config['critic_upper'], model=merged)
# action layer
action = self.build_network(self.model_config['critic_action'], input_shape=(self.n_stock,), is_conv=False)
# output layer
merged = Merge([upper_model, action], mode='mul')
model = Sequential()
model.add(merged)
model.add(Dense(1))
return model
I cannot really give you the exact answer, because your question is not detailed enough, but I can provide you an example, where layers are concatenated. Common problem is to import Concatenate and use it as in previous versions.
nlp_input = Input(shape=(seq_length,), name='nlp_input')
meta_input = Input(shape=(10,), name='meta_input')
emb = Embedding(output_dim=embedding_size, input_dim=100, input_length=seq_length)(nlp_input)
nlp_out = Bidirectional(LSTM(128, dropout=0.3, recurrent_dropout=0.3, kernel_regularizer=regularizers.l2(0.01)))(emb)
x = concatenate([nlp_out, meta_input])
x = Dense(classifier_neurons, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[nlp_input , meta_input], outputs=[x])
This is a dirty workaround to show how to get input and output tensors from models and use concatenate layers with them. Also to learn how to use Dense and other layers with tensors and create functional API models.
Ideally, you should rewrite everything that's inside build_network for clean and optimized code. (Perhaps this doesn't even work depending on the content of this function, but this is the idea)
lower_model = [self.build_network(
self.model_config['critic_lower'],
input_shape=(self.history_length, self.n_stock, 1))
for _ in range(1 + self.n_smooth + self.n_down)]
#for building models you need input and output tensors
lower_inputs = [model.input for model in lower_model]
lower_outputs = [model.output for model in lower_model]
#these lines assume each model in the list has only one input and output
#using a concatenate layer on a list of tensors
merged_tensor = Concatenate()(lower_outputs) #or Concatenate(axis=...)(lower_outputs)
#this is a workaround for compatibility.
#ideally you should work just with tensors, not create unnecessary intermediate models
merged_model = Model(lower_inputs, merged_tensor) #make model from input tensors to outputs
# upper layer
upper_model = self.build_network(self.model_config['critic_upper'], model=merged_model)
# action layer
action = self.build_network(self.model_config['critic_action'], input_shape=(self.n_stock,), is_conv=False)
# output layer - get the output tensors from the models
upper_out = upper_model.output
action_out = action.output
#apply the Multiply layer on the list of tensors
merged_tensor = Multiply()([upper_out, action_out])
#apply the Dense layer on the merged tensor
out = Dense(1)(merged_tensor)
#get input tensors to create a model
upper_iputs = upper_model.inputs #should be a list
action_inputs = action.inputs #if not a list, append to the previous list
inputs = upper_inputs + action_inputs
model = Model(inputs, out)
return model

LSTM Model in Keras with Auxiliary Inputs

I have a dataset with 2 columns - Each column contains a set of documents. I have to match the document in Col A with documents provided in Col B. This is a supervised classification problem. So my training data contains a label column indicating whether the documents match or not.
To solve the problem, I have a created a set of features, say f1-f25 (by comparing the 2 documents) and then trained a binary classifier on these features. This approach works reasonably well, but now I would like to evaluate Deep Learning models on this problem (specifically, LSTM models).
I am using the keras library in Python. After going through the keras documentation and other tutorials available online, I have managed to do the following:
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
# Each document contains a series of 200 words
# The necessary text pre-processing steps have been completed to transform
each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')
# Next I add a word embedding layer (embed_matrix is separately created
for each word in my vocabulary by reading from a pre-trained embedding model)
x = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input1)
y = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input2)
# Next separately pass each layer thru a lstm layer to transform seq of
vectors into a single sequence
lstm_out_x1 = LSTM(32)(x)
lstm_out_x2 = LSTM(32)(y)
# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
# generate intermediate output
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)
# add auxiliary input - auxiliary inputs contains 25 features for each document pair
auxiliary_input = Input(shape=(25,), name='aux_input')
# merge aux output with aux input and stack dense layer on top
main_input = keras.layers.concatenate([auxiliary_output, auxiliary_input])
x = Dense(64, activation='relu')(main_input)
x = Dense(64, activation='relu')(x)
# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)
model = Model(inputs=[main_input1, main_input2, auxiliary_input], outputs= main_output)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit([x1, x2,aux_input], y,
epochs=3, batch_size=32)
However, when I score this on the training data, I get the same prob. score for all cases. The issue seems to be with the way auxiliary input is fed in (because it generates meaningful output when I remove the aux. input).
I also tried inserting the auxiliary input at different places in the network. But somehow I couldnt get this to work.
Any pointers?
Well, this is open for several months and people are voting it up.
I did something very similar recently using this dataset that can be used to forecast credit card defaults and it contains categorical data of customers (gender, education level, marriage status etc.) as well as payment history as time series. So I had to merge time series with non-series data. My solution was very similar to yours by combining LSTM with a dense, I try to adopt the approach to your problem. What worked for me is dense layer(s) on the auxiliary input.
Furthermore in your case a shared layer would make sense so the same weights are used to "read" both documents. My proposal for testing on your data:
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
# Each document contains a series of 200 words
# The necessary text pre-processing steps have been completed to transform
each doc to a fixed length seq
main_input1 = Input(shape=(200,), dtype='int32', name='main_input1')
main_input2 = Input(shape=(200,), dtype='int32', name='main_input2')
# Next I add a word embedding layer (embed_matrix is separately created
for each word in my vocabulary by reading from a pre-trained embedding model)
x1 = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input1)
x2 = Embedding(output_dim=300, input_dim=20000,
input_length=200, weights = [embed_matrix])(main_input2)
# Next separately pass each layer thru a lstm layer to transform seq of
vectors into a single sequence
# Comment Manngo: Here I changed to shared layer
# Also renamed y as input as it was confusing
# Now x and y are x1 and x2
lstm_reader = LSTM(32)
lstm_out_x1 = lstm_reader(x1)
lstm_out_x2 = lstm_reader(x2)
# concatenate the 2 layers and stack a dense layer on top
x = keras.layers.concatenate([lstm_out_x1, lstm_out_x2])
x = Dense(64, activation='relu')(x)
x = Dense(32, activation='relu')(x)
# generate intermediate output
# Comment Manngo: This is created as a dead-end
# It will not be used as an input of any layers below
auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(x)
# add auxiliary input - auxiliary inputs contains 25 features for each document pair
# Comment Manngo: Dense branch on the comparison features
auxiliary_input = Input(shape=(25,), name='aux_input')
auxiliary_input = Dense(64, activation='relu')(auxiliary_input)
auxiliary_input = Dense(32, activation='relu')(auxiliary_input)
# OLD: merge aux output with aux input and stack dense layer on top
# Comment Manngo: actually this is merging the aux output preparation dense with the aux input processing dense
main_input = keras.layers.concatenate([x, auxiliary_input])
main = Dense(64, activation='relu')(main_input)
main = Dense(64, activation='relu')(main)
# finally add the main output layer
main_output = Dense(1, activation='sigmoid', name='main_output')(main)
# Compile
# Comment Manngo: also define weighting of outputs, main as 1, auxiliary as 0.5
model.compile(optimizer=adam,
loss={'main_output': 'w_binary_crossentropy', 'aux_output': 'binary_crossentropy'},
loss_weights={'main_output': 1.,'auxiliary_output': 0.5},
metrics=['accuracy'])
# Train model on main_output and on auxiliary_output as a support
# Comment Manngo: Unknown information marked with placeholders ____
# We have 3 inputs: x1 and x2: the 2 strings
# aux_in: the 25 features
# We have 2 outputs: main and auxiliary; both have the same targets -> (binary)y
model.fit({'main_input1': __x1__, 'main_input2': __x2__, 'auxiliary_input' : __aux_in__}, {'main_output': __y__, 'auxiliary_output': __y__},
epochs=1000,
batch_size=__,
validation_split=0.1,
callbacks=[____])
I don't know how much this can help since I don't have your data so I can't try. Nevertheless this is my best shot.
I didn't run the above code for obvious reasons.
I found answers from https://datascience.stackexchange.com/questions/17099/adding-features-to-time-series-model-lstm Mr.Philippe Remy wrote a library to condition on auxiliary inputs. I used his library and it's very helpful.
# 10 stations
# 365 days
# 3 continuous variables A and B => C is target.
# 2 conditions dim=5 and dim=1. First cond is one-hot. Second is continuous.
import numpy as np
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
from cond_rnn import ConditionalRNN
stations = 10 # 10 stations.
time_steps = 365 # 365 days.
continuous_variables_per_station = 3 # A,B,C where C is the target.
condition_variables_per_station = 2 # 2 variables of dim 5 and 1.
condition_dim_1 = 5
condition_dim_2 = 1
np.random.seed(123)
continuous_data = np.random.uniform(size=(stations, time_steps, continuous_variables_per_station))
condition_data_1 = np.zeros(shape=(stations, condition_dim_1))
condition_data_1[:, 0] = 1 # dummy.
condition_data_2 = np.random.uniform(size=(stations, condition_dim_2))
window = 50 # we split series in 50 days (look-back window)
x, y, c1, c2 = [], [], [], []
for i in range(window, continuous_data.shape[1]):
x.append(continuous_data[:, i - window:i])
y.append(continuous_data[:, i])
c1.append(condition_data_1) # just replicate.
c2.append(condition_data_2) # just replicate.
# now we have (batch_dim, station_dim, time_steps, input_dim).
x = np.array(x)
y = np.array(y)
c1 = np.array(c1)
c2 = np.array(c2)
print(x.shape, y.shape, c1.shape, c2.shape)
# let's collapse the station_dim in the batch_dim.
x = np.reshape(x, [-1, window, x.shape[-1]])
y = np.reshape(y, [-1, y.shape[-1]])
c1 = np.reshape(c1, [-1, c1.shape[-1]])
c2 = np.reshape(c2, [-1, c2.shape[-1]])
print(x.shape, y.shape, c1.shape, c2.shape)
model = Sequential(layers=[
ConditionalRNN(10, cell='GRU'), # num_cells = 10
Dense(units=1, activation='linear') # regression problem.
])
model.compile(optimizer='adam', loss='mse')
model.fit(x=[x, c1, c2], y=y, epochs=2, validation_split=0.2)

Resources