Summarize the Problem
I have a Raw Signal from a Sensor which is 76000 Datapoints long. I want to
process those data with a CNN. To do that, I thought I could use a Lambda Layer to form a Short Time Fourier Transformation from the Raw Signal such as
x = Lambda(lambda v: tf.abs(tf.signal.stft(v,frame_length=frame_length,frame_step=frame_step)))(x)
which totally works. But I want to go one step further and process the Raw data in advance. In the hope that a Convolution1D layer works as a filter to let some of the frequency pass and block others.
What I tried
I do have the two separate (Conv1D example for raw Data processing & the Conv2D example where I process the STFT "image") up and running. But I want to combine these.
Conv1D Where the input is : input = Input(shape = (76000,))
x = Lambda(lambda v: tf.expand_dims(v,-1))(input)
x = layers.Conv1D(filters =10,kernel_size=100,activation = 'relu')(x)
x = Flatten()(x)
output = Model(input, x)
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 76000)] 0
_________________________________________________________________
lambda_2 (Lambda) (None, 76000, 1) 0
_________________________________________________________________
conv1d (Conv1D) (None, 75901, 10) 1010
________________________________________________________________
Conv2D same input
x = Lambda(lambda v:tf.expand_dims(tf.abs(tf.signal.stft(v,frame_length=frame_length,frame_step=frame_step)),-1))(input)
x = BatchNormalization()(x)
Model: "model_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 76000)] 0
_________________________________________________________________
lambda_8 (Lambda) (None, 751, 513, 1) 0
_________________________________________________________________
batch_normalization_3 (Batch (None, 751, 513, 1) 4
_________________________________________________________________
. . .
. . .
flatten_4 (Flatten) (None, 1360) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 1360) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 1361
Im Looking for a way to combine the start from the "conv1d" to the "lambda_8" layer. If I put them together I ll get :
x = Lambda(lambda v: tf.expand_dims(v,-1))(input)
x = layers.Conv1D(filters =10,kernel_size=100,activation = 'relu')(x)
#x = Flatten()(x)
x = Lambda(lambda v:tf.expand_dims(tf.abs(tf.signal.stft(v,frame_length=frame_length,frame_step=frame_step)),-1))(x)
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 76000)] 0
_________________________________________________________________
lambda_17 (Lambda) (None, 76000, 1) 0
_________________________________________________________________
conv1d_6 (Conv1D) (None, 75901, 10) 1010
_________________________________________________________________
lambda_18 (Lambda) (None, 75901, 0, 513, 1) 0 <-- Wrong
=================================================================
Which is not what I am looking for. It should look more like (None,751,513,10,1).
So far I could not find a suitable solution.
Can someone help me?
Thanks in advance!
From the documentation, it seems the stft only accepts (..., length) inputs, it doesn't accept (..., length, channels).
Thus, the first suggestion is to move the channels to another dimension first, to keep the length at the last index and make the function work.
Now, of course, you will need matching lengths, you can't match 76000 with 75901. Thus the second suggestion is to use a padding='same' in the 1D convolutions to keep the lengths equal.
And lastly, since you will already have 10 channels in the result of the stft, you don't need to expand dims in the last lambda.
Summarizing:
1D part
inputs = Input((76000,)) #(batch, 76000)
c1Out = Lambda(lambda x: K.expand_dims(x, axis=-1))(inputs) #(batch, 76000, 1)
c1Out = Conv1D(10, 100, activation = 'relu', padding='same')(c1Out) #(batch, 76000, 10)
#permute for putting length last, apply stft, put the channels back to their position
c1Stft = Permute((2,1))(c1Out) #(batch, 10, 76000)
c1Stft = x = Lambda(lambda v: tf.abs(tf.signal.stft(v,
frame_length=frame_length,
frame_step=frame_step)
)
)(c1Stft) #(batch, 10, probably 751, probably 513)
c1Stft = Permute((2,3,1))(c1Stft) #(batch, 751, 513, 10)
2D part, your code seems ok:
c2Out = Lambda(lambda v: tf.expand_dims(tf.abs(tf.signal.stft(v,
frame_length=frame_length,
frame_step=frame_step)
),
-1))(inputs) #(batch, 751, 513, 1)
Now that everything has compatible dimensions
#maybe
#c2Out = Conv2D(10, ..., padding='same')(c2Out)
joined = Concatenate()([c1Stft, c2Out]) #(batch, 751, 513, 11) #maybe (batch, 751, 513, 20)
further = BatchNormalization()(joined)
further = Conv2D(...)(further)
Warning: I don't know if they made stft differentiable or not, the Conv1D part will only work if the gradients are defined.
Related
When I train my model it has a two-dimension output - it is (none, 1) - corresponding to the time series I'm trying to predict. But whenever I load the saved model in order to make predictions, it has a three-dimensional output - (none, 40, 1) - where 40 corresponds to the n_steps required to fit the conv1D network. What is wrong?
Here is the code:
df = np.load('Principal.npy')
# Conv1D
#model = load_model('ModeloConv1D.h5')
model = autoencoder_conv1D((2, 20, 17), n_steps=40)
model.load_weights('weights_35067.hdf5')
# summarize model.
model.summary()
# load dataset
df = df
# split into input (X) and output (Y) variables
X = f.separar_interface(df, n_steps=40)
# THE X INPUT SHAPE (59891, 17) length and attributes, respectively ##
# conv1D input format
X = X.reshape(X.shape[0], 2, 20, X.shape[2])
# Make predictions
test_predictions = model.predict(X)
## test_predictions.shape = (59891, 40, 1)
test_predictions = model.predict(X).flatten()
##test_predictions.shape = (2395640, 1)
plt.figure(3)
plt.plot(test_predictions)
plt.legend('Prediction')
plt.show()
In the plot below you can see that it is plotting the input format.
Here is the network architecture:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_70 (TimeDis (None, 1, 31, 24) 4104
_________________________________________________________________
time_distributed_71 (TimeDis (None, 1, 4, 24) 0
_________________________________________________________________
time_distributed_72 (TimeDis (None, 1, 4, 48) 9264
_________________________________________________________________
time_distributed_73 (TimeDis (None, 1, 1, 48) 0
_________________________________________________________________
time_distributed_74 (TimeDis (None, 1, 1, 64) 12352
_________________________________________________________________
time_distributed_75 (TimeDis (None, 1, 1, 64) 0
_________________________________________________________________
time_distributed_76 (TimeDis (None, 1, 64) 0
_________________________________________________________________
lstm_17 (LSTM) (None, 100) 66000
_________________________________________________________________
repeat_vector_9 (RepeatVecto (None, 40, 100) 0
_________________________________________________________________
lstm_18 (LSTM) (None, 40, 100) 80400
_________________________________________________________________
time_distributed_77 (TimeDis (None, 40, 1024) 103424
_________________________________________________________________
dropout_9 (Dropout) (None, 40, 1024) 0
_________________________________________________________________
dense_18 (Dense) (None, 40, 1) 1025
=================================================================
As I've found my mistake, and as I think it may be useful for someone else, I'll reply to my own question:
In fact, the network output has the same format as the training dataset labels. It means, the saved model is generating an output with shape (None, 40, 1) since it is exactly the same shape you (me) have given to the training output labels.
You (i.e. me) appreciate a difference between the network output while training and the network while predicting because you are most probably using a method such as train_test_split while training, which randomize the network output. Therefore, What you see at end of training is the production of this randomized batch.
In order to correct your problem (my problem), you should change the shape of the dataset labels from (None, 40, 1) to (None, 1), as you have a regression problem for a time series. For fixing that in your above network, you'd better set a flatten layer before the dense output layer. Therefore, I'll get the result your are looking for.
I want to predict a time series X1 of n points based on two different time series X2 and X3, also of n points each. Those time series interact, so I was hoping to use similar methods as with combining images to produce another image.
So far I have successfully implemented an autoencoder to learn and return all time series (X1, X2, X3). When I tried to set up a neural network to use X2 and X3 only to predict X1 (of 3000 units) the model doesn't compile and I am getting an error:
Error when checking target: expected sequential_9 to have 2 dimensions, but got array with shape (61, 3000, 1, 1)
In different combinations it breaks at flatten_x or dense_x.
It works if my output is of only one unit and not 3000.
The network I tried would have the following layers:
Layer (type) Output Shape Param #
=================================================================
input_8 (InputLayer) (None, 3000, 2, 1) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 3000, 2, 32) 96
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 1500, 2, 32) 0
_________________________________________________________________
flatten_7 (Flatten) (None, 96000) 0
_________________________________________________________________
dense_6 (Dense) (None, 3000, 1, 32)
Here is the code that I'm using to create the net:
network = Sequential((
Conv2D(filters=32, kernel_size=(1,2), activation='relu', input_shape=(x, y, inChannel)),
MaxPooling2D(pool_size = (2, 1)),
Flatten(),
Dense(3000, activation='relu'),
))
network.compile(loss='mean_squared_error', optimizer = RMSprop())
The input has shape (61, 3000, 2, 1).
Should I specify the expected inputs/outputs somewhere and I'm not doing that? Make some data transformations on the way? Maybe use a different architecture?
Thanks for all suggestions!
The network you've coded does not have the architecture you intended
If you print out
network.summary()
you get
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 3000, 1, 32) 96
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 1500, 1, 32) 0
_________________________________________________________________
flatten_6 (Flatten) (None, 48000) 0
_________________________________________________________________
dense_6 (Dense) (None, 3000) 144003000
=================================================================
Total params: 144,003,096
Trainable params: 144,003,096
Non-trainable params: 0
So, you have to change your architecture to get the desired output shape.
I am building an LSTM network for multivariate time series classification using 2 categorical features which I have created Embedding layers for in Keras. The model compiles and the architecture is displayed below with code. I am getting a ValueError: all the input array dimensions except for the concatenation axis must match exactly. This is strange to me because of model compiling and the output shapes seem to match (3D alignment concatenated along axis = -1). The model fit X parameters are a list of 3 inputs (first categorical variable array, second categorical variable array, and multivariate time series input 3-D for LSTM)
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_4 (InputLayer) (None, 1) 0
__________________________________________________________________________________________________
input_5 (InputLayer) (None, 1) 0
__________________________________________________________________________________________________
VAR_1 (Embedding) (None, 46, 5) 50 input_4[0][0]
__________________________________________________________________________________________________
VAR_2 (Embedding) (None, 46, 13) 338 input_5[0][0]
__________________________________________________________________________________________________
time_series (InputLayer) (None, 46, 11) 0
__________________________________________________________________________________________________
concatenate_3 (Concatenate) (None, 46, 18) 0 VAR_1[0][0]
VAR_2[0][0]
__________________________________________________________________________________________________
concatenate_4 (Concatenate) (None, 46, 29) 0 time_series[0][0]
concatenate_3[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) (None, 46, 100) 52000 concatenate_4[0][0]
__________________________________________________________________________________________________
attention_2 (Attention) (None, 100) 146 lstm_2[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 101 attention_2[0][0]
==================================================================================================
Total params: 52,635
Trainable params: 52,635
Non-trainable params: 0
n_timesteps = 46
n_features = 11
def EmbeddingNet(cat_vars,n_timesteps,n_features,embedding_sizes):
inputs = []
embed_layers = []
for (c, (in_size, out_size)) in zip(cat_vars, embedding_sizes):
i = Input(shape=(1,))
o = Embedding(in_size, out_size, input_length=n_timesteps, name=c)(i)
inputs.append(i)
embed_layers.append(o)
embed = Concatenate()(embed_layers)
time_series_input = Input(batch_shape=(None,n_timesteps,n_features ), name='time_series')
inputs.append(time_series_input)
concatenated_inputs = Concatenate(axis=-1)([time_series_input, embed])
lstm_layer1 = LSTM(units=100,return_sequences=True)(concatenated_inputs)
attention = Attention()(lstm_layer1)
output_layer = Dense(1, activation="sigmoid")(attention)
opt = Adam(lr=0.001)
model = Model(inputs=inputs, outputs=output_layer)
model.compile(loss='binary_crossentropy',optimizer=opt,metrics=['accuracy'])
model.summary()
return model
model = EmbeddingNet(cat_vars,n_timesteps,n_features,embedding_sizes)
history = model.fit(x=[x_train_cat_array[0],x_train_cat_array[1],x_train_input], y=y_train_input, batch_size=8, epochs=1, verbose=1, validation_data=([x_val_cat_array[0],x_val_cat_array[1],x_val_input], y_val_input),shuffle=False)
I'm trying to the the very same thing. You should concatenate over axis 2. Please check HERE
Let me know if this works in your dataset, because categorical features are not giving any benefit to me.
I need to identify if two fingerprints (from id card and sensor) match or not. Below some examples from my database (3000 pairs of images):
Example of matching images
Example of non-matching images
I am trying to train a siamese network which receives a pair of images and its output is [1, 0] if they don't match and [0, 1] if they match, then I created my model with Keras:
image_left = Input(shape=(200, 200, 1))
image_right = Input(shape=(200, 200, 1))
vector_left = conv_base(image_left)
vector_right = conv_base(image_right)
merged_features = concatenate([vector_left, vector_right], axis=-1)
fc1 = Dense(64, activation='relu')(merged_features)
fc1 = Dropout(0.2)(fc1)
# # fc2 = Dense(128, activation='relu')(fc1)
pred = Dense(2, activation='softmax')(fc1)
model = Model(inputs=[image_left, image_right], outputs=pred)
Where conv_base is a convolutional architecture. Actually, I have tried with ResNet, leNet, MobileNetV2 and NASNet from keras.applications, but they don't work.
conv_base = NASNetMobile(weights = None,
include_top=True,
classes=256)
My model summary is similar as shown below (depending on corresponding network used):
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) (None, 200, 200, 1) 0
__________________________________________________________________________________________________
input_3 (InputLayer) (None, 200, 200, 1) 0
__________________________________________________________________________________________________
NASNet (Model) (None, 256) 4539732 input_2[0][0]
input_3[0][0]
__________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 512) 0 NASNet[1][0]
NASNet[2][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 64) 32832 concatenate_5[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 64) 0 dense_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 2) 130 dropout_1[0][0]
==================================================================================================
Total params: 4,572,694
Trainable params: 4,535,956
Non-trainable params: 36,738
Additional to convolutional architecture changes, I've tried with using pre-trained weights, setting all layers as trainable, setting last convolutional layers as trainable, data augmentation, using categorical_crossentropy and contrastive_loss functions, changing learning rate, but they all have same behavior. It is, training and validation accuracy are always 0.5.
Does anybody have an idea about what I am missing/doing wrong?
Thank you.
I'm trying to build a Q&A model based off of the bAbI Task 8 example and I am having trouble merging two of my input layers into one layer. Here is my current model architecture:
story_input = Input(shape=(story_maxlen,vocab_size), name='story_input')
story_input_proc = Embedding(vocab_size, latent_dim, name='story_input_embed', input_length=story_maxlen)(story_input)
story_input_proc = Reshape((latent_dim,story_maxlen), name='story_input_reshape')(story_input_proc)
query_input = Input(shape=(query_maxlen,vocab_size), name='query_input')
query_input_proc = Embedding(vocab_size, latent_dim, name='query_input_embed', input_length=query_maxlen)(query_input)
query_input_proc = Reshape((latent_dim,query_maxlen), name='query_input_reshape')(query_input_proc)
story_query = dot([story_input_proc, query_input_proc], axes=(1, 1), name='story_query_merge')
encoder = LSTM(latent_dim, return_state=True, name='encoder')
encoder_output, state_h, state_c = encoder(story_query)
encoder_output = RepeatVector(3, name='encoder_3dim')(encoder_output)
encoder_states = [state_h, state_c]
decoder = LSTM(latent_dim, return_sequences=True, name='decoder')(encoder_output, initial_state=encoder_states)
answer_output = Dense(vocab_size, activation='softmax', name='answer_output')(decoder)
model = Model([story_input, query_input], answer_output)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
and here is the output of model.summary()
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
story_input (InputLayer) (None, 358, 38) 0
__________________________________________________________________________________________________
query_input (InputLayer) (None, 5, 38) 0
__________________________________________________________________________________________________
story_input_embed (Embedding) (None, 358, 64) 2432 story_input[0][0]
__________________________________________________________________________________________________
query_input_embed (Embedding) (None, 5, 64) 2432 query_input[0][0]
__________________________________________________________________________________________________
story_input_reshape (Reshape) (None, 64, 358) 0 story_input_embed[0][0]
__________________________________________________________________________________________________
query_input_reshape (Reshape) (None, 64, 5) 0 query_input_embed[0][0]
__________________________________________________________________________________________________
story_query_merge (Dot) (None, 358, 5) 0 story_input_reshape[0][0]
query_input_reshape[0][0]
__________________________________________________________________________________________________
encoder (LSTM) [(None, 64), (None, 17920 story_query_merge[0][0]
__________________________________________________________________________________________________
encoder_3dim (RepeatVector) (None, 3, 64) 0 encoder[0][0]
__________________________________________________________________________________________________
decoder (LSTM) (None, 3, 64) 33024 encoder_3dim[0][0]
encoder[0][1]
encoder[0][2]
__________________________________________________________________________________________________
answer_output (Dense) (None, 3, 38) 2470 decoder[0][0]
==================================================================================================
Total params: 58,278
Trainable params: 58,278
Non-trainable params: 0
__________________________________________________________________________________________________
where vocab_size = 38, story_maxlen = 358, query_maxlen = 5, latent_dim = 64, and the batch size = 64.
When I try to train this model I get the error:
Input to reshape is a tensor with 778240 values, but the requested shape has 20480
Here is the formula for those two values:
input_to_reshape = batch_size * latent_dim * query_maxlen * vocab_size
requested_shape = batch_size * latent_dim * query_maxlen
Where I'm At
I believe the error message is saying the shape of the tensor inputted into the query_input_reshape layer is (?, 5, 38, 64) but it is expecting a tensor of shape (?, 5, 64) (see formulas above), but I could be wrong on that.
When I change the target_shape input of Reshape to be 3D (i.e. Reshape((latent_dim,query_maxlen,vocab_size)) I get the error total size of new array must be unchanged, which doesn't make any sense to me because the input is 3D. You would think that Reshape((latent_dim,query_maxlen)) would give me that error because it'd be changing a 3D tensor into a 2D tensor, but it compiles fine, so I've no clue what's going on there.
The only reason I'm using Reshape is because I need to merge the two tensors as an input into the LSTM encoder. When I try to get rid of the Reshape layers I just get dimension mismatch errors when I try to compile the model. The model architecture above at least compiles but I can't train it.
Can someone please help me figure out how I can merge the story_input and query_input layers? Thanks!