I am waiting a output after Conv3x3 in this shape : (112,112,32) but I took a output inthis shape(111,111,32). What can I do? - conv-neural-network

x = model.add(Conv2D(32,3, input_shape = (224,224,1),strides=2))

Related

“Concatenate layer” problem when doing GRAD-CAM. How to overcome this in my custom functional model?

I am having problems with grad-cam. I would be grateful if anyone could help. my codes are here
https://www.kaggle.com/mervearmagan/gradcamproblem
Sorry, I couldn't fix the error I got
ValueError: Input 0 is incompatible with layer model_1: expected
shape=(None, 512, 512, 3), found shape=(512, 512, 3)
img = tf.keras.layers.Input(shape = IMG_SHAPE)
gender = tf.keras.layers.Input(shape=(1,))
base_model = tf.keras.applications.InceptionV3(input_shape = IMG_SHAPE, include_top = False, weights = 'imagenet')
cnn_vec=base_model(img)
cnn_vec = tf.keras.layers.GlobalAveragePooling2D()(cnn_vec)
cnn_vec = tf.keras.layers.Dropout(0.20)(cnn_vec)
gender_vec = tf.keras.layers.Dense(32,activation = 'relu')(gender)
features = tf.keras.layers.Concatenate(axis=-1)([cnn_vec,gender_vec])
dense_layer = tf.keras.layers.Dense(256,activation = 'relu')(features)
dense_layer = tf.keras.layers.Dropout(0.1)(dense_layer)
dense_layer = tf.keras.layers.Dense(128,activation = 'relu')(dense_layer)
dense_layer = tf.keras.layers.Dropout(0.1)(dense_layer)
dense_layer = tf.keras.layers.Dense(64,activation = 'relu')(dense_layer)
output_layer = tf.keras.layers.Dense(1, activation = 'linear')(dense_layer)
model = tf.keras.Model(inputs=[img,gender],outputs=output_layer`
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, classifier_layer_names):
last_conv_layer = model.get_layer(last_conv_layer_name)
last_conv_layer_model = tf.keras.Model(model.inputs, last_conv_layer.output)
classifier_input = tf.keras.layers.Input(shape=last_conv_layer.output.shape)
#classifier_input = tf.keras.layers.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer_name in classifier_layer_names:
x = model.get_layer(layer_name)(x)
classifier_model = tf.keras.Model(classifier_input, x)
with tf.GradientTape() as tape:
last_conv_layer_output =last_conv_layer_model(img_array)
#last_conv_layer_model(img_array)
tape.watch(last_conv_layer_output)
preds = classifier_model(last_conv_layer_output)
top_pred_index = tf.argmax(preds[0])
top_class_channel = preds[:, top_pred_index]
grads = tape.gradient(top_class_channel, last_conv_layer_output)
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
last_conv_layer_output = last_conv_layer_output.numpy()[0]
pooled_grads = pooled_grads.numpy()
for i in range(pooled_grads.shape[-1]):
last_conv_layer_output[:, :, i] *= pooled_grads[i]
heatmap = np.mean(last_conv_layer_output, axis=-1)
heatmap = np.maximum(heatmap, 0) / np.max(heatmap)
return heatmap
last_conv_layer_name = 'global_average_pooling2d'
classifier_layer_names = ['dense_4']
img = get_input('4360.png' )
inputgender=tf.ones((1,1))
image=tf.reshape(img,(1,512,512,3))
heatmap = make_gradcam_heatmap([image,inputgender], model, last_conv_layer_name, classifier_layer_names)
When running the model remember to test the model using inputs in form:
model([tf.ones((1,512,512,3)),tf.ones((1,1))])
...in case where you input one image and one gender to the network. The first "1" in the tensors means the first "batch" of samples, and so on. That kind of input should give as a result like:
...which looks like OK at this stage. Go through your code and check this "stage" first and then go forward in your program.
This is a handy way to covert image in numpy array format to a tensor having a extra dimension in it, to be compatible with the neural network input:
#Advice how to convert image to format of tensor...
import tensorflow as tf
import numpy as np
#Download image...and suppose it has size 512x512,3...e.g. using PIL or whatever suitable library...
#image = Image.open('smile_or_not.png')
#Convert the image to numpy...here we simulate it because no real image was loaded...
image_np=np.random.rand(512,512,3)
#Let's see its shape...
print("Size of input image:",image_np.shape)
#And convert it to a tensor of shape (1,height,widht,3)
in_tensor_format=tf.reshape(image_np,(1,512,512,3))
print("...has a shape of: ", in_tensor_format.shape, "...when converted to tensor")

How to compare one picture to all data test in siamese neural network?

I've been build siamese neural network using pytorch. But I've just test it by inserting 2 pictures and calculate the similarity score, where 0 says that picture is different and 1 says picture is same.
import numpy as np
import os, sys
from PIL import Image
dir_name = "/Users/tania/Desktop/Aksara/Compare" #this should contain 26 images only
X = []
for i in os.listdir(dir_name):
if ".PNG" in i:
X.append(torch.from_numpy(np.array(Image.open("./Compare/" + i))))
x1 = np.array(Image.open("/Users/tania/Desktop/Aksara/TEST/Ba/B/B.PNG"))
x1 = transforms(x1)
x1 = torch.from_numpy(x1)
#x1 = torch.stack([x1])
closest = 0.0 #highest similarity
closest_letter_idx = 0 #index of closest letter 0=A, 1=B, ...
cnt = 0
for i in X:
output = model(x1,i) #assuming x1 is your input image
output = torch.sigmoid(output)
if output > closest:
closest_letter_idx = cnt
closest = output
cnt += 1
Both pictures are different, so the output
File "test.py", line 83, in <module>
X.append(torch.from_numpy(Image.open("./Compare/" + i)))
TypeError: expected np.ndarray (got PngImageFile)
this is the directory
Yes there is a way, you could use the softmax function:
output = torch.softmax(output)
This returns a tensor of 26 values, each corresponding to the probability that the image corresponds to each of the 26 classes. Hence, the tensor sums to 1 (100%).
However, this method is suitable for classification tasks, as opposed to Siamese Networks. Siamese networks compare between inputs, instead of sorting inputs into classes. From your question, it seems like you're trying to compare 1 picture with 26 others. You could loop over all the 26 samples to compare with, compute & save the similarity score for each, and output the maximum value (that is if you don't want to modify your model):
dir_name = '/Aksara/Compare' #this should contain 26 images only
X = []
for i in os.listdir(dir_name):
if ".PNG" in i:
X.append(torch.from_numpy(np.array(Image.open("./Compare/" + i))))
x1 = np.array(Image.open("test.PNG"))
#do your transformations on x1
x1 = torch.from_numpy(x1)
closest = 0.0 #highest similarity
closest_letter_idx = 0 #index of closest letter 0=A, 1=B, ...
cnt = 0
for i in X:
output = model(x1,i) #assuming x1 is your input image
output = torch.sigmoid(output)
if output > closest:
closest_letter_idx = cnt
closest = output
cnt += 1
print(closest_letter_idx)

how to batch a variable length spectogram in tensorflow

I have to train a denoising autoencoder but i need to batch the 5-frame noisy powerspectrum with 1 frame clean powerspectrum , but i dono how to batch the spectrogram since my data are all variable length in time-series.
def parse_line(noise_file,clean_file):
noise_binary = tf.read_file(noise_file)
noise_binary = tf.contrib.ffmpeg.decode_audio(noise_binary, file_format='wav', samples_per_second=16000, channel_count=1)
noise_stfts = tf.contrib.signal.stft(tf.reshape(noise_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
noise_powerspectrum = tf.log(tf.abs(noise_stfts)**2)
noise_data = tf.squeeze(tf.contrib.signal.frame(noise_powerspectrum,frame_length=5,frame_step=1,axis=1))
clean_binary = tf.read_file(clean_file)
clean_binary = tf.contrib.ffmpeg.decode_audio(clean_binary, file_format='wav', samples_per_second=16000, channel_count=1)
clean_stfts = tf.contrib.signal.stft(tf.reshape(clean_binary, [1, -1]), frame_length=512, frame_step=256,fft_length=512)
clean_powerspectrum = tf.log(tf.abs(clean_stfts)**2)
clean_data = tf.squeeze(clean_powerspectrum)[:-4]
return noise_data, clean_data
my tf.data pipeline is as shown below
shuffle_batch = 10
batch_size = 10
dataset = tf.data.Dataset.from_tensor_slices((noise_datalist,clean_datalist))
dataset = dataset.shuffle(shuffle_batch) # shuffle number of files perbatch
dataset = dataset.map(parse_line,num_parallel_calls=8)
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)
dataset = dataset.make_one_shot_iterator()
next_element = dataset.get_next()
this is the errors that shows
InvalidArgumentError (see above for traceback): Cannot batch tensors with different shapes in component 0. First element had shape [443,5,257] and element 1 had shape [280,5,257].
[[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](OneShotIterator)]]
when i change the batch_size to 1 it works and get one data. How can I batch this variable length data or even maybe batch all data to 1 like [443,5,257] and [280,5,257] to [723,5,257]?

Theano function with updates produces NAN output

my code looks like.
output = lasagne.layers.get_output(output_layer)
loss = function(output) * target
loss = -(loss.sum())
params = lasagne.layers.get_all_params(output_layer)
updates = lasagne.updates.sgd(loss,params,learning_rate=0.00001)
train_fn = theano.function([input,target], loss, updates=updates,allow_input_downcast=True)
validate_fn = theano.function([input,target], loss, allow_input_downcast=True)
here outputlayer is a CNN network, and function is defined as follows:
def function(X):
squared_euclidean_distances = (X ** 2).sum(1).reshape((X.shape[0], 1)) + (X ** 2).sum(1).reshape((1, X.shape[0])) - 2 * X.dot(X.T)
dist = 1/(1+squared_euclidean_distances)
Pij = (dist) / (dist.sum(0))
return Pij
target is a sparse matrix where target(i,j) = 1 if outputlayer(i) and outputlayer(j) belong to the same class, otherwise target(i,j) = 0
When digging the code, I found that the error is from a conv layer in the CNN network, raised by a function true_div.
Clearly, the only difference of the train_fn and validate_fn is the updates parameter.
However, I print the output of train_fn and validate_fn with the same dummy input. The output of validate_fn makes sense, but train_fn output is NAN.
I think the ouput is produced before updating updates the parameters. Anything wrong?

LSTMLayer produces NaN values even before training it

I'm currently trying to construct a LSTM network with Lasagne to predict the next step of noisy sequences. I first trained a stack of 2 LSTM layers for a while, but had to use an abysmally small learning rate (1e-6) because of divergence issues (that ultimately produced NaN values). The results were kind of disappointing, as the network produced smooth, out-of-phase versions of the input.
I then came to the conclusion I should use better parameter initialization than what is given by default. The goal was to start from a network that just mimics identity, since for strongly auto-correlated signal it should be a good first estimation of the next step (x(t) ~ x(t+1)), and to sprinkle a bit of noise on top of it.
import theano, numpy, lasagne
from theano import tensor as T
from lasagne.layers.recurrent import LSTMLayer, InputLayer, Gate
from lasagne.layers import DropoutLayer
from lasagne.nonlinearities import sigmoid, tanh, leaky_rectify
from lasagne.layers import get_output
from lasagne.init import GlorotNormal, Normal, Constant
floatX = 'float32'
# function to create a lstm that ~ propagate the input from start to finish off the bat
# should be a good start for a predictive lstm with high one-step autocorrelation
def create_identity_lstm(input, shape, orig_inp=None, noiselvl=0.01, G=10., mask_input=None):
inp, out = shape
# orig_inp is used to limit the number of units that are actually used to pass the input information from one layer to the other - the rest of the units should produce ~ 0 activation.
if orig_inp is None:
orig_inp = inp
# input gate
inputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# forget gate
forgetgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
# cell gate
cell = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=None,
b=Constant(0.),
nonlinearity=leaky_rectify
)
# output gate
outputgate = Gate(
W_in=GlorotNormal(noiselvl),
W_hid=GlorotNormal(noiselvl),
W_cell=Normal(noiselvl),
b=Constant(0.),
nonlinearity=sigmoid
)
lstm = LSTMLayer(input, out, ingate=inputgate, forgetgate=forgetgate, cell=cell, outgate=outputgate, nonlinearity=leaky_rectify, mask_input=mask_input)
# change matrices and biases
# ingate - should return ~1 (matrices = 0, big bias)
b_i = lstm.b_ingate.get_value()
b_i[:orig_inp] += G
lstm.b_ingate.set_value(b_i)
# forgetgate - should return 0 (matrices = 0, big negative bias)
b_f = lstm.b_forgetgate.get_value()
b_f[:orig_inp] -= G
b_f[orig_inp:] += G # to help learning future features, I preserve a large bias on "unused" units to help it remember stuff
lstm.b_forgetgate.set_value(b_f)
# cell - should return x(t) (W_xc = identity, rest is 0)
W_xc = lstm.W_in_to_cell.get_value()
for i in xrange(orig_inp):
W_xc[i, i] += 1.
lstm.W_in_to_cell.set_value(W_xc)
# outgate - should return 1 (same as ingate)
b_o = lstm.b_outgate.get_value()
b_o[:orig_inp] += G
lstm.b_outgate.set_value(b_o)
# done
return lstm
I then use this lstm generation code to generate the following network:
# layers
#input + dropout
input = InputLayer((None, None, 7), name='input')
mask = InputLayer((None, None), name='mask')
drop1 = DropoutLayer(input, p=0.33)
#lstm1 + dropout
lstm1 = create_identity_lstm(drop1, (7, 1024), mask_input=mask)
drop2 = DropoutLayer(lstm1, p=0.33)
#lstm2 + dropout
lstm2 = create_identity_lstm(drop2, (1024, 128), orig_inp=7, mask_input=mask)
drop3 = DropoutLayer(lstm2, p=0.33)
#lstm3
lstm3 = create_identity_lstm(drop3, (128, 7), orig_inp=7, mask_input=mask)
# symbolic variables and prediction
x = input.input_var
ma = mask.input_var
ma_reshape = ma.dimshuffle((0,1,'x'))
yhat = get_output(lstm3, deterministic=False)
yhat_det = get_output(lstm3, deterministic=True)
y = T.ftensor3('y')
predict = theano.function([x, ma], yhat_det)
Problem is, even without any training, this network produces garbage values and sometimes even a bunch of NaNs, right from the very first LSTM layer:
X = numpy.random.random((5, 10000, 7)).astype('float32')
Masks = numpy.ones(X.shape[:2], dtype='float32')
hid1 = get_output(lstm1, determistic=True)
get_hid1 = theano.function([x, ma], hid1)
h1 = get_hid1(X, Masks)
print numpy.isnan(h1).sum(axis=1).sum(axis=1)
array([6379520, 6367232, 6377472, 6376448, 6378496])
# even the first output value is garbage!
print h1[:,0,0] - X[:,0,0]
array([-0.03898358, -0.10118812, 0.34877831, -0.02509735, 0.36689138], dtype=float32)
I don't get why, I checked each matrices and their values are fine, like I wanted them to be. I even tried to recreate each gate activations and the resulting hidden activations using the actual numpy arrays and they reproduce the input just fine. What did I do wrong there??

Resources