effect of masking layer - python-3.x

I am trying to build a neural network model where the input data has many missing values which are hard to fill in by any means in advance. Therefore, the idea is to train a neural network with only observed data. The data vector fed in the input layer is then a vector with missing values in various positions. The positions of the missing values will not be fixed.
After some search, I found Tensorflow has a masking layer for use. Therefore, I inserted a masking layer right after the input layer,
inputs = keras.Input(shape=(inputDim,))
maskingLayer = keras.Masking(mask_value = -999)(inputs)
where the missing values are replaced with -999 in the preprocessing. After that, several dense layers are inserted and the model was compiled and fit in usual way.
The question is that I don't see much effect of the masking layer. I am wondering if the masking layer really masked out all the nodes of value -999 in the input layer as well as the weights and biases connected to them?
I found this post who had a similar question
Not fully connected layer in tensorflow
However, his unwanted links are fixed and in my case I would like to build a layer (next to the input layer) that only connects to the unmasked nodes of the input layer. Is it possible to do it?
Thanks.

Related

Conv2D filters and CNN architecture

I am currently pursuing undergraduation, I am working on CNN model to recognize Telegu characters.
This Questions has two parts,
I have a (32,32,1) shape Telegu character images, I want to train my CNN model to recognize the character. So, what should be my model architecture and how to decide the architecture, no of parameters and hidden layers. I know that my case is exactly same as handwritten digit recognition, but I want to know how to decide those parameters. Is there any common practice in building such architecture.
Operation Conv2D (32, (5,5)) means 32 filters of size 5x5 are applied on to the input, my question is are these filters all same or different, if different what kind of filters are initialized and who decides them?
I tried to surf internet but everywhere I go, the answer I get is Conv2D operation applies filters on input and does the convolution operation.
To decide which model architecture would be best, you need to experiment. Thats the only way. As you want to classify, VGG architecture would be a good starting point I believe. You need to experiment with number of parameters as it depends on your problem. You can use Keras Tuner for it: https://keras.io/keras_tuner/
For kernel initialization, as far as I know convolutional layers in Keras uses Glorot Uniform Initialization but you can change that by using kernel_initializer parameter. Long story short, convolutional layers are initialized with a distribution function and as training goes filters change the values inside, which is learning process. https://keras.io/api/layers/initializers
Edit: I forgot to inform you that I suggest VGG architecture but in a way you downsize the models a lot. Your input shape is little so if your model is too much deep, you will overfit really quickly.

Keras regression - Should my first/last layer have an activation function?

I keep seeing examples floating around the internet where the input and/or output layer have either no activation function, a linear activation function, or None. What I'm confused about is when to use one, and how to know if you should? I also am confused about what the number of nodes should be for the input layer.
Right now I have a regression problem, I'm trying to predict a real value based on an array of inputs (about 54). Should I be using relu in my activation function for the input layer? Should I have linear as my output activation? My data is linearly scaled from 0 to 1 for each feature independently as they're different units. I was also unsure of the number of nodes I should use for my input layer as I see some examples pick an arbitrary number not related to their input shape, and other examples saying to specifically set it to the number of inputs, or number of inputs plus one for a bias. But none of the examples so far have explained their reasoning behind their choices.
Since my model isn't performing very well, I thought asking what the architecture should be could help me fine tune it more.

How to ignore some input layer, while predicting, in a keras model trained with multiple input layers?

I'm working with neural networks and I've implemented the following architecture using keras with tensorflow backend:
For training, I'll give some labels in the layer labels_vector, this vector can have int32 values (ie: 0 could be a label). For the testing phase, I need to just ignore this input layer, if I set it to 0 results could be wrong since I've trained with labels that can be equal to 0 vector. Is there a way to simply ignore or disable this layer on the prediction phase?
Thanks in advance.
How to ignore some input layer ?
You can't. Keras cannot just ignore an input layer as the output depends on it.
One solution to get nearly what you want is to define a custom label in your training data to be the null value. Your network will learn to ignore it if it feels that it is not an important feature.
If labels_vector is a vector of categorical labels, use one-hot encoding instead of integer encoding. integer encoding assumes that there is a natural ordered relationship between each label which is wrong.

keras RNN w/ local support and shared weights

I would like to understand how Keras sets up weights to be shared. Specifically, I would like to use a convolutional 1D layer for processing a time-frequency representation of an audio signal and feed it into an RNN (perhaps a GRU layer) that has:
local support (e.g. like the Conv1D layer with a specified kernel size). Things that are far away in frequency from an output are unlikely to affect the output.
Shared weights, that is I train only a single set of weights across all of the neurons in the RNN layer. Similar inferences should work at lower or higher frequencies.
Essentially, I'm looking for many of the properties that we find in the 2D RNN layers. I've been looking at some of the Keras source code for the convnets to try to understand how weight sharing is implemented, but when I see the weight allocation code in the layer build methods (e.g. in the _Conv class), it's not clear to me how the code is specifying that the weights for each filter are shared. Is this buried in the backend? I see that the backend call is to a specific 1D, 2D, or 3D convolution.
Any pointers in the right direction would be appreciated.
Thank you - Marie

joint autoencoder with sharing weight using keras

In this article, I've come across the following network structure:
Figure 1(b). https://wx4.sinaimg.cn/mw690/5396ee05ly1fg9vi5phcbj20vj0kb0ty.jpg
Each layer is a fully connected one.
The weights shared by the two parts are denoted by Wc.
The pairs of the top fully connected layers of dimension 500 are concatenated to create a layer of dimension 1000 which is then used directly to reconstruct the input of size 784.
I want to implement it with keras, however I am not skilled with keras.
any ideas on how to implement this?
thank you very much!

Resources