Convolutional neural network is a continous function? - conv-neural-network

The question is: is convolutional neural network architecture a continuous function? By convolutional I mean made of only convolutional layers. Intuitively I would say yes, since as far as I know the operation of convolution is continuous, but am I missing anything?
Also, does anybody know if this is the case also for transpose convolution?

It is a mathematical property that, if a function is differentiable, then the function is continuous; but vice versa is not true.
Now, we know that CNNs are differentiable, which is why back propagation works. Thus, they are continuous functions (in fact, every neural net which is differentiable is continuous).

Related

Conv2D filters and CNN architecture

I am currently pursuing undergraduation, I am working on CNN model to recognize Telegu characters.
This Questions has two parts,
I have a (32,32,1) shape Telegu character images, I want to train my CNN model to recognize the character. So, what should be my model architecture and how to decide the architecture, no of parameters and hidden layers. I know that my case is exactly same as handwritten digit recognition, but I want to know how to decide those parameters. Is there any common practice in building such architecture.
Operation Conv2D (32, (5,5)) means 32 filters of size 5x5 are applied on to the input, my question is are these filters all same or different, if different what kind of filters are initialized and who decides them?
I tried to surf internet but everywhere I go, the answer I get is Conv2D operation applies filters on input and does the convolution operation.
To decide which model architecture would be best, you need to experiment. Thats the only way. As you want to classify, VGG architecture would be a good starting point I believe. You need to experiment with number of parameters as it depends on your problem. You can use Keras Tuner for it: https://keras.io/keras_tuner/
For kernel initialization, as far as I know convolutional layers in Keras uses Glorot Uniform Initialization but you can change that by using kernel_initializer parameter. Long story short, convolutional layers are initialized with a distribution function and as training goes filters change the values inside, which is learning process. https://keras.io/api/layers/initializers
Edit: I forgot to inform you that I suggest VGG architecture but in a way you downsize the models a lot. Your input shape is little so if your model is too much deep, you will overfit really quickly.

when should i use "sigmoid" and "relu" function in CNN?

To implement the CNN model for classification images we need to use sigmoid and relu function. but I am confused what is the use of these.
If you are working with a conventional CNN for image classification, the output layer has N neurons, where N is the number of image classes you want to identify. You want each output neuron to represent the probability that you have observed each image class. The sigmoid function is good for representing a probability. Its domain is all real numbers, but its range is 0 to 1.
For network layers that are not output layers, you could also use the sigmoid. In theory, any non-linear transfer function will work in the inner layers of a neural network. However, there are practical reasons not to use the sigmoid. Some of those reasons are:
Sigmoid requires a fair amount of computation.
The slope of the sigmoid function is very shallow when the input is
far from zero, which slows gradient descent learning down.
Modern neural networks have many layers, and if you have several
layers in a neural network with sigmoid functions between them, it's
quite possible to end up with a zero learning rate.
The ReLU function solves many of sigmoid's problems. It is easy and fast to compute. Whenever the input is positive, ReLU has a slope of -1, which provides a strong gradient to descend. ReLU is not limited to the range 0-1, though, so if you used it it your output layer, it would not be guaranteed to be able to represent a probability.

Resolution preserving Fully Convolutional Network

I am new to ML and Pytorch and I have the following problem:
I am looking for a Fully Convolutional Network architecture in Pytorch, so that the input would be an RGB image (HxWxC or 480x640x3) and the output would be a single channel image (HxW or 480x640). In other words, I am looking for a network that will preserve the resolution of the input (HxW), and will loose the channel dimension. All of the networks that I've came across (ResNet, Densenet, ...) end with a fully connected layer (without any upsampling or deconvolution). This is problematic for two reasons:
I am restricted with the choice of the input size (HxWxC).
It has nothing to do with the output that I expect to get (a single channel image HxW).
What am I missing? Why is there even a FC layer? Why is there no up-sampling, or some deconvolution layers after feature extraction? Is there any build-in torchvision.model that might suit my requirements? Where can I find such pytorch architecture? As I said, I am new in this field so I don't really like the idea of building such a network from scratch.
Thanks.
You probably came across the networks that are used in classification. So they end up with a pooling and a fully connected layer to produce a fixed number of categorical output.
Have a look at Unet
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
Note: the original unet implementation use a lot of tricks.
You can simply downsample and then upsample symmetrically to do the work.
Your kind of task belongs to dense classification tasks, e.g. segmentation. In those tasks, we use fully convolution nets (see here for the original paper). In the FCNs you don't have any fully-connected layers, because when applying fully-connected layers you lose spatial information which you need for the dense prediction. Also have a look at the U-Net paper. All state-of-the art architectures use some kind of encoder-decoder architecture extended for example with a pyramid pooling module.
There are some implementations in the pytorch model zoo here. Search also Github for pytorch implementations for other networks.

Is it possible to infer more than one parameter from Convolution neural network

I have a question and I am not sure if it's a smart one. But I've been reading quite a lot about convolution neural networks. And so far I understand that the output layer could for example be a softmax layer for a classification problem or you could do regression in order to get a quantitative value. But I was wondering if it is possible to infer more than one parameter. For example, if I have a data and my output label is both price of the house and size of the house. I know it is not a smart example. But I just want to know if it's possible to predict two different output values in the same output layer in the convolution neural network. Or do I need to have two different convolution neural network where one predicts the size of the house and the one predicts price of the house. And how can we combine these two predictions then. And if we can do it in one convolution neural network, then how can we do that?
In your mentioned cases, the output layer is most likely a dense layer, not a convolutional one. But that's beside the point, if you want multiple outputs, then multiple output layers are often trained. So the same convolutional network can go to two separate output layers, which can be trained independently. Then you've one neural network, with two outputs. The convolutional part is often received by transfer learning, and are often frozen layers that can no longer be trained. Have a look at the figures of this paper, this shows how it can be done.

Why use ReLu in the final layer of Neural Network?

It is recommended that we use ReLu in the final layer of the neural network when we are learning regressions.
It makes sense to me, since the output from ReLu is not confined between 0 and 1.
However, how does it behave when x < 0 (ie when ReLu output is zero). Can y(the result of regression) still be lesser than 0?
I believe, I am missing a basic mathematical concept here. Any help is appreciated.
You typically use:
A linear layer for regression in order to get a continuous value
Softmax for classification where you want a probability distribution of classes
But these aren't set in stone. If you know your output value for a regression should only be positive, why not use a ReLu? If the output of your classification isn't a probability distribution (ex, which classes exists) you could just as easily use a sigmoid.

Resources