How should I modify a CNN to take input of 1 channel images? - conv-neural-network

I am working on a image compressing problem. The network I am studying is the CompressAI from GitHub: https://github.com/InterDigitalInc/CompressAI.
I want to modify the network to take input images of 1 channel and I do not know where should I get started. I think about other method such as changing 1 channel to 3 channels by duplicating 3 times, but that does not seem to fit the scope of "image compressing". Please, any suggestion where I shoud get start?

Looking at the official repository code here at line 56 the architecture for mbt2018 model is mentioned as JointAutoregressiveHierarchicalPriors. So the same model's input channels need to be changed. This can be done by changing the definition for the model located here. Change line 411 of the file from:
conv(3, N, kernel_size=5, stride=2),
to
conv(1, N, kernel_size=5, stride=2),
This ensures that the model now works for single channel images.

Related

Diffusion with melspectrogram data

I am trying to put a data set of melspectrogram tensors into a diffusion model, the shape of the ensors is (128, 646) (a 15 second second audio file).
I want to run it through a diffusion model like the one in this notebook: (https://colab.research.google.com/drive/1sjy9odlSSy0RBVgMTgP7s99NXsqglsUL?usp=sharing)
This code is for images of size 64 x 64
My questions are as follows:
How do i adjust the model to accept tensors instead of images?
Would it be a viable solution to pad the tensors to look 'square'
Do you have any other advice for diffusion on tensors?
Thank you.
I havent tried anything yet...i am still researching how to do this.

How to restrict the output of GAN as "one channel one mask"?

I'm a beginner to Deep Learning, and I'm using a deep CNN structure to generate masks. I hope every channel of the output contains exactly one mask, but I found some channels may contain 0 or 2 masks, so what should I do to solve the problem.

Is there a way to give a gray scale image with 1 channel 2 added channels that are made only by zeros?

I have a file with images with only 1 channel, and I want to use a pre-trained model to make some predictions with the images that I have. The problem is that the pre-trained model has been trained with images with 3 channels so I have to add 2 extra channels to all my images. I have tried by stacking the images to get the extra 2 channels but I would like to try something different.
Is there a way to make the 2 extra channels only made by zeros?

Repeating part of Keras model, depending on number of inputs

I'm trying to use part of Google Deepminds CGQN network in Keras (Deepmind Paper). Depending on how much input images they give to the network, the network understands more about the 3D environment it is trying to predict. Here is an scheme of their network:
I would also like to use multiple input "images" like they did with the Mθ network. So my question is: Using Keras, how can I reuse a part of the network an arbitrary number of times and then sum all of the outputs it generates, which will be used as an input to the next part of the network?
Thanks in advance!
You can achieve this using the functional API, I'll just give a proof-of-concept here:
images_in = Input(shape=(None, 32, 32, 3)) # Some number of 32x32 colour images
# think of it as a video, a sequence of images for example
shared_conv = Conv2D(32, 2, ...) # some shared layer that you want to apply to every image
features = TimeDistributed(shared_conv)(images_in) # applies shared_conv to every image
Here TimeDistributed applies a given layer across the time dimension, which in our case means it applies to every image and you'll get an output for every image. There are more examples in the documentation linked above and you can implement a shared set of layers / submodel and then apply that to every image and the take the reduced sum.

Tensorflow tf.nn.conv2d clarification

In reading through the Tensorflow tutorial and API documentation, I do not understand how they defined the shape of the convolution input and filter arguments. The method is: tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None), where the input is shape: [batch, in_height, in_width, in_channels] and the filter is shape: [filter_height, filter_width, in_channels, out_channels]. If anyone could shed light on how to properly define the "in_channel" and "out_channel" sizes, that would be very helpful.
in_channels refers to the depth of the inputs to the constitutional layer. For example, if you are feed the layer with raw RGB images, then the depth will be 3, corresponding to the Red, Green, and Blue channels. This means that the kernels actually are 3D rather than 2D. The out_channels refer to the depth of output. Following picture from here illustrates an example with input depth of 3 and output depth of 5:
properly define is something done based on experiments. That is a network design issue. You may read about some of the famous architectures like AlexNet and VGG-16 to see how network architectures are designed in practice.

Resources