I have a 5-dimensional dataset and I'm interested in using a neural network to model the posterior distributions from which the data was drawn. I decided to implement a GAN to do this, and have been familiarizing myself with PyTorch.
I'm wondering how one should go about restricting what values the generator can produce for the parameters. For one of the parameters, the values must be nonnegative real values. For another case, the values must be nonnegative integer values. For the other three cases, the parameters can take on any real value.
My first idea was to control this through the transfer function applied to the nodes in the output layer of my neural network. But all of the PyTorch examples I've seen so far apply the same transfer function to all of the output nodes, which is not what I want to do. Is there a way to apply a different transfer function to each output node? Or is there maybe a better way to approach this problem?
Related
I have implemented an autoencoder that should realize a non-linear version of a principal component analysis. In- and Output of the model is a the same dataset with n features and I am interested in the encoding which has dimension d<n. To generalize the principal component analysis I would like to have an encoding that consists of d almost linearly independent vectors, but if I use the loss function "mse" I get e.g. for d=2 two vectors which look almost the same.
Theoretically I could use a loss function including a penalty term for vector that are similar and far from independent. But that would mean to have a loss function that uses the information of the whole batch not just a single sample and not from the output but from an intermediate layer.
Since I am working with Keras: Can anyone give me hint or a reference how I can approach this problem in Keras?
I am implementing a custom CNN with some custom modules in it. I have implemented only the forward pass for the custom modules and left their backward pass to autograd.
I have manually computed the correct formulae for backpropagation through the parameters of the custom modules, and I wished to see whether they match with the formulae used internally by autograd to compute the gradients.
Is there any way to see this?
Thanks
Edit (To add a test case) :-
I have a complex affine layer where the weights and inputs are complex-valued matrices, and the operation is a matrix multiplication of the weight and input matrices.
The multiplication of two complex numbers is given by -
(a+ib)(c+id) = (ac-bd)+i(ad+bc)
I computed the backpropagation formula for this layer given we have the incoming gradient from the higher layer.
It comes out to be dL/dI(n) = (hermitian(W(n))).matmul(dL/dI(n+1))
where I(n) and W(n) are the input and weight of nth layer and I(n+1) is input of (n+1)th layer.
So I wished to check whether autograd is also computing dL/dI(n) using the same formula that I derived.
(Since Pytorch doesn't support complex-valued tensors backpropagation as for now, I have created my own representation of complex numbers by dealing with separate real and imaginary tensors)
I don't believe there is such a feature in pytorch, even because it would be quite unreadable. What you can do is to implement a custom backward method for your layer with the formula you derived, then know by design that the backpropagation is what you want.
I keep seeing examples floating around the internet where the input and/or output layer have either no activation function, a linear activation function, or None. What I'm confused about is when to use one, and how to know if you should? I also am confused about what the number of nodes should be for the input layer.
Right now I have a regression problem, I'm trying to predict a real value based on an array of inputs (about 54). Should I be using relu in my activation function for the input layer? Should I have linear as my output activation? My data is linearly scaled from 0 to 1 for each feature independently as they're different units. I was also unsure of the number of nodes I should use for my input layer as I see some examples pick an arbitrary number not related to their input shape, and other examples saying to specifically set it to the number of inputs, or number of inputs plus one for a bias. But none of the examples so far have explained their reasoning behind their choices.
Since my model isn't performing very well, I thought asking what the architecture should be could help me fine tune it more.
I don't have much experience with training neural networks. I have 4 variable vectors as input and I have respectively 3 variable output vector. I want to create a neural network that takes these inputs and outputs which have some unknown correlation(might not be linear) between them and train. So that when I put previously untrained data through it should predict the correlated output.
I was wondering,
What type of model should I use in such scenarios? Is it Restricted boltzmann machine, regression, GAN, etc?
What library is easiest to learn and implement for such a model? eg:- TensorFlow, PyTorch, etc
If images were involved which can be processed as fft arrays, would the model change.
I did find this answer, but I am not satisfied with it.
Please let me know if there are any functions or other points you would like me to know. Any help is much appreciated.
A multilayer perceprton is a good place to start.
Keras is the highest level/easiest to use library I have used.
If you are working with images or spatially structured data a convolutional neural network will probably work best.
I have string data in my dataset of the type :
AGF.SL.CA.LOSANG.15764
ABC.EMP.GOO.__._ME$.ZR_ME$ATR$GENERAL
SEM.JP.YOO.����_������_�����.ZC_NA:US::SANDO$GENERAL
Every record has a category associated with it, and given one such string, I have to use a Machine Learning or Deep Learning approach to identify the corresponding category.
I am confused as to what approach to follow in order to do this. My primary question is, should I keep the strings as is and use string similarity functions, or should I break up the strings into different words, and then do count vectorization on it, and then proceed from there?
Given this kind of data, with just one string to predict the class, what would be the best approach? I have to put this into production so I need look at something which will scale well. I am new to ML so any suggestions would be appreciated. Thanks.
It seems to me that you can tackle this problem using lstm. Long short-term memory (LSTM) units (or blocks) are a building unit for layers of a recurrent neural network (RNN)
These LSTM will help us to capture sequential information and generally used in case where we want to learn the sequential patterns in the data
You can decode this problem using character level LSTM.
In this you have to pass every character of the text in a LSTM cell.and at the last time step you will have a class which is the true label
You can use cross-entropy loss function.
https://machinelearningmastery.com/develop-character-based-neural-language-model-keras/
This will give you complete idea