how to handle different size of input data using Pytorch built in neural network - pytorch

I build a simple pytorch model as below. However, I receive error message that mat1 and mat2 size are not aligned. How do I tweek the code to allow the flexibility of different dimension of data?
class simpleNet(nn.Module):
def __init__(self, **input_dim, hidden_size, num_classes**):
"""
:param input_dim: input feature dimension
:param hidden_size: hidden dimension
:param num_classes: total number of classes
"""
super(TwoLayerNet, self).__init__()
# hidden layer
self.hidden = nn.Linear(input_dim, hidden_size)
# Second fully connected layer that outputs our 10 labels
self.output = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = None
x = self.hidden(x)
x = torch.sigmoid(x)
x = self.output(x)
out = x
trying to build a toy neural network using Pytorch.

For your neural network to work, your output from your previous layer should be equal to your input for next layer, since its a code snippet for just your architecture without the initializations code, I cannot tell what you can simplify, not having equals in transition is not a good practice though. However, you can use reshape function from torch to make your output of previous layer equal to your next layer to make it work as a brute force method. Refer to: https://pytorch.org/docs/stable/generated/torch.reshape.html

Related

Transfer Learning problem with PyTorch, different data distributions range

I'm wondering about the following thing:
I would like to apply some transfer learning on a project I'm working on using an Artificial Neural Network. I have two (chemical) datasets which have a different distribution of values but can be related from a physical point of view.
Having that the first quantity varies between 0 and 12 and the second between 10^{-13] and 10^{13}, how should I set up the network? Maybe with some intermediate normalization layer?
My initial attempt relies on building a first network in the following way:
class Model(nn.Module):
def __init__(self, in_features, h1, h2, out_features=1):
super(Model, self).__init__()
self.fc1 = nn.Linear(in_features,h1) # input layer
self.fc2 = nn.Linear(h1, h2) # hidden layer
self.out = nn.Linear(h2, out_features) # output layer
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return x
After having trained this model for some epochs I would use the pre-trained weights to run over the other dataset that has different data distribution (between 10^{-13} and 10^{13}) but I'm not sure what kind of normalization, intermediate layer should I put before, to kinda shift the final way to match the other distribution..

How to develop a layer that works with arbitrary size input

I'm trying to develop a layer in Keras which works with 3D tensors. To make it flexible, I would like to postpone the code that relies on the input's exact shape as much as possible.
My layer is overriding 5 methods:
from tensorflow.python.keras.layers import Layer
class MyLayer(Layer):
def __init__(self, **kwargs):
pass
def build(self, input_shape):
pass
def call(self, inputs, verbose=False):
second_dim = K.int_shape(inputs)[-2]
# Do something with the second_dim
def compute_output_shape(self, input_shape):
pass
def get_config(self):
pass
And I'm using this layer like this:
input = Input(batch_shape=(None, None, 128), name='input')
x = MyLayer(name='my_layer')(input)
model = Model(input, x)
But I'm facing an error since the second_dim is None. How can I develop a layer that relies on the dimensions of the input but it's ok with it being provided by the actual data and not the input layer?
I ended up asking the same question differently, and I've got a perfect answer:
What is the right way to manipulate the shape of a tensor when there are unknown elements in it?
The gist of it is, don't treat the dimensions directly. Use them by reference and not by value. So, do not use K.int_shape and instead use K.shape. And use Keras operations to compose and come up with a new shape:
shape = K.shape(x)
newShape = K.concatenate([
shape[0:1],
shape[1:2] * shape[2:3],
shape[3:4]
])

Keras Sequential model with cRelu activation

I have a problem with creating a Dense model with 3 Layers in which the activation function is cRelu.
cRelu concatenates two relu (a negative and a positive) and creates a tensor twice the size in it's output.
When trying to add another layer after it, I always get a size mismatch error
model = Sequential()
model.add(Dense(N, input_dim=K, activation=crelu))
model.add(Dense(N//2, activation=crelu))
How do I tell the next layer to expect a 2N input and to N?
Keras doesn't expect the activation function to change the output shape. If you want to change it, you should wrap the crelu functionality in a layer and specify the corresponding output shape:
import tensorflow as tf
from keras.layers import Layer
class cRelu(Layer):
def __init__(self, **kwargs):
super(cRelu, self).__init__(**kwargs)
def build(self, input_shape):
super(cRelu, self).build(input_shape)
def call(self, x):
return tf.nn.crelu(x)
def compute_output_shape(self, input_shape):
"""
All axis of output_shape, except the last one,
coincide with the input shape.
The last one is twice the size of the corresponding input
as it's the axis along which the two relu get concatenated.
"""
return (*input_shape[:-1], input_shape[-1]*2)
Then you can use it as follows
model = Sequential()
model.add(Dense(N, input_dim=K))
model.add(cRelu())
model.add(Dense(N//2))
model.add(cRelu())

Custom layer in keras not doing anything

I am trying to make a layer which flips an image in the horizontal axis and then adds this image to the batch dimension. The code is as follows:
class FlipLayer(keras.layers.Layer):
def __init__(self, input_layer):
super(FlipLayer, self).__init__()
def get_output_shape(self, input_shape):
return (2 * input_shape[0],) + input_shape[1:]
def get_output(self, input):
return keras.layers.Concatenate([
input,
flipim(input)
], axis=0)
Where 'flipim' is just a function flipping the numpy array in the desired axis. Keras does not give any error when compiling the model using this function, however it isn't doing anything. When I use this layer as my last layer and check the output, it is still the same size in the batch dimension compared to the previous layer.

Implementing RNN and LSTM into DQN Pytorch code

I have some troubles finding some example on the great www to how i implement a recurrent neural network with LSTM layer into my current Deep q-network in Pytorch so it become a DRQN.. Bear with me i am just getting started..
Futhermore, I am NOT working with images processing, thereby CNN so do not worry about this. My states are purely temperatures values.
Here is my code that i am currently train my DQN with:
# Importing the libraries
import numpy as np
import random # random samples from different batches (experience replay)
import os # For loading and saving brain
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim # for using stochastic gradient descent
import torch.autograd as autograd # Conversion from tensor (advanced arrays) to avoid all that contains a gradient
# We want to put the tensor into a varaible taht will also contain a
# gradient and to this we need:
from torch.autograd import Variable
# to convert this tensor into a variable containing the tensor and the gradient
# Creating the architecture of the Neural Network
class Network(nn.Module): #inherinting from nn.Module
#Self - refers to the object that will be created from this class
# - self here to specify that we're referring to the object
def __init__(self, input_size, nb_action): #[self,input neuroner, output neuroner]
super(Network, self).__init__() #inorder to use modules in torch.nn
# Input and output neurons
self.input_size = input_size
self.nb_action = nb_action
# Full connection between different layers of NN
# In this example its one input layer, one hidden layer and one output layer
# Using self here to specify that fc1 is a variable of my object
self.fc1 = nn.Linear(input_size, 40)
self.fc2 = nn.Linear(40, 30)
#Example of adding a hiddenlayer
# self.fcX = nn.Linear(30,30)
self.fc3 = nn.Linear(30, nb_action) # 30 neurons in hidden layer
# For function that will activate neurons and perform forward propagation
def forward(self, state):
# rectifier function
x = F.relu(self.fc1(state))
x = F.relu(self.fc2(x))
q_values = self.fc3(x)
return q_values
# Implementing Experience Replay
# We know that RL is based on MDP
# So going from one state(s_t) to the next state(s_t+1)
# We gonna put 100 transition between state into what we call the memory
# So we can use the distribution of experience to make a decision
class ReplayMemory(object):
def __init__(self, capacity):
self.capacity = capacity #100 transitions
self.memory = [] #memory to save transitions
# pushing transitions into memory with append
#event=transition
def push(self, event):
self.memory.append(event)
if len(self.memory) > self.capacity: #memory only contain 100 events
del self.memory[0] #delete first transition from memory if there is more that 100
# taking random sample
def sample(self, batch_size):
#Creating variable that will contain the samples of memory
#zip =reshape function if list = ((1,2,3),(4,5,6)) zip(*list)= (1,4),(2,5),(3,6)
# (state,action,reward),(state,action,reward)
samples = zip(*random.sample(self.memory, batch_size))
#This is to be able to differentiate with respect to a tensor
#and this will then contain the tensor and gradient
#so for state,action and reward we will store the seperately into some
#bytes which each one will get a gradient
#so that eventually we'll be able to differentiate each one of them
return map(lambda x: Variable(torch.cat(x, 0)), samples)
# Implementing Deep Q Learning
class Dqn():
def __init__(self, input_size, nb_action, gamma, lrate, T):
self.gamma = gamma #self.gamma gets assigned to input argument
self.T = T
# Sliding window of the evolving mean of the last 100 events/transitions
self.reward_window = []
#Creating network with network class
self.model = Network(input_size, nb_action)
#creating memory with memory class
#We gonna take 100000 samples into memory and then we will sample from this memory to
#to get a snakk number of random transitions
self.memory = ReplayMemory(100000)
#creating optimizer (stochastic gradient descent)
self.optimizer = optim.Adam(self.model.parameters(), lr = lrate) #learning rate
#input vector which is batch of input observations
#by unsqeeze we create a fake dimension to this is
#what the network expect for its inputs
#have to be the first dimension of the last_state
self.last_state = torch.Tensor(input_size).unsqueeze(0)
#Inilizing
self.last_action = 0
self.last_reward = 0
def select_action(self, state):
#Q value depends on state
#Temperature parameter T will be a positive number and the closer
#it is to ze the less sure the NN will when taking an action
#forexample
#softmax((1,2,3))={0.04,0.11,0.85} ==> softmax((1,2,3)*3)={0,0.02,0.98}
#to deactivate brain then set T=0, thereby it is full random
probs = F.softmax((self.model(Variable(state, volatile = True))*self.T),dim=1) # T=100
#create a random draw from the probability distribution created from softmax
action = probs.multinomial()
print(probs.multinomial())
return action.data[0,0]
# See section 5.3 in AI handbook
def learn(self, batch_state, batch_next_state, batch_reward, batch_action):
outputs = self.model(batch_state).gather(1, batch_action.unsqueeze(1)).squeeze(1)
#next input for target see page 7 in attached AI handbook
next_outputs = self.model(batch_next_state).detach().max(1)[0]
target = self.gamma*next_outputs + batch_reward
#Using hubble loss inorder to obtain loss
td_loss = F.smooth_l1_loss(outputs, target)
#using lass loss/error to perform stochastic gradient descent and update weights
self.optimizer.zero_grad() #reintialize the optimizer at each iteration of the loop
#This line of code that backward propagates the error into the NN
#td_loss.backward(retain_variables = True) #userwarning
td_loss.backward(retain_graph = True)
#And this line of code uses the optimizer to update the weights
self.optimizer.step()
def update(self, reward, new_signal):
#Updated one transition and we have dated the last element of the transition
#which is the new state
new_state = torch.Tensor(new_signal).float().unsqueeze(0)
self.memory.push((self.last_state, new_state, torch.LongTensor([int(self.last_action)]), torch.Tensor([self.last_reward])))
#After ending in a state its time to play a action
action = self.select_action(new_state)
if len(self.memory.memory) > 100:
batch_state, batch_next_state, batch_action, batch_reward = self.memory.sample(100)
self.learn(batch_state, batch_next_state, batch_reward, batch_action)
self.last_action = action
self.last_state = new_state
self.last_reward = reward
self.reward_window.append(reward)
if len(self.reward_window) > 1000:
del self.reward_window[0]
return action
def score(self):
return sum(self.reward_window)/(len(self.reward_window)+1.)
def save(self):
torch.save({'state_dict': self.model.state_dict(),
'optimizer' : self.optimizer.state_dict(),
}, 'last_brain.pth')
def load(self):
if os.path.isfile('last_brain.pth'):
print("=> loading checkpoint... ")
checkpoint = torch.load('last_brain.pth')
self.model.load_state_dict(checkpoint['state_dict'])
self.optimizer.load_state_dict(checkpoint['optimizer'])
print("done !")
else:
print("no checkpoint found...")
I hope there is someone out there that can help me and could implement a RNN and a LSTM layer into my code! I believe in you stackflow!
Best regards Søren Koch
From my point of view, I think you could add RNN, LSTM layer to the Network#__init__,Network#forward; shape of data should be reshaped into sequences...
For more detail, I think you should read these two following articles; after that implementing RNN, LSTM not hard as it seem to be.
http://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#sphx-glr-beginner-nlp-sequence-models-tutorial-py
http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

Resources