Related
I'm working on a basic RNN-NLP classifier using PyTorch, and trying to use CUDA for acceleration.(On Google_Colab)
but, I can't solve this error.
The code is written like this.
error message
Input and hidden tensors are not at the same device, found input tensor at cuda:0 and hidden tensor at cpu
RNN class
class RNN(nn.Module):
def __init__(self, vocab_size, emb_size, hidden_size, output_size):
super().__init__()
self.hidden_size = hidden_size
self.emb = nn.Embedding(vocab_size, emb_size)
self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh', batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
self.batch_size = x.size()[0]
hidden = self.init_hidden()
emb = self.emb(x)
out, hidden = self.rnn(emb, hidden)
out = self.fc(out[:, -1, :])
return out
def init_hidden(self):
hidden = torch.zeros(1, self.batch_size, self.hidden_size)
return hidden
device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
Setting var
VOCAB_SIZE = len(word_id.keys()) +1
EMB_SIZE = 300
OUTPUT_SIZE = 4
HIDDEN_SIZE = 50
model = RNN(VOCAB_SIZE,EMB_SIZE, HIDDEN_SIZE, OUTPUT_SIZE)
model = model.to(device)
Predict
for i in range(10):
# datasetet の、リスト0indexに入力要素
X, y = dataset_train[i]
X = X.to(device)
print(torch.softmax(model(X.unsqueeze(0)), dim=1))
This code works on CPU. but, can't works on "GPU".
Follow this error, I try to make some fix code.
ex) hidden.to(device),,,,
but,I can't solve...
Pleas someone tell me how to solve.
Thank you very much for my question.
Doesn't doing something like the following work?
device = torch.device("cuda" if torch.cuda.is_available() else "CPU")
class RNN(nn.Module):
def __init__(self, vocab_size, emb_size, hidden_size, output_size):
super().__init__()
self.hidden_size = hidden_size
self.emb = nn.Embedding(vocab_size, emb_size)
self.rnn = nn.RNN(emb_size, hidden_size, nonlinearity='tanh', batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
self.to(device)
def forward(self, x):
self.batch_size = x.size()[0]
hidden = self.init_hidden()
emb = self.emb(x)
out, hidden = self.rnn(emb, hidden)
out = self.fc(out[:, -1, :])
return out
def init_hidden(self):
hidden = torch.zeros(1, self.batch_size, self.hidden_size).to(device)
return hidden
I am learning LSTM and language models, I developed the following code for a character level text generation:
Here is the model class:
class RNN(nn.Module):
def __init__(self, input_size, embedding_dim, hidden_size, num_layers, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
# Layers:
self.embed = nn.Embedding(input_size, embedding_dim, padding_idx=0)
self.dropout = nn.Dropout(0.5) # regularization to reduces overfitting and to increase stability
self.lstm = nn.LSTM(embedding_dim, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
self.o = nn.Softmax(dim=1)
def forward(self, x, hidden, cell):
out = self.embed(x)
out = self.dropout(out)
out, (hidden, cell) = self.lstm(out.unsqueeze(1), (hidden, cell))
out = self.fc(out.reshape(out.shape[0], -1))
out = self.o(out)
return out, (hidden, cell)
def init_hidden(self, batch_size):
hidden = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
cell = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device)
return hidden, cell
n_chars, embedding_dim, hidden_size, num_layers, output_size = 55, 20, 256, 2, 55
model = RNN(n_chars, embedding_dim, hidden_size, num_layers, n_chars).to(device)
And this is the train function, where I have the problem:
def train(model, optimizer, criterion, epochs=10, every=5):
for epoch in range(epochs):
k = random.randint(0,len(data))
x, y = get_batch(k)
xt, yt = tensorize(x,y)
mean_loss = 0
L = len(xt)
for i in range(L):
hidden, cell = model.init_hidden(batch_size) # not doing this will cause an error
out, (hidden, cell) = model(xt[i].unsqueeze(0), hidden, cell)
target = yt[i].unsqueeze(0)
loss = criterion(out, target)
loss.backward()
optimizer.step()
optimizer.zero_grad()
mean_loss += loss.item()
if epoch % every == 0:
print("epoch = ", epoch ," mean loss = ", mean_loss/L)
However, the loss seems to not change at all. What did I do wrong, please?
Note: I am giving the model character by character and not the entire batch at once.
I'm creating an LSTM Autoencoder for feature extraction for my master's thesis. However, I'm having a lot of trouble with combining dropout with LSTM layers.
Since it's an Autoencoder, I'm having a bottleneck which is achieved by having two separate LSTM layers, each with num_layers=1, and a dropout in between. I have time series with very different lengths and have found packed sequences to be a good idea for that reason.
But, from my experiments, I must pack the data before the first LSTM, unpack before the dropout, then pack again before the second LSTM. This seems wildly inefficient. Is there a better way? I'm providing some example code and an alternative way to implement it below.
Current, working, but possibly suboptimal solution:
class Encoder(nn.Module):
def __init__(self, seq_len, n_features, embedding_dim, hidden_dim, dropout):
super(Encoder, self).__init__()
self.seq_len = seq_len
self.n_features = n_features
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.lstm1 = nn.LSTM(
input_size=n_features,
hidden_size=self.hidden_dim,
num_layers=1,
batch_first=True,
)
self.lstm2 = nn.LSTM(
input_size=self.hidden_dim,
hidden_size=embedding_dim,
num_layers=1,
batch_first=True,
)
self.drop1 = nn.Dropout(p=dropout, inplace=False)
def forward(self, x):
x, (_, _) = self.lstm1(x)
x, lens = pad_packed_sequence(x, batch_first=True, total_length=self.seq_len)
x = self.drop1(x)
x = pack_padded_sequence(x, lens, batch_first=True, enforce_sorted=False)
x, (hidden_n, _) = self.lstm2(x)
return hidden_n.reshape((-1, self.n_features, self.embedding_dim)), lens
Alternative, possibly better, but currently not working solution;
class Encoder2(nn.Module):
def __init__(self, seq_len, n_features, embedding_dim, hidden_dim, dropout):
super(Encoder2, self).__init__()
self.seq_len = seq_len
self.n_features = n_features
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.lstm1 = nn.LSTM(
input_size=n_features,
hidden_size=self.hidden_dim,
num_layers=2,
batch_first=True,
dropout=dropout,
proj_size=self.embedding_dim,
)
def forward(self, x):
_, (h_n, _) = self.lstm1(x)
return h_n[-1].unsqueeze(1), lens
Any help and tips about working with time-series, packed sequences, lstm-cells and dropout would be immensely appreciated, as I'm not finding much documentation/guidance elsewhere on the internet. Thank you!
Best, Lars Ankile
For the hereafter, after a lot of trial and error, the following full code for the Autoencoder seems to work very well. Getting the packing and unpacking to work correctly was the main hurdle. The clue is, I think, to try to utilize the LSTM modules for what they're worth by using the proj_size, num_layers, and dropout parameters.
class EncoderV4(nn.Module):
def __init__(
self, seq_len, n_features, embedding_dim, hidden_dim, dropout, num_layers
):
super().__init__()
self.seq_len = seq_len
self.n_features = n_features
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.lstm1 = nn.LSTM(
input_size=n_features,
hidden_size=self.hidden_dim,
num_layers=num_layers,
batch_first=True,
dropout=dropout,
proj_size=self.embedding_dim,
)
def forward(self, x):
_, (h_n, _) = self.lstm1(x)
return h_n[-1].unsqueeze(1)
class DecoderV4(nn.Module):
def __init__(self, seq_len, input_dim, hidden_dim, n_features, num_layers):
super().__init__()
self.seq_len = seq_len
self.input_dim = input_dim
self.hidden_dim = hidden_dim
self.n_features = n_features
self.num_layers = num_layers
self.lstm1 = nn.LSTM(
input_size=input_dim,
hidden_size=hidden_dim,
num_layers=num_layers,
proj_size=n_features,
batch_first=True,
)
def forward(self, x, lens):
x = x.repeat(1, self.seq_len, 1)
x = pack_padded_sequence(x, lens, batch_first=True, enforce_sorted=False)
x, _ = self.lstm1(x)
return x
class RecurrentAutoencoderV4(nn.Module):
def __init__(
self, seq_len, n_features, embedding_dim, hidden_dim, dropout, num_layers
):
super().__init__()
self.encoder = EncoderV4(
seq_len, n_features, embedding_dim, hidden_dim, dropout, num_layers
)
self.decoder = DecoderV4(
seq_len, embedding_dim, hidden_dim, n_features, num_layers
)
def forward(self, x, lens):
x = self.encoder(x)
x = self.decoder(x, lens)
return x
The full code and a paper using this Autoencoder can be found at GitHub and arXiv, respectively.
I got an error while running pytorch
I train artificial intelligence with ResNet, and I wrote my own custom dataset for the dataset. After loading the data set from ResnNet, training data and test data were set separately by learning with artificial intelligence. But even though I ran it, an error occurred, but I don't know what kind of problem occurred.
Next, I will attach my ResNet.py code and my own data set CustomDataset.py code.
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import transforms, datasets, models
from customDataset import CatsAndDogsDataset
USE_CUDA = torch.cuda.is_available()
DEVICE = torch.device("cuda" if USE_CUDA else "cpu")
EPOCHS = 3
BATCH_SIZE = 10
dataset = CatsAndDogsDataset(csv_file = 'cats_dogs.csv', root_dir = 'cats_dogs_resized',transform = transforms.ToTensor())
train_set, test_set = torch.utils.data.random_split(dataset, [28,4])
train_loader = DataLoader(dataset=train_set, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_set, batch_size=BATCH_SIZE, shuffle=True)
class BasicBlock(nn.Module):
def __init__(self, in_planes, planes, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3,stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.shortcut = nn.Sequential()
if stride != 1 or in_planes != planes:
self.shortcut = nn.Sequential(nn.Conv2d(in_planes, planes,kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes))
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += self.shortcut(x)
out = F.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, num_classes=10):
super(ResNet, self).__init__()
self.in_planes = 16
self.conv1 = nn.Conv2d(3, 16, kernel_size=3,stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(16)
self.layer1 = self._make_layer(16, 2, stride=1)
self.layer2 = self._make_layer(32, 2, stride=2)
self.layer3 = self._make_layer(64, 2, stride=2)
self.linear = nn.Linear(64, num_classes)
def _make_layer(self, planes, num_blocks, stride):
strides = [stride] + [1] * (num_blocks - 1)
layers = []
for stride in strides:
layers.append(BasicBlock(self.in_planes, planes, stride))
self.in_planes = planes
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = F.avg_pool2d(out, 8)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
model = ResNet().to(DEVICE)
optimizer = optim.SGD(model.parameters(), lr=0.1,momentum=0.9, weight_decay=0.0005)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
print(model)
def train(model, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(DEVICE), target.to(DEVICE)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
def evaluate(model, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(DEVICE), target.to(DEVICE)
output = model(data)
test_loss += F.cross_entropy(output, target,reduction='sum').item()
pred = output.max(1, keepdim=True)[1]
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= length(test_loader.dataset)
test_accuracy = 100. * correct / length(test_loader.dataset)
return test_loss, test_accuracy
for epoch in range(1, EPOCHS + 1):
scheduler.step()
train(model, train_loader, optimizer, epoch)
test_loss, test_accuracy = evaluate(model, test_loader)
print('[{}] Test Loss: {:.4f}, Accuracy: {:.2f}%'.format(
epoch, test_loss, test_accuracy))
import os
import pandas as pd
import torch
from torch.utils.data import Dataset
from skimage import io
class CatsAndDogsDataset(Dataset):
def __init__(self, csv_file, root_dir, transform=None):
self.annotations = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __length__(self):
return length(self.annotations)
def __getitem__(self, index):
img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
image = io.imread(img_path)
y_label = torch.tensor(int(self.annotations.iloc[index, 1]))
if self.transform:
image = self.transform(image)
return (image, ylabel)
In this way, if I write the code and then run it
TypeError: object of type'CatsAndDogsDataset' has no len()
I wonder why I can't have len(). In addition, an error occurred in the result of running Backend.ai instead of pycharm, but the error content is
Cannot verify that dataset is Sized
if sum(lengths) != len(dataset):
It is raise ValueError("sum of input lengths does not equal the length of the input dataset!").
Error appears. Is there a workaround? help me plz
You need to define the function __len__ for your custom dataset (which you seem to have currently incorrectly defined as __length__).
This documentation provides details. Relevant excerpt:
torch.utils.data.Dataset is an abstract class representing a dataset.
Your custom dataset should inherit Dataset and override the following
methods:
__len__ so that len(dataset) returns the size of the dataset.
__getitem__ to support the indexing such that dataset[i] can be used to get i th sample.
Standard interpretation: in the original RNN, the hidden state and output are calculated as
In other words, we obtain the the output from the hidden state.
According to Wiki, the RNN architecture can be unfolded like this:
And the code I have been using is like:
class Model(nn.Module):
def __init__(self, input_size, output_size, hidden_dim, n_layers):
super(Model, self).__init__()
self.hidden_dim = hidden_dim
self.rnn = nn.RNN(input_size, hidden_dim, 1)
self.fc = nn.Linear(hidden_dim, output_size)
def forward(self, x):
batch_size = x.size(0)
out, hidden = self.rnn(x)
# getting output from the hidden state
out = out..view(-1, self.hidden_dim)
out = self.fc(out)
return out, hidden
RNN as "pure" feed-forward layers: but today, I see another implementation from the Pytorch Tutorial
And their code is like
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.i2o = nn.Linear(input_size + hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
The hidden layer calculation is same as the standard interpretation, but the output is is calculated independently from the current hidden state h.
To me, the math behind this implementation is:
So, this implementation is different from the original RNN implementation?
I have been using RNN for almost 1 year and I thought I understand it, not until today when I see this post from Pytorch. I am really confused now.