pytorch model.cuda() runtime error - pytorch

I'm building a text classifier using pytorch, and got into some trouble with .cuda() method. I know that .cuda() moves all parameters into gpu so that the training procedure can be faster. However, error occurred in .cuda() method like this:
start_time = time.time()
for model_type in ('lstm',):
hyperparam_combinations = score_util.all_combination(hyperparam_dict[model_type].values())
# for selecting best scoring model
for test_idx, setting in enumerate(hyperparam_combinations):
args = custom_dataset.list_to_args(setting,model_type=model_type)
print(args)
tsv = "test %d\ttrain_loss\ttrain_acc\ttrain_auc\tval_loss\tval_acc\tval_auc\n"%(test_idx) # tsv record
avg_score = [] # cv_mean score
### 4 fold cross validation
for cv_num,(train_iter,val_iter) in enumerate(cv_splits):
### model initiation
model = model_dict[model_type](args)
if args.emb_type is not None: # word embedding init
emb = emb_dict[args.emb_type]
emb = score_util.embedding_init(emb,tr_text_field,args.emb_type)
model.embed.weight.data.copy_(emb)
model.cuda()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-20-ff6cfce73c10> in <module>()
23 model.embed.weight.data.copy_(emb)
24
---> 25 model.cuda()
26
27 optimizer= torch.optim.Adam(model.parameters(),lr=args.lr)
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in cuda(self, device_id)
145 copied to that device
146 """
--> 147 return self._apply(lambda t: t.cuda(device_id))
148
149 def cpu(self, device_id=None):
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
116 def _apply(self, fn):
117 for module in self.children():
--> 118 module._apply(fn)
119
120 for param in self._parameters.values():
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
122 # Variables stored in modules are graph leaves, and we don't
123 # want to create copy nodes, so we have to unpack the data.
--> 124 param.data = fn(param.data)
125 if param._grad is not None:
126 param._grad.data = fn(param._grad.data)
RuntimeError: Variable data has to be a tensor, but got torch.cuda.FloatTensor
These are error traceback and I can't see why this happens.
This code worked very well before I set epoch parameter to 1 to run some tests. I set epoch to 1000 again, but the problem lingers on.
Aren't torch.cuda.FloatTensor object also Tensors? Any help would be much appreciated.
my model looks like this :
class TR_LSTM(nn.Module):
def __init__(self,args,
use_hidden_average=False,
pretrained_emb = None):
super(TR_LSTM,self).__init__()
# arguments
self.emb_dim = args.embed_dim
self.emb_num = args.embed_num
self.num_hidden_unit = args.hidden_state_dim
self.num_lstm_layer = args.num_lstm_layer
self.use_hidden_average = use_hidden_average
self.batch_size = args.batch_size
# layers
self.embed = nn.Embedding(self.emb_num, self.emb_dim)
if pretrained_emb is not None:
self.embed.weight.data.copy_(pretrained_emb)
self.lstm_layer = nn.LSTM(self.emb_dim, self.num_hidden_unit, self.num_lstm_layer, batch_first = True)
self.fc_layer = nn.Sequential(nn.Linear(self.num_hidden_unit,self.num_hidden_unit),
nn.Linear(self.num_hidden_unit,2))
def forward(self,x):
x = self.embed(x) # batch * max_seq_len * emb_dim
h_0,c_0 = self.init_hidden(x.size(0))
x, (_, _) = self.lstm_layer(x, (h_0,c_0)) # batch * seq_len * hidden_unit_num
if not self.use_hidden_average:
x = x[:,x.size(1)-1,:]
x = x.squeeze(1)
else:
x = x.mean(1).squeeze(1)
x = self.fc_layer(x)
return x
def init_hidden(self,batch_size):
h_0, c_0 = torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit),\
torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit)
h_0, c_0 = h_0.cuda(), c_0.cuda()
h_0_param, c_0_param = torch.nn.Parameter(h_0), torch.nn.Parameter(c_0)
return h_0_param, c_0_param

model.cuda() is called inside your training/test loop, which is the problem. As the error message suggests, you repeatedly convert parameters(tensors) in your model to cuda, which is not the right way to convert model into cuda tensor.
model object should be created and cuda-ize outside the loop. Only training/test instances shall be convert to cuda tensor every time you feed your model. I also suggest you read examples code from pytorch document site.

Related

AttributeError: lower not found in sklearn

I'm trying to build a news classifier using sklearn, I manage to generate the models, but when I try to train it I get these messages:
AttributeError Traceback (most recent call last) Cell In [131], line 3
1 if isinstance(new_text, str):
2 new_text_tfidf = tfidf_vectorizer.transform([new_text])
----> 3 predicted_category = dt.predict(new_text_tfidf)[0]
4 else:
5 predicted_category = "Invalid input, please provide a string"
File ~\AppData\Roaming\Python\Python39\site-packages\sklearn\pipeline.py:457, in Pipeline.predict(self, X, **predict_params)
455 Xt = X
456 for _, name, transform in self._iter(with_final=False):
--> 457 Xt = transform.transform(Xt)
458 return self.steps[-1][1].predict(Xt, **predict_params)
File ~\AppData\Roaming\Python\Python39\site-packages\sklearn\feature_extraction\text.py:2103, in TfidfVectorizer.transform(self, raw_documents) 2086 """Transform documents to document-term matrix. 2087 2088 Uses the vocabulary and document frequencies (df) learned by fit (or (...) 2099 Tf-idf-weighted document-term matrix. 2100 """ 2101 check_is_fitted(self, msg="The TF-IDF vectorizer is not fitted")
-> 2103 X = super().transform(raw_documents) 2104 return self._tfidf.transform(X, copy=False)
File ~\AppData\Roaming\Python\Python39\site-packages\sklearn\feature_extraction\text.py:1387, in CountVectorizer.transform(self, raw_documents) 1384 self._check_vocabulary() 1386 # use the same matrix-building strategy as fit_transform
-> 1387 _, X = self._count_vocab(raw_documents, fixed_vocab=True) 1388 if self.binary: 1389 X.data.fill(1)
File ~\AppData\Roaming\Python\Python39\site-packages\sklearn\feature_extraction\text.py:1209, in CountVectorizer._count_vocab(self, raw_documents, fixed_vocab) 1207 for doc in raw_documents: 1208 feature_counter = {}
-> 1209 for feature in analyze(doc): 1210 try: 1211 feature_idx = vocabulary[feature]
File ~\AppData\Roaming\Python\Python39\site-packages\sklearn\feature_extraction\text.py:111, in _analyze(doc, analyzer, tokenizer, ngrams, preprocessor, decoder, stop_words)
109 else:
110 if preprocessor is not None:
--> 111 doc = preprocessor(doc)
112 if tokenizer is not None:
113 doc = tokenizer(doc)
File ~\AppData\Roaming\Python\Python39\site-packages\sklearn\feature_extraction\text.py:69, in _preprocess(doc, accent_function, lower)
50 """Chain together an optional series of text preprocessing steps to
51 apply to a document.
52 (...)
66 preprocessed string
67 """
68 if lower:
---> 69 doc = doc.lower()
70 if accent_function is not None:
71 doc = accent_function(doc)
File ~\AppData\Roaming\Python\Python39\site-packages\scipy\sparse\_base.py:771, in spmatrix.__getattr__(self, attr)
769 return self.getnnz()
770 else:
--> 771 raise AttributeError(attr + " not found")
AttributeError: lower not found
Below are some piece of codes from my notebook:
preprocess_text(s) method:
def preprocess_text(s):
"""A text processing pipeline for cleaning up text using the hero package."""
s= s.replace("<br/>", "")
s = s.replace("’", "")
s = s.replace("‘", "")
s = hero.fillna(s)
s = hero.lowercase(s)
s = hero.remove_digits(s)
s = hero.remove_punctuation(s)
s = hero.remove_diacritics(s)
s = hero.remove_whitespace(s)
s = s.replace("Ë","E").replace("ë","e").replace("Ç","C").replace("ç","c")
return s
text = dataset['Text']
category = dataset['Category']
print(category)
X_train, X_test, Y_train, Y_test = train_test_split(text,category, test_size = 0.3, random_state = 42,shuffle=True, stratify=category)
# Initialize a TfidfVectorizer object: tfidf_vectorizer
tfidf_vectorizer = TfidfVectorizer(sublinear_tf=False, min_df=2, norm='l2', encoding='latin-1', ngram_range=(1,2))
# Transform the training data: tfidf_train
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
# Transform the test data: tfidf_test
tfidf_test = tfidf_vectorizer.transform(X_test)`
Train model with Random Forest algorithm:
#Random Forest Classifier
`rfc = Pipeline([('tfidf', TfidfVectorizer()),
('rfc', RandomForestClassifier(n_estimators=100)),
])
rfc.fit(X_train, Y_train)
test_predict = rfc.predict(X_test)
train_accuracy = round(rfc.score(X_train,Y_train)*100)
test_accuracy =round(accuracy_score(test_predict, Y_test)*100)
print("RandomForestClassifier Train Accuracy Score : {}% ".format(train_accuracy ))
print("RandomForestClassifier Test Accuracy Score : {}% ".format(test_accuracy ))
print()
print(classification_report(test_predict, Y_test, target_names=target_category))
import pickle
with open('model/random_fin.pkl', 'wb') as file:
pickle.dump(rfc, file)
with open('model/tfidf_vectorizer.pkl', 'wb') as file:
pickle.dump(rfc.named_steps['tfidf'], file)
new_text = "Berisha ka akuzuar Ramen ne lidhje me aferen e inceneratoreve"
new_text_tfidf = tfidf_vectorizer.transform([new_text])
predicted_category = dt.predict(new_text_tfidf)[0]
redicted_category = "Invalid input, please provide a string"
print(predicted_category)
I'm trying to resolve this issue, but until now no success...

Issue implementing InceptionV3 with binary classifier - transfer learning with Pytorch

I'm having an issue getting Inception V3 to work as the feature extractor with a binary classifier in Pytorch. I update the primary and auxiliary nets in Inception to have the binary class (as done in https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html)
but I'm getting an error
#Parameters for Inception V3
num_classes= 2
model_ft = models.inception_v3(pretrained=True)
# set_parameter_requires_grad(model_ft, feature_extract)
#handle auxilliary net
num_ftrs = model_ft.AuxLogits.fc.in_features
model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
#handle primary net
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs,num_classes)
# input_size = 299
#simulate data input
x = torch.rand([64, 3, 299, 299])
#create model with inception backbone
backbone = model_ft
num_filters = backbone.fc.in_features
layers = list(backbone.children())[:-1]
feature_extractor = nn.Sequential(*layers)
# use the pretrained model to classify damage 2 classes
num_target_classes = 2
classifier = nn.Linear(num_filters, num_target_classes)
feature_extractor.eval()
with torch.no_grad():
representations = feature_extractor(x).flatten(1)
x = classifier(representations)
But Im getting the error
RuntimeError Traceback (most recent call last)
<ipython-input-54-c2be64b8a99e> in <module>()
11 feature_extractor.eval()
12 with torch.no_grad():
---> 13 representations = feature_extractor(x)
14 x = classifier(representations)
9 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
442 _pair(0), self.dilation, self.groups)
443 return F.conv2d(input, weight, bias, self.stride,
--> 444 self.padding, self.dilation, self.groups)
445
446 def forward(self, input: Tensor) -> Tensor:
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [64, 2]
before I updated the class to 2 (when it was 1000) I was getting the same error but with [64, 1000]. This method of creating a backbone and adding a classifier worked for Resnet but not here. I think it's because of the auxiliary net structure but not sure how to update it to deal with the dual output? Thanks
Inheriting feature_extracture by children function at line layers = list(backbone.children())[:-1] will bring the module from backbone to feature_extracture only, not the operation in forward function.
Let's take a look at the code below:
class Example(torch.nn.Module):
def __init__(self):
super().__init__()
self.avg = torch.nn.AdaptiveAvgPool2d((1,1))
self.linear = torch.nn.Linear(10, 1)
def forward(self, x):
out = self.avg(x)
out = out.squeeze()
out = self.linear(out)
return out
x = torch.randn(5, 10, 12, 12)
model = Example()
y = model(x) # work well
new_model = torch.nn.Sequential(*list(model.children()))
y = new_model(x) # error
Module model and new_model have the same blocks but not the same way of working. In new_module, the output from the pooling layer is not squeezed yet, so the shape of linear input is violate its assumption which causes the error.
In your case, the last two comments are redundant and that's why it returns the error, you did create a new fc in the InceptionV3 module at line model_ft.fc = nn.Linear(num_ftrs,num_classes). Therefore, replace the last one as the code below should work fine:
with torch.no_grad():
x = model_ft(x)

Custom Regressor: GridSearchCV says 'get_params' not implement when inheriting from BaseEsitmator

Hello,
Thank you for taking the time to look at this.
I am working on implementing a scikit-learn API version of this blog post, the data is available here. My custom class reproduces the authors results, but does not work with GridSearchCV.
Essentially, he implements partial least squares regression on some spectral data, with the optimal number of components being determined as the number components that yielded the lowest MSE. My attempt is shown below, I am able to replicate the authors MSE result for the optimal calibration, and the default parameters of the __init__ below are set to those parameters. Note that I am inheriting from BaseEstiamtor and RegressorMixin.
#download the .csv from the github repo from the blog post
#Creating df, shuffling, then creating `X` and `y`
df = pd.read_csv("nirpyresearch/data/peach_spectra+brixvalues.csv")
df = df.sample(replace=False, frac=1).copy()
y = df['Brix'].values
X = df[[i for i in list(df.columns) if 'wl' in i]].values
class SavgolPLS(BaseEstimator, RegressorMixin):
"""My Regressor"""
def __init__(self, savgol_window = 17, savgol_polyorder = 2, savgol_deriv = 2, pls_components = 7 ):
self.savgol_window = savgol_window
self.savgol_polyorder = savgol_polyorder
self.savgol_deriv = savgol_deriv
self.pls_components = pls_components
def fit(self, X, y):
# Check that X and y have correct shape
X, y = check_X_y(X, y)
self.X_ = X
self.y_ = y
self.X_savgol_ = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
self.pls_ = PLSRegression(n_components=self.pls_components).fit(self.X_savgol_, self.y_)
# Return the classifier
return self
def predict(self, X, apply_savgol = True):
# Check is fit had been called
#check_is_fitted(self)
# Input validation
X = check_array(X)
if apply_savgol:
X = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
pred_y = self.pls_.predict(X)
return pred_y
def score(self, y_pred):
mse = mean_squared_error( y_true = self.y_, y_pred=y_pred,)
return mse
I can now initialize the model and use use .get_params() to get a dict containing the 4 parameters in the __init__.
s_pls = SavgolPLS(pls_components=7)
s_pls.get_params()
Thus, get_params() seems to exist. . . which makes sense given that it was inherited from BaseEstimator. I can also use the fit() method to replicate the authors results.
s_pls = s_pls.fit(X = X, y = y)
y_pred = s_pls.predict(X)
#This should be ~0.6566
s_pls.score(y_pred)
Why then, does applying GridSearchCV in code below generate the shown error?
parameters ={'savgol_window':[3,30], 'savgol_polyorder':[2,4], 'savgol_deriv':[1,3], 'pls_components':[2,15]}
clf = GridSearchCV(SavgolPLS, parameters, cv = 10)
clf.fit(X, y)
Yields
TypeError Traceback (most recent call last)
<ipython-input-22-e20c1eabb4fa> in <module>
----> 1 clf.fit(X, y.ravel())
C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
631 n_splits = cv.get_n_splits(X, y, groups)
632
--> 633 base_estimator = clone(self.estimator)
634
635 parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\base.py in clone(estimator, safe)
58 "it does not seem to be a scikit-learn estimator "
59 "as it does not implement a 'get_params' methods."
---> 60 % (repr(estimator), type(estimator)))
61 klass = estimator.__class__
62 new_object_params = estimator.get_params(deep=False)
TypeError: Cannot clone object '<class '__main__.SavgolPLS'>' (type <class 'type'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
Thank you for your help!
You're passing a class to GridSearchCV, you should pass an instance: clf = GridSearchCV(SavgolPLS(), parameters, cv = 10)

AttributeError: 'function' object has no attribute 'predict'. Keras

I am working on an RL problem and I created a class to initialize the model and other parameters. The code is as follows:
class Agent:
def __init__(self, state_size, is_eval=False, model_name=""):
self.state_size = state_size
self.action_size = 20 # measurement, CNOT, bit-flip
self.memory = deque(maxlen=1000)
self.inventory = []
self.model_name = model_name
self.is_eval = is_eval
self.done = False
self.gamma = 0.95
self.epsilon = 1.0
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
def model(self):
model = Sequential()
model.add(Dense(units=16, input_dim=self.state_size, activation="relu"))
model.add(Dense(units=32, activation="relu"))
model.add(Dense(units=8, activation="relu"))
model.add(Dense(self.action_size, activation="softmax"))
model.compile(loss="categorical_crossentropy", optimizer=Adam(lr=0.003))
return model
def act(self, state):
options = self.model.predict(state)
return np.argmax(options[0]), options
I want to run it for only one iteration, hence I create an object and I pass a vector of length 16 like this:
agent = Agent(density.flatten().shape)
state = density.flatten()
action, probs = agent.act(state)
However, I get the following error:
AttributeError Traceback (most recent call last) <ipython-input-14-4f0ff0c40f49> in <module>
----> 1 action, probs = agent.act(state)
<ipython-input-10-562aaf040521> in act(self, state)
39 # return random.randrange(self.action_size)
40 # model = self.model()
---> 41 options = self.model.predict(state)
42 return np.argmax(options[0]), options
43
AttributeError: 'function' object has no attribute 'predict'
What's the issue? I checked some other people's codes as well, like this and I think mine is also very similar.
Let me know.
EDIT:
I changed the argument in Dense from input_dim to input_shape and self.model.predict(state) to self.model().predict(state).
Now when I run the NN for one input data of shape (16,1), I get the following error:
ValueError: Error when checking input: expected dense_1_input to have
3 dimensions, but got array with shape (16, 1)
And when I run it with shape (1,16), I get the following error:
ValueError: Error when checking input: expected dense_1_input to have
3 dimensions, but got array with shape (1, 16)
What should I do in this case?
in last code block,
def act(self, state):
options = self.model.predict(state)
return np.argmax(options[0]), options
self.model is a function which is returning a model, it should be self.model().predict(state)
I used np.reshape. So in this case, I did
density_test = np.reshape(density.flatten(), (1,1,16))
and the network gave the output.

PyTorch RuntimeError : Gradients are not CUDA tensors

I am getting the following error while doing seq to seq on characters and feeding to LSTM, and decoding to words using attention. The forward propagation is fine but while computing loss.backward() I am getting the following error.
RuntimeError: Gradients aren't CUDA tensors
My train() function is as followed.
def train(input_batch, input_batch_length, target_batch, target_batch_length, batch_size):
# Zero gradients of both optimizers
encoderchar_optimizer.zero_grad()
encoder_optimizer.zero_grad()
decoder_optimizer.zero_grad()
encoder_input = Variable(torch.FloatTensor(len(input_batch), batch_size, 500))
for ix , w in enumerate(input_batch):
w = w.contiguous().view(15, batch_size)
reshaped_input_length = [x[ix] for x in input_batch_length] # [15 ,.. 30 times] * 128
if USE_CUDA:
w = w.cuda()
#reshaped_input_length = Variable(torch.LongTensor(reshaped_input_length)).cuda()
hidden_all , output = encoderchar(w, reshaped_input_length)
encoder_input[ix] = output.transpose(0,1).contiguous().view(batch_size, -1)
if USE_CUDA:
encoder_input = encoder_input.cuda()
temporary_target_batch_length = [15] * batch_size
encoder_hidden_all, encoder_output = encoder(encoder_input, target_batch_length)
decoder_input = Variable(torch.LongTensor([SOS_token] * batch_size))
decoder_hidden = encoder_output
max_target_length = max(temporary_target_batch_length)
all_decoder_outputs = Variable(torch.zeros(max_target_length, batch_size, decoder.output_size))
# Move new Variables to CUDA
if USE_CUDA:
decoder_input = decoder_input.cuda()
all_decoder_outputs = all_decoder_outputs.cuda()
target_batch = target_batch.cuda()
# Run through decoder one time step at a time
for t in range(max_target_length):
decoder_output, decoder_hidden, decoder_attn = decoder(
decoder_input, decoder_hidden, encoder_hidden_all
)
all_decoder_outputs[t] = decoder_output
decoder_input = target_batch[t] # Next input is current target
if USE_CUDA:
decoder_input = decoder_input.cuda()
# Loss calculation and backpropagation
loss = masked_cross_entropy(
all_decoder_outputs.transpose(0, 1).contiguous(), # -> batch x seq
target_batch.transpose(0, 1).contiguous(), # -> batch x seq
target_batch_length
)
loss.backward()
# Clip gradient norms
ecc = torch.nn.utils.clip_grad_norm(encoderchar.parameters(), clip)
ec = torch.nn.utils.clip_grad_norm(encoder.parameters(), clip)
dc = torch.nn.utils.clip_grad_norm(decoder.parameters(), clip)
# Update parameters with optimizers
encoderchar_optimizer.step()
encoder_optimizer.step()
decoder_optimizer.step()
return loss.data[0], ec, dc
Full Stack Trace is here.
RuntimeError Traceback (most recent call last)
<ipython-input-10-9778e12ded02> in <module>()
11 data_target_batch_index= Variable(torch.LongTensor(data_target_batch_index)).transpose(0,1)
12 # Send the data for training
---> 13 loss, ar1, ar2 = train(data_input_batch_index, data_input_batch_length, data_target_batch_index, data_target_batch_length, batch_size)
14
15 # Keep track of loss
<ipython-input-8-9c71c385f8cd> in train(input_batch, input_batch_length, target_batch, target_batch_length, batch_size)
54 target_batch_length
55 )
---> 56 loss.backward()
57
58 # Clip gradient norms
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/variable.py in backward(self, gradient, retain_variables)
144 'or with gradient w.r.t. the variable')
145 gradient = self.data.new().resize_as_(self.data).fill_(1)
--> 146 self._execution_engine.run_backward((self,), (gradient,), retain_variables)
147
148 def register_hook(self, hook):
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/function.py in _do_backward(self, gradients, retain_variables)
207 def _do_backward(self, gradients, retain_variables):
208 self.retain_variables = retain_variables
--> 209 result = super(NestedIOFunction, self)._do_backward(gradients, retain_variables)
210 if not retain_variables:
211 del self._nested_output
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/autograd/function.py in backward(self, *gradients)
215 def backward(self, *gradients):
216 nested_gradients = _unflatten(gradients, self._nested_output)
--> 217 result = self.backward_extended(*nested_gradients)
218 return tuple(_iter_None_tensors(result))
219
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/_functions/rnn.py in backward_extended(self, grad_output, grad_hy)
314 grad_hy,
315 grad_input,
--> 316 grad_hx)
317
318 if any(self.needs_input_grad[1:]):
/home/ubuntu/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/backends/cudnn/rnn.py in backward_grad(fn, input, hx, weight, output, grad_output, grad_hy, grad_input, grad_hx)
371 hidden_size, dcy.size()))
372 if not dhy.is_cuda or not dy.is_cuda or (dcy is not None and not dcy.is_cuda):
--> 373 raise RuntimeError('Gradients aren\'t CUDA tensors')
374
375 check_error(cudnn.lib.cudnnRNNBackwardData(
RuntimeError: Gradients aren't CUDA tensors
any suggestions about why I am doing wrong?
Make sure that all the objects that inherit nn.Module also call their .cuda(). Make sure to call before you pass any tensor to them. (essentially before training)
For example, (and I am guessing your encoder and decoder are such objects), do this right before you call train().
encoder = encoder.cuda()
decoder = decoder.cuda()
This ensures that all of the model's parameters are initialized in cuda memory.
Edit
In general, whenever you have this kind of error,
RuntimeError: Gradients aren't CUDA tensors
somewhere, (from your model creation, to defining inputs, to finally supplying the outputs to the loss function) you missed specifying a Variable object to be in GPU memory. You will have go through every step in your model, verifying all Variable objects to be in GPU memory.
Additionally, you dont have to call .cuda() on the outputs. Given that the inputs are in gpu's memory, all operations also takes place in gpu's memory, and so are your outputs.

Resources