Pytorch device and .to(device) method - pytorch

I'm trying to learn RNN and Pytorch.
So I saw some codes for RNN where in the forward probagation method, they did a check like this:
def forward(self, inputs, hidden):
if inputs.is_cuda:
device = inputs.get_device()
else:
device = torch.device("cpu")
embed_out = self.embeddings(inputs)
logits = torch.zeros(self.seq_len, self.batch_size, self.vocab_size).to(device)
I think the point of the check is to see if we can run the code on faster GPU instead of CPU?
To understand the code a bit more, I did the following:
ex= torch.zeros(3,10,5)
ex1= torch.tensor(np.array([[0,0,0,1,0], [1,0,0,0,0],[0,1,0,0,0]]))
print(ex)
print("device is")
print(ex1.get_device())
print(ex.to(ex1.get_device()))
And the output was:
...
[[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]])
device is
-1
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-b09342e2ba0f> in <module>()
67 print("device is")
68 print(ex1.get_device())
---> 69 print(ex.to(ex1.get_device()))
RuntimeError: Device index must not be negative
I don't understand the "device" in the code and I don't understand the .to(device) method. Can you help me understand it?

This code is deprecated. Just do:
def forward(self, inputs, hidden):
embed_out = self.embeddings(inputs)
logits = torch.zeros((self.seq_len, self.batch_size, self.vocab_size), device=inputs.device)
Note that to(device) is cost-free if the tensor is already on the requested device. And do not use get_device() but rather device attribute. It is working fine with cpu and gpu out of the box.
Also, note that torch.tensor(np.array(...)) is a bad practice for several reasons. First, to convert numpy array to torch tensor either use as_tensor or from_numpy. THen, you will get a tensor with default numpy dtype instead of torch. In this case it is the same (int64), but for float it would be different. Finally, torch.tensor can be initialized using a list, just as numpy array, so you can get rid of numpy completely and call torch directly.

Related

Batched index_fill in PyTorch

I have an index tensor of size (2, 3):
>>> index = torch.empty(6).random_(0,8).view(2,3)
tensor([[6., 3., 2.],
[3., 4., 7.]])
And a value tensor of size (2, 8):
>>> value = torch.zeros(2,8)
tensor([[0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0.]])
I want to set the element in value to 1 by the index along dim=-1.** The output should be like:
>>> output
tensor([[0., 0., 1., 1., 0., 0., 1., 0.],
[0., 0., 0., 1., 1., 0., 0., 1.]])
I tried value[range(2), index] = 1 but it triggers an error. I also tried torch.index_fill but it doesn't accept batched indices. torch.scatter requires creating an extra tensor of size 2*8 full of 1, which consumes unnecessary memory and time.
You can actually use torch.Tensor.scatter_ by setting the value (int) option instead of the src option (Tensor).
>>> value.scatter_(dim=-1, index=index.long(), value=1)
>>> value
tensor([[0., 0., 1., 1., 0., 0., 1., 0.],
[0., 0., 0., 1., 1., 0., 0., 1.]])
Make sure the index is of type int64 though.

UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach()

I'm new on PyTorch and I'm trying to code with it
so I have a function called OH which tack a number and return a vector like this
def OH(x,end=10,l=12):
x = T.LongTensor([[x]])
end = T.LongTensor([[end]])
one_hot_x = T.FloatTensor(1,l)
one_hot_end = T.FloatTensor(1,l)
first=one_hot_x.zero_().scatter_(1,x,1)
second=one_hot_end.zero_().scatter_(1,end,1)
vector=T.cat((one_hot_x,one_hot_end),dim=1)
return vector
OH(0)
output:
tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 0.]])
now I have a NN that takes this output and return number but this warning always appear in my compiling
online.act(OH(obs))
output:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:17: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
4
I tried to to use online.act(OH(obs).clone().detach()) but it give me the same warning
and the code works fine and give good results but I need to understand this warning
Edit
the following is my NN that has the act function
class Network(nn.Module):
def __init__(self,lr,n_action,input_dim):
super(Network,self).__init__()
self.f1=nn.Linear(input_dim,128)
self.f2=nn.Linear(128,64)
self.f3=nn.Linear(64,32)
self.f4=nn.Linear(32,n_action)
#self.optimizer=optim.Adam(self.parameters(),lr=lr)
#self.loss=nn.MSELoss()
self.device=T.device('cuda' if T.cuda.is_available() else 'cpu')
self.to(self.device)
def forward(self,x):
x=F.relu(self.f1(x))
x=F.relu(self.f2(x))
x=F.relu(self.f3(x))
x=self.f4(x)
return x
def act(self,obs):
state=T.tensor(obs).to(device)
actions=self.forward(state)
action=T.argmax(actions).item()
return action
the problem is that you are receiving a tensor on the act function on the Network and then save it as a tensor
just remove the tensor in the action like this
def act(self,obs):
#state=T.tensor(obs).to(device)
state=obs.to(device)
actions=self.forward(state)
action=T.argmax(actions).item()

How to get probability of each class instead of one hot encoded array with one value 1 and others 0?

My Sequential CNN model is trained on 39 classes as a multi-class classifier. As for predictions, it returns a one-hot encoded array like [0,0,...1,0,...] whereas I want something like [0.012,0.022,0.067,...,0.997,0.0004,...]
Is there a way to get this? if not what exactly should I make to get these?
The reason I want it this way is to verify how close are other classes, so if one says 0.98 and others say 0.96 then I am doing something wrong, data isn't enough, etc..
Thank you :)
My model is basically a keras.model resnet50 with following configs :
model = keras.applications.resnet.ResNet50(include_top=False, weights=None, input_tensor=None, input_shape=(64,64,1), pooling='avg', classes=39)
x = model.output
x = Dropout(0.7)(x)
num_classes = 39
predictions = Dense(num_classes, activation= 'softmax')(x)
model = Model(inputs = model.input, outputs = predictions)
optimizer = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer, loss='categorical_crossentropy', metrics=['categorical_accuracy'], loss_weights=None, sample_weight_mode=None, weighted_metrics=None, target_tensors=None)
Sample input :
import cv2
img = cv2.imread(IMAGE_PATH, 0)
img = cv2.resize(img, (64,64))
img = np.reshape(img, (1,64,64,1))
predicted_class_indices = np.argmax(model.predict(img, verbose = 1))
Sample output:
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
Desired output (numbers are hypothetical):
array([[0.022, 0.353, 0.0535, 0.52, 0212., 0.822, 0.532, 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
One way to do so is to remove the last activation layer (related issue).
You can do so by using model.layers[-1].activation=None.
However, the softmax shouldn't output a one-hot vector but the prob distribution, you might want to check how your training is doing.

Getting embedding matrix of all zeros after performing word embedding on any input data

I am trying to do word embeddings in Keras. I am using 'glove.6B.50d.txt' for the purpose. I am able to get correct output till the preparation of embedding index from the "glove.6B.50d.txt" file.
But I'm always getting embedding matrix full of zeros whenever I map the word from the input provided by me to that in the embedding index.
Here is the code:
#here is the example sentence given as input
line="The quick brown fox jumped over the lazy dog"
line=line.split(" ")
#this is my embedding file
EMBEDDING_FILE='glove.6B.50d.txt'
embed_size = 10 # how big is each word vector
max_features = 10000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 10 # max number of words in a comment to use
tokenizer = Tokenizer(num_words=max_features,split=" ",char_level=False)
tokenizer.fit_on_texts(list(line))
list_tokenized_train = tokenizer.texts_to_sequences(line)
sequences = tokenizer.texts_to_sequences(line)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
X_t = pad_sequences(list_tokenized_train, maxlen=maxlen)
print(sequences)
print(word_index)
print('Shape of data tensor:', X_t.shape)
#got correct output here as
# Found 8 unique tokens.
#[[1], [2], [3], [4], [5], [6], [1], [7], [8]]
#{'the': 1, 'quick': 2, 'brown': 3, 'fox': 4, 'jumped': 5, 'over': 6, 'lazy': 7, 'dog': 8}
# Shape of data tensor: (9, 10)
#loading the embedding file to prepare embedding index matrix
embeddings_index = {}
for i in open(EMBEDDING_FILE, "rb"):
values = i.split()
word = values[0]
#print(word)
coefs = np.asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
print('Found %s word vectors.' % len(embeddings_index))
#Found 400000 word vectors.
#making the embedding matrix
embedding_matrix = np.zeros((len(word_index) + 1, embed_size))
for word, i in word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
# words not found in embedding index will be all-zeros.
embedding_matrix[i] = embedding_vector
Here when I print the embedding matrix ,I get all zeros in it (i.e not a single word in input is recognized).
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Also if I print the embeddings_index.get(word) for each iteration, it is unable to fetch the word and returns NONE.
Where am I going wrong in the code?
The embed size should be 50 not 10 (it indicates the dimensionality of the word embedding )
The number of features should >>50 (make it close to 10,000). Restricting it to 50 means a whole lot of the vectors will be missing
Got the problem solved today.
Seems like embeddings_index.get(word) was unable to get the word because of some encoding issues.
I changed for i in open(EMBEDDING_FILE, "rb"): present in the preparation of embedding matrix to for i in open(EMBEDDING_FILE, 'r', encoding='utf-8'):
and this solved the problem.

Buidling matrix using scan within theano

I'm pretty certain this is trivial, but I haven't yet managed to quite get my head around scan. I want to iteratively build a matrix of values, m, where
m[i,j] = f(m[k,l]) for k < i, j < l
so you could think of it as a dynamic programming problem. However, I can't even generate the list [1..100] by iterating over the list [1..100] and updating the shared value as I go.
import numpy as np
import theano as T
import theano.tensor as TT
def test():
arr = T.shared(np.zeros(100))
def grid(idx, arr):
return {arr: TT.set_subtensor(arr[idx], idx)}
T.scan(
grid,
sequences=TT.arange(100),
non_sequences=[arr])
return arr
run = T.function([], outputs=test())
run()
which returns
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0.])
There's a few things here that point towards some misunderstandings. scan really can be a hard bit of Theano to wrap your head around!
Here's some updated code that does what I think you're trying to do, but I wouldn't recommend using this code at all. The basic issue is that you seem to be using a shared variable inappropriately.
import numpy as np
import theano as T
import theano.tensor as TT
def test():
arr = T.shared(np.zeros(100))
def grid(idx, arr):
return {arr: TT.set_subtensor(arr[idx], idx)}
_, updates = T.scan(
grid,
sequences=TT.arange(100),
non_sequences=[arr])
return arr, updates
outputs, updates = test()
run = T.function([], outputs=outputs, updates=updates)
print run()
print outputs.get_value()
This code is changed from the original in two ways:
The updates from the scan have to be captured (originally discarded) and passed to the theano.function's updates parameters. Without this the shared variable won't be updated at all.
The contents of the shared variable need to be examined after the function is executed (see below).
This code prints two sets of values. The first is the output of the Theano function from when it's executed. The second is the contents of the shared variable after the Theano function has executed. The Theano function returns the shared variable so you might think that these two sets of values should be the same, but you'd be wrong! No shared variables are updated until after all of the function's output values have been computed. So it's only after the function has been executed and we look at the contents of the shared variable that we see the values we expected to see originally.
Here's an example of implementing a dynamic programming algorithm in Theano. The algorithm is a simplified version of dynamic time warping which has a lot of similarities to edit distance.
import numpy
import theano
import theano.tensor as tt
def inner_step(j, c_ijm1, i, c_im1, x, y):
insert_cost = tt.switch(tt.eq(j, 0), numpy.inf, c_ijm1)
delete_cost = tt.switch(tt.eq(i, 0), numpy.inf, c_im1[j])
match_cost = tt.switch(tt.eq(i, 0), numpy.inf, c_im1[j - 1])
in_top_left = tt.and_(tt.eq(i, 0), tt.eq(j, 0))
min_c = tt.min(tt.stack([insert_cost, delete_cost, match_cost]))
c_ij = tt.abs_(x[i] - y[j]) + tt.switch(in_top_left, 0., min_c)
return c_ij
def outer_step(i, c_im1, x, y):
outputs, _ = theano.scan(inner_step, sequences=[tt.arange(y.shape[0])],
outputs_info=[tt.constant(0, dtype=theano.config.floatX)],
non_sequences=[i, c_im1, x, y], strict=True)
return outputs
def main():
x = tt.vector()
y = tt.vector()
outputs, _ = theano.scan(outer_step, sequences=[tt.arange(x.shape[0])],
outputs_info=[tt.zeros_like(y)],
non_sequences=[x, y], strict=True)
f = theano.function([x, y], outputs=outputs)
a = numpy.array([1, 2, 4, 8], dtype=theano.config.floatX)
b = numpy.array([2, 3, 4, 7, 8, 9], dtype=theano.config.floatX)
print a
print b
print f(a, b)
main()
This is highly simplified and I wouldn't recommend using it for real. In general Theano is very bad at doing dynamic programming because theano.scan is so slow in comparison to native looping. If you need to propagate gradients through a dynamic program then you may not have any choice but if you don't need gradients you should probably avoid using Theano for dynamic programming.
If you want a much more thorough implementation of DTW which gets over some of the performance hits Theano imposes by computing many comparisons in parallel (i.e. batching) then take a look here: https://github.com/danielrenshaw/TheanoBatchDTW.

Resources