I'm looking at this implementation of SGD for PyTorch: https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD
And I see some strange calculations which I don't understand.
For instance, take a look at p.data.add_(-group['lr'], d_p). It makes sense to think that there is a multiplication of the two parameters, right? (It's how SGD works, -lr * grads)
But the documentation of the function doesn't say anything about this.
And what is more confusing, although this SGD code actually works (I tested by copying the code and calling prints below the add_), I can't simply use add_ with two arguments as it does:
#this returns an error about using too many arguments
import torch
a = torch.tensor([1,2,3])
b = torch.tensor([6,10,15])
c = torch.tensor([100,100,100])
a.add_(b, c)
print(a)
What's going on here? What am I missing?
This works for scalars:
a = t.tensor(1)
b = t.tensor(2)
c = t.tensor(3)
a.add_(b, c)
print(a)
tensor(7)
Or a can be a tensor:
a = t.tensor([[1,1],[1,1]])
b = t.tensor(2)
c = t.tensor(3)
a.add_(b, c)
print(a)
tensor([[7, 7],
[7, 7]])
Output is 7, because: (Tensor other, Number alpha)
Related
I am trying to code up the positional encoding in the transformers paper. In order to do so I need to do an operation similar to the following:
a = torch.arange(20).reshape(4,5)
b = a * 2
c = torch.cat([torch.stack([a_row,b_row]) for a_row, b_row in zip(a,b)])
I feel like there might be a faster way to do the above? perhaps by adding a dimension on to a and b?
I would simply use the assignment operator for this:
c = torch.zeros(8, 5)
c[::2, :] = a # Index every second row, starting from 0
c[1::2, :] = b # Index every second row, starting from 1
When timing the two solutions, I used the following:
import timeit
import torch
a = torch.arange(20).reshape(4,5)
b = a * 2
suggested = timeit.timeit("c = torch.cat([torch.stack([a_row, b_row]) for a_row, b_row in zip (a, b)])",
setup="import torch; from __main__ import a, b", number=10000)
print(suggested/10000)
# 4.5105120493099096e-05
improved = timeit.timeit("c = torch.zeros(8, 5); c[::2, :] = a; c[1::2, :] = b",
setup="import torch; from __main__ import a, b", number=10000)
print(improved/10000)
# 2.1489459509029985e-05
The second approach takes consistently less (approximately half) the time, even though a single iteration is still very fast. Of course, you would have to test this for your actual tensor sizes, but that is the most straightforward solution I could come up with.
Can't wait to see if anyone has some nifty low-level solution for this that is even faster!
Also, keep in mind that I did not time the creation of b, assuming that the tensors you want to interweave are already given.
So turns out simple concatenation and reshaping does the trick:
c = torch.cat([a, b], dim=-1).view(-1, a.shape[-1])
When I timed it with the following it was about 2.3x faster than #dennlinger's answer:
improved2 = timeit.timeit("c = torch.cat([a, b], dim=-1).view(-1, a.shape[-1])",
setup="import torch; from __main__ import a, b",
number=10000)
print(improved2/10000)
# 7.253780400003507e-06
print(improved / improved2)
# 2.3988091506044955
I am using Python at the moment and I have a function that I need to multiply against itself for different constants.
e.g. If I have f(x,y)= x^2y+a, where 'a' is some constant (possibly list of constants).
If 'a' is a list (of unknown size as it depends on the input), then if we say a = [1,3,7] the operation I want to do is
(x^2y+1)*(x^2y+3)*(x^2y+7)
but generalised to n elements in 'a'. Is there an easy way to do this in Python as I can't think of a decent way around this problem? If the size in 'a' was fixed then it would seem much easier as I could just define the functions separately and then multiply them together in a new function, but since the size isn't fixed this approach isn't suitable. Does anyone have a way around this?
You can numpy ftw, it's fairly easy to get into.
import numpy as np
a = np.array([1,3,7])
x = 10
y = 0.2
print(x ** (2*y) + a)
print(np.sum(x**(2*y)+a))
Output:
[3.51188643 5.51188643 9.51188643]
18.53565929452874
I haven't really got much for it to be honest, I'm still trying to figure out how to get the functions to not overlap.
a=[1,3,7]
for i in range(0,len(a)-1):
def f(x,y):
return (x**2)*y+a[i]
def g(x,y):
return (x**2)*y+a[i+1]
def h(x,y):
return f(x,y)*g(x,y)
f1= lambda y, x: h(x,y)
integrate.dblquad(f1, 0, 2, lambda x: 1, lambda x: 10)
I should have clarified that the end result can't be in floats as it needs to be integrated afterwards using dblquad.
Just for practice, I am using nested lists (for exaple, [[1, 0], [0, 1]] is the 2*2 identity matrix) as matrices. I am trying to compute determinant by reducing it to an upper triangular matrix and then by multiplying its diagonal entries. To do this:
"""adds two matrices"""
def add(A, B):
S = []
for i in range(len(A)):
row = []
for j in range(len(A[0])):
row.append(A[i][j] + B[i][j])
S.append(row)
return S
"""scalar multiplication of matrix with n"""
def scale(n, A):
return [[(n)*x for x in row] for row in A]
def detr(M):
Mi = M
#the loops below are supossed to convert Mi
#to upper triangular form:
for i in range(len(Mi)):
for j in range(len(Mi)):
if j>i:
k = -(Mi[j][i])/(Mi[i][i])
Mi[j] = add( scale(k, [Mi[i]]), [Mi[j]] )[0]
#multiplies diagonal entries of Mi:
k = 1
for i in range(len(Mi)):
k = k*Mi[i][i]
return k
Here, you can see that I have set M (argument) equal to Mi and and then operated on Mi to take it to upper triangular form. So, M is supposed to stay unmodified. But after using detr(A), print(A) prints the upper triangular matrix. I tried:
setting X = M, then Mi = X
defining kill(M): return M and then setting Mi = kill(M)
But these approaches are not working. This was causing some problems as I was trying to use detr(M) in another function, problems which I was able to bypass, but why is this happening? What is the compiler doing here, why was M modified even though I operated only on Mi?
(I am using Spyder 3.3.2, Python 3.7.1)
(I am sorry if this question is silly, but I have only started learning python and new to coding in general. This question means a lot to me because I still don't have a deep understanding of this language.)
See python documentation about assignment:
https://docs.python.org/3/library/copy.html
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
You need to import copy and then use Mi = copy.deepcopy(M)
See also
How to deep copy a list?
Let's say we have a tensor of size B x C x W x H (as common for batches of images), and we want to reshape it to B x M where M = C*W*H. Is there a built in way to do so without explicitly mentioning B?
If we know B in advance we can do following, even without explicitly knowing any of the three C,W,H:
a = torch.randn(20,3,512,512)
b = a.reshape((20, -1)) #we can use -1 to infer the dimension `M`
But can we also do so without knowing B?
(I know we could obviously find B using B = a.shape[0], but my question is whether it is possible without knowing B either.)
The only other way would be to calculate the second dimension and use -1 for the first.
a = torch.randn(20,3,512,512)
print(a.shape)
b = a.reshape((20, -1))
print(b.shape)
b = a.reshape((-1, 786432)) # 3*512*512
print(b.shape)
torch.Size([20, 3, 512, 512])
torch.Size([20, 786432])
torch.Size([20, 786432])
Because there can be only one -1 when reshape.
In principle you could make it a generic function working with any batch size by simply using first dimension of input, e.g.:
a = torch.randn(20, 3, 512, 512)
b = a.reshape((a.shape[0], -1))
You can wrap it in a function and just call it whenever necessary.
I have the following code:
a = torch.randint(0,10,[3,3,3,3])
b = torch.LongTensor([1,1,1,1])
I have a multi-dimensional index b and want to use it to select a single cell in a. If b wasn't a tensor, I could do:
a[1,1,1,1]
Which returns the correct cell, but:
a[b]
Doesn't work, because it just selects a[1] four times.
How can I do this? Thanks
A more elegant (and simpler) solution might be to simply cast b as a tuple:
a[tuple(b)]
Out[10]: tensor(5.)
I was curious to see how this works with "regular" numpy, and found a related article explaining this quite well here.
You can split b into 4 using chunk, and then use the chunked b to index the specific element you want:
>> a = torch.arange(3*3*3*3).view(3,3,3,3)
>> b = torch.LongTensor([[1,1,1,1], [2,2,2,2], [0, 0, 0, 0]]).t()
>> a[b.chunk(chunks=4, dim=0)] # here's the trick!
Out[24]: tensor([[40, 80, 0]])
What's nice about it is that it can be easily generalized to any dimension of a, you just need to make number of chucks equal the dimension of a.