I am trying to code up the positional encoding in the transformers paper. In order to do so I need to do an operation similar to the following:
a = torch.arange(20).reshape(4,5)
b = a * 2
c = torch.cat([torch.stack([a_row,b_row]) for a_row, b_row in zip(a,b)])
I feel like there might be a faster way to do the above? perhaps by adding a dimension on to a and b?
I would simply use the assignment operator for this:
c = torch.zeros(8, 5)
c[::2, :] = a # Index every second row, starting from 0
c[1::2, :] = b # Index every second row, starting from 1
When timing the two solutions, I used the following:
import timeit
import torch
a = torch.arange(20).reshape(4,5)
b = a * 2
suggested = timeit.timeit("c = torch.cat([torch.stack([a_row, b_row]) for a_row, b_row in zip (a, b)])",
setup="import torch; from __main__ import a, b", number=10000)
print(suggested/10000)
# 4.5105120493099096e-05
improved = timeit.timeit("c = torch.zeros(8, 5); c[::2, :] = a; c[1::2, :] = b",
setup="import torch; from __main__ import a, b", number=10000)
print(improved/10000)
# 2.1489459509029985e-05
The second approach takes consistently less (approximately half) the time, even though a single iteration is still very fast. Of course, you would have to test this for your actual tensor sizes, but that is the most straightforward solution I could come up with.
Can't wait to see if anyone has some nifty low-level solution for this that is even faster!
Also, keep in mind that I did not time the creation of b, assuming that the tensors you want to interweave are already given.
So turns out simple concatenation and reshaping does the trick:
c = torch.cat([a, b], dim=-1).view(-1, a.shape[-1])
When I timed it with the following it was about 2.3x faster than #dennlinger's answer:
improved2 = timeit.timeit("c = torch.cat([a, b], dim=-1).view(-1, a.shape[-1])",
setup="import torch; from __main__ import a, b",
number=10000)
print(improved2/10000)
# 7.253780400003507e-06
print(improved / improved2)
# 2.3988091506044955
Related
Here Get intersecting rows across two 2D numpy arrays they got intersecting rows by using the function np.intersect1d. So i changed the function to use np.setdiff1d to get the set difference but it doesn't work properly. The following is the code.
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.view(dtype), B.view(dtype))
return C.view(A.dtype).reshape(-1, ncols)
The following data is used for checking the issue:
min_dis=400
Xt = np.arange(50, 3950, min_dis)
Yt = np.arange(50, 3950, min_dis)
Xt, Yt = np.meshgrid(Xt, Yt)
Xt[::2] += min_dis/2
# This is the super set
turbs_possible_locs = np.vstack([Xt.flatten(), Yt.flatten()]).T
# This is the subset
subset = turbs_possible_locs[np.random.choice(turbs_possible_locs.shape[0],50, replace=False)]
diffs = set_diff2d(turbs_possible_locs, subset)
diffs is supposed to have a shape of 50x2, but it is not.
Ok, so to fix your issue try the following tweak:
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.copy().view(dtype), B.copy().view(dtype))
return C
The problem was - A after .view(...) was applied was broken in half - so it had 2 tuple columns, instead of 1, like B. I.e. as a consequence of applying dtype you essentially collapsed 2 columns into tuple - which is why you could do the intersection in 1d in the first place.
Quoting after documentation:
"
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of the array’s memory with a different data-type. This can cause a reinterpretation of the bytes of memory.
"
Src https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html
I think the "reinterpretation" is exactly what happened - hence for the sake of simplicity I would just .copy() the array.
NB however I wouldn't square it - it's always A which gets 'broken' - whether it's an assignment, or inline B is always fine...
I'm looking at this implementation of SGD for PyTorch: https://pytorch.org/docs/stable/_modules/torch/optim/sgd.html#SGD
And I see some strange calculations which I don't understand.
For instance, take a look at p.data.add_(-group['lr'], d_p). It makes sense to think that there is a multiplication of the two parameters, right? (It's how SGD works, -lr * grads)
But the documentation of the function doesn't say anything about this.
And what is more confusing, although this SGD code actually works (I tested by copying the code and calling prints below the add_), I can't simply use add_ with two arguments as it does:
#this returns an error about using too many arguments
import torch
a = torch.tensor([1,2,3])
b = torch.tensor([6,10,15])
c = torch.tensor([100,100,100])
a.add_(b, c)
print(a)
What's going on here? What am I missing?
This works for scalars:
a = t.tensor(1)
b = t.tensor(2)
c = t.tensor(3)
a.add_(b, c)
print(a)
tensor(7)
Or a can be a tensor:
a = t.tensor([[1,1],[1,1]])
b = t.tensor(2)
c = t.tensor(3)
a.add_(b, c)
print(a)
tensor([[7, 7],
[7, 7]])
Output is 7, because: (Tensor other, Number alpha)
I am trying to find the number of solutions of inequality
c > (a+(b^2)−1)/(a−1)
subject to constraints 2<=a<=A, 1<=b<=B, 1<=c<=C.
The approach I am using till now is to use a nested loop with outer loop for a and inner loop for b. I am trying to find ways to optimize my approach since A can be as large as 10^9.
Any suggestions would be appreciated.
Use numpy for vectorized operations. You can do something like this ->
import numpy as np
a = np.arange(2,A)
b = np.arange(1,B)
a1 = a.repeat(len(b), axis=0)
b1 = b.repeat(len(a), axis=0)
rhs = (a1 + (b1**2) -1) / (a1 - 1)
Just for practice, I am using nested lists (for exaple, [[1, 0], [0, 1]] is the 2*2 identity matrix) as matrices. I am trying to compute determinant by reducing it to an upper triangular matrix and then by multiplying its diagonal entries. To do this:
"""adds two matrices"""
def add(A, B):
S = []
for i in range(len(A)):
row = []
for j in range(len(A[0])):
row.append(A[i][j] + B[i][j])
S.append(row)
return S
"""scalar multiplication of matrix with n"""
def scale(n, A):
return [[(n)*x for x in row] for row in A]
def detr(M):
Mi = M
#the loops below are supossed to convert Mi
#to upper triangular form:
for i in range(len(Mi)):
for j in range(len(Mi)):
if j>i:
k = -(Mi[j][i])/(Mi[i][i])
Mi[j] = add( scale(k, [Mi[i]]), [Mi[j]] )[0]
#multiplies diagonal entries of Mi:
k = 1
for i in range(len(Mi)):
k = k*Mi[i][i]
return k
Here, you can see that I have set M (argument) equal to Mi and and then operated on Mi to take it to upper triangular form. So, M is supposed to stay unmodified. But after using detr(A), print(A) prints the upper triangular matrix. I tried:
setting X = M, then Mi = X
defining kill(M): return M and then setting Mi = kill(M)
But these approaches are not working. This was causing some problems as I was trying to use detr(M) in another function, problems which I was able to bypass, but why is this happening? What is the compiler doing here, why was M modified even though I operated only on Mi?
(I am using Spyder 3.3.2, Python 3.7.1)
(I am sorry if this question is silly, but I have only started learning python and new to coding in general. This question means a lot to me because I still don't have a deep understanding of this language.)
See python documentation about assignment:
https://docs.python.org/3/library/copy.html
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
You need to import copy and then use Mi = copy.deepcopy(M)
See also
How to deep copy a list?
I'm doing data analysis that involves minimizing the least-square-error between a set of points and a corresponding set of orthogonal functions. In other words, I'm taking a set of y-values and a set of functions, and trying to zero in on the x-value that gets all of the functions closest to their corresponding y-value. Everything is being done in a 'data_set' class. The functions that I'm comparing to are all stored in one list, and I'm using a class method to calculate the total lsq-error for all of them:
self.fits = [np.poly1d(np.polyfit(self.x_data, self.y_data[n],10)) for n in range(self.num_points)]
def error(self, x, y_set):
arr = [(y_set[n] - self.fits[n](x))**2 for n in range(self.num_points)]
return np.sum(arr)
This was fine when I had significantly more time than data, but now I'm taking thousands of x-values, each with a thousand y-values, and that for loop is unacceptably slow. I've been trying to use np.vectorize:
#global scope
def func(f,x):
return f(x)
vfunc = np.vectorize(func, excluded=['x'])
…
…
#within data_set class
def error(self, x, y_set):
arr = (y_set - vfunc(self.fits, x))**2
return np.sum(arr)
func(self.fits[n], x) works fine as long as n is valid, and as far as I can tell from the docs, vfunc(self.fits, x) should be equivalent to
[self.fits[n](x) for n in range(self.num_points)]
but instead it throws:
ValueError: cannot copy sequence with size 10 to array axis with dimension 11
10 is the degree of the polynomial fit, and 11 is (by definition) the number of terms in it, but I have no idea why they're showing up here. If I change the fit order, the error message reflects the change. It seems like np.vectorize is taking each element of self.fits as a list rather than a np.poly1d function.
Anyway, if someone could either help me understand np.vectorize better, or suggest another way to eliminate that loop, that would be swell.
As the functions in question all have a very similar structure we can "manually" vectorize once we've extracted the poly coefficients. In fact, the function is then a quite simple one-liner, eval_many below:
import numpy as np
def poly_vec(list_of_polys):
O = max(p.order for p in list_of_polys)+1
C = np.zeros((len(list_of_polys), O))
for p, c in zip(list_of_polys, C):
c[len(c)-p.order-1:] = p.coeffs
return C
def eval_many(x,C):
return C#np.vander(x,11).T
# make example
list_of_polys = [np.poly1d(v) for v in np.random.random((1000,11))]
x = np.random.random((2000,))
# put all coeffs in one master matrix
C = poly_vec(list_of_polys)
# test
assert np.allclose(eval_many(x,C), [p(x) for p in list_of_polys])
from timeit import timeit
print('vectorized', timeit(lambda: eval_many(x,C), number=100)*10)
print('loopy ', timeit(lambda: [p(x) for p in list_of_polys], number=10)*100)
Sample run:
vectorized 6.817315469961613
loopy 56.35076989419758