I can use the vectors data, row_ind and col_ind to create a sparse Matrix with the function sparse.csr as follows:
sparse.csr_matrix((data,(row_ind,col_ind)),[shape=(M, N)])
However, assuming that I have a sparse Matrix A as my input; how do I extract the data, row_ind and col_ind vectors?
Thanks in advance
Just figured it out. The sparse matrix has to be created With sparse.dok_matrix() and then the values can be extracted with the method values()
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.dok_matrix.html
It does not work with sparse.csr_matrix() and sparse.csC_matrix() since the method values()is not available for them
Related
I want a similar function as in https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.circulant.html to create a circulant matrix using PyTorch. I need this as a part of my Deep Learning model (in order to reduce over-parametrization in some of my Fully Connected layers as suggested in https://arxiv.org/abs/1907.08448 (Fig.3))
The input of the function shall be a 1D torch tensor, and the output should be the 2D circulant matrix.
You can make use of unfold to extract sliding windows. But to get the correct order you need to flip (later unflip) the tensors, and first concatenate the flipped tensor to itself.
circ=lambda v:torch.cat([f:=v.flip(0),f[:-1]]).unfold(0,len(v),1).flip(0)
Here is a generic function for pytorch tensors, to get the circulant matrix for one dimension. It's based on unfold and it works for 2d circulant matrix or high-dimension tensors.
def circulant(tensor, dim):
"""get a circulant version of the tensor along the {dim} dimension.
The additional axis is appended as the last dimension.
E.g. tensor=[0,1,2], dim=0 --> [[0,1,2],[2,0,1],[1,2,0]]"""
S = tensor.shape[dim]
tmp = torch.cat([tensor.flip((dim,)), torch.narrow(tensor.flip((dim,)), dim=dim, start=0, length=S-1)], dim=dim)
return tmp.unfold(dim, S, 1).flip((-1,))
Essentially, this is a PyTorch version of scipy.linalg.circulant and works for multi-dimension tensors.
Also a similar question: Create array/tensor of cycle shifted arrays
I have a list of indices and values. I want to create a sparse tensor of size 30000 from this indices and values as follows.
indices = torch.LongTensor([1,3,4,6])
values = torch.FloatTensor([1,1,1,1])
So, I want to build a 30k dimensional sparse tensor in which the indices [1,3,4,6] are ones and the rest are zeros. How can I do that?
I want to store the sequences of such sparce tensors efficiently.
In general the indices tensor needs to have shape (sparse_dim, nnz) where nnz is the number of non-zero entries and sparse_dim is the number of dimensions for your sparse tensor.
In your case nnz = 4 and sparse_dim = 1 since your desired tensor is 1D. All we need to do to make your indices work is to insert a unitary dimension at the front of indices to make it shape (1, 4).
t = torch.sparse_coo_tensor(indices.unsqueeze(0), values, (30000,))
or equivalently
t = torch.sparse.FloatTensor(indices.unsqueeze(0), values, (30000,))
Keep in mind only a limited number of operations are supported on sparse tensors. To convert a tensor back to it's dense (inefficient) representation you can use the to_dense method
t_dense = t.to_dense()
I have a model that represents a collection of documents in multidimensional vector space. So, for example, for 100k documents, my model represents them in the form of 300 dimensional vectors. So, finally, I get a matrix of size [100K, 300]. For retrieving those documents according to relevance to the given query, I do matrix multiplication. For example, I represent a given query as a [300, 1]. Then I get the cosine similarity scores using matrix multiplication as follows :
[100K, 300]*[300, 1] = [100K, 1].
Now how can I retrieve top 1000 documents from this collection with highest cosine similarity. The trivial way would be to sort based on cosine similarity and grab the first 1000 docs. Is there any way to retrieve the documents this way using some function in pytorch?
I mean, how can I get the indices of highest 1000 values from a 1D torch tensor?p
Once you have the similarity scores after the dot product.
you can get the top 1000 indices as follows
top_indices = torch.argsort(sims)[:1000]
similar_docs = sims[top_indices]
I think you are looking for torch.topk it will return top k largest elements values and indices both.
For example
x = torch.arange(100).view(-1,1)
x.shape
torch.Size([100, 1])
value, indices = x.topk(k=10, dim=0)
value
tensor([[99],
[98],
[97],
[96],
[95],
[94],
[93],
[92],
[91],
[90]])
indices
tensor([[99],
[98],
[97],
[96],
[95],
[94],
[93],
[92],
[91],
[90]])
I have an array as below.
X1=np.array([[0,0],[0,0]])
X1[:,0].shape gives me (2,).
How do I convert X1[:,0] to a shape of (2,1).
thanks for asking. what you have is a two by two matrix, so turning one part of the array to a two dimensional array will cause an error. I think you should create a new array from the sub array and then reshape it. you can do it like this new_x = X[:,0]
new_x.reshape(2,1). I hope this works
In pytorch, I have multiple (scale of hundred thousand) 300 dim vectors (which I think I should upload in a matrix), I want to sort them by their cosine similarity with another vector and extract the top-1000. I want to avoid for loop as it is time consuming. I was looking for an efficient solution.
You can use torch.nn.functional.cosine_similarity function for computing cosine similarity. And torch.argsort to extract top 1000.
Here is an example:
x = torch.rand(10000,300)
y = torch.rand(1,300)
dist = F.cosine_similarity(x,y)
index_sorted = torch.argsort(dist)
top_1000 = index_sorted[:1000]
Please note the shape of y, don't forget to reshape before calling similarity function. Also note that argsort simply returns the indexes of closest vectors. To access those vectors themselves, just write x[top_1000], which will return a matrix shaped (1000,300).