Numpy matrix addition vs ndarrays, convenient oneliner - python-3.x

How does numpy's matrix class work? I understand it will likely be removed in the future, so I am trying to understand how it works, so I can do the same with ndarrrays.
>>> x=np.matrix([[1,1,1],[2,2,2],[3,3,3]])
>>> x[:,0] + x[0,:]
matrix([[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
Seems like a row of ones got added to every row.
>>> x=np.matrix([[1,2,3],[1,2,3],[1,2,3]])
>>> x[0,:] + x[:,0]
matrix([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
Now it seems like a column of ones got added to every column. What it does it with the identity is even weirder,
>>> x=np.matrix([[1,0,0],[0,1,0],[0,0,1]])
>>> x[0,:] + x[:,0]
matrix([[2, 1, 1],
[1, 0, 0],
[1, 0, 0]])
EDIT:
It seems if you take a (N,1) shape matrix and add it to a (N,1) shape matrix, then of one these is replicated to form a (N,N) matrix and the other is added to every row or column of this new matrix. It seems to be a convenience restricted to vectors of the right sizes. A nice use case was networkx's implementation of Floyd-Warshal.
Is there an equivalently convenient one-liner for this using standard numpy ndarrays?

Related

How to create a list of arrays from multiple same-size vectors

I am attempting to create a list of arrays from 2 vectors.
I have a dataset I'm reading from a .csv file and need to pair each value with a 1 to create a list of arrays.
import numpy as np
Data = np.array([1, 2, 3, 4, 5]) #this is actually a column in a .csv file, but simplified it for the example
#do something here
output = ([1,1], [1,2], [1,3], [1,4], [1,5]) #2nd column in each array is the data, first is a 1
I've tried to use numpy concatenate and vstack, but they don't give me exactly what I'm looking for.
Any suggestions would be appreciated.
You can form the output using a list comprehension:
data = [1, 2, 3, 4, 5]
output = [[1, item] for item in data]
This will output:
[[1, 1], [1, 2], [1, 3], [1, 4], [1, 5]]

What's the most efficient way to scatter plot a pair of 2D lists on top of each other in python?

Working with python and matplotlib. Lets's say for example I have the following lists:
A=[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
B=[[4, 2, 6], [3, 2, 1], [5, 1, 4]]
Each row of these lists represent a single scatter plot, A being x-axis and B being y-axis. Is there an efficient way of stacking these scatter plots on top of each other into a single scatter plot? I have already tried a "for" loop:
for i in range(len(A)):
plt.scatter(A[i], B[i])
It works, but it's a bit slow when working with larger numbers of entries. Is there a more efficient way to do this?
Unless there is a reason to do multiple calls to scatter, I would recommend flattening the lists and doing a single call to plt.scatter like so:
import itertools
A=[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
B=[[4, 2, 6], [3, 2, 1], [5, 1, 4]]
A_flat = list(itertools.chain.from_iterable(A))
B_flat = list(itertools.chain.from_iterable(B))
plt.scatter(A_flat, B_flat)

How should I understand the nn.Embeddings arguments num_embeddings and embedding_dim?

I'm trying to get used to the Embedding class in the PyTorch nn module.
I've noticed that quite a few other people have had the same problem as myself, and therefore posted questions on the PyTorch discussion forum and on Stack Overflow, but I'm still having some confusion.
According to the official documentation, the arguments that are passed are num_embeddings and embedding_dim which each refer to how large our dictionary (or vocabulary) is and how many dimensions we want our embeddings to be, respectively.
What I'm confused about is how exactly I should interpret those. For example, the small practice code that I ran:
import torch
import torch.nn as nn
embedding = nn.Embedding(num_embeddings=10, embedding_dim=3)
a = torch.LongTensor([[1, 2, 3, 4], [4, 3, 2, 1]]) # (2, 4)
b = torch.LongTensor([[1, 2, 3], [2, 3, 1], [4, 5, 6], [3, 3, 3], [2, 1, 2],
[6, 7, 8], [2, 5, 2], [3, 5, 8], [2, 3, 6], [8, 9, 6],
[2, 6, 3], [6, 5, 4], [2, 6, 5]]) # (13, 3)
c = torch.LongTensor([[1, 2, 3, 2, 1, 2, 3, 3, 3, 3, 3],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]]) # (2, 11)
When I run a, b, and c through the embedding variable, I get embedded results of shapes (2, 4, 3), (13, 3, 3), (2, 11, 3).
What's confusing me is that I thought of the number of samples we have exceeds the predefined number of embeddings, we should get an error? Since the embedding I've defined has 10 embeddings, shouldn't b give me an error since it is a tensor containing 13 words of dimension 3?
In your case, here is how your input tensor are interpreted:
a = torch.LongTensor([[1, 2, 3, 4], [4, 3, 2, 1]]) # 2 sequences of 4 elements
Moreover, this is how your embedding layer is interpreted:
embedding = nn.Embedding(num_embeddings=10, embedding_dim=3) # 10 distinct elements and each those is going to be embedded in a 3 dimensional space
So, it doesn't matter if your input tensor has more than 10 elements, as long as they are in the range [0, 9]. For example, if we create a tensor of two elements such as:
d = torch.LongTensor([[1, 10]]) # 1 sequence of 2 elements
We would get the following error when we pass this tensor through the embedding layer:
RuntimeError: index out of range: Tried to access index 10 out of table with 9 rows
To summarize num_embeddings is total number of unique elements in the vocabulary, and embedding_dim is the size of each embedded vector once passed through the embedding layer. Therefore, you can have a tensor of 10+ elements, as long as each element in the tensor is in the range [0, 9], because you defined a vocabulary size of 10 elements.

regarding transform a one way list into a two-dimensional array representing a mesh grid

I have a data set saved as following. Generally speaking, it is a list, each element of this master list is a sublist. Each sublist includes two elements, where the first one is a value, and the second one is ID.
[[0.089, 0],
[0.075, 1],
[0.588, 2],
[0.906, 3],
[0.332, 4],
[0.707, 5],
[0.668, 6],
[0.426, 7],
[0.034, 8]]
The above test data set can be generated using the following code segment
import numpy as np
testlist=[]
for i in range(9):
temp=[]
x1 = np.random.rand()
temp.append(x1)
temp.append(i)
testlist.append(temp)
How to transfer this list into the two-dimensional array representing a mesh. For instance, the values will be arranged in this two-dimensional array
0.089 0.045 0.588
0.907 0.332 0.707
0.668 0.426 0.034
Is this what you want?
the reshape will change array's shape, -1 in shape means the value will be infered by numpy itself
arr = np.array([[0.089, 0],
[0.075, 1],
[0.588, 2],
[0.906, 3],
[0.332, 4],
[0.707, 5],
[0.668, 6],
[0.426, 7],
[0.034, 8]])
arr[:,0].reshape(-1,3).copy()
Result
array([[0.089, 0.075, 0.588],
[0.906, 0.332, 0.707],
[0.668, 0.426, 0.034]])

python numpy stack matrices and add specific corner/column entries

Say we have two matrices A and B with a size of 2 by 2. Is there a command that can stack them horizontally and add A[:,1] to B[:,0] so that the resulting matrix C is 2 by 3, with C[:,0] = A[:,0], C[:,1] = A[:,1] + B[:,0], C[:,2] = B[:,1]. One step further, stacking them on diagonal so that C[0:2,0:2] = A, C[1:2,1:2] = B, C[1,1] = A[1,1] + B[0,0]. C is 3 by 3 in this case. Hard coding this routine is not hard, but I'm just curious since MATLAB has a similar function if my memory serves me well.
A straight forward approach is to copy or add the two arrays to a target:
In [882]: A=np.arange(4).reshape(2,2)
In [883]: C=np.zeros((2,3),int)
In [884]: C[:,:-1]=A
In [885]: C[:,1:]+=A # or B
In [886]: C
Out[886]:
array([[0, 1, 1],
[2, 5, 3]])
Another approach is to to pad A at the end, pad B at the start, and sum; while there is a convenient pad function, it won't be any faster.
And for the diagonal
In [887]: C=np.zeros((3,3),int)
In [888]: C[:-1,:-1]=A
In [889]: C[1:,1:]+=A
In [890]: C
Out[890]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
Again the 2 arrays could be pad and added.
I'm not aware of any specialized function to do this; even if there were, it probably would do the same thing. This isn't a common enough operation to justify a compiled version.
I have built up finite element sparse matrices by adding over lapping element matrices. The sparse formats for both MATLAB and scipy facilitate this (duplicate coordinates are summed).
============
In [896]: np.pad(A,[[0,0],[0,1]],mode='constant')+np.pad(A,[[0,0],[1,0]],mode='
...: constant')
Out[896]:
array([[0, 1, 1],
[2, 5, 3]])
In [897]: np.pad(A,[[0,1],[0,1]],mode='constant')+np.pad(A,[[1,0],[1,0]],mode='
...: constant')
Out[897]:
array([[0, 1, 0],
[2, 3, 1],
[0, 2, 3]])
What's the special MATLAB code for doing this?
in Octave I found:
prepad(A,3,0,axis=2)+postpad(A,3,0,axis=2)

Resources