Related
I am trying to roll only the first n elements from my numpy axis instead of all. However, I am at a loss on how to accomplish this.
import numpy as np
foo = np.random.rand(32,3,16,16)
#Foo is a batch of 32 images, with 3 channels and a height, width of 16
print("Foo Shape = ", foo.shape)
#Foo Shape = (32, 3, 16, 16)
I would like to roll each first element of the second axis by 1 step. Basically roll the first channel of each image by 1.
np.roll(foo, 1, 1)
The code above rolls all the elements of the second axis (channel dimension) by 1, instead of just rolling the first element. I couldn't find any numpy functionality that helps with this issue.
Select only the elements you want using a 2D slice:
>>> import numpy as np
>>> arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
>>> print(arr)
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
>>> arr[:, 0] = np.roll(arr[:, 0], shift=1, axis=1)
>>> print(arr)
[[[2 1]
[3 4]]
[[6 5]
[7 8]]]
>>>
I am trying to convert the rows [0-1] of a matrix to representation in number (binary equivalent), the code I have is the following:
import numpy as np
def generate_binary_matrix(matrix):
result = []
for i in matrix:
val = '0b' + ''.join([str(x) for x in i])
result.append(int(val, 2))
result = np.array(result)
return result
initial_matrix = np.array([[0, 1, 0], [1, 0, 0], [0, 0, 1]])
result = generate_binary_matrix(initial_matrix )
print(result)
This code works but it is very slow, does anyone know how to do it in a faster way?
You can convert a 0/1 list to binary using just arithmetic, which should be faster:
from functools import reduce
b = reduce(lambda r, x: 2*r + x, i)
Suppose you matrix numpy array is A with m rows and n columns.
Create a b vector with nelements by:
b = np.power(2, np.arange(n))[::-1]
then your answer is A # b
Example:
import numpy as np
A = np.array([[0, 0, 1], [1, 0, 1]])
n = A.shape[1]
b = np.power(2, np.arange(n))[::-1]
print(A # b) # --> [1 5]
update - I reversed b as the MSB (2^n-1) is A[:,0] + power arguments were mistakenly flipped + add an example.
I have a row A = [0 1 2 3 4] and an index I = [0 0 1 0 1]. I would like to extract the elements in A indexed by I, i.e. [2, 4].
My attempt:
import numpy as np
A = np.array([0, 1, 2, 3, 4])
index = np.array([0, 0, 1, 0, 1])
print(A[index])
The result is not as I expected:
[0 0 1 0 1]
Could you please elaborate on how to achieve my goal?
I think you want boolean indexing:
A[index.astype(bool)]
# array([2, 4])
A non-numpy way to achieve this, in case its useful - it uses zip to combine each pair of elements, and returns the first if the second is true:
[x[0] for x in zip(a, i) if x[1]]
I'm trying to write a function for a euclidean minimum spanning tree, where I have run into trouble is finding K nearest neighbor, as you can see I call the function that returns a sparse array tat contains the the indexes and distance to its nearest neighbor, however I can not access the elements as I assumed I would:
for p1,p2, w in A:
do things
as this returns an error that A only returns 1 item(not 3). Is there a way to access the elements of each within this data set to form edges with the distance as weight? I am pretty new to python and still trying to learn all of finer details of the language.
from sklearn.neighbors import kneighbors_graph
from kruskalsalgorithm import *
import networkx as nx
def EMST(inlist):
graph = nx.Graph()
for a,b in inlist:
graph.add_node((a,b))
print("nodes = ", graph.nodes())
A = kneighbors_graph(graph.nodes(),1,mode='distance', metric='euclidean',include_self=False,n_jobs=-1)
A.toarray()
This is how I am testing my function
mylist = [[2,3],[4,2],[9,4],[3,1]]
EMST(mylist)
and my output is:
nodes = [(2, 3), (4, 2), (9, 4), (3, 1)]
(0, 1) 2.2360679775
(1, 3) 1.41421356237
(2, 1) 5.38516480713
(3, 1) 1.41421356237
You did not really explain what exactly you want to do. There are a lot of potential things imaginable.
But in general you should follow the docs # scipy.sparse. In your case, sklearn's function guarantees the csr_format.
One potential usage is something like:
from scipy import sparse as sp
import numpy as np
np.random.seed(1)
mat = sp.random(4,4, density=0.4)
print(mat)
I, J, V = sp.find(mat)
print(I)
print(J)
print(V)
Output:
(3, 0) 0.846310916686
(1, 3) 0.313273516932
(3, 1) 0.524548159573
(2, 0) 0.44345289378
(2, 1) 0.22957721373
(2, 2) 0.534413908947
[2 3 2 3 2 1]
[0 0 1 1 2 3]
[ 0.44345289 0.84631092 0.22957721 0.52454816 0.53441391 0.31327352]
Of course you could do:
for a, b, w in zip(I, J, V):
print(a, b, w)
which prints:
2 0 0.44345289378
3 0 0.846310916686
2 1 0.22957721373
3 1 0.524548159573
2 2 0.534413908947
1 3 0.313273516932
I can recreate your display with:
In [65]: from scipy import sparse
In [72]: row = np.array([0,1,2,3])
In [73]: col = np.array([1,3,1,1])
In [74]: data = np.array([5,2,29,2])**.5
In [75]: M = sparse.csr_matrix((data, (row, col)), shape=(4,4))
In [76]: M
Out[76]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in Compressed Sparse Row format>
In [77]: print(M)
(0, 1) 2.23606797749979
(1, 3) 1.4142135623730951
(2, 1) 5.385164807134504
(3, 1) 1.4142135623730951
In [78]: M.A # M.toarray()
Out[78]:
array([[0. , 2.23606798, 0. , 0. ],
[0. , 0. , 0. , 1.41421356],
[0. , 5.38516481, 0. , 0. ],
[0. , 1.41421356, 0. , 0. ]])
pts=[(2, 3), (4, 2), (9, 4), (3, 1)]'. Distance frompts[0] to pts[1]issqrt(5)`, etc.
Sparse coo format gives access to the coordinates and distances. sparse.find also produces these arrays.
In [83]: Mc = M.tocoo()
In [84]: Mc.row
Out[84]: array([0, 1, 2, 3], dtype=int32)
In [85]: Mc.col
Out[85]: array([1, 3, 1, 1], dtype=int32)
In [86]: Mc.data
Out[86]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
Checking the point and matrix match:
In [95]: pts = np.array([(2, 3), (4, 2), (9, 4), (3, 1)])
In [96]: pts
Out[96]:
array([[2, 3],
[4, 2],
[9, 4],
[3, 1]])
In [97]: for r,c,d in zip(*sparse.find(M)):
...: print(((pts[r]-pts[c])**2).sum()**.5)
...:
2.23606797749979
5.385164807134504
1.4142135623730951
1.4142135623730951
Or getting all closest distances at once:
In [107]: np.sqrt(((pts[row,:]-pts[col,:])**2).sum(1))
Out[107]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [110]: np.linalg.norm(pts[row,:]-pts[col,:],axis=1)
Out[110]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
A 'brute force' minimum distance calc:
All pairwise distances:
In [112]: dist = np.linalg.norm(pts[None,:,:]-pts[:,None,:],axis=2)
In [113]: dist
Out[113]:
array([[0. , 2.23606798, 7.07106781, 2.23606798],
[2.23606798, 0. , 5.38516481, 1.41421356],
[7.07106781, 5.38516481, 0. , 6.70820393],
[2.23606798, 1.41421356, 6.70820393, 0. ]])
(compare this with Out[78])
'blank' out the diagonal
In [114]: D = dist + np.eye(4)*100
Minimum distance and coordinates (by row):
In [116]: np.min(D, axis=1)
Out[116]: array([2.23606798, 1.41421356, 5.38516481, 1.41421356])
In [117]: np.argmin(D, axis=1)
Out[117]: array([1, 3, 1, 1], dtype=int32)
For example, I have the following arrays:
x = [0, 1, 2, 3, 4.5, 5]
y = [2, 8, 3, 7, 8, 1]
I would like to be able to do the following given x:
>>> what_is_y_when_x_is(2)
(2, 3)
>>> what_is_y_when_x_is(3.1) # Perhaps set rules to round to nearest (or up or down)
(3, 7)
On the other hand, when given y:
>>> what_is_x_when_y_is(2)
(0, 2)
>>> what_is_x_when_y_is(max(y))
([1, 4.5], 8)
The circumstances of this problem
I could have plotted y versus x using a closed analytical function, which should be very easy by just calling foo_function(x). However, I'm running numerical simulations whose data plots do not have closed analytical solutions.
Attempted solution
I've tackled similar problems before and approached them roughly this way:
what_is_y_when_x_is(some_x)
Search the array x for some_x.
Get its index, i.
Pick up y[i].
Question
Is there a better way to do this? Perhaps a built-in numpy function or a better algorithm?
You should look at numpy.searchsorted and also numpy.interp. Both of those look like they might do the trick. Here is an example:
import numpy as np
x = np.array([0, 1, 2, 3, 4.5, 5])
y = np.array([2, 8, 3, 7, 8, 1])
# y should be sorted for both of these methods
order = y.argsort()
y = y[order]
x = x[order]
def what_is_x_when_y_is(input, x, y):
return x[y.searchsorted(input, 'left')]
def interp_x_from_y(input, x, y):
return np.interp(input, y, x)
print what_is_x_when_y_is(7, x, y)
# 3
print interp_x_from_y(1.5, x, y)
# 2.5
You could use the bisect module for this. This is pure python - no numpy here:
>>> x = [0, 1, 2, 3, 4.5, 5]
>>> y = [2, 8, 3, 7, 8, 1]
>>> x_lookup = sorted(zip(x, y))
>>> y_lookup = sorted(map(tuple, map(reversed, zip(x, y))))
>>>
>>> import bisect
>>> def pair_from_x(x):
... return x_lookup[min(bisect.bisect_left(x_lookup, (x,)), len(x_lookup)-1)]
...
>>> def pair_from_y(y):
... return tuple(reversed(y_lookup[min(bisect.bisect_left(y_lookup, (y,)), len(y_lookup)-1)]))
...
And some examples of using it:
>>> pair_from_x(0)
(0, 2)
>>> pair_from_x(-2)
(0, 2)
>>> pair_from_x(2)
(2, 3)
>>> pair_from_x(3)
(3, 7)
>>> pair_from_x(7)
(5, 1)
>>>
>>> pair_from_y(0)
(5, 1)
>>> pair_from_y(1)
(5, 1)
>>> pair_from_y(3)
(2, 3)
>>> pair_from_y(4)
(3, 7)
>>> pair_from_y(8)
(1, 8)
The way you described is, as far as I'm considered, a good way. I'm not sure if you are, but I think you could use the .index(...) method on your array:
>>> li
['I', 'hope', 'this', 'answer', 'helps', 'you']
>>> li.index("hope")
1
Other than that, you might want to consider one array op "Points" which have an x and a y, though I'm not sure if this is possible of course. That way you won't have to keep two arrays in sync (same number of elements).
I don't see any problem with your pipeline. You can write a snippet base on numpy.where to implement it efficiently. Note that you will have to pass your lists as numpy arrays first (this can be included in the function).
Below is an example of a function doing the job, with an option to round the target (I have included a which array argument, so everything can be done in just one function, whatever you want to search in x or y). Note that one of the output will be a numpy array, so modify it to convert it to anything you want (liste, tuple, etc ...).
import numpy as np
def pick(x_array, y_array, target, which_array='x', round=True):
# ensure that x and y are numpy arrays
x_array, y_array = np.array(x_array), np.array(y_array)
# optional: round to the nearest. True by default
if round==True:
target = np.round(target)
if which_array == 'x': # look for the target in x_array
return target, y_array[np.where(x_array == target)[0]]
if which_array == 'y': # look for the target in y_array
return x_array[np.where(y_array == target)[0]], target
Results given by your examples:
# >>> what_is_y_when_x_is(2)
pick(x, y, 2, 'x')
(2, array([3]))
# >>> what_is_y_when_x_is(3.1)
pick(x, y, 3.1, 'x')
3.0, array([7]))
# >>> what_is_y_when_x_is(2)
pick(x, y, 2, 'y')
(array([ 0.]), 2)
# >>> what_is_x_when_y_is(max(y))
pick(x, y, max(y), 'y')
(array([ 1. , 4.5]), 8)
Here is a revised version of code provided by # Bi Rico:
import numpy as np
x = np.array([0, 1, 2, 3, 4.5, 5])
y = np.array([2, 8, 3, 7, 8, 1])
# y should be sorted for both of these methods
order = np.argsort(y)
y = y[order]
x = x[order]
def what_is_x_when_y_is(input, x, y):
return x[y.searchsorted(input, 'left')]
print(what_is_x_when_y_is(7, x, y))
# 3
This worked for me:
def what_is_y_when_x_is(value, x, y, tolerance=1e-3):
return [(xi, yi) for (xi, yi) in zip(x, y) if abs(xi - value) <= tolerance]
Notice that rather than comparing for equality the code above performs a "close enough" equality test. The default tolerance is set up to 0.001 (you can use any other value). Here are some examples of use:
>>> x = [0, 1, 2, 3, 4.5, 5]
>>> y = [2, 8, 3, 7, 8, 1]
>>> what_is_y_when_x_is(0, x, y)
[(0, 2)]
>>> what_is_y_when_x_is(1, x, y, tolerance=.1)
[(1, 8)]
>>> what_is_y_when_x_is(2, x, y, tolerance=1)
[(1, 8), (2, 3), (3, 7)]
>>> what_is_y_when_x_is(4, x, y, tolerance=.5)
[(4.5, 8)]