Python declaring a numpy matrix of lists of lists - python-3.x

I would like to have a numpy matrix that looks like this
[int, [[int,int]]]
I receive an error that looks like this "ValueError: setting an array element with a sequence."
below is the declaration
def __init__(self):
self.path=np.zeros((1, 2))
I attempt to assign a value to this in the line below
routes_traveled.path[0, 1]=[loc]
loc is a list and routes_traveled is the object

Do you want a higher dimensional array, say 3d, or do you really want a 2d array whose elements are Python lists. Real lists, not numpy arrays?
One way to put lists in to an array is to use dtype=object:
In [71]: routes=np.zeros((1,2),dtype=object)
In [72]: routes[0,1]=[1,2,3]
In [73]: routes[0,0]=[4,5]
In [74]: routes
Out[74]: array([[[4, 5], [1, 2, 3]]], dtype=object)
One term of this array is 2 element list, the other a 3 element list.
I could have created the same thing directly:
In [76]: np.array([[[4,5],[1,2,3]]])
Out[76]: array([[[4, 5], [1, 2, 3]]], dtype=object)
But if I'd given it 2 lists of the same length, I'd get a 3d array:
In [77]: routes1=np.array([[[4,5,6],[1,2,3]]])
Out[77]:
array([[[4, 5, 6],
[1, 2, 3]]])
I could index the last, routes1[0,1], and get an array: array([1, 2, 3]), where as routes[0,1] gives [1, 2, 3].
In this case you need to be clear where you talking about arrays, subarrays, and Python lists.
With dtype=object, the elements can be anything - lists, dictionaries, numbers, strings
In [84]: routes[0,0]=3
In [85]: routes
Out[85]: array([[3, [1, 2, 3]]], dtype=object)
Just be ware that such an array looses a lot of the functionality that a purely numeric array has. What the array actually contains is pointers to Python objects - just a slight generalization of Python lists.

Did you want to create an array of zeros with shape (1, 2)? In that case use np.zeros((1, 2)).
In [118]: np.zeros((1, 2))
Out[118]: array([[ 0., 0.]])
In contrast, np.zeros(1, 2) raises TypeError:
In [117]: np.zeros(1, 2)
TypeError: data type not understood
because the second argument to np.zeros is supposed to be the dtype, and 2 is not a value dtype.
Or, to create a 1-dimensional array with a custom dtype consisting of an int and a pair of ints, you could use
In [120]: np.zeros((2,), dtype=[('x', 'i4'), ('y', '2i4')])
Out[120]:
array([(0, [0, 0]), (0, [0, 0])],
dtype=[('x', '<i4'), ('y', '<i4', (2,))])
I wouldn't recommend this though. If the values are all ints, I think you would be better off with a simple ndarray with homogeneous integer dtype, perhaps of shape (nrows, 3):
In [121]: np.zeros((2, 3), dtype='<i4')
Out[121]:
array([[0, 0, 0],
[0, 0, 0]], dtype=int32)
Generally I find using an array with a simple dtype makes many operations from building the array to slicing and reshaping easier.

Related

What is the difference for the indexing between x[:] [1] and x [:,1]?

I have a hard time understanding the difference between these two kinds of indexing.
Let's say I have a nested nested list:
x = np.array([[[1,2],[5,6]],[[9,7],[12,23]]])
if I did
x[:][:][1] and x[:,:,1]
I would get
[[9 7][12 23]]
[[5 6][12 23]]
respectively.
To be honest, I have no clue as to how I would get these results. Could someone explain the steps to me as to how I would get these arrays ?
This has to do with python's slice syntax. Essentially, obj[a:b:c] is syntactic shorthand for obj.__getitem__(slice(a,b,c)).
x[:] simply returns a 'full slice' of x - that is, it returns an exact copy of x. Doing x[:][:][1] is no different from doing x[1].
Meanwhile, doing x[:,:,1] equates to:
x.__getitem__((slice(), slice(), 1))
that is, using a 3-tuple as an index. For an ordinary python list, this would fail, but Numpy accepts it gracefully. To see how Numpy does so, let's look a bit closer at this example:
>>> x = np.array([[[1,2],[5,6]],[[9,7],[12,23]]])
>>> x[1]
array([[ 9, 7],
[12, 23]])
>>> x[:,1]
array([[ 5, 6],
[12, 23]])
>>> x[:,:,1]
array([[ 2, 6],
[ 7, 23]])
>>> x[:,:,:,1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array: array is 3-dimensional, but 4 were indexed
We can see a pattern.
When you give a Numpy array a tuple as an index, it maps each element of the tuple to a dimension, and then applies the subsequent elements of the tuple to each of those dimensions. In short:
x[1] just gets the element at index 1 from the first dimension of the array. This is the single element [[9, 7], [12, 23]]
x[:, 1] gets the element at index 1 from each element in the first dimension of the array. Which is to say, it gets the elements at index 1 from the second dimension of the array. This is two elements: [5, 6] and [12, 23]. Numpy groups them together in a list.
x[:, :, 1] follows the same pattern - it gets the elements at index 1 from the third dimension of the array. This time there are four unique elements: 2 and 6 from the first element in the second dimension, and, 7 and 23 from the second element in the second dimension. Numpy groups them by dimension in a nested list.
x[:, :, :, 1] fails, because the array only has three dimensions - there's no way to further subdivide any of the third dimension's elements.

How do I get the array with maximum value from an array of arrays only considering one specific elememnt per array

I have an array like this:
[[0, 46.0], [1, 83.0], [2, 111.0], [3, 18.0], [4, 37.0], [5, 55.0], [6, 0.0], [7, 9.0], [8, 9.0]]
I want to return the array where the second element is the highest. In this example: [2, 111]
How can I do that?
Until now I have tried numpy.amax(array, axis=0) and numpy.amax(array, axis=1). But they do not consider my condition, that I just want to consider the last element per array.
You could use max with a key argument:
result = max(arr, key = lambda x : x[1])
If you array is an numpy array, you can use np.argmax to get the index and slice:
a[np.argmax(a[:,1])]
# array([ 2., 111.])

I am having trouble multiplying two matrices with numpy

I am trying to use numpy to multiply two matrices:
import numpy as np
A = np.array([[1, 3, 2], [4, 0, 1]])
B = np.array([[1, 0, 5], [3, 1, 2]])
I tested the process and ran the calculations manually, utilizing the formula for matrix multiplications. So, in this case, I would first multiply [1, 0, 5] x A, which resulted in [11, 9] and then multiply [3, 1, 2] x B, which resulted in [10, 14]. Finally, the product of this multiplication is [[11, 9], [10, 14]]
nevertheless, when I use numpy to multiply these matrices, I am getting an error:
ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
Is there a way to do this with python, successfully?
Read the docs on matrix multiplication in numpy, specifically on behaviours.
The behavior depends on the arguments in the following way.
If both arguments are 2-D they are multiplied like conventional
matrices. If either argument is N-D, N > 2, it is treated as a stack
of matrices residing in the last two indexes and broadcast
accordingly. If the first argument is 1-D, it is promoted to a matrix
by prepending a 1 to its dimensions. After matrix multiplication the
prepended 1 is removed. If the second argument is 1-D, it is promoted
to a matrix by appending a 1 to its dimensions. After matrix
multiplication the appended 1 is removed.
to get your output, try transposing one before multiplying?
c=np.matmul(A,B.transpose())
array([[11, 10],
[ 9, 14]])

harmonic mean for nested list the contains some negative values

I have to find the harmonic mean of the nested list that contains some negative values. I know harmonicmean is only used for positive values, so what can I do to compute harmonic mean of my list?
I tried this:
x=[['a', 1, -3, 5], ['b', -2, 6, 8], ['c', 3, 7, -9]]
import statistics as s
y=[s.harmonicmean(i[1:]) for i in x1]
but I get statistics.statisticserror for the negative values.
You probably want to use filter
filter will iterate over a copy of a list, or anything that's iterable, while filtering out elements that don't satisfy a specific condition. Keep in mind I said "copy;" it doesn't mutate the iterable you pass to it.
for example:
>>> numbers = [-1, 2, 3]
>>> filter(lambda i: i >= 0, numbers)
[2, 3]
or if you just want absolute values, you can use map which will iterate over a copy of a list, or anything that's iterable, while applying a function to each element:
>>> map(abs, numbers)
[1, 2, 3]

Get top-n items of every row in a scipy sparse matrix

After reading this similar question, I still can't fully understand how to go about implementing the solution im looking for. I have a sparse matrix, i.e.:
import numpy as np
from scipy import sparse
arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])
arr_csc = sparse.csc_matrix(arr)
I would like to efficiently get the top n items of each row, without converting the sparse matrix to dense.
The end result should look like this (assuming n=2):
top_n_arr = np.array([[0,5,3,0,0],[6,0,0,9,0],[0,0,0,6,8]])
top_n_arr_csc = sparse.csc_matrix(top_n_arr)
What is wrong with the linked answer? Does it not work in your case? or you just don't understand it? Or it isn't efficient enough?
I was going to suggest working out a means of finding the top values for a row of an lil format matrix, and apply that row by row. But I would just be repeating my earlier answer.
OK, my previous answer was a start, but lacked some details on iterating through the lol format. Here's a start; it probably could be cleaned up.
Make the array, and a lil version:
In [42]: arr = np.array([[0,5,3,0,2],[6,0,4,9,0],[0,0,0,6,8]])
In [43]: arr_sp=sparse.csc_matrix(arr)
In [44]: arr_ll=arr_sp.tolil()
The row function from the previous answer:
def max_n(row_data, row_indices, n):
i = row_data.argsort()[-n:]
# i = row_data.argpartition(-n)[-n:]
top_values = row_data[i]
top_indices = row_indices[i] # do the sparse indices matter?
return top_values, top_indices, i
Iterate over the rows of arr_ll, apply this function and replace the elements:
In [46]: for i in range(arr_ll.shape[0]):
d,r=max_n(np.array(arr_ll.data[i]),np.array(arr_ll.rows[i]),2)[:2]
arr_ll.data[i]=d.tolist()
arr_ll.rows[i]=r.tolist()
....:
In [47]: arr_ll.data
Out[47]: array([[3, 5], [6, 9], [6, 8]], dtype=object)
In [48]: arr_ll.rows
Out[48]: array([[2, 1], [0, 3], [3, 4]], dtype=object)
In [49]: arr_ll.tocsc().A
Out[49]:
array([[0, 5, 3, 0, 0],
[6, 0, 0, 9, 0],
[0, 0, 0, 6, 8]])
In the lil format, the data is stored in 2 object type arrays, as sublists, one with the data numbers, the other with the column indices.
Viewing the data attributes of sparse matrix is handy when doing new things. Changing those attributes has some risk, since it mess up the whole array. But it looks like the lil format can be tweaked like this safely.
The csr format is better for accessing rows than csc. It's data is stored in 3 arrays, data, indices and indptr. The lil format effectively splits 2 of those arrays into sublists based on information in the indptr. csr is great for math (multiplication, addition etc), but not so good when changing the sparsity (turning nonzero values into zeros).

Resources