Checking "Less than" with a potential very small value in Python - python-3.x

Good morning,
I had to find the index[i,j] of the greatest value of a matrix of size m x n. The following code works and passes all the test:
def biggest_value(matrix, rows, columns):
max_value = -99999999999999999999
index = 0
for i in range(0,rows):
for j in range(0,columns):
if max_num < matrix[i][j]:
max_num = matrix[i][j]
index = (i,j)
return index
However, since for some of the test the input is very small (ie -97969584948693858938939848, I was wondering how could I implement this function in a better way, so I cover any potential negative value the function could take as M argument.
Many thanks!

Using numpy's unravel_index and argmax:
import numpy as np
from numpy import unravel_index
arr = np.array([[1,23,2,24], [3,45,21,30]])
print(unravel_index(arr.argmax(), arr.shape))
# (1, 1)

You can use math.inf
import math
def biggest_value(matrix, rows, columns):
max_num = - math.inf
index = 0
for i in range(0,rows):
for j in range(0,columns):
if max_num < matrix[i][j]:
max_num = matrix[i][j]
index = (i,j)
return index

Related

How to vectorize a function of two matrices in numpy?

Say, I have a binary (adjacency) matrix A of dimensions nxn and another matrix U of dimensions nxl. I use the following piece of code to compute a new matrix that I need.
import numpy as np
from numpy import linalg as LA
new_U = np.zeros_like(U)
for idx, a in np.ndenumerate(A):
diff = U[idx[0], :] - U[idx[1], :]
if a == 1.0:
new_U[idx[0], :] += 2 * diff
elif a == 0.0:
norm_diff = LA.norm(U[idx[0], :] - U[idx[1], :])
new_U[idx[0], :] += -2 * diff * np.exp(-norm_diff**2)
return new_U
This takes quite a lot of time to run even when n and l are small. Is there a better way to rewrite (vectorize) this code to reduce the runtime?
Edit 1: Sample input and output.
A = np.array([[0,1,0], [1,0,1], [0,1,0]], dtype='float64')
U = np.array([[2,3], [4,5], [6,7]], dtype='float64')
new_U = np.array([[-4.,-4.], [0,0],[4,4]], dtype='float64')
Edit 2: In mathematical notation, I am trying to compute the following:
where u_ik = U[i, k],u_jk = U[j, k], and u_i = U[i, :]. Also, (i,j) \in E corresponds to a == 1.0 in the code.
Leveraging broadcasting and np.einsum for the sum-reductions -
# Get pair-wise differences between rows for all rows in a vectorized manner
Ud = U[:,None,:]-U
# Compute norm L1 values with those differences
L = LA.norm(Ud,axis=2)
# Compute 2 * diff values for all rows and mask it with ==0 condition
# and sum along axis=1 to simulate the accumulating behaviour
p1 = np.einsum('ijk,ij->ik',2*Ud,A==1.0)
# Similarly, compute for ==1 condition and finally sum those two parts
p2 = np.einsum('ijk,ij,ij->ik',-2*Ud,np.exp(-L**2),A==0.0)
out = p1+p2
Alternatively, use einsum for computing squared-norm values and using those to get p2 -
Lsq = np.einsum('ijk,ijk->ij',Ud,Ud)
p2 = np.einsum('ijk,ij,ij->ik',-2*Ud,np.exp(-Lsq),A==0.0)

Does SciPy.sparse.linalg.svds give matrix rank?

I have a largish sparse binary-valued rectangular matrix, M, where n > m. My understanding of matrix rank suggests the largest possible rank is m, and my understanding of SVD suggests the rank of a matrix can be found by identifying the number of non-zero singular values.
I'm attempting to use SciPy.sparse.linalg.svds to determine the rank of M. First problem is that I cannot compute m singular values since k can only go up to p = m - 1. So I thought I'd be clever and compute p highest values, the p lowest values, combine them, run set to find the unique values, and end up with a list of at most m values. This didn't work out according to plan.
Here's a MWE:
import scipy.sparse
import scipy.sparse.linalg
import numpy
import itertools
m = 6
n = 10
test = scipy.sparse.rand(m, n, density=0.25, format='lil', dtype=None, random_state=None)
for i, j in itertools.product(list(range(m)), list(range(n))):
test[i, j] = 1 if test[i, j] > 0 else 0
U1, S1, VT1 = scipy.sparse.linalg.svds(test, k = min(test.shape) - 1, ncv = None, tol = 1e-5, which = 'LM', v0 = None, maxiter = None,
return_singular_vectors = True)
U2, S2, VT2 = scipy.sparse.linalg.svds(test, k = min(test.shape) - 1, ncv = None, tol = 1e-5, which = 'SM', v0 = None, maxiter = None,
return_singular_vectors = True)
S = list(set(numpy.concatenate((S1, S2), axis = 0)))
len(S)
Here's a sample output:
10
with S being
[0.5303120147925737,
1.0725314055439354,
2.7940865631779643,
1.5060744813473148,
1.8412737686034186,
0.3208993522030293,
0.5303120147925728,
1.072531405543936,
1.5060744813473153,
1.841273768603419]
How can a m X n matrix with m < n have a rank of n? Are my assumptions above incorrect, or am I misapplying the function? My real M is sparse, binary-valued, and roughly 300 X 500.
Thanks for looking!
With help from #tch I've come up with the following hack. To check for rank = m, I only need check the smallest value, and append it to the m - 1 values obtained from the svds highest values function. It turns out svds doesn't report 0s when thresholded, so the lowest values function will return nan for rank < m. Here's the revised code:
import scipy.sparse
import scipy.sparse.linalg
import numpy
import itertools
m = 6
n = 10
test = scipy.sparse.rand(m, n, density=0.25, format='lil', dtype=None, random_state=None)
test = test > 0
test = test.astype('d')
U1, S1, VT1 = scipy.sparse.linalg.svds(test, k = min(test.shape) - 1, ncv = None, tol = 1e-5, which = 'LM', v0 = None, maxiter = None,
return_singular_vectors = True)
U2, S2, VT2 = scipy.sparse.linalg.svds(test, k = 1, ncv = None, tol = 1e-5, which = 'SM', v0 = None, maxiter = None,
return_singular_vectors = True)
S = list(set(numpy.concatenate((S1, S2), axis = 0)))
print(sum(x > 1e-10 for x in S))
S
What you are trying to do would work in exact arithmetic (assuming the matrix has no repeat singular values). However, due to numerical rounding errors, it won't work in practice.
To see this try
C = np.random.randn(10,3)
u,s,vt = np.linalg.svd(C#C.T)
Note that C#C.T is a 10x10 matrix with rank 3. However, you will see that none of the singular values are exactly zero (however 7 are close to 0).
When finding the rank of a matrix numerically, thresholding is often used to determine what it means for a singular value to be 0. For instance, everything below 1e-10 may be set to zero.
If the matrix has exact rank k, hopefully you will see k singular values away from 0, and then min(m,n)-k singular values very close to zero. However, depending on the matrix, there may not even be a well defined "drop".
So for your example, you could try removing elements which are within some threshold of one another. However this of course could run into issues if the matrix has repeat singular values.
You could just run the smallest singular values and see how many give you near zero. Presumably the matrix is at least rank ` so the first singular value will be nonzero.
As a note about finding where test[i,j] > 0, you can just to test>0 and it will give a boolean array with True in the nonzero entries and False elsewhere. You can also set the dtype of the random matrix to bool and it will be True whenever the random number is nonzero.

simpson integration on python

I am trying to integrate numerically using simpson integration rule for f(x) = 2x from 0 to 1, but keep getting a large error. The desired output is 1 but, the output from python is 1.334. Can someone help me find a solution to this problem?
thank you.
import numpy as np
def f(x):
return 2*x
def simpson(f,a,b,n):
x = np.linspace(a,b,n)
dx = (b-a)/n
for i in np.arange(1,n):
if i % 2 != 0:
y = 4*f(x)
elif i % 2 == 0:
y = 2*f(x)
return (f(a)+sum(y)+f(x)[-1])*dx/3
a = 0
b = 1
n = 1000
ans = simpson(f,a,b,n)
print(ans)
There is everything wrong. x is an array, everytime you call f(x), you are evaluating the function over the whole array. As n is even and n-1 odd, the y in the last loop is 4*f(x) and from its sum something is computed
Then n is the number of segments. The number of points is n+1. A correct implementation is
def simpson(f,a,b,n):
x = np.linspace(a,b,n+1)
y = f(x)
dx = x[1]-x[0]
return (y[0]+4*sum(y[1::2])+2*sum(y[2:-1:2])+y[-1])*dx/3
simpson(lambda x:2*x, 0, 1, 1000)
which then correctly returns 1.000. You might want to add a test if n is even, and increase it by one if that is not the case.
If you really want to keep the loop, you need to actually accumulate the sum inside the loop.
def simpson(f,a,b,n):
dx = (b-a)/n;
res = 0;
for i in range(1,n): res += f(a+i*dx)*(2 if i%2==0 else 4);
return (f(a)+f(b) + res)*dx/3;
simpson(lambda x:2*x, 0, 1, 1000)
But loops are generally slower than vectorized operations, so if you use numpy, use vectorized operations. Or just use directly scipy.integrate.simps.

Finding a Dot Product without using np.dot or loops in Python

I need to write a function which:
Receives - two numpy.array objects
Returns - the floating-point dot product of the two input
numpy arrays
Not allowed to use:
numpy.dot()
loops of any kind
Any suggestions?
A possible solution makes use of recursion
import numpy as np
def multiplier (first_vector, second_vector, size, index, total):
if index < size:
addendum = first_vector[index]*second_vector[index]
total = total + addendum
index = index + 1
# ongoing job
if index < size:
multiplier(first_vector, second_vector, size, index, total)
# job done
else:
print("dot product = " + str(total))
def main():
a = np.array([1.5, 2, 3.7])
b = np.array([3, 4.3, 5])
print(a, b)
i = 0
total_sum = 0
# check needed if the arrays are not hardcoded
if a.size == b.size:
multiplier(a, b, a.size, i, total_sum)
else:
print("impossible dot product for arrays with different size")
if __name__== "__main__":
main()
Probably considered cheating, but Python 3.5 added a matrix multiply operator that numpy uses to compute the dot product without actually calling np.dot:
>>> arr1 = np.array([1,2,3])
>>> arr2 = np.array([3,4,5])
>>> arr1 # arr2
26
Problem solved!

Smoothing values (neighbors between 1-9)

Instructions: Compute and store R=1000 random values from 0-1 as x. moving_window_average(x, n_neighbors) is pre-loaded into memory from 3a. Compute the moving window average for x for the range of n_neighbors 1-9. Store x as well as each of these averages as consecutive lists in a list called Y.
My solution:
R = 1000
n_neighbors = 9
x = [random.uniform(0,1) for i in range(R)]
Y = [moving_window_average(x, n_neighbors) for n_neighbors in range(1,n_neighbors)]
where moving_window_average(x, n_neighbors) is a function as follows:
def moving_window_average(x, n_neighbors=1):
n = len(x)
width = n_neighbors*2 + 1
x = [x[0]]*n_neighbors + x + [x[-1]]*n_neighbors
# To complete the function,
# return a list of the mean of values from i to i+width for all values i from 0 to n-1.
mean_values=[]
for i in range(1,n+1):
mean_values.append((x[i-1] + x[i] + x[i+1])/width)
return (mean_values)
This gives me an error, Check your usage of Y again. Even though I've tested for a few values, I did not get yet why there is a problem with this exercise. Did I just misunderstand something?
The instruction tells you to compute moving averages for all neighbors ranging from 1 to 9. So the below code should work:
import random
random.seed(1)
R = 1000
x = []
for i in range(R):
num = random.uniform(0,1)
x.append(num)
Y = []
Y.append(x)
for i in range(1,10):
mov_avg = moving_window_average(x, n_neighbors=i)
Y.append(mov_avg)
Actually your moving_window_average(list, n_neighbors) function is not going to work with a n_neighbors bigger than one, I mean, the interpreter won't say a thing, but you're not delivering correctness on what you have been asked.
I suggest you to use something like:
def moving_window_average(x, n_neighbors=1):
n = len(x)
width = n_neighbors*2 + 1
x = [x[0]]*n_neighbors + x + [x[-1]]*n_neighbors
mean_values = []
for i in range(n):
temp = x[i: i+width]
sum_= 0
for elm in temp:
sum_+= elm
mean_values.append(sum_ / width)
return mean_values
My solution for +100XP
import random
random.seed(1)
R=1000
Y = list()
x = [random.uniform(0, 1) for num in range(R)]
for n_neighbors in range(10):
Y.append(moving_window_average(x, n_neighbors))

Resources