Matrix Position - python-3.x

I need to figure out how to return the matrix position of the largest value in a given matrix. For example:
[[1,2,3],
[4,5,6],
[7,8,9]]
A simple method of finding the maximum of the matrix would be:
maximum = max(max(matrix))
return maximum
For this matrix, the maximum is the int value: 9.
However, I am slightly lost when it comes to finding the value's exact matrix position. I know that in matrices the upper-left corner is considered (0,0) and the values (i,j) (given that i,j ε int) are incremented by one each position further from (0,0)-- i increases horizontally and j increases vertically.
The correct output for this matrix should be (2,2).
Any pointers?

Using numpy:
import numpy as np
mat = np.arange(1,9).reshape(3,3) # Creates your example matrix
maxVal = np.amax(mat) # Returns 9 for your example
locMax = [np.where(mat == maxVal)[0][0],np.where(mat == maxVal)[1][0]] # Returns (2,2) as list

Related

What's a potentially better algorithm to solve this python nested for loop than the one I'm using?

I have a nested loop that has to loop through a huge amount of data.
Assuming a data frame with random values with a size of 1000,000 rows each has an X,Y location in 2D space. There is a window of 10 length that go through all the 1M data rows one by one till all the calculations are done.
Explaining what the code is supposed to do:
Each row represents a coordinates in X-Y plane.
r_test is containing the diameters of different circles of investigations in our 2D plane (X-Y plane).
For each 10 points/rows, for every single diameter in r_test, we compare the distance between every point with the remaining 9 points and if the value is less than R we add 2 to H. Then we calculate H/(N**5) and store it in c_10 with the index corresponding to that of the diameter of investigation.
For this first 10 points finally when the loop went through all those diameters in r_test, we read the slope of the fitted line and save it to S_wind[ii]. So the first 9 data points will have no value calculated for them thus giving them np.inf to be distinguished later.
Then the window moves one point down the rows and repeat this process till S_wind is completed.
What's a potentially better algorithm to solve this than the one I'm using? in python 3.x?
Many thanks in advance!
import numpy as np
import pandas as pd
####generating input data frame
df = pd.DataFrame(data = np.random.randint(2000, 6000, (1000000, 2)))
df.columns= ['X','Y']
####====creating upper and lower bound for the diameter of the investigation circles
x_range =max(df['X']) - min(df['X'])
y_range = max(df['Y']) - min(df['Y'])
R = max(x_range,y_range)/20
d = 2
N = 10 #### Number of points in each window
#r1 = 2*R*(1/N)**(1/d)
#r2 = (R)/(1+d)
#r_test = np.arange(r1, r2, 0.05)
##===avoiding generation of empty r_test
r1 = 80
r2= 800
r_test = np.arange(r1, r2, 5)
S_wind = np.zeros(len(df['X'])) + np.inf
for ii in range (10,len(df['X'])): #### maybe the code run slower because of using len() function instead of a number
c_10 = np.zeros(len(r_test)) +np.inf
H = 0
C = 0
N = 10 ##### maybe I should also remove this
for ind in range(len(r_test)):
for i in range (ii-10,ii):
for j in range(ii-10,ii):
dd = r_test[ind] - np.sqrt((df['X'][i] - df['X'][j])**2+ (df['Y'][i] - df['Y'][j])**2)
if dd > 0:
H += 1
c_10[ind] = (H/(N**2))
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0]
You can use numpy broadcasting to eliminate all of the inner loops. I'm not sure if there's an easy way to get rid of the outermost loop, but the others are not too hard to avoid.
The inner loops are comparing ten 2D points against each other in pairs. That's just dying for using a 10x10x2 numpy array:
# replacing the `for ind` loop and its contents:
points = np.hstack((np.asarray(df['X'])[ii-10:ii, None], np.asarray(df['Y'])[ii-10:ii, None]))
differences = np.subtract(points[None, :, :], points[:, None, :]) # broadcast to 10x10x2
squared_distances = (differences * differences).sum(axis=2)
within_range = squared_distances[None,:,:] < (r_test*r_test)[:, None, None] # compare squares
c_10 = within_range.sum(axis=(1,2)).cumsum() * 2 / (N**2)
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0] # this is unchanged...
I'm not very pandas savvy, so there's probably a better way to get the X and Y values into a single 2-dimensional numpy array. You generated the random data in the format that I'd find most useful, then converted into something less immediately useful for numeric operations!
Note that this code matches the output of your loop code. I'm not sure that's actually doing what you want it to do, as there are several slightly strange things in your current code. For example, you may not want the cumsum in my code, which corresponds to only re-initializing H to zero in the outermost loop. If you don't want the matches for smaller values of r_test to be counted again for the larger values, you can skip that sum (or equivalently, move the H = 0 line to in between the for ind and the for i loops in your original code).

kth-value per row in pytorch?

Given
import torch
A = torch.rand(9).view((3,3)) # tensor([[0.7455, 0.7736, 0.1772],\n[0.6646, 0.4191, 0.6602],\n[0.0818, 0.8079, 0.6424]])
k = torch.tensor([0,1,0])
A.kthvalue_vectoriezed(k) -> [0.1772,0.6602,0.0818]
Meaning I would like to operate on each column with a different k.
Not kthvalue nor topk offers such API.
Is there a vectorized way around that?
Remark - kth value is not the value in the kth index, but the kth smallest element. Pytorch docs
torch.kthvalue(input, k, dim=None, keepdim=False, out=None) -> (Tensor, LongTensor)
Returns a namedtuple (values, indices) where values is the k th smallest element of each row of the input tensor in the given dimension dim. And indices is the index location of each element found.
Assuming you don't need indices into original matrix (if you do, just use fancy indexing for the second return value as well) you could simply sort the values (by last index by default) and return appropriate values like so:
def kth_smallest(tensor, indices):
tensor_sorted, _ = torch.sort(tensor)
return tensor_sorted[torch.arange(len(indices)), indices]
And this test case gives you your desired values:
tensor = torch.tensor(
[[0.7455, 0.7736, 0.1772], [0.6646, 0.4191, 0.6602], [0.0818, 0.8079, 0.6424]]
)
print(kth_smallest(tensor, [0, 1, 0])) # -> [0.1772,0.6602,0.0818]

How to calculate two orthogonal points of a line?

I need two new points, which are on a new orthogonal line through point 1 and in distance of meters s and minus s for the other direction. The new orthogonal line is orthogonal to a line given by two points shown in "coords".
I have tried to reuse results from here and here, but both example are somehow different. These examples state that I should work with the vector from the line and that the orthognal of the vector m is given by -1/m or a new point by y = (-1/m)x + b
import math as m
coords=([5,5], [5,6])
print (coords)
x1,y1=coords[0]
x2,y2=coords[1]
s= 5
veclen= m.sqrt(m.pow(x2-x1,2)+m.pow(y2-y1,2))
u=(x2-x1)/veclen
v=(y2-y1)/veclen
print ("u,v:", u,v)
dir1 = (v, -u)
dir2 = (-v, u)
newpoint1=(x1+ s*dir1[0], y1+ s*dir1[1])
newpoint2=(x1+ s*dir2[0], y1+ s*dir2[1])
print (newpoint1, newpoint2)
xn,yn=newpoint1
dist = m.hypot(xn-x1, yn-y1)
print (dist)
This is maybe the right direction, but somehow I do not understand the derived vector 1 (v) and the orthogonal vector (v2) and how to add from point x1,y1 the distance s. Should the vector 1 not be (1,1), as in +1 in x-direction and +1 in y-direction? And likewise the orthogonal vector 2 (1, -1) as in +1 in x and -1 in y?
And is the calculation of both newpoints correct?
I will assume that this is the problem, in one sentence:
Code a routine that is given a tuple coords containing two 2-dimensional points and also given a positive number s, and the routine returns two other distinct points such that the line segment between each output point and coords[0] is orthogonal (perpendicular) to the line segment between coords[0] and coords[1] and the distance from each output point to coords[0] is s.
Now for your questions.
The 2-tuple v represents the vector of length one (the "unit vector") that is parallel to the vector from point coords[0] to point coords[1]. It is found by first subtracting the coordinates of the two points in coords, but that vecto will probably have the wrong length. So your code beforehand found the length of that vector in variable l (a terrible name for a variable) and divides the vector by l. Mathematics tells us that the resulting vector is parallel to the original vector and has length one.
Your code then tries to find a perpendicular unit vector. It fails in two ways. First, it does not use the unit vector; it uses the original vector instead. Second, the new vector is not necessarily perpendicular. Your code says a vector perpendicular to (u, v) is (-u, v), but actually the perpendicular vector is either (v, -u) or (-v, u)--note the swapped coordinates. This new vector is both perpendicular to the previous vector and has the same length.
Therefore the calculation of the two new points is not correct.
I have answered your given questions--let me know if you need code that actually does what you want. Note that you should improve your code by using longer, descriptive variable names and comments and by wrapping up the code into a function. The function should return the points, while the calling routine could print the results.
Here is my code that satisfies your problem. I reduced the amount of printing as checks--you can print more checks, if you like. I also combined some lines, since too many separate computation lines can worsen the accuracy of floating-point calculations. I never compute a unit vector--I go straight to a vector of the desired length.
import math
def orthogonal_points(coords, s):
"""Given a tuple coords containing two 2-dimensional points and also
given a positive number s, return two other distinct points such
that the line segment between each output point and coords[0] is
orthogonal (perpendicular) to the line segment between coords[0] and
coords[1] and the distance from each output point to coords[0] is s.
"""
(point1x, point1y), (point2x, point2y) = coords
points_vectorx, points_vectory = point2x - point1x, point2y - point1y
points_vector_length = math.hypot(points_vectorx, points_vectory)
normalized_x, normalized_y = (points_vectorx * s / points_vector_length,
points_vectory * s / points_vector_length)
newpoint1x, newpoint1y = point1x + normalized_y, point1y - normalized_x
newpoint2x, newpoint2y = point1x - normalized_y, point1y + normalized_x
return ([newpoint1x, newpoint1y], [newpoint2x, newpoint2y])
coords=([5,5], [5,6])
s= 5
print (coords, s)
print (orthogonal_points(coords, s))
The output from that is correct:
([5, 5], [5, 6]) 5
([10.0, 5.0], [0.0, 5.0])

How to filter unwanted values in arrays for plotting? ValueError in matplotlib using numpy arrays

I am working on a new routine inside some codes based on OOP, and encountered a problem while modifying the array of the data (short example of the code is below).
Basically, this routine is about taking the array R, transposing it and then sorting it, and then filter out the data below the pre-determined value of thres. Then, I re-transpose back this array into its original dimension, and then plot each of its rows with the first element of T.
import numpy as np
import matplotlib.pyplot as plt
R = np.random.rand(3,8)
R = R.transpose() # transpose the random matrix
R = R[R[:,0].argsort()] # sort this matrix
print(R)
T = ([i for i in np.arange(1,9,1.0)],"temps (min)")
thres = float(input("Define the threshold of coherence: "))
if thres >= 0.0 and thres <= 1.0 :
R = R[R[:, 0] >= thres] # how to filter unwanted values? changing to NaN / zeros ?
else :
print("The coherence value is absurd or you're not giving a number!")
print("The final results are ")
print(R)
print(R.transpose())
R.transpose() # re-transpose this matrix
ax = plt.subplot2grid( (4,1),(0,0) )
ax.plot(T[0],R[0])
ax.set_ylabel('Coherence')
ax = plt.subplot2grid( (4,1),(1,0) )
ax.plot(T[0],R[1],'.')
ax.set_ylabel('Back-azimuth')
ax = plt.subplot2grid( (4,1),(2,0) )
ax.plot(T[0],R[2],'.')
ax.set_ylabel('Velocity\nkm/s')
ax.set_xlabel('Time (min)')
However, I encounter an error
ValueError: x and y must have same first dimension, but have shapes (8,) and (3,)
I comment the part of where I think the problem might reside (how to filter unwanted values?), but then the question remains.
How can I plot this two arrays (R and T) while still being able to filter out unwanted values below thres? Can I transform these unwanted values to zero or NaN and then successfully plot them? If yes, how can I do that?
Your help would be much appreciated.
With the help of a techie friend, the problem is simply resolved by keeping this part
R = R[R[:, 0] >= thres]
because removing unwanted elements is more preferable than changing them to NaN or zero. And then the problem with plotting is fixed by adding a slight modification in this part
ax.plot(T[0][:len(R[0])],R[0])
and also for the subsequent plotting part. This slices T into the same dimension as R.

Rearranging a 3-D array using indices from sorting?

I have a 3-D array of random numbers of size [channels = 3, height = 10, width = 10].
Then I sorted it using sort command from pytorch along the columns and obtained the indices as well.
The corresponding index is shown below:
Now, I would like to return to the original matrix using these indices. I currently use for loops to do this (without considering the batches). The code is:
import torch
torch.manual_seed(1)
ch = 3
h = 10
w = 10
inp_unf = torch.randn(ch,h,w)
inp_sort, indices = torch.sort(inp_unf,1)
resort = torch.zeros(inp_sort.shape)
for i in range(ch):
for j in range(inp_sort.shape[1]):
for k in range (inp_sort.shape[2]):
temp = inp_sort[i,j,k]
resort[i,indices[i,j,k],k] = temp
I would like it to be vectorized considering batches as well i.e.input size is [batch, channel, height, width].
Using Tensor.scatter_()
You can directly scatter the sorted tensor back into its original state using the indices provided by sort():
torch.zeros(ch,h,w).scatter_(dim=1, index=indices, src=inp_sort)
The intuition is based on the previous answer below. As scatter() is basically the reverse of gather(), inp_reunf = inp_sort.gather(dim=1, index=reverse_indices) is the same as inp_reunf.scatter_(dim=1, index=indices, src=inp_sort):
Previous answer
Note: while correct, this is probably less performant, as calling the sort() operation a 2nd time.
You need to obtain the sorting "reverse indices", which can be done by "sorting the indices returned by sort()".
In other words, given x_sort, indices = x.sort(), you have x[indices] -> x_sort ; while what you want is reverse_indices such that x_sort[reverse_indices] -> x.
This can be obtained as follows: _, reverse_indices = indices.sort().
import torch
torch.manual_seed(1)
ch, h, w = 3, 10, 10
inp_unf = torch.randn(ch,h,w)
inp_sort, indices = inp_unf.sort(dim=1)
_, reverse_indices = indices.sort(dim=1)
inp_reunf = inp_sort.gather(dim=1, index=reverse_indices)
print(torch.equal(inp_unf, inp_reunf))
# True

Resources