Numpy take along multiple axes - python-3.x

I have an N dimensional array where N is a variable from which I want to take elements along a given set of axes.
The objective is similar to the question except that the solution in that one seems to work when the dimensions and the axes are fixed.
For example, suppose from a 3D array, we want to extract elements along axis 0 for every multi-index along the other two axes. If the value of N is known beforehand, this can be hard corded
import numpy as np
a = np.arange(12).reshape((2,3,2))
ydim = a.shape[1]
zdim = a.shape[2]
for y in range(ydim):
for z in range(zdim):
print(a[:,y,z])
which gives the output
[0 6]
[1 7]
[2 8]
[3 9]
[ 4 10]
[ 5 11]
Q: How can this be achieved when N and the axes are not known beforehand?
For a single axis, numpy.take or numpy.take_along_axis do the job. I am looking for a similar function but for multiple axes. A function, say, take_along_axes() which can be used as follows:
ax = [1,2] ## list of axes from which indices are taken
it = np.nditer(a, op_axes=ax, flags=['multi_index']) ## Every index along those axes
while not it.finished():
print(np.take_along_axes(a,it.multi_index, axes=ax)
it.iternext()
The expected output is the same as the previous one.

Related

Finding the mean of a distribution

My code generates a number of distributions (I only plotted one below to make it more legible). Y axis - here represents a probability density function and the X axis - is a simple array of values.
In more detail.
Y = [0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]
And X is generated using np.arange(0,10,1) = [0 1 2 3 4 5 6 7 8 9]
I want to find the mean of this distribution (i.e where the curve peaks on the X-axis, not the Y value mean. I know how to use numpy packages np.mean to find the mean of Y but its not what I need.
By eye, the mean here is about x=3 but I would like to generate this with a code to make it more accurate.
Any help would be great.
By definition, the mean (actually, the expected value of a random variable x, but since you have the PDF, you could use the expected value) is sum(p(x[j]) * x[j]), where p(x[j]) is the value of the PDF at x[j]. You can implement this as code like this:
>>> import numpy as np
>>> Y = np.array(eval(",".join("[0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]".split())))
>>> Y
array([0.02046505, 0.10756612, 0.24319883, 0.30336375, 0.22071875,
0.0890625 , 0.015625 , 0. , 0. , 0. ])
>>> X = np.arange(0, 10)
>>> Y.sum()
1.0
>>> (X * Y).sum()
2.92599253
So the (approximate) answer is 2.92599253.

How to generate random points uniformly distributed over 40*40 rectangular region centered at (0,0) using python?

I am trying to generate random points uniformly distributed over rectangular region centered at (0,0) using numpy and matplotlib.
The words, random points uniformly distributed is not easy to interprete. This is one of my interpretation shown as a runnable code. Sample output plot is also given.
import matplotlib.pyplot as plt
import numpy as np
# create array of meshgrid over a rectangular region
# range of x: -cn/2, cn/2
# range of y: -rn/2, rn/2
cn, rn = 10, 14 # number of columns/rows
xs = np.linspace(-cn/2, cn/2, cn)
ys = np.linspace(-rn/2, rn/2, rn)
# meshgrid will give regular array-like located points
Xs, Ys = np.meshgrid(xs, ys) #shape: rn x cn
# create some uncertainties to add as random effects to the meshgrid
mean = (0, 0)
varx, vary = 0.007, 0.008 # adjust these number to suit your need
cov = [[varx, 0], [0, vary]]
uncerts = np.random.multivariate_normal(mean, cov, (rn, cn))
# plot the random-like meshgrid
plt.scatter(Xs+uncerts[:,:,0], Ys+uncerts[:,:,1], color='b');
plt.gca().set_aspect('equal')
plt.show()
You can change the values of varx and vary to change the level of randomness of the dot array on the plot.
As #JohanC mentioned in comments, you need points with x and y coordinates between -20 and 20. To create them use:
np.random.uniform(-20, 20, size=(n,2))
with n being your desired number of points.
To plot them:
import matplotlib.pyplot as plt
plt.scatter(a[:,0],a[:,1])
sample plot for n=100 points:

How to efficiently scatter plot a numpy 2d array

I have a numpy with each row containing x, y pairs and I want to display a scatter plot without using a for loop so I used the following approach using pandas:
def visualize_k_means_output(self, centroids):
fig, ax = plt.subplots()
self.visualize_box_relative_sizes()
frame = pd.DataFrame(centroids, columns=['X', 'Y'])
ax.scatter(frame['X'], frame['Y'], marker='*', s=200, c='black')
The question is how to extract the first item as x and the second item as y without using a for loop for example:
ax.scatter(x=[item[0] for item in centroids], y=[item[1] for item in centroids], ...)
If I understood correctly, you want to slice your numpy array:
x = centroids[:, 0]
y = centroids[:, 1]

Why does contourf (matplotlib) switch x and y coordinates?

I am trying to get contourf to plot my stuff right, but it seems to switch the x and y coordinates. In the example below, I show this by evaluating a 2d Gaussian function that has different widths in x and y directions. With the values given, the width in y direction should be larger. Here is the script:
from numpy import *
from matplotlib.pyplot import *
xMax = 50
xNum = 100
w0x = 10
w0y = 15
dx = xMax/xNum
xGrid = linspace(-xMax/2+dx/2, xMax/2-dx/2, xNum, endpoint=True)
yGrid = xGrid
Int = zeros((xNum, xNum))
for idX in range(xNum):
for idY in range(xNum):
Int[idX, idY] = exp(-((xGrid[idX]/w0x)**2 + (yGrid[idY]/(w0y))**2))
fig = figure(6)
clf()
ax = subplot(2,1,1)
X, Y = meshgrid(xGrid, yGrid)
contour(X, Y, Int, colors='k')
plot(array([-xMax, xMax])/2, array([0, 0]), '-b')
plot(array([0, 0]), array([-xMax, xMax])/2, '-r')
ax.set_aspect('equal')
xlabel("x")
ylabel("y")
subplot(2,1,2)
plot(xGrid, Int[:, int(xNum/2)], '-b', label='I(x, y=max/2)')
plot(xGrid, Int[int(xNum/2), :], '-r', label='I(x=max/2, y)')
ax.set_aspect('equal')
legend()
xlabel(r"x or y")
ylabel(r"I(x or y)")
The figure thrown out is this:
On top the contour plot which has the larger width in x direction (not y). Below are slices shown, one across x direction (at constant y=0, blue), the other in y direction (at constant x=0, red). Here, everything seems fine, the y direction is broader than the x direction. So why would I have to transpose the array in order to have it plotted as I want? This seems unintuitive to me and not in agreement with the documentation.
It helps if you think of a 2D array's shape not as (x, y) but as (rows, columns), because that is how most math routines interpret them - including matplotlib's 2D plotting functions. Therefore, the first dimension is vertical (which you call y) and the second dimension is horizontal (which you call x).
Note that this convention is very prominent, even in numpy. The function np.vstack is supposed to concatenate arrays vertically works along the first dimension and np.hstack works horizontally on the second dimension.
To illustrate the point:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[0, 0, 1, 0, 0],
[0, 1, 1, 1, 0],
[1, 1, 1, 1, 1]])
a[:, 2] = 2 # set column
print(a)
plt.imshow(a)
plt.contour(a, colors='k')
This prints
[[0 0 2 0 0]
[0 1 2 1 0]
[1 1 2 1 1]]
and consistently plots
According to your convention that an array is (x, y) the command a[:, 2] = 2 should have assigned to the third row, but numpy and matplotlib both agree that it was the column :)
You can of course use your own convention how to interpret the dimensions of your arrays, but in the long run it will be more consistent to treat them as (y, x).

Matplotlib Markers as Tick Labels

Given the following code, I'd like to replace the y-tick label numbers with stars, the number of which corresponding to each number. For example, the top label should be 10 stars, aligned such that the last star is placed where the 0 in 10 currently resides. They need to be dynamically generated, meaning I want to avoid using plt.xticks(['**********',.....]):
import matplotlib.pyplot as plt
x = [1, 2]
y = [1, 4]
labels = ['Bogs', 'Slogs']
plt.plot(x, y, 'ro')
plt.xticks(x, labels, rotation='vertical')
plt.margins(0.2)
plt.subplots_adjust(bottom=0.15)
plt.show()
Here's basically what I'm trying to produce (dynamic numbers of stars per the underlying y-tick label values):
Thanks in advance!
Don't actually write out the stars, then. When using a programming language, program! :-)
y_limit = 5
y_labels = ['*' * i for i in range(y_limit)]
plt.yticks(range(y_limit), y_labels)

Resources