Sample from a discrete random distribution in Python - python-3.x

I was hoping to know if there is a command in numpy of scipy to pick an element of a data from a discrete random distribution. i.e.,
For example I have a discrete distribution x = (0.5, 0.3, 0.2) and I want to sample from y = (1, 2, 3)...
>>> sample(x, y)
2
>>> sample(x, y)
3
>>> sample(x, y)
3
>>> sample(x, y)
1
Hope my question is clear. Thanks.

Related

Multiply a [3, 2, 3] by a [3, 2] tensor in pytorch (dot product along dimension)

Given the following tensors x and y with shapes [3,2,3] and [3,2]. I want to multiply the tensors along the 2nd dimension, this is expected to be a kind of dot product and scaling along the axis and return a [3,2,3] tensor.
import torch
a = [[[0.2,0.3,0.5],[-0.5,0.02,1.0]],[[0.01,0.13,0.06],[0.35,0.12,0.0]], [[1.0,-0.3,1.0],[1.0,0.02, 0.03]] ]
b = [[1,2],[1,3],[0,2]]
x = torch.FloatTensor(a) # shape [3,2,3]
y = torch.FloatTensor(b) # shape [3,2]
The expected output :
Expected output shape should be [3,2,3]
#output = [[[0.2,0.3,0.5],[-1.0,0.04,2.0]],[[0.01,0.13,0.06],[1.05,0.36,0.0]], [[0.0,0.0,0.0],[2.0,0.04, 0.06]] ]
I have tried the two below but none of them is giving the desired output and output shape.
torch.matmul(x,y)
torch.matmul(x,y.unsqueeze(1).shape)
What is the best way to fix this?
This is just broadcasted multiply. So you can insert a unitary dimension on the end of y to make it a [3,2,1] tensor and then multiply by x. There are multiple ways to insert unitary dimensions.
# all equivalent
x * y.unsqueeze(2)
x * y[..., None]
x * y[:, :, None]
x * y.reshape(3, 2, 1)
You could also use torch.einsum.
torch.einsum('abc,ab->abc', x, y)

Finding the mean of a distribution

My code generates a number of distributions (I only plotted one below to make it more legible). Y axis - here represents a probability density function and the X axis - is a simple array of values.
In more detail.
Y = [0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]
And X is generated using np.arange(0,10,1) = [0 1 2 3 4 5 6 7 8 9]
I want to find the mean of this distribution (i.e where the curve peaks on the X-axis, not the Y value mean. I know how to use numpy packages np.mean to find the mean of Y but its not what I need.
By eye, the mean here is about x=3 but I would like to generate this with a code to make it more accurate.
Any help would be great.
By definition, the mean (actually, the expected value of a random variable x, but since you have the PDF, you could use the expected value) is sum(p(x[j]) * x[j]), where p(x[j]) is the value of the PDF at x[j]. You can implement this as code like this:
>>> import numpy as np
>>> Y = np.array(eval(",".join("[0.02046505 0.10756612 0.24319883 0.30336375 0.22071875 0.0890625 0.015625 0 0 0]".split())))
>>> Y
array([0.02046505, 0.10756612, 0.24319883, 0.30336375, 0.22071875,
0.0890625 , 0.015625 , 0. , 0. , 0. ])
>>> X = np.arange(0, 10)
>>> Y.sum()
1.0
>>> (X * Y).sum()
2.92599253
So the (approximate) answer is 2.92599253.

Plotting all of a trigonometric function (x^2 + y^2 == 1) with matplotlib and python

As an exercise in learning Matplotlib and improving my math/coding I decided to try and plot a trigonometric function (x squared plus y squared equals one).
Trigonometric functions are also called "circular" functions but I am only producing half the circle.
#Attempt to plot equation x^2 + y^2 == 1
import numpy as np
import matplotlib.pyplot as plt
import math
x = np.linspace(-1, 1, 21) #generate np.array of X values -1 to 1 in 0.1 increments
x_sq = [i**2 for i in x]
y = [math.sqrt(1-(math.pow(i, 2))) for i in x] #calculate y for each value in x
y_sq = [i**2 for i in y]
#Print for debugging / sanity check
for i,j in zip(x_sq, y_sq):
print('x: {:1.4f} y: {:1.4f} x^2: {:1.4f} y^2: {:1.4f} x^2 + Y^2 = {:1.4f}'.format(math.sqrt(i), math.sqrt(j), i, j, i+j))
#Format how the chart displays
plt.figure(figsize=(6, 4))
plt.axhline(y=0, color='y')
plt.axvline(x=0, color='y')
plt.grid()
plt.plot(x, y, 'rx')
plt.show()
I want to plot the full circle. My code only produces the positive y values and I want to plot the full circle.
Here is how the full plot should look. I used Wolfram Alpha to generate it.
Ideally I don't want solutions where the lifting is done for me such as using matplotlib.pyplot.contour. As a learning exercise, I want to "see the working" so to speak. Namely I ideally want to generate all the values and plot them "manually".
The only method I can think of is to re-arrange the equation and generate a set of negative y values with calculated x values then plot them separately. I am sure there is a better way to achieve the outcome and I am sure one of the gurus on Stack Overflow will know what those options are.
Any help will be gratefully received. :-)
The equation x**2 + y**2 = 1 describes a circle with radius 1 around the origin.
But suppose you wouldn't know this already, you can still try to write this equation in polar coordinates,
x = r*cos(phi)
y = r*sin(phi)
(r*cos(phi))**2 + (r*sin(phi))**2 == 1
r**2*(cos(phi)**2 + sin(phi)**2) == 1
Due to the trigonometric identity cos(phi)**2 + sin(phi)**2 == 1 this reduces to
r**2 == 1
and since r should be real,
r == 1
(for any phi).
Plugging this into python:
import numpy as np
import matplotlib.pyplot as plt
phi = np.linspace(0, 2*np.pi, 200)
r = 1
x = r*np.cos(phi)
y = r*np.sin(phi)
plt.plot(x,y)
plt.axis("equal")
plt.show()
This happens because the square root returns only the positive value, so you need to take those values and turn them into negative values.
You can do something like this:
import numpy as np
import matplotlib.pyplot as plt
r = 1 # radius
x = np.linspace(-r, r, 1000)
y = np.sqrt(r-x**2)
plt.figure(figsize=(5,5), dpi=100) # figsize=(n,n), n needs to be equal so the image doesn't flatten out
plt.grid(linestyle='-', linewidth=2)
plt.plot(x, y, color='g')
plt.plot(x, -y, color='r')
plt.legend(['Positive y', 'Negative y'], loc='lower right')
plt.axhline(y=0, color='b')
plt.axvline(x=0, color='b')
plt.show()
And that should return this:
PLOT

Calculate the volume of 3d plot

The data is from a measurement. The picture of the plotted data
I tried using trapz twice, but I get and error code: "ValueError: operands could not be broadcast together with shapes (1,255) (256,531)"
The x has 256 points and y has 532 points, also the Z is a 2d array that has a 256 by 532 lenght. The code is below:
import numpy as np
img=np.loadtxt('focus_x.txt')
m=0
m=np.max(img)
Z=img/m
X=np.loadtxt("pixelx.txt",float)
Y=np.loadtxt("pixely.txt",float)
[X, Y] = np.meshgrid(X, Y)
volume=np.trapz(X,np.trapz(Y,Z))
The docs state that trapz should be used like this
intermediate = np.trapz(Z, x)
result = np.trapz(intermediate, y)
trapz is reducing the dimensionality of its operand (by default on the last axis) using optionally a 1D array of abscissae to determine the sub intervals of integration; it is not using a mesh grid for its operation.
A complete example.
First we compute, using sympy, the integral of a simple bilinear function over a rectangular domain (0, 5) × (0, 7)
In [1]: import sympy as sp, numpy as np
In [2]: x, y = sp.symbols('x y')
In [3]: f = 1 + 2*x + y + x*y
In [4]: f.integrate((x, 0, 5)).integrate((y, 0, 7))
Out[4]: 2555/4
Now we compute the trapezoidal approximation to the integral (as it happens, the approximation is exact for a bilinear function) — we need coordinates arrays
In [5]: x, y = np.linspace(0, 5, 11), np.linspace(0, 7, 22)
(note that the sampling is different in the two directions and different from the defalt value used by trapz) — we need a mesh grid to compute the integrand and we need to compute the integrand
In [6]: X, Y = np.meshgrid(x, y)
In [7]: z = 1 + 2*X + Y + X*Y
and eventually we compute the integral
In [8]: 4*np.trapz(np.trapz(z, x), y)
Out[8]: 2555.0

Why does contourf (matplotlib) switch x and y coordinates?

I am trying to get contourf to plot my stuff right, but it seems to switch the x and y coordinates. In the example below, I show this by evaluating a 2d Gaussian function that has different widths in x and y directions. With the values given, the width in y direction should be larger. Here is the script:
from numpy import *
from matplotlib.pyplot import *
xMax = 50
xNum = 100
w0x = 10
w0y = 15
dx = xMax/xNum
xGrid = linspace(-xMax/2+dx/2, xMax/2-dx/2, xNum, endpoint=True)
yGrid = xGrid
Int = zeros((xNum, xNum))
for idX in range(xNum):
for idY in range(xNum):
Int[idX, idY] = exp(-((xGrid[idX]/w0x)**2 + (yGrid[idY]/(w0y))**2))
fig = figure(6)
clf()
ax = subplot(2,1,1)
X, Y = meshgrid(xGrid, yGrid)
contour(X, Y, Int, colors='k')
plot(array([-xMax, xMax])/2, array([0, 0]), '-b')
plot(array([0, 0]), array([-xMax, xMax])/2, '-r')
ax.set_aspect('equal')
xlabel("x")
ylabel("y")
subplot(2,1,2)
plot(xGrid, Int[:, int(xNum/2)], '-b', label='I(x, y=max/2)')
plot(xGrid, Int[int(xNum/2), :], '-r', label='I(x=max/2, y)')
ax.set_aspect('equal')
legend()
xlabel(r"x or y")
ylabel(r"I(x or y)")
The figure thrown out is this:
On top the contour plot which has the larger width in x direction (not y). Below are slices shown, one across x direction (at constant y=0, blue), the other in y direction (at constant x=0, red). Here, everything seems fine, the y direction is broader than the x direction. So why would I have to transpose the array in order to have it plotted as I want? This seems unintuitive to me and not in agreement with the documentation.
It helps if you think of a 2D array's shape not as (x, y) but as (rows, columns), because that is how most math routines interpret them - including matplotlib's 2D plotting functions. Therefore, the first dimension is vertical (which you call y) and the second dimension is horizontal (which you call x).
Note that this convention is very prominent, even in numpy. The function np.vstack is supposed to concatenate arrays vertically works along the first dimension and np.hstack works horizontally on the second dimension.
To illustrate the point:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[0, 0, 1, 0, 0],
[0, 1, 1, 1, 0],
[1, 1, 1, 1, 1]])
a[:, 2] = 2 # set column
print(a)
plt.imshow(a)
plt.contour(a, colors='k')
This prints
[[0 0 2 0 0]
[0 1 2 1 0]
[1 1 2 1 1]]
and consistently plots
According to your convention that an array is (x, y) the command a[:, 2] = 2 should have assigned to the third row, but numpy and matplotlib both agree that it was the column :)
You can of course use your own convention how to interpret the dimensions of your arrays, but in the long run it will be more consistent to treat them as (y, x).

Resources