Columnwise sum of array - two methods, two different results - python-3.x

In this example, the column-wise sum of an array pr is computed in two different ways:
(a) take the sum over the first axis using p.sum's axis parameter
(b) slice the array along the the second axis and take the sum of each slice
import matplotlib.pyplot as plt
import numpy as np
m = 100
n = 2000
x = np.random.random_sample((m, n))
X = np.abs(np.fft.rfft(x)).T
frq = np.fft.rfftfreq(n)
total = X.sum(axis=0)
c = frq # X / total
df = frq[:, None] - c
pr = df * X
a = np.sum(pr, axis=0)
b = [np.sum(pr[:, i]) for i in range(m)]
fig, ax = plt.subplots(1)
ax.plot(a)
ax.plot(b)
plt.show()
Both methods should return the same, but for whatever reason, in this example, they do not. As you can see in the plot below, a and b have totally different values. The difference is, however, so small that np.allclose(a, b) is True.
If you replace pr with some small random values, there is no difference between the two summation methods:
pr = np.random.randn(n, m) / 1e12
a = np.sum(pr, axis=0)
b = np.array([np.sum(pr[:, i]) for i in range(m)])
fig, ax = plt.subplots(1)
ax.plot(a)
ax.plot(b)
plt.show()
The second example indicates that the differences in the sums of the first example are not related to the summation methods. Then, is this a problem relate to floating point value summation? If so, why doesn't such an effect occure in the second example?
Why do the colum-wise sums differ in the first example, and which one is correct?

For why the results are different, see https://stackoverflow.com/a/55469395/7207392. The slice case uses pairwise summation, the axis case doesn't.
Which one is correct? Well, probably neither, but pairwise summation is expected to be more accurate.
Indeed, we can see that it is fairly close to the exact (within machine precision) result obtained using math.fsum.

Related

Plotting a Line of Best Fit on the Same Plot for Multiple Datasets

I am trying to approximate a line of best fit between multiple datasets, and display everything on one plot. This question addresses a similar notion, but the contents are in MatLab and, hence, not the same.
I have data from 4 different experiments that's composed of 146 values, the Y values represent changes in distance over time, the X value, which is represented by integer timesteps (1,2,3,...). The shape of my Y data is (4,146), as I've decided to keep all of it in a nested list, and the shape of my X data is (146,). I have the following set-up for my subplots:
x = [i for i in range(len(temp[0]))]
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(x,Y[0],c="blue", marker='.',linewidth=1)
ax1.scatter(x,Y[1],c="orange", marker='.',linewidth=1)
ax1.scatter(x,Y[2],c="green", marker='.',linewidth=1)
ax1.scatter(x,Y[3],c="purple", marker='.',linewidth=1)
z = np.polyfit(x,Y,3) # Throws an error because x,Y are not the same length
p = np.poly1d(z)
plt.plot(x, p(x))
I do not know how to fit a line of best fit between the scatter plots. numpy.polyfit documentation suggests that "Several data sets of sample points sharing the same x-coordinates can be fitted at once", but I have been unsuccessful thus far, and can only fit the line to one dataset. Is there a way that I can fit the line to all of the data sets? Should I use a different library entirely, like Seaborn?
Try to cast x and Y to a numpy arrays (I assume it is in a list). You can do this by using x = np.asarray(x). Now to fit on the data collectively, you can flatten the Y array using Y.flatten(). It transforms the shape from (n,N) to (n*N). And you can tile the x array n times to make a fit, this just copies the array n times into a new array so this will also become shape (n*N,). In this way you match the values form Y to corresponding values of x.
N = 10 # no. datapoints
n = 4 # no. experiments
# creating some dummy data
x = np.linspace(0,1, N) # shape (N,)
Y = np.random.normal(0,1,(n, N))
np.polyfit(np.tile(x, n), Y.flatten(), deg=3)
The polyfit function expects the Y array to be, in your case, (146, 4) rather than (4, 146), so you should pass it the transpose of Y, e.g.,
z = np.polyfit(x, Y.T, 3)
The poly1d function can only do one polynomial at a time, so you have to loop over the results from polyfit, e.g.,:
for res in z:
p = np.poly1d(res)
plt.plot(x, p(x))

python library for interporate randomly located 2d points based on regular gridded date points

Do you know some well-known python library for interpolate randomly located 2d points based on regular grid date points?
Note that data points to create an interpolator is on regular grid. But evaluation points are not on regular grid.
context
Let me explain the context. In my application, data points to create an interpolator is on a regular grid. However, at the evaluation time, the points to be evaluated are on random locations (say np.random.rand(100, 2)).
As far as I know, most used library for 2d interpolation is scipy's interp2d. But at the evaluation time interp2d takes grid coordinates X and Y instead of points as the following documentation describe.
Of course, it is possible to do something like
values = []
for p in np.random.rand(100, 2):
value = itp([p[0]], [p[1]])
values.append(value)
or to avoid for-loop
pts = np.random.rand(100, 2)
tmp = itp(pts[:, 0], pts[:, 1])
value = tmp.diagonal()
But both method is two inefficient. First one will be slow by for loop (run code as possible as in c-side) and the second one is wasteful because evaluate N^2 points for getting results for only N points.
scipy.interpolate.RegularGridInterpolator does. By this, one can create interpolator using gridded data points, and at evaluation time it takes 2dim numpy array with shape (n_points, n_dim).
For example:
import numpy as np
from scipy.interpolate import RegularGridInterpolator
x = np.linspace(0, 1, 20)
y = np.linspace(0, 1, 20)
f = np.random.randn(20, 20)
itp = RegularGridInterpolator((x, y), f)
pts = np.random.rand(100, 2)
f_interped = itp(pts)

How can I interpolate a numpy array so that it becomes a certain length?

I have three numpy arrays each with different lengths:
A.shape = (3401,)
B.shape = (2200,)
C.shape = (4103,)
I would like to average the three arrays to produce a new array with size of the largest array (in this case C):
D.shape = (4103,)
Problem is, I don't think I can do this without adding "fake" data to A and B, by interpolation.
How can I perform interpolation on the first two numpy arrays so that they are of the same length as array C?
Do I even need to interpolate here?
First thing that comes to mind is zoom from scipy:
The array is zoomed using spline interpolation of the requested order.
Code:
import numpy as np
from scipy.ndimage import zoom
A = np.random.rand(3401)
B = np.random.rand(2200)
C = np.ones(4103)
for arr in [A, B]:
zoom_rate = C.shape[0] / arr.shape[0]
arr = zoom(arr, zoom_rate)
print(arr.shape)
Output:
(4103,)
(4103,)
I think the simplest option is to do the following:
D = np.concatenate([np.average([A[:2200], B, C[:2200]], axis=0),
np.average([A[2200:3401], C[2200:3401]], axis=0),
C[3401:]])

points of intersection of horizontal line with a function [duplicate]

This question already has answers here:
Intersection of two graphs in Python, find the x value
(10 answers)
Closed 3 years ago.
i have this code that generate the following (image), how would i proceed to detect the intersections of the line with the function ?`
import numpy as np
import matplotlib.pyplot as plt
y = 0.4*np.ones(100)
x = np.arange(0, 100)
t = np.linspace(0,100,100)
Fs = 6000
f = 200
func = np.sin(2 * np.pi * f * t / Fs)
idx = np.where(func == y) # how i think i should do to detect intersections
print(idx)
plt.plot(x, y) # the horizontal line
plt.plot(t,func) # the function
plt.show()
You can use the following expression to get the indices of the array t that is closest to the intersection points.
idx = np.argwhere(np.diff(np.sign(y - func))).flatten()
This expression selects indices where there is a change of sign in the list. However, this is only an approximation of the real intersection points. Decrease the step-size of t to increase precision.
Since the equations are relatively simple, another way would be to solve it by hand and implement the closed-form formula for plotting.
You have the equations y = 0.4 and y = sin(2*pi*t*f/Fs). Intersection points are at values of t such that 0.4 = sin(2*pi*t*f/Fs). Solving for t gives two answers:
t = (arcsin(0.4) + 2*pi*k) / (2*pi*f/Fs)
t = (pi - arcsin(0.4) + 2*pi*k) / (2*pi*f/Fs)
where k is any integer. In short, loop through all desired integers in a given range and compute the coordinates t using the two equations above. You will get a set of points (t,0.4) that you can plot on your graph.

Compute sum of pairwise sums of two array's columns

I am looking for a way to avoid the nested loops in the following snippet, where A and B are two-dimensional arrays, each of shape (m, n) with m, n beeing arbitray positive integers:
import numpy as np
m, n = 5, 2
a = randint(0, 10, (m, n))
b = randint(0, 10, (m, n))
out = np.empty((n, n))
for i in range(n):
for j in range(n):
out[i, j] = np.sum(A[:, i] + B[:, j])
The above logic is roughly equivalent to
np.einsum('ij,ik', A, B)
with the exception that einsum computes the sum of products.
Is there a way, equivalent to einsum, that computes a sum of sums? Or do I have to write an extension for this operation?
einsum needs to perform elementwise multiplication and then it does summing (optional). As such it might not be applicable/needed to solve our case. Read on!
Approach #1
We can leverage broadcasting such that the first axes are aligned
and second axis are elementwise summed after extending dimensions to 3D. Finally, we need summing along the first axis -
(A[:,:,None] + B[:,None,:]).sum(0)
Approach #2
We can simply do outer addition of columnar summations of each -
A.sum(0)[:,None] + B.sum(0)
Approach #3
And hence, bring in einsum -
np.einsum('ij->j',A)[:,None] + np.einsum('ij->j',B)
You can also use numpy.ufunc.outer, specifically here numpy.add.outer after summing along axis 0 as #Divakar mentioned in #approach 2
In [126]: numpy.add.outer(a.sum(0), b.sum(0))
Out[126]:
array([[54, 67],
[43, 56]])

Resources