Filtering signal: how to restrict filter that last point of output must equal the last point of input

Filtering signal: how to restrict filter that last point of output must equal the last point of input - python-3.x

Please help my poor knowledge of signal processing.
I want to smoothen some data. Here is my code:
import numpy as np
from scipy.signal import butter, filtfilt
def testButterworth(nyf, x, y):
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
if __name__ == '__main__':
positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fl = testButterworth(nyf, x, y)
I am pretty satisfied with results except one point:
it is absolutely crucial to me that the start and end point in returned values equal to the start and end point of input. How can I introduce this restriction?
UPD 15-Dec-14 12:04:
my original data looks like this
Applying the filter and zooming into last part of the graph gives following result:
So, at the moment I just care about the last point that must be equal to original point. I try to append copy of data to the end of original list this way:
the result is as expected even worse.
Then I try to append data this way:
And the slice where one period ends and next one begins, looks like that:

To do this, you're always going to cheat somehow, since the true filter applied to the true data doesn't behave the way you require.
One of the best ways to cheat with your data is to assume it's periodic. This has the advantages that: 1) it's consistent with the data you actually have and all your changing is to append data to the region you don't know about (so assuming it's periodic as as reasonable as anything else -- although may violate some unstated or implicit assumptions); 2) the result will be consistent with your filter.
You can usually get by with this by appending copies of your data to the beginning and end of your real data, or just small pieces, depending on your filter.
Since the FFT assumes that the data is periodic anyway, that's often a quick and easy approach, and is fully accurate (whereas concatenating the data is an estimation of an infinitely periodic waveform). Here's an example of the FFT approach for a step filter.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 128)
y = (np.sin(.22*(x+10))>0).astype(np.float)
# filter
y2 = np.fft.fft(y)
f0 = np.fft.fftfreq(len(x))
y2[(f0<-.25) | (f0>.25)] = 0
y3 = abs(np.fft.ifft(y2))
plt.plot(x, y)
plt.plot(x, y3)
plt.xlim(-10, 140)
plt.ylim(-.1, 1.1)
plt.show()
Note how the end points bend towards each other at either end, even though this is not consistent with the periodicity of the waveform (since the segments at either end are very truncated). This can also be seen by adjusting waveform so that the ends are the same (here I used x+30 instead of x+10, and here the ends don't need to bend to match-up so they stay at level with the end of the data.
Note, also, to have the endpoints actually be exactly equal you would have to extend this plot by one point (at either end), since it periodic with exactly the wavelength of the original waveform. Doing this is not ad hoc though, and the result will be entirely consistent with your analysis, but just representing one extra point of what was assumed to be infinite repeats all along.
Finally, this FFT trick works best with waveforms of length 2n. Other lengths may be zero padded in the FFT. In this case, just doing concatenations to either end as I mentioned at first might be the best way to go.

The question is how to filter data and require that the left endpoint of the filtered result matches the left endpoint of the data, and same for the right endpoint. (That is, in general, the filtered result should be close to most of the data points, but not necessarily exactly match any of them, but what if you need a match at both endpoints?)
To make the filtered result exactly match the endpoints of a curve, one could add a padding of points at either end of the curve and adjust the y-position of this padding so that the endpoints of the valid part of the filter exactly matched the end points of the original data (without the padding).
In general, this can be done by either iterating towards a solution, adjusting the padding y-position until the ends line up, or by calculating a few values and then interpolating to determine the y-positions that would be required for the matched endpoints. I'll do the second approach.
Here's the code I used, where I simulated the data as a sine wave with two flat pieces on either side (note, that these flat pieces are not the padding, but I'm just trying to make data that looks a bit like the OPs).
import numpy as np
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
#### op's code
def testButterworth(nyf, x, y):
#b, a = butter(4, 1.5/nyf)
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
def do_fit(data):
positions_recorded = data
#positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fx = testButterworth(nyf, x, y)
return fx
### simulate some data (op should have done this too!)
def sim_data():
t = np.linspace(.1*np.pi, (2.-.1)*np.pi, 100)
y = np.sin(t)
c = np.ones(10, dtype=np.float)
z = np.concatenate((c*y[0], y, c*y[-1]))
return z
### code to find the required offset padding
def fit_with_pads(v, data, n=1):
c = np.ones(n, dtype=np.float)
z = np.concatenate((c*v[0], data, c*v[1]))
fx = do_fit(z)
return fx
def get_errors(data, fx):
n = (len(fx)-len(data))//2
return np.array((fx[n]-data[0], fx[-n]-data[-1]))
def vary_padding(data, span=.005, n=100):
errors = np.zeros((4, n)) # Lpad, Rpad, Lerror, Rerror
offsets = np.linspace(-span, span, n)
for i in range(n):
vL, vR = data[0]+offsets[i], data[-1]+offsets[i]
fx = fit_with_pads((vL, vR), data, n=1)
errs = get_errors(data, fx)
errors[:,i] = np.array((vL, vR, errs[0], errs[1]))
return errors
if __name__ == '__main__':
data = sim_data()
fx = do_fit(data)
errors = vary_padding(data)
plt.plot(errors[0], errors[2], 'x-')
plt.plot(errors[1], errors[3], 'o-')
oR = -0.30958
oL = 0.30887
fp = fit_with_pads((oL, oR), data, n=1)[1:-1]
plt.figure()
plt.plot(data, 'b')
plt.plot(fx, 'g')
plt.plot(fp, 'r')
plt.show()
Here, for the padding I only used a single point on either side (n=1). Then I calculate the error for a range of values shifting the padding up and down from the first and last data points.
For the plots:
First I plot the offset vs error (between the fit and the desired data value). To find the offset to use, I just zoomed in on the two lines to find the x-value of the y zero crossing, but to do this more accurately, one could calculate the zero crossing from this data:
Here's the plot of the original "data", the fit (green) and the adjusted fit (red):
and zoomed in the RHS:
The important point here is that the red (adjusted fit) and blue (original data) endpoints match, even though the pure fit doesn't.
Is this a valid approach? Of the various options, this seems the most reasonable since one isn't usually making any claims about the data that isn't being shown, and also for show region has an accurately applied filter. For example, FFTs usually assume the data is zero or periodic beyond the boundaries. Certainly, though, to be precise one should explain what was done.

Related

What's a potentially better algorithm to solve this python nested for loop than the one I'm using?

I have a nested loop that has to loop through a huge amount of data.
Assuming a data frame with random values with a size of 1000,000 rows each has an X,Y location in 2D space. There is a window of 10 length that go through all the 1M data rows one by one till all the calculations are done.
Explaining what the code is supposed to do:
Each row represents a coordinates in X-Y plane.
r_test is containing the diameters of different circles of investigations in our 2D plane (X-Y plane).
For each 10 points/rows, for every single diameter in r_test, we compare the distance between every point with the remaining 9 points and if the value is less than R we add 2 to H. Then we calculate H/(N**5) and store it in c_10 with the index corresponding to that of the diameter of investigation.
For this first 10 points finally when the loop went through all those diameters in r_test, we read the slope of the fitted line and save it to S_wind[ii]. So the first 9 data points will have no value calculated for them thus giving them np.inf to be distinguished later.
Then the window moves one point down the rows and repeat this process till S_wind is completed.
What's a potentially better algorithm to solve this than the one I'm using? in python 3.x?
Many thanks in advance!
import numpy as np
import pandas as pd
####generating input data frame
df = pd.DataFrame(data = np.random.randint(2000, 6000, (1000000, 2)))
df.columns= ['X','Y']
####====creating upper and lower bound for the diameter of the investigation circles
x_range =max(df['X']) - min(df['X'])
y_range = max(df['Y']) - min(df['Y'])
R = max(x_range,y_range)/20
d = 2
N = 10 #### Number of points in each window
#r1 = 2*R*(1/N)**(1/d)
#r2 = (R)/(1+d)
#r_test = np.arange(r1, r2, 0.05)
##===avoiding generation of empty r_test
r1 = 80
r2= 800
r_test = np.arange(r1, r2, 5)
S_wind = np.zeros(len(df['X'])) + np.inf
for ii in range (10,len(df['X'])): #### maybe the code run slower because of using len() function instead of a number
c_10 = np.zeros(len(r_test)) +np.inf
H = 0
C = 0
N = 10 ##### maybe I should also remove this
for ind in range(len(r_test)):
for i in range (ii-10,ii):
for j in range(ii-10,ii):
dd = r_test[ind] - np.sqrt((df['X'][i] - df['X'][j])**2+ (df['Y'][i] - df['Y'][j])**2)
if dd > 0:
H += 1
c_10[ind] = (H/(N**2))
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0]

You can use numpy broadcasting to eliminate all of the inner loops. I'm not sure if there's an easy way to get rid of the outermost loop, but the others are not too hard to avoid.
The inner loops are comparing ten 2D points against each other in pairs. That's just dying for using a 10x10x2 numpy array:
# replacing the `for ind` loop and its contents:
points = np.hstack((np.asarray(df['X'])[ii-10:ii, None], np.asarray(df['Y'])[ii-10:ii, None]))
differences = np.subtract(points[None, :, :], points[:, None, :]) # broadcast to 10x10x2
squared_distances = (differences * differences).sum(axis=2)
within_range = squared_distances[None,:,:] < (r_test*r_test)[:, None, None] # compare squares
c_10 = within_range.sum(axis=(1,2)).cumsum() * 2 / (N**2)
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0] # this is unchanged...
I'm not very pandas savvy, so there's probably a better way to get the X and Y values into a single 2-dimensional numpy array. You generated the random data in the format that I'd find most useful, then converted into something less immediately useful for numeric operations!
Note that this code matches the output of your loop code. I'm not sure that's actually doing what you want it to do, as there are several slightly strange things in your current code. For example, you may not want the cumsum in my code, which corresponds to only re-initializing H to zero in the outermost loop. If you don't want the matches for smaller values of r_test to be counted again for the larger values, you can skip that sum (or equivalently, move the H = 0 line to in between the for ind and the for i loops in your original code).

How to match a geometric template of 2D boxes to fit another set of 2D boxes

I'm trying to find a match between a set of 2D boxes with coordinates (A) (from a template with known sizes and distances between boxes) to another set of 2D boxes with coordinates (B) (which may contain more boxes than A). They should match in terms of each box from A corresponds to a single Box in B. The boxes in A together form a "stamp" which is assymmetrical in atleast one dimension.
Illustration of problem
explanation: "Stanz" in the illustration is a box from set A.
One might even think of the Set A as only 2D points (the centerpoint of the box) to make it simpler.
The end result will be to know which A box corresponds to which B box.
I can only think of very specific ways of doing this, tailored to a specific layout of boxes, is there any known generic ways of dealing with this forms of matching/search problems and what are they called?
Edit: Possible solution
I have come up with one possible solution, looking for all the possible rotations at each possible B center position for a single box from set A. Here all of the points in A would be rotated and compared against the distance to B centers. Not sure if this is a good way.
Looking for the possible rotations at each B centerpoint- solution

In your example, the transformation between the template and its presence in B can be entirely defined (actually, over-defined) by two matching points.
So here's a simple approach which is kind of performant. First, put all the points in B into a kD-tree. Now, pick a canonical "first" point in A, and hypothesize matching it to each of the points in B. To check whether it matches a particular point in B, pick a canonical "second" point in A and measure its distance to the "first" point. Then, use a standard kD proximity-bounding query to find all the points in B which are roughly that distance from your hypothesized matched "first" point in B. For each of those, determine the transformation between A and B, and for each of the other points in A, determine whether there's a point in A at roughly the right place (again, using the kD-tree), early-outing with the first unmatched point.
The worst-case performance there can get quite bad with pathological cases (O(n^3 log n), I think) but in general I would expect roughly O(n log n) for well-behaved data with a low threshold. Note that the thresholding is a bit rough-and-ready, and the results can depend on your choice of "first" and "second" points.

This is more of an idea than an answer, but it's too long for a comment. I asked some additional questions in a comment above, but the answers may not be particular relevant, so I'll go ahead and offer some thoughts in the meantime.
As you may know, point matching is its own problem domain, and if you search for 'point matching algorithm', you'll find various articles, papers, and other resources. It seems though that an ad hoc solution might be appropriate here (one that's simpler than more generic algorithms that are available).
I'll assume that the input point set can only be rotated, and not also flipped. If this idea were to work though, it should also work with flipping - you'd just have to run the algorithm separately for each flipped configuration.
In your example image, you've matched a point from set A with a point from set B so that they're coincident. Call this shared point the 'anchor' point. You'd need to do this for every combination of a point from set A and a point from set B until you found a match or exhausted the possibilities. The problem then is to determine if a match can be made given one of these matched point pairs.
It seems that for a given anchor point, a necessary but not sufficient condition for a match is that a point from set A and a point from set B can be found that are approximately the same distance from the anchor point. (What 'approximately' means would depend on the input, and would need to be tuned appropriately given that you're using integers.) This condition is met in your example image in that the center point of each point set is (approximately) the same distance from the anchor point. (Note that there could be multiple pairs of points that meet this condition, in which case you'd have to examine each such pair in turn.)
Once you have such a pair - the center points in your example - you can use some simple trigonometry and linear algebra to rotate set A so that the points in the pair coincide, after which the two point sets are locked together at two points and not just one. In your image that would involve rotating set A about 135 degrees clockwise. Then you check to see if every point in set B has a point in set A with which it's coincident, to within some threshold. If so, you have a match.
In your example, this fails of course, because the rotation is not actually a match. Eventually though, if there's a match, you'll find the anchor point pair for which the test succeeds.
I realize this would be easier to explain with some diagrams, but I'm afraid this written explanation will have to suffice for the moment. I'm not positive this would work - it's just an idea. And maybe a more generic algorithm would be preferable. But, if this did work, it might have the advantage of being fairly straightforward to implement.
[Edit: Perhaps I should add that this is similar to your solution, except for the additional step to allow for only testing a subset of the possible rotations.]
[Edit: I think a further refinement may be possible here. If, after choosing an anchor point, matching is possible via rotation, it should be the case that for every point p in B there's a point in A that's (approximately) the same distance from the anchor point as p is. Again, it's a necessary but not sufficient condition, but it allows you to quickly eliminate cases where a match isn't possible via rotation.]

Below follows a finished solution in python without kD-tree and without early outing candidates. A better way is to do the implementation yourself according to Sneftel but if you need anything quick and with a plot this might be useful.
Plot shows the different steps, starts off with just the template as a collection of connected lines. Then it is translated to a point in B where the distances between A and B points fits the best. Finally it is rotated.
In this example it was important to also match up which of the template positions was matched to which boundingbox position, so its an extra step in the end. There might be some deviations in the code compared to the outline above.
import numpy as np
import random
import math
import matplotlib.pyplot as plt
def to_polar(pos_array):
x = pos_array[:, 0]
y = pos_array[:, 1]
length = np.sqrt(x ** 2 + y ** 2)
t = np.arctan2(y, x)
zip_list = list(zip(length, t))
array_polar = np.array(zip_list)
return array_polar
def to_cartesian(pos):
# first element radius
# second is angle(theta)
# Converting polar to cartesian coordinates
radius = pos[0]
theta = pos[1]
x = radius * math.cos(theta)
y = radius * math.sin(theta)
return x,y
def calculate_distance_points(p1,p2):
return np.sqrt((p1[0]-p2[0])**2+(p1[1]-p2[1])**2)
def find_closest_point_inx(point, neighbour_set):
shortest_dist = None
closest_index = -1
# Find the point in the secondary array that is the closest
for index,curr_neighbour in enumerate(neighbour_set):
distance = calculate_distance_points(point, curr_neighbour)
if shortest_dist is None or distance < shortest_dist:
shortest_dist = distance
closest_index = index
return closest_index
# Find the sum of distances between each point in primary to the closest one in secondary
def calculate_agg_distance_arrs(primary,secondary):
total_distance = 0
for point in primary:
closest_inx = find_closest_point_inx(point, secondary)
dist = calculate_distance_points(point, secondary[closest_inx])
total_distance += dist
return total_distance
# returns a set of <primary_index,neighbour_index>
def pair_neighbours_by_distance(primary_set, neighbour_set, distance_limit):
pairs = {}
for num, point in enumerate(primary_set):
closest_inx = find_closest_point_inx(point, neighbour_set)
if calculate_distance_points(neighbour_set[closest_inx], point) > distance_limit:
closest_inx = None
pairs[num]=closest_inx
return pairs
def rotate_array(array, angle,rot_origin=None):
if rot_origin is not None:
array = np.subtract(array,rot_origin)
# clockwise rotation
theta = np.radians(angle)
c, s = np.cos(theta), np.sin(theta)
R = np.array(((c, -s), (s, c)))
rotated = np.matmul(array, R)
if rot_origin is not None:
rotated = np.add(rotated,rot_origin)
return rotated
# Finds out a point in B_set and a rotation where the points in SetA have the best alignment towards SetB.
def find_stamp_rotation(A_set, B_set):
# Step 1
anchor_point_A = A_set[0]
# Step 2. Convert all points to polar coordinates with anchor as origin
A_anchor_origin = A_set - anchor_point_A
anchor_A_polar = to_polar(A_anchor_origin)
print(anchor_A_polar)
# Step 3 for each point in B
score_tuples = []
for num_anchor, B_anchor_point_try in enumerate(B_set):
# Step 3.1
B_origin_rel_point = B_set-B_anchor_point_try
B_polar_rp_origin = to_polar(B_origin_rel_point)
# Step 3.3 select arbitrary point q from Ap
point_Aq = anchor_A_polar[1]
# Step 3.4 test each rotation, where pointAq is rotated to each B-point (except the B anchor point)
for try_rot_point_B in [B_rot_point for num_rot, B_rot_point in enumerate(B_polar_rp_origin) if num_rot != num_anchor]:
# positive rotation is clockwise
# Step 4.1 Rotate Ap by the angle between q and n
angle_to_try = try_rot_point_B[1]-point_Aq[1]
rot_try_arr = np.copy(anchor_A_polar)
rot_try_arr[:,1]+=angle_to_try
cart_rot_try_arr = [to_cartesian(e) for e in rot_try_arr]
cart_B_rp_origin = [to_cartesian(e) for e in B_polar_rp_origin]
distance_score = calculate_agg_distance_arrs(cart_rot_try_arr, cart_B_rp_origin)
score_tuples.append((B_anchor_point_try,angle_to_try,distance_score))
# Step 4.3
lowest=None
for b_point,angle,distance in score_tuples:
print("point:{} angle(rad):{} distance(sum):{}".format(b_point,360*(angle/(2*math.pi)),distance))
if lowest is None or distance < lowest[2]:
lowest = b_point, 360*angle/(2*math.pi), distance
return lowest
def test_example():
ax = plt.subplot()
ax.grid(True)
plt.title('Fit Template to BBoxes by translation and rotation')
plt.xlim(-20, 20)
plt.ylim(-20, 20)
ax.set_xticks(range(-20,20), minor=True)
ax.set_yticks(range(-20,20), minor=True)
template = np.array([[-10,-10],[-10,10],[0,0],[10,-10],[10,10], [0,20]])
# Test Bboxes are Rotated 40 degree, translated 2,2
rotated = rotate_array(template,40)
rotated = np.subtract(rotated,[2,2])
# Adds some extra bounding boxes as noise
for i in range(8):
rotated = np.append(rotated,[[random.randrange(-20,20), random.randrange(-20,20)]],axis=0)
# Scramble entries in array and return the position change.
rnd_rotated = rotated.copy()
np.random.shuffle(rnd_rotated)
element_positions = []
# After shuffling, looks at which index the "A"-marks has ended up at. For later comparison to see that the algo found the correct answer.
# This is to represent the actual case, where I will get a bunch of unordered bboxes.
rnd_map = {}
indexes_translation = [num2 for num,point in enumerate(rnd_rotated) for num2,point2 in enumerate(rotated) if point[0]==point2[0] and point[1]==point2[1]]
for num,inx in enumerate(indexes_translation):
rnd_map[num]=inx
# algo part 1/3
b_point,angle,_ = find_stamp_rotation(template,rnd_rotated)
# Plot for visualization
legend_list = np.empty((0,2))
leg_template = plt.plot(template[:,0],template[:,1],c='r')
legend_list = np.append(legend_list,[[leg_template[0],'1. template-pattern']],axis=0)
leg_bboxes = plt.scatter(rnd_rotated[:,0],rnd_rotated[:,1],c='b',label="scatter")
legend_list = np.append(legend_list,[[leg_bboxes,'2. bounding boxes']],axis=0)
leg_anchor = plt.scatter(b_point[0],b_point[1],c='y')
legend_list = np.append(legend_list,[[leg_anchor,'3. Discovered bbox anchor point']],axis=0)
# algo part 2/3
# Superimpose A onto B by A[0] to b_point
offset = b_point - template[0]
super_imposed_A = template + offset
# Plot superimposed, but not yet rotated
leg_s_imposed = plt.plot(super_imposed_A[:,0],super_imposed_A[:,1],c='k')
#plt.legend(rubberduckz, "superimposed template on anchor")
legend_list = np.append(legend_list,[[leg_s_imposed[0],'4. Templ superimposed on Bbox']],axis=0)
print("Superimposed A on B by A[0] to {}".format(b_point))
print(super_imposed_A)
# Rotate, now the template should match pattern of bboxes
# algo part 3/4
super_imposed_rotated_A = rotate_array(super_imposed_A,-angle,rot_origin=super_imposed_A[0])
# Show the beautiful match in a last plot
leg_s_imp_rot = plt.plot(super_imposed_rotated_A[:,0],super_imposed_rotated_A[:,1],c='g')
legend_list = np.append(legend_list,[[leg_s_imp_rot[0],'5. final fit']],axis=0)
plt.legend(legend_list[:,0], legend_list[:,1],loc="upper left")
plt.show()
# algo part 4/4
pairs = pair_neighbours_by_distance(super_imposed_rotated_A, rnd_rotated, 10)
print(pairs)
for inx in range(len(pairs)):
bbox_num = pairs[inx]
print("template id:{}".format(inx))
print("bbox#id:{}".format(bbox_num))
#print("original_bbox:{}".format(rnd_map[bbox_num]))
if __name__ == "__main__":
test_example()
Result on actual image with bounding boxes. Here it can be seen that the scaling is incorrect which makes the template a bit off but it will still be able to pair up and thats the desired end-result in my case.

Hot to get the set difference of two 2d numpy arrays, or equivalent of np.setdiff1d in a 2d array?

Here Get intersecting rows across two 2D numpy arrays they got intersecting rows by using the function np.intersect1d. So i changed the function to use np.setdiff1d to get the set difference but it doesn't work properly. The following is the code.
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)],
'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.view(dtype), B.view(dtype))
return C.view(A.dtype).reshape(-1, ncols)
The following data is used for checking the issue:
min_dis=400
Xt = np.arange(50, 3950, min_dis)
Yt = np.arange(50, 3950, min_dis)
Xt, Yt = np.meshgrid(Xt, Yt)
Xt[::2] += min_dis/2
# This is the super set
turbs_possible_locs = np.vstack([Xt.flatten(), Yt.flatten()]).T
# This is the subset
subset = turbs_possible_locs[np.random.choice(turbs_possible_locs.shape[0],50, replace=False)]
diffs = set_diff2d(turbs_possible_locs, subset)
diffs is supposed to have a shape of 50x2, but it is not.

Ok, so to fix your issue try the following tweak:
def set_diff2d(A, B):
nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]}
C = np.setdiff1d(A.copy().view(dtype), B.copy().view(dtype))
return C
The problem was - A after .view(...) was applied was broken in half - so it had 2 tuple columns, instead of 1, like B. I.e. as a consequence of applying dtype you essentially collapsed 2 columns into tuple - which is why you could do the intersection in 1d in the first place.
Quoting after documentation:
"
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of the array’s memory with a different data-type. This can cause a reinterpretation of the bytes of memory.
"
Src https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html
I think the "reinterpretation" is exactly what happened - hence for the sake of simplicity I would just .copy() the array.
NB however I wouldn't square it - it's always A which gets 'broken' - whether it's an assignment, or inline B is always fine...

Unexpected solution using JiTCDDE

I'm trying to investigate the behavior of the following Delayed Differential Equation using Python:
y''(t) = -y(t)/τ^2 - 2y'(t)/τ - Nd*f(y(t-T))/τ^2,
where f is a cut-off function which is essentially equal to the identity when the absolute value of its argument is between 1 and 10 and otherwise is equal to 0 (see figure 1), and Nd, τ and T are constants.
For this I'm using the package JiTCDDE. This provides a reasonable solution to the above equation. Nevertheless, when I try to add a noise on the right hand side of the equation, I obtain a solution which stabilize to a non-zero constant after a few oscillations. This is not a mathematical solution of the equation (the only possible constant solution being equal to zero). I don't understand why this problem arises and if it is possible to solve it.
I reproduce my code below. Here, for the sake of simplicity, I substituted the noise with an high-frequency cosine, which is introduced in the system of equation as the initial condition for a dummy variable (the cosine could have been introduced directly in the system, but for a general noise this doesn't seem possible). To simplify further the problem, I removed also the term involving the f function, as the problem arises also without it. Figure 2 shows the plot of the function given by the code.
from jitcdde import jitcdde, y, t
import numpy as np
from matplotlib import pyplot as plt
import math
from chspy import CubicHermiteSpline
# Definition of function f:
def functionf(x):
return x/4*(1+symengine.erf(x**2-Bmin**2))*(1-symengine.erf(x**2-Bmax**2))
#parameters:
τ = 42.9
T = 35.33
Nd = 8.32
# Definition of the initial conditions:
dt = .01 # Time step.
totT = 10000. # Total time.
Nmax = int(totT / dt) # Number of time steps.
Vt = np.linspace(0., totT, Nmax) # Vector of times.
# Definition of the "noise"
X = np.zeros(Nmax)
for i in range(Nmax):
X[i]=math.cos(Vt[i])
past=CubicHermiteSpline(n=3)
for time, datum in zip(Vt,X):
regular_past = [10.,0.]
past.append((
time-totT,
np.hstack((regular_past,datum)),
np.zeros(3)
))
noise= lambda t: y(2,t-totT)
# Integration of the DDE
g = [
y(1),
-y(0)/τ**2-2*y(1)/τ+0.008*noise(t)
]
g.append(0)
DDE = jitcdde(g)
DDE.add_past_points(past)
DDE.adjust_diff()
data = []
for time in np.arange(DDE.t, DDE.t+totT, 1):
data.append( DDE.integrate(time)[0] )
plt.plot(data)
plt.show()
Incidentally, I noticed that even without noise, the solution seems to be discontinuous at the point zero (y is set to be equal to zero for negative times), and I don't understand why.

As the comments unveiled, your problem eventually boiled down to this:
step_on_discontinuities assumes delays that are small with respect to the integration time and performs steps that are placed on those times where the delayed components points to the integration start (0 in your case). This way initial discontinuities are handled.
However, implementing an input with a delayed dummy variable introduces a large delay into the system, totT in your case.
The respective step for step_on_discontinuities would be at totT itself, i.e., after the desired integration time.
Thus when you reach for time in np.arange(DDE.t, DDE.t+totT, 1): in your code, DDE.t is totT.
Therefore you have made a big step before you actually start integrating and observing which may seem like a discontinuity and lead to weird results, in particular you do not see the effect of your input, because it has already “ended” at this point.
To avoid this, use adjust_diff or integrate_blindly instead of step_on_discontinuities.

How do I call a list of numpy functions without a for loop?

I'm doing data analysis that involves minimizing the least-square-error between a set of points and a corresponding set of orthogonal functions. In other words, I'm taking a set of y-values and a set of functions, and trying to zero in on the x-value that gets all of the functions closest to their corresponding y-value. Everything is being done in a 'data_set' class. The functions that I'm comparing to are all stored in one list, and I'm using a class method to calculate the total lsq-error for all of them:
self.fits = [np.poly1d(np.polyfit(self.x_data, self.y_data[n],10)) for n in range(self.num_points)]
def error(self, x, y_set):
arr = [(y_set[n] - self.fits[n](x))**2 for n in range(self.num_points)]
return np.sum(arr)
This was fine when I had significantly more time than data, but now I'm taking thousands of x-values, each with a thousand y-values, and that for loop is unacceptably slow. I've been trying to use np.vectorize:
#global scope
def func(f,x):
return f(x)
vfunc = np.vectorize(func, excluded=['x'])
…
…
#within data_set class
def error(self, x, y_set):
arr = (y_set - vfunc(self.fits, x))**2
return np.sum(arr)
func(self.fits[n], x) works fine as long as n is valid, and as far as I can tell from the docs, vfunc(self.fits, x) should be equivalent to
[self.fits[n](x) for n in range(self.num_points)]
but instead it throws:
ValueError: cannot copy sequence with size 10 to array axis with dimension 11
10 is the degree of the polynomial fit, and 11 is (by definition) the number of terms in it, but I have no idea why they're showing up here. If I change the fit order, the error message reflects the change. It seems like np.vectorize is taking each element of self.fits as a list rather than a np.poly1d function.
Anyway, if someone could either help me understand np.vectorize better, or suggest another way to eliminate that loop, that would be swell.

As the functions in question all have a very similar structure we can "manually" vectorize once we've extracted the poly coefficients. In fact, the function is then a quite simple one-liner, eval_many below:
import numpy as np
def poly_vec(list_of_polys):
O = max(p.order for p in list_of_polys)+1
C = np.zeros((len(list_of_polys), O))
for p, c in zip(list_of_polys, C):
c[len(c)-p.order-1:] = p.coeffs
return C
def eval_many(x,C):
return C#np.vander(x,11).T
# make example
list_of_polys = [np.poly1d(v) for v in np.random.random((1000,11))]
x = np.random.random((2000,))
# put all coeffs in one master matrix
C = poly_vec(list_of_polys)
# test
assert np.allclose(eval_many(x,C), [p(x) for p in list_of_polys])
from timeit import timeit
print('vectorized', timeit(lambda: eval_many(x,C), number=100)*10)
print('loopy ', timeit(lambda: [p(x) for p in list_of_polys], number=10)*100)
Sample run:
vectorized 6.817315469961613
loopy 56.35076989419758

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string