Following is the graph which you get on plotting the given data points. There is an exact point where the slope changes before giving a stable line. What we have done is obtaining the first derivative and looking out for a point where the slope makes transition from positive to negative values. But for many data points,such a transition was not found. So is there a better method to do this?
How do you find that point (marked as a red circle in the graph) using slope in python?
graph of the signal
[-0.0006029533498891765, -0.0005180378648295125, -0.0004122940532457625, -0.0002953349889182749, -0.00018692087906219124, -0.00010093727469359659, -4.699724959278395e-05, -1.602178963390488e-05, -5.340596544722853e-07, 9.079014125876195e-06, 1.976020721514149e-05, 3.0441400304406785e-05, 3.845229512135229e-05, 4.3258832011533466e-05, 4.432695132046416e-05, 4.592913028383938e-05, 5.020160751956215e-05, 5.6076263718660146e-05, 5.9814681299896755e-05, 6.195091991774426e-05, 6.408715853560565e-05, 6.568933749899475e-05, 6.889369542577295e-05, 7.209805335256503e-05, 7.370023231594025e-05]
This problem requires some additional specification. The core question is what do you mean by "a stable line"? One potential definition is "consecutive line segments with the exact same slope." However, since the slope of each line segment is likely not precisely similar, this may not be helpful.
Another potential definition is "consecutive line segments whose slopes differ by less than a defined cut-off value." Whenever we're talking about a difference of slopes, we want to look at the second derivative. We can identify transition points by finding where the absolute value of the second derivative is less than the chosen cut-off value at each point.
The question then becomes, what cut-off value is acceptable? Since you want a method that classifies the 10th point as a transition point, I'll use that to inform the decision.
Here is code that defines a cut-off value and uses that to identify points with nearly similar slopes:
data = [-0.0006029533498891765, -0.0005180378648295125, -0.0004122940532457625, -0.0002953349889182749, -0.00018692087906219124, -0.00010093727469359659, -4.699724959278395e-05, -1.602178963390488e-05, -5.340596544722853e-07, 9.079014125876195e-06, 1.976020721514149e-05, 3.0441400304406785e-05, 3.845229512135229e-05, 4.3258832011533466e-05, 4.432695132046416e-05, 4.592913028383938e-05, 5.020160751956215e-05, 5.6076263718660146e-05, 5.9814681299896755e-05, 6.195091991774426e-05, 6.408715853560565e-05, 6.568933749899475e-05, 6.889369542577295e-05, 7.209805335256503e-05, 7.370023231594025e-05]
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
# Generate figure
plt.figure()
plt.subplot(2,1,1)
plt.title('Data')
plt.plot(data)
plt.scatter(np.arange(len(data)), data)
plt.scatter(9, data[9], c='r') # Identify 10th point
plt.subplot(2,1,2)
plt.title('First Derivative')
deriv1 = data - np.roll(data, -1) # Use simple difference to compute the derivative
deriv1 = deriv1[0:-1] # Remove the last point
plt.plot(deriv1)
plt.scatter(np.arange(len(deriv1)), deriv1)
plt.scatter(9, deriv1[9], c='r') # Identify 10th point
plt.tight_layout()
# Approximate second derivative
deriv2 = deriv1 - np.roll(deriv1, -1) # Use simple difference to compute the derivative
deriv2 = deriv2[0:-1] # Remove the last point
# Plot data
plt.figure()
plt.subplot(2,1,1)
plt.title('Second Derivative')
x = np.arange(len(deriv2))
plt.plot(deriv2)
plt.scatter(x, deriv2)
plt.scatter(9, y[9], c='r') # Identify 10th point
plt.subplot(2,1,2)
plt.title('Absolute Value of Second Derivative')
y = np.abs(deriv2)
plt.plot(x, y)
plt.scatter(x, y)
plt.scatter(9, y[9], c='r') # Identify 10th point
# Correctly scale y axis
diff = max(y) - min(y)
scale = 0.1*diff
plt.ylim(min(y)-scale, max(y)+scale)
# Define cutoff value
cutoff = 1e-17
# Identify points where abs(deriv2) < cutoff
idx_filter = y <= cutoff
plt.axhline(y = cutoff, c='r', linestyle='--', alpha=0.5)
plt.scatter(x[idx_filter], y[idx_filter], s=200, edgecolor='r', facecolor = '')
plt.tight_layout()
As it turns out, there ARE two line segments with precisely the same slope. The 10th point identifies the start of them. The following code finds that transition point concisely, and should work for data with exactly one such point. It can be adapted to find multiple transition points if needed.
# Compute the first derivative
deriv = data - np.roll(data, -1) # Use simple difference to compute the derivative
deriv = deriv[0:-1] # Remove the last point
# Compute the second derivative
deriv2 = deriv - np.roll(deriv, -1) # Use simple difference to compute the derivative
deriv2 = deriv2[0:-1] # Remove the last point
# Define cutoff value
cutoff = 1e-17
# Identify points where abs(deriv2) < cutoff
idx_filter = y <= cutoff
x_transition = int(x[idx_filter][0])
y_transition = data[x_transition]
print('Transition Point Index: '+str(x_transition))
print('Transition Point Value: '+str(y_transition))
print('Difference in slopes: {:.20f}'.format(deriv2[x_transition]))
>>> Transition Point Index: 9
>>> Transition Point Value: 9.079014125876195e-06
>>> Difference in slopes: 0.00000000000000000000
Since there were no x-values provided, the derivative approximation is simplified by assuming that the x-distance between each successive point is 1. Addition of x-data would require slight modifications to the derivative approximations, and another derivative approximation method may be more appropriate if the data is unevenly distributed along the x-axis.
Related
I want to generate 2D travelling sine wave. To do this, I've set the parameters for the plane wave and generate wave for any time instants like as follows:
import numpy as np
import random
import matplotlib.pyplot as plt
f = 10 # frequency
fs = 100 # sample frequency
Ts = 1/fs # sample period
t = np.arange(0,0.5, Ts) # time index
c = 50 # speed of wave
w = 2*np.pi *f # angular frequency
k = w/c # wave number
resolution = 0.02
x = np.arange(-5, 5, resolution)
y = np.arange(-5, 5, resolution)
dx = np.array(x); M = len(dx)
dy = np.array(y); N = len(dy)
[xx, yy] = np.meshgrid(x, y);
theta = np.pi / 4 # direction of propagation
kx = k* np.cos(theta)
ky = k * np.sin(theta)
So, the plane wave would be
plane_wave = np.sin(kx * xx + ky * yy - w * t[1])
plt.figure();
plt.imshow(plane_wave,cmap='seismic',origin='lower', aspect='auto')
that gives a smooth plane wave as shown in . Also, the sine wave variation with plt.figure(); plt.plot(plane_wave[2,:]) time is given in .
However, when I want to append plane waves at different time instants then there is some discontinuity arises in figure 03 & 04 , and I want to get rid of from this problem.
I'm new in python and any help will be highly appreciated. Thanks in advance.
arr = []
for count in range(len(t)):
p = np.sin(kx * xx + ky * yy - w * t[count]); # plane wave
arr.append(p)
arr = np.array(arr)
print(arr.shape)
pp,q,r = arr.shape
sig = np.reshape(arr, (-1, r))
print('The signal shape is :', sig.shape)
plt.figure(); plt.imshow(sig.transpose(),cmap='seismic',origin='lower', aspect='auto')
plt.xlabel('X'); plt.ylabel('Y')
plt.figure(); plt.plot(sig[2,:])
This is not that much a problem of programming. It has to do more with the fact that you are using the physical quantities in a somewhat unusual way. Your plots are absolutely fine and correct.
What you seem to have misunderstood is the fact that you are talking about a 2D problem with a third dimension added for time. This is by no means wrong but if you try to append the snapshot of the 2D wave side-by-side you are using (again) the x spatial dimension to represent temporal variations. This leads to an inconsistency of the use of that coordinate axis. Now, to make this more intuitive, consider the two time instances separately. Does it not coincide with your intuition that all points on the 2D plane must have different amplitudes (unless of course the time has progressed by a multiple of the period of the wave)? This is the case indeed. Thus, when you try to append the two snapshots, a discontinuity is exhibited. In order to avoid that you have to either use a time step equal to one period, which I believe is of no practical use, or a constant time step that will make the phase of the wave on the left border of the image in the current time equal to the phase of the wave on the right border of the image in the previous time step. Yet, this will always be a constant time step, alternating the phase (on the edges of the image) between the two said values.
The same applies to the 1D case because you use the two coordinate axes to represent the wave (x is the x spatial dimension and y is used to represent the amplitude). This is what can be seen in your last plot.
Now, what would be the solution you may ask. The solution is provided by simple inspection of the mathematical formula of the wave function. In 2D, it is a scalar function of three variables (that is, takes as input three values and outputs one) and so you need at least four dimensions to represent it. Alas, we can't perceive a fourth spatial dimension, but this is not a problem in your case as the output of the function is represented with colors. Then there are three dimensions that could be used to represent the temporal evolution of your function. All you have to do is to create a 3D array where the third dimension represents time and all 2D snapshots will be stored in the first two dimensions.
When it comes to visual representation of the results you could either use some kind of waterfall plots where the z-axis will represent time or utilize the fourth dimension we can perceive, time that is, to create an animation of the evolution of the wave.
I am not very familiar with Python, so I will only provide a generic naive implementation. I am sure a lot of people here could provide some simplification and/or optimisation of the following snippet. I assume that everything in your first two blocks of code is available so changes have to be done only in the last block you present
arr = np.zeros((len(xx), len(yy), len(t))) # Initialise the array to hold the temporal evolution of the snapshots
for i in range(len(t)):
arr[:, :, i] = np.sin(kx * xx + ky * yy - w * t[i])
# Below you can plot the figures with any function you prefer or make an animation out of it
I'm trying to find a match between a set of 2D boxes with coordinates (A) (from a template with known sizes and distances between boxes) to another set of 2D boxes with coordinates (B) (which may contain more boxes than A). They should match in terms of each box from A corresponds to a single Box in B. The boxes in A together form a "stamp" which is assymmetrical in atleast one dimension.
Illustration of problem
explanation: "Stanz" in the illustration is a box from set A.
One might even think of the Set A as only 2D points (the centerpoint of the box) to make it simpler.
The end result will be to know which A box corresponds to which B box.
I can only think of very specific ways of doing this, tailored to a specific layout of boxes, is there any known generic ways of dealing with this forms of matching/search problems and what are they called?
Edit: Possible solution
I have come up with one possible solution, looking for all the possible rotations at each possible B center position for a single box from set A. Here all of the points in A would be rotated and compared against the distance to B centers. Not sure if this is a good way.
Looking for the possible rotations at each B centerpoint- solution
In your example, the transformation between the template and its presence in B can be entirely defined (actually, over-defined) by two matching points.
So here's a simple approach which is kind of performant. First, put all the points in B into a kD-tree. Now, pick a canonical "first" point in A, and hypothesize matching it to each of the points in B. To check whether it matches a particular point in B, pick a canonical "second" point in A and measure its distance to the "first" point. Then, use a standard kD proximity-bounding query to find all the points in B which are roughly that distance from your hypothesized matched "first" point in B. For each of those, determine the transformation between A and B, and for each of the other points in A, determine whether there's a point in A at roughly the right place (again, using the kD-tree), early-outing with the first unmatched point.
The worst-case performance there can get quite bad with pathological cases (O(n^3 log n), I think) but in general I would expect roughly O(n log n) for well-behaved data with a low threshold. Note that the thresholding is a bit rough-and-ready, and the results can depend on your choice of "first" and "second" points.
This is more of an idea than an answer, but it's too long for a comment. I asked some additional questions in a comment above, but the answers may not be particular relevant, so I'll go ahead and offer some thoughts in the meantime.
As you may know, point matching is its own problem domain, and if you search for 'point matching algorithm', you'll find various articles, papers, and other resources. It seems though that an ad hoc solution might be appropriate here (one that's simpler than more generic algorithms that are available).
I'll assume that the input point set can only be rotated, and not also flipped. If this idea were to work though, it should also work with flipping - you'd just have to run the algorithm separately for each flipped configuration.
In your example image, you've matched a point from set A with a point from set B so that they're coincident. Call this shared point the 'anchor' point. You'd need to do this for every combination of a point from set A and a point from set B until you found a match or exhausted the possibilities. The problem then is to determine if a match can be made given one of these matched point pairs.
It seems that for a given anchor point, a necessary but not sufficient condition for a match is that a point from set A and a point from set B can be found that are approximately the same distance from the anchor point. (What 'approximately' means would depend on the input, and would need to be tuned appropriately given that you're using integers.) This condition is met in your example image in that the center point of each point set is (approximately) the same distance from the anchor point. (Note that there could be multiple pairs of points that meet this condition, in which case you'd have to examine each such pair in turn.)
Once you have such a pair - the center points in your example - you can use some simple trigonometry and linear algebra to rotate set A so that the points in the pair coincide, after which the two point sets are locked together at two points and not just one. In your image that would involve rotating set A about 135 degrees clockwise. Then you check to see if every point in set B has a point in set A with which it's coincident, to within some threshold. If so, you have a match.
In your example, this fails of course, because the rotation is not actually a match. Eventually though, if there's a match, you'll find the anchor point pair for which the test succeeds.
I realize this would be easier to explain with some diagrams, but I'm afraid this written explanation will have to suffice for the moment. I'm not positive this would work - it's just an idea. And maybe a more generic algorithm would be preferable. But, if this did work, it might have the advantage of being fairly straightforward to implement.
[Edit: Perhaps I should add that this is similar to your solution, except for the additional step to allow for only testing a subset of the possible rotations.]
[Edit: I think a further refinement may be possible here. If, after choosing an anchor point, matching is possible via rotation, it should be the case that for every point p in B there's a point in A that's (approximately) the same distance from the anchor point as p is. Again, it's a necessary but not sufficient condition, but it allows you to quickly eliminate cases where a match isn't possible via rotation.]
Below follows a finished solution in python without kD-tree and without early outing candidates. A better way is to do the implementation yourself according to Sneftel but if you need anything quick and with a plot this might be useful.
Plot shows the different steps, starts off with just the template as a collection of connected lines. Then it is translated to a point in B where the distances between A and B points fits the best. Finally it is rotated.
In this example it was important to also match up which of the template positions was matched to which boundingbox position, so its an extra step in the end. There might be some deviations in the code compared to the outline above.
import numpy as np
import random
import math
import matplotlib.pyplot as plt
def to_polar(pos_array):
x = pos_array[:, 0]
y = pos_array[:, 1]
length = np.sqrt(x ** 2 + y ** 2)
t = np.arctan2(y, x)
zip_list = list(zip(length, t))
array_polar = np.array(zip_list)
return array_polar
def to_cartesian(pos):
# first element radius
# second is angle(theta)
# Converting polar to cartesian coordinates
radius = pos[0]
theta = pos[1]
x = radius * math.cos(theta)
y = radius * math.sin(theta)
return x,y
def calculate_distance_points(p1,p2):
return np.sqrt((p1[0]-p2[0])**2+(p1[1]-p2[1])**2)
def find_closest_point_inx(point, neighbour_set):
shortest_dist = None
closest_index = -1
# Find the point in the secondary array that is the closest
for index,curr_neighbour in enumerate(neighbour_set):
distance = calculate_distance_points(point, curr_neighbour)
if shortest_dist is None or distance < shortest_dist:
shortest_dist = distance
closest_index = index
return closest_index
# Find the sum of distances between each point in primary to the closest one in secondary
def calculate_agg_distance_arrs(primary,secondary):
total_distance = 0
for point in primary:
closest_inx = find_closest_point_inx(point, secondary)
dist = calculate_distance_points(point, secondary[closest_inx])
total_distance += dist
return total_distance
# returns a set of <primary_index,neighbour_index>
def pair_neighbours_by_distance(primary_set, neighbour_set, distance_limit):
pairs = {}
for num, point in enumerate(primary_set):
closest_inx = find_closest_point_inx(point, neighbour_set)
if calculate_distance_points(neighbour_set[closest_inx], point) > distance_limit:
closest_inx = None
pairs[num]=closest_inx
return pairs
def rotate_array(array, angle,rot_origin=None):
if rot_origin is not None:
array = np.subtract(array,rot_origin)
# clockwise rotation
theta = np.radians(angle)
c, s = np.cos(theta), np.sin(theta)
R = np.array(((c, -s), (s, c)))
rotated = np.matmul(array, R)
if rot_origin is not None:
rotated = np.add(rotated,rot_origin)
return rotated
# Finds out a point in B_set and a rotation where the points in SetA have the best alignment towards SetB.
def find_stamp_rotation(A_set, B_set):
# Step 1
anchor_point_A = A_set[0]
# Step 2. Convert all points to polar coordinates with anchor as origin
A_anchor_origin = A_set - anchor_point_A
anchor_A_polar = to_polar(A_anchor_origin)
print(anchor_A_polar)
# Step 3 for each point in B
score_tuples = []
for num_anchor, B_anchor_point_try in enumerate(B_set):
# Step 3.1
B_origin_rel_point = B_set-B_anchor_point_try
B_polar_rp_origin = to_polar(B_origin_rel_point)
# Step 3.3 select arbitrary point q from Ap
point_Aq = anchor_A_polar[1]
# Step 3.4 test each rotation, where pointAq is rotated to each B-point (except the B anchor point)
for try_rot_point_B in [B_rot_point for num_rot, B_rot_point in enumerate(B_polar_rp_origin) if num_rot != num_anchor]:
# positive rotation is clockwise
# Step 4.1 Rotate Ap by the angle between q and n
angle_to_try = try_rot_point_B[1]-point_Aq[1]
rot_try_arr = np.copy(anchor_A_polar)
rot_try_arr[:,1]+=angle_to_try
cart_rot_try_arr = [to_cartesian(e) for e in rot_try_arr]
cart_B_rp_origin = [to_cartesian(e) for e in B_polar_rp_origin]
distance_score = calculate_agg_distance_arrs(cart_rot_try_arr, cart_B_rp_origin)
score_tuples.append((B_anchor_point_try,angle_to_try,distance_score))
# Step 4.3
lowest=None
for b_point,angle,distance in score_tuples:
print("point:{} angle(rad):{} distance(sum):{}".format(b_point,360*(angle/(2*math.pi)),distance))
if lowest is None or distance < lowest[2]:
lowest = b_point, 360*angle/(2*math.pi), distance
return lowest
def test_example():
ax = plt.subplot()
ax.grid(True)
plt.title('Fit Template to BBoxes by translation and rotation')
plt.xlim(-20, 20)
plt.ylim(-20, 20)
ax.set_xticks(range(-20,20), minor=True)
ax.set_yticks(range(-20,20), minor=True)
template = np.array([[-10,-10],[-10,10],[0,0],[10,-10],[10,10], [0,20]])
# Test Bboxes are Rotated 40 degree, translated 2,2
rotated = rotate_array(template,40)
rotated = np.subtract(rotated,[2,2])
# Adds some extra bounding boxes as noise
for i in range(8):
rotated = np.append(rotated,[[random.randrange(-20,20), random.randrange(-20,20)]],axis=0)
# Scramble entries in array and return the position change.
rnd_rotated = rotated.copy()
np.random.shuffle(rnd_rotated)
element_positions = []
# After shuffling, looks at which index the "A"-marks has ended up at. For later comparison to see that the algo found the correct answer.
# This is to represent the actual case, where I will get a bunch of unordered bboxes.
rnd_map = {}
indexes_translation = [num2 for num,point in enumerate(rnd_rotated) for num2,point2 in enumerate(rotated) if point[0]==point2[0] and point[1]==point2[1]]
for num,inx in enumerate(indexes_translation):
rnd_map[num]=inx
# algo part 1/3
b_point,angle,_ = find_stamp_rotation(template,rnd_rotated)
# Plot for visualization
legend_list = np.empty((0,2))
leg_template = plt.plot(template[:,0],template[:,1],c='r')
legend_list = np.append(legend_list,[[leg_template[0],'1. template-pattern']],axis=0)
leg_bboxes = plt.scatter(rnd_rotated[:,0],rnd_rotated[:,1],c='b',label="scatter")
legend_list = np.append(legend_list,[[leg_bboxes,'2. bounding boxes']],axis=0)
leg_anchor = plt.scatter(b_point[0],b_point[1],c='y')
legend_list = np.append(legend_list,[[leg_anchor,'3. Discovered bbox anchor point']],axis=0)
# algo part 2/3
# Superimpose A onto B by A[0] to b_point
offset = b_point - template[0]
super_imposed_A = template + offset
# Plot superimposed, but not yet rotated
leg_s_imposed = plt.plot(super_imposed_A[:,0],super_imposed_A[:,1],c='k')
#plt.legend(rubberduckz, "superimposed template on anchor")
legend_list = np.append(legend_list,[[leg_s_imposed[0],'4. Templ superimposed on Bbox']],axis=0)
print("Superimposed A on B by A[0] to {}".format(b_point))
print(super_imposed_A)
# Rotate, now the template should match pattern of bboxes
# algo part 3/4
super_imposed_rotated_A = rotate_array(super_imposed_A,-angle,rot_origin=super_imposed_A[0])
# Show the beautiful match in a last plot
leg_s_imp_rot = plt.plot(super_imposed_rotated_A[:,0],super_imposed_rotated_A[:,1],c='g')
legend_list = np.append(legend_list,[[leg_s_imp_rot[0],'5. final fit']],axis=0)
plt.legend(legend_list[:,0], legend_list[:,1],loc="upper left")
plt.show()
# algo part 4/4
pairs = pair_neighbours_by_distance(super_imposed_rotated_A, rnd_rotated, 10)
print(pairs)
for inx in range(len(pairs)):
bbox_num = pairs[inx]
print("template id:{}".format(inx))
print("bbox#id:{}".format(bbox_num))
#print("original_bbox:{}".format(rnd_map[bbox_num]))
if __name__ == "__main__":
test_example()
Result on actual image with bounding boxes. Here it can be seen that the scaling is incorrect which makes the template a bit off but it will still be able to pair up and thats the desired end-result in my case.
I want to select 5 Points in each polygon based on random sampling method. And required 5 points co-ordinates(Lat,Long) in each polygon for identify which crop is grawn.
Any ideas for do this using geopandas?
Many thanks.
My suggestion involves sampling random x and y coordinates within the shape's bounding box and then checking whether the sampled point is actually within the shape. If the sampled point is within the shape then return it, otherwise repeat until a point within the shape is found. For sampling, we can use the uniform distribution, such that all points in the shape have the same probability of being sampled. Here is the function:
from shapely.geometry import Point
def random_point_in_shp(shp):
within = False
while not within:
x = np.random.uniform(shp.bounds[0], shp.bounds[2])
y = np.random.uniform(shp.bounds[1], shp.bounds[3])
within = shp.contains(Point(x, y))
return Point(x,y)
and here's an example how to apply this function to an example GeoDataFrame called geo_df to get 5 random points for each entry:
for num in range(5):
geo_df['Point{}'.format(num)] = geo_df['geometry'].apply(random_point_in_shp)
There might be more efficient ways to do this, but depending on your application the algorithm could be sufficiently fast. With my test file, which contains ~2300 entries, generating five random points for each entry took around 15 seconds on my machine.
Python developers
I am working on spectroscopy in a university. My experimental 1-D data sometimes shows "cosmic ray", 3-pixel ultra-high intensity, which is not what I want to analyze. So I want to remove this kind of weird peaks.
Does anybody know how to fix this issue in Python 3?
Thanks in advance!!
A simple solution could be to use the algorithm proposed by Whitaker and Hayes, in which they use modified z scores on the derivative of the spectrum. This medium post explains how it works and its implementation in python https://towardsdatascience.com/removing-spikes-from-raman-spectra-8a9fdda0ac22 .
The idea is to calculate the modified z scores of the spectra derivatives and apply a threshold to detect the cosmic spikes. Afterwards, a fixer is applied to remove the spike points and replace it by the mean values of the surrounding pixels.
# definition of a function to calculate the modified z score.
def modified_z_score(intensity):
median_int = np.median(intensity)
mad_int = np.median([np.abs(intensity - median_int)])
modified_z_scores = 0.6745 * (intensity - median_int) / mad_int
return modified_z_scores
# Once the spike detection works, the spectrum can be fixed by calculating the average of the previous and the next point to the spike. y is the intensity values of a spectrum, m is the window which we will use to calculate the mean.
def fixer(y,m):
threshold = 7 # binarization threshold.
spikes = abs(np.array(modified_z_score(np.diff(y)))) > threshold
y_out = y.copy() # So we don't overwrite y
for i in np.arange(len(spikes)):
if spikes[i] != 0: # If we have an spike in position i
w = np.arange(i-m,i+1+m) # we select 2 m + 1 points around our spike
w2 = w[spikes[w] == 0] # From such interval, we choose the ones which are not spikes
y_out[i] = np.mean(y[w2]) # and we average the value
return y_out
The answer depends a on what your data looks like: If you have access to two-dimensional CCD readouts that the one-dimensional spectra were created from, then you can use the lacosmic module to get rid of the cosmic rays there. If you have only one-dimensional spectra, but multiple spectra from the same source, then a quick ad-hoc fix is to make a rough normalisation of the spectra and remove those pixels that are several times brighter than the corresponding pixels in the other spectra. If you have only one one-dimensional spectrum from each source, then a less reliable option is to remove all pixels that are much brighter than their neighbours. (Depending on the shape of your cosmics, you may even want to remove the nearest 5 pixels or something, to catch the wings of the cosmic ray peak as well).
Please help my poor knowledge of signal processing.
I want to smoothen some data. Here is my code:
import numpy as np
from scipy.signal import butter, filtfilt
def testButterworth(nyf, x, y):
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
if __name__ == '__main__':
positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fl = testButterworth(nyf, x, y)
I am pretty satisfied with results except one point:
it is absolutely crucial to me that the start and end point in returned values equal to the start and end point of input. How can I introduce this restriction?
UPD 15-Dec-14 12:04:
my original data looks like this
Applying the filter and zooming into last part of the graph gives following result:
So, at the moment I just care about the last point that must be equal to original point. I try to append copy of data to the end of original list this way:
the result is as expected even worse.
Then I try to append data this way:
And the slice where one period ends and next one begins, looks like that:
To do this, you're always going to cheat somehow, since the true filter applied to the true data doesn't behave the way you require.
One of the best ways to cheat with your data is to assume it's periodic. This has the advantages that: 1) it's consistent with the data you actually have and all your changing is to append data to the region you don't know about (so assuming it's periodic as as reasonable as anything else -- although may violate some unstated or implicit assumptions); 2) the result will be consistent with your filter.
You can usually get by with this by appending copies of your data to the beginning and end of your real data, or just small pieces, depending on your filter.
Since the FFT assumes that the data is periodic anyway, that's often a quick and easy approach, and is fully accurate (whereas concatenating the data is an estimation of an infinitely periodic waveform). Here's an example of the FFT approach for a step filter.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 128)
y = (np.sin(.22*(x+10))>0).astype(np.float)
# filter
y2 = np.fft.fft(y)
f0 = np.fft.fftfreq(len(x))
y2[(f0<-.25) | (f0>.25)] = 0
y3 = abs(np.fft.ifft(y2))
plt.plot(x, y)
plt.plot(x, y3)
plt.xlim(-10, 140)
plt.ylim(-.1, 1.1)
plt.show()
Note how the end points bend towards each other at either end, even though this is not consistent with the periodicity of the waveform (since the segments at either end are very truncated). This can also be seen by adjusting waveform so that the ends are the same (here I used x+30 instead of x+10, and here the ends don't need to bend to match-up so they stay at level with the end of the data.
Note, also, to have the endpoints actually be exactly equal you would have to extend this plot by one point (at either end), since it periodic with exactly the wavelength of the original waveform. Doing this is not ad hoc though, and the result will be entirely consistent with your analysis, but just representing one extra point of what was assumed to be infinite repeats all along.
Finally, this FFT trick works best with waveforms of length 2n. Other lengths may be zero padded in the FFT. In this case, just doing concatenations to either end as I mentioned at first might be the best way to go.
The question is how to filter data and require that the left endpoint of the filtered result matches the left endpoint of the data, and same for the right endpoint. (That is, in general, the filtered result should be close to most of the data points, but not necessarily exactly match any of them, but what if you need a match at both endpoints?)
To make the filtered result exactly match the endpoints of a curve, one could add a padding of points at either end of the curve and adjust the y-position of this padding so that the endpoints of the valid part of the filter exactly matched the end points of the original data (without the padding).
In general, this can be done by either iterating towards a solution, adjusting the padding y-position until the ends line up, or by calculating a few values and then interpolating to determine the y-positions that would be required for the matched endpoints. I'll do the second approach.
Here's the code I used, where I simulated the data as a sine wave with two flat pieces on either side (note, that these flat pieces are not the padding, but I'm just trying to make data that looks a bit like the OPs).
import numpy as np
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
#### op's code
def testButterworth(nyf, x, y):
#b, a = butter(4, 1.5/nyf)
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
def do_fit(data):
positions_recorded = data
#positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fx = testButterworth(nyf, x, y)
return fx
### simulate some data (op should have done this too!)
def sim_data():
t = np.linspace(.1*np.pi, (2.-.1)*np.pi, 100)
y = np.sin(t)
c = np.ones(10, dtype=np.float)
z = np.concatenate((c*y[0], y, c*y[-1]))
return z
### code to find the required offset padding
def fit_with_pads(v, data, n=1):
c = np.ones(n, dtype=np.float)
z = np.concatenate((c*v[0], data, c*v[1]))
fx = do_fit(z)
return fx
def get_errors(data, fx):
n = (len(fx)-len(data))//2
return np.array((fx[n]-data[0], fx[-n]-data[-1]))
def vary_padding(data, span=.005, n=100):
errors = np.zeros((4, n)) # Lpad, Rpad, Lerror, Rerror
offsets = np.linspace(-span, span, n)
for i in range(n):
vL, vR = data[0]+offsets[i], data[-1]+offsets[i]
fx = fit_with_pads((vL, vR), data, n=1)
errs = get_errors(data, fx)
errors[:,i] = np.array((vL, vR, errs[0], errs[1]))
return errors
if __name__ == '__main__':
data = sim_data()
fx = do_fit(data)
errors = vary_padding(data)
plt.plot(errors[0], errors[2], 'x-')
plt.plot(errors[1], errors[3], 'o-')
oR = -0.30958
oL = 0.30887
fp = fit_with_pads((oL, oR), data, n=1)[1:-1]
plt.figure()
plt.plot(data, 'b')
plt.plot(fx, 'g')
plt.plot(fp, 'r')
plt.show()
Here, for the padding I only used a single point on either side (n=1). Then I calculate the error for a range of values shifting the padding up and down from the first and last data points.
For the plots:
First I plot the offset vs error (between the fit and the desired data value). To find the offset to use, I just zoomed in on the two lines to find the x-value of the y zero crossing, but to do this more accurately, one could calculate the zero crossing from this data:
Here's the plot of the original "data", the fit (green) and the adjusted fit (red):
and zoomed in the RHS:
The important point here is that the red (adjusted fit) and blue (original data) endpoints match, even though the pure fit doesn't.
Is this a valid approach? Of the various options, this seems the most reasonable since one isn't usually making any claims about the data that isn't being shown, and also for show region has an accurately applied filter. For example, FFTs usually assume the data is zero or periodic beyond the boundaries. Certainly, though, to be precise one should explain what was done.