How to produce X values of a stretched graph? - groovy

I'm trying to "normalize" monthly data in a way.
What I mean by that is, I need to take daily values and check the data from each month against the data in another month.
The only problem with this is that some months are longer than others. So I have come up with a way that I want to do this, but I'm kind of confused as to exactly how to do it...
Basically, I'm looking at this website: http://paulbourke.net/miscellaneous/interpolation/ and trying to transform each set of coordinates into a graph with 31 X- and Y-values (I would like to use the Cosine interpolator, but I'm still trying to figure out what to do with it).
Now, the X values have a function. I can pass in something like (1..28) and morph it into 31 values...simple enough code, something like this works for me:
def total = (1..30)
def days = 28
def resize = {x, y->
result = []
x.each{ result << (it * (y/x.size())}
return result
}
resize(total,days)
Which returns a list of 31 Y-values spanning from 0 to 28.
My question is: How do I translate a list of the corresponding Y values to these values? I'm having a really hard time wrapping my head around the concept and could use a little help.
My first thought was to simply run the Y-values through this function too, but that returns values that are all lower than the original input.
I'm looking for something that will retain the values at certain points, but simply stretch the graph out horizontally.
For example, at the x value at (1/3) of the graph's length, the value needs to be the same as it would be at (1/3) of the original graph's length.
Can anyone help me out on this? It's got me stumped.
Thanks in advance!

Not sure where the problem lies with this, so I made up some data
I think your algorithm is correct, and you only need to normalize the x-axis.
I came up with this code (and some plots) to demonstrate what I believe is the answer
Define some x and y values:
def x = 1..30
def y = [1..15,15..1].flatten()
Then, generate a list of xy values in the form: [ [ x, y ], [ x, y ], ...
def xy = [x,y].transpose()
If we plot this list, we get:
Then define a normalize function (basically the same as yours, but it doesn't touch the y value)
def normalize( xylist, days ) {
xylist.collect { x, y -> [ x * ( days / xylist.size() ), y ] }
}
Then we can normalize our list to 28 days
def normalxy = normalize( xy, 28 )
Now, if we plot these points, we get
As you can see, both plots have the same shape, they are just different widths...
Have I missed the point?

Related

How to obtain all possible values using interpolate of scipy?

I'm trying to identify all the y-axis values when my data passes through zero on the x-axis. Using scipy's interpolation function it only shows me the last value, why? or Can anyone help me to deal with this issue?
This is my code:
x = <https://drive.google.com/file/d/1shJtrg9orwLFm9Da5wXaysbQZVtZYO-r/view?usp=share_link>
y = np.arange(0,150,1)
y_interp = interp1d(x, y, kind='cubic')
y_interp(0)
The result is:
array(112.72889837)
But it also passes by ~50.

using Geopandas, How to randomly select in each polygon 5 Points by sampling method

I want to select 5 Points in each polygon based on random sampling method. And required 5 points co-ordinates(Lat,Long) in each polygon for identify which crop is grawn.
Any ideas for do this using geopandas?
Many thanks.
My suggestion involves sampling random x and y coordinates within the shape's bounding box and then checking whether the sampled point is actually within the shape. If the sampled point is within the shape then return it, otherwise repeat until a point within the shape is found. For sampling, we can use the uniform distribution, such that all points in the shape have the same probability of being sampled. Here is the function:
from shapely.geometry import Point
def random_point_in_shp(shp):
within = False
while not within:
x = np.random.uniform(shp.bounds[0], shp.bounds[2])
y = np.random.uniform(shp.bounds[1], shp.bounds[3])
within = shp.contains(Point(x, y))
return Point(x,y)
and here's an example how to apply this function to an example GeoDataFrame called geo_df to get 5 random points for each entry:
for num in range(5):
geo_df['Point{}'.format(num)] = geo_df['geometry'].apply(random_point_in_shp)
There might be more efficient ways to do this, but depending on your application the algorithm could be sufficiently fast. With my test file, which contains ~2300 entries, generating five random points for each entry took around 15 seconds on my machine.

making a function that translates a point around another point

given an array of points my program should in theory, Find the two furthest points from each other. Then calculate the angle that those two points make with the x axis. Then in rotate all the points in the array around the averaged center of all the points by that angle. For some reason my translation function to rotate all the points around the center is not working it is giving me unexpected values. I am fairly sure the math I am using to do this is accurate since I tested the formula I am using using wolfram alpha and plotted the points on desmos. I am not sure what's wrong with my code because it keeps giving me unexpected output. Any help would greatly be appreciated.
This is the code to translate the array:
def translation(array,centerArray):
array1=array
maxDistance=0
point1=[]
point2=[]
global angle
for i in range(len(array1)):
for idx in range(len(array1)):
if(maxDistance<math.sqrt(((array1[i][0]-array1[idx][0])**2)+((array1[i][1]-array1[idx][1])**2)+((array1[i][2]-array1[idx][2])**2))):
maxDistance=math.sqrt(((array1[i][0]-array1[idx][0])**2)+((array1[i][1]-array1[idx][1])**2)+((array1[i][2]-array1[idx][2])**2))
point1 = array1[i]
point2 = array1[idx]
angle=math.atan2(point1[1]-point2[1],point1[0]-point2[0]) #gets the angle between two furthest points and xaxis
for i in range(len(array1)): #this is the problem here
array1[i][0]=((array[i][0]-centerArray[0])*math.cos(angle)-(array[i][1]-centerArray[1])*math.sin(angle))+centerArray[0] #rotate x cordiate around center of all points
array1[i][1]=((array[i][1]-centerArray[1])*math.cos(angle)+(array[i][0]-centerArray[0])*math.sin(angle))+centerArray[1] #rotate y cordiate around center of all points
return array1
This is the code I am using to test it. tortose is what I set turtle graphics name as
tortose.color("violet")
testarray=[[200,400,9],[200,-100,9]] #array of 2 3d points but don't worry about z axis it will not be used for in function translation
print("testsarray",testarray)
for i in range(len(testarray)): #graph points in testarray
tortose.setposition(testarray[i][0],testarray[i][1])
tortose.dot()
testcenter=findCenter(testarray) # array of 1 point in the center of all the points format center=[x,y,z] but again don't worry about z
print("center",testcenter)
translatedTest=translation(testarray,testcenter) # array of points after they have been translated same format and size of testarray
print("translatedarray",translatedTest) #should give the output [[-50,150,9]] as first point but instead give output of [-50,-99.999999997,9] not sure why
tortose.color("green")
for i in range(len(testarray)): #graphs rotated points
tortose.setposition(translatedTest[i][0],translatedTest[i][1])
tortose.dot()
print(angle*180/3.14) #checks to make sure angle is 90 degrees because it should be in this case this is working fine
tortose.color("red")
tortose.setposition(testcenter[0],testcenter[1])
tortose.dot()
find center code finds the center of all points in array don't worry about z axis since it is not used in translation:
def findCenter(array):
sumX = 0
sumY = 0
sumZ = 0
for i in range(len(array)):
sumX += array[i][0]
sumY += array[i][1]
sumZ += array[i][2]
centerX= sumX/len(array)
centerY= sumY/len(array)
centerZ= sumZ/len(array)
#print(centerX)
#print(centerY)
#print(centerZ)
centerArray=[centerX,centerY,centerZ]
return centerArray
import math
import turtle
tortose = turtle.Turtle()
tortose.penup()
my expected output should be a point at (-50,150) but it is giving me a point at (-50,-99.99999999999997)
This is a common mistake when doing in-place rotations:
array1[i][0]= ...
array1[i][1]= ... array[i][0] ...
First you update array1[i][0]. Then you update array1[i][1], but you use the new value when you should use the old value. Instead, temporarily store the old value:
x = array1[i][0]
array1[i][0]=((array[i][0]-centerArray[0])*math.cos(angle)-(array[i][1]-centerArray[1])*math.sin(angle))+centerArray[0] #rotate x cordiate around center of all points
array1[i][1]=((array[i][1]-centerArray[1])*math.cos(angle)+(x-centerArray[0])*math.sin(angle))+centerArray[1] #rotate y cordiate around center of all points

Filtering signal: how to restrict filter that last point of output must equal the last point of input

Please help my poor knowledge of signal processing.
I want to smoothen some data. Here is my code:
import numpy as np
from scipy.signal import butter, filtfilt
def testButterworth(nyf, x, y):
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
if __name__ == '__main__':
positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fl = testButterworth(nyf, x, y)
I am pretty satisfied with results except one point:
it is absolutely crucial to me that the start and end point in returned values equal to the start and end point of input. How can I introduce this restriction?
UPD 15-Dec-14 12:04:
my original data looks like this
Applying the filter and zooming into last part of the graph gives following result:
So, at the moment I just care about the last point that must be equal to original point. I try to append copy of data to the end of original list this way:
the result is as expected even worse.
Then I try to append data this way:
And the slice where one period ends and next one begins, looks like that:
To do this, you're always going to cheat somehow, since the true filter applied to the true data doesn't behave the way you require.
One of the best ways to cheat with your data is to assume it's periodic. This has the advantages that: 1) it's consistent with the data you actually have and all your changing is to append data to the region you don't know about (so assuming it's periodic as as reasonable as anything else -- although may violate some unstated or implicit assumptions); 2) the result will be consistent with your filter.
You can usually get by with this by appending copies of your data to the beginning and end of your real data, or just small pieces, depending on your filter.
Since the FFT assumes that the data is periodic anyway, that's often a quick and easy approach, and is fully accurate (whereas concatenating the data is an estimation of an infinitely periodic waveform). Here's an example of the FFT approach for a step filter.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 128)
y = (np.sin(.22*(x+10))>0).astype(np.float)
# filter
y2 = np.fft.fft(y)
f0 = np.fft.fftfreq(len(x))
y2[(f0<-.25) | (f0>.25)] = 0
y3 = abs(np.fft.ifft(y2))
plt.plot(x, y)
plt.plot(x, y3)
plt.xlim(-10, 140)
plt.ylim(-.1, 1.1)
plt.show()
Note how the end points bend towards each other at either end, even though this is not consistent with the periodicity of the waveform (since the segments at either end are very truncated). This can also be seen by adjusting waveform so that the ends are the same (here I used x+30 instead of x+10, and here the ends don't need to bend to match-up so they stay at level with the end of the data.
Note, also, to have the endpoints actually be exactly equal you would have to extend this plot by one point (at either end), since it periodic with exactly the wavelength of the original waveform. Doing this is not ad hoc though, and the result will be entirely consistent with your analysis, but just representing one extra point of what was assumed to be infinite repeats all along.
Finally, this FFT trick works best with waveforms of length 2n. Other lengths may be zero padded in the FFT. In this case, just doing concatenations to either end as I mentioned at first might be the best way to go.
The question is how to filter data and require that the left endpoint of the filtered result matches the left endpoint of the data, and same for the right endpoint. (That is, in general, the filtered result should be close to most of the data points, but not necessarily exactly match any of them, but what if you need a match at both endpoints?)
To make the filtered result exactly match the endpoints of a curve, one could add a padding of points at either end of the curve and adjust the y-position of this padding so that the endpoints of the valid part of the filter exactly matched the end points of the original data (without the padding).
In general, this can be done by either iterating towards a solution, adjusting the padding y-position until the ends line up, or by calculating a few values and then interpolating to determine the y-positions that would be required for the matched endpoints. I'll do the second approach.
Here's the code I used, where I simulated the data as a sine wave with two flat pieces on either side (note, that these flat pieces are not the padding, but I'm just trying to make data that looks a bit like the OPs).
import numpy as np
from scipy.signal import butter, filtfilt
import matplotlib.pyplot as plt
#### op's code
def testButterworth(nyf, x, y):
#b, a = butter(4, 1.5/nyf)
b, a = butter(4, 1.5/nyf)
fl = filtfilt(b, a, y)
return fl
def do_fit(data):
positions_recorded = data
#positions_recorded = np.loadtxt('original_positions.txt', delimiter='\n')
number_of_points = len(positions_recorded)
end = 10
dt = end/float(number_of_points)
nyf = 0.5/dt
x = np.linspace(0, end, number_of_points)
y = positions_recorded
fx = testButterworth(nyf, x, y)
return fx
### simulate some data (op should have done this too!)
def sim_data():
t = np.linspace(.1*np.pi, (2.-.1)*np.pi, 100)
y = np.sin(t)
c = np.ones(10, dtype=np.float)
z = np.concatenate((c*y[0], y, c*y[-1]))
return z
### code to find the required offset padding
def fit_with_pads(v, data, n=1):
c = np.ones(n, dtype=np.float)
z = np.concatenate((c*v[0], data, c*v[1]))
fx = do_fit(z)
return fx
def get_errors(data, fx):
n = (len(fx)-len(data))//2
return np.array((fx[n]-data[0], fx[-n]-data[-1]))
def vary_padding(data, span=.005, n=100):
errors = np.zeros((4, n)) # Lpad, Rpad, Lerror, Rerror
offsets = np.linspace(-span, span, n)
for i in range(n):
vL, vR = data[0]+offsets[i], data[-1]+offsets[i]
fx = fit_with_pads((vL, vR), data, n=1)
errs = get_errors(data, fx)
errors[:,i] = np.array((vL, vR, errs[0], errs[1]))
return errors
if __name__ == '__main__':
data = sim_data()
fx = do_fit(data)
errors = vary_padding(data)
plt.plot(errors[0], errors[2], 'x-')
plt.plot(errors[1], errors[3], 'o-')
oR = -0.30958
oL = 0.30887
fp = fit_with_pads((oL, oR), data, n=1)[1:-1]
plt.figure()
plt.plot(data, 'b')
plt.plot(fx, 'g')
plt.plot(fp, 'r')
plt.show()
Here, for the padding I only used a single point on either side (n=1). Then I calculate the error for a range of values shifting the padding up and down from the first and last data points.
For the plots:
First I plot the offset vs error (between the fit and the desired data value). To find the offset to use, I just zoomed in on the two lines to find the x-value of the y zero crossing, but to do this more accurately, one could calculate the zero crossing from this data:
Here's the plot of the original "data", the fit (green) and the adjusted fit (red):
and zoomed in the RHS:
The important point here is that the red (adjusted fit) and blue (original data) endpoints match, even though the pure fit doesn't.
Is this a valid approach? Of the various options, this seems the most reasonable since one isn't usually making any claims about the data that isn't being shown, and also for show region has an accurately applied filter. For example, FFTs usually assume the data is zero or periodic beyond the boundaries. Certainly, though, to be precise one should explain what was done.

Filling up an output vector with stata loop

When you take centiles of a variable in Stata, for eg.
*set directory
cd"C:\Etc\Etc Etc\"
*open data file
use "dataset.dta",clear
*get centiles
centile var1, centile(1,5(5)95,99)
is there some way to record the resulting centile table to excel? The centile values are stored in r(c_#), where # indicates the centile at which you want the data. But I need a vector of the values at all the centiles, more or less as it appears in the output window.
I have attempted to use foreach loop to get the centiles into a vector, as follows:
*Create column of centiles
foreach i in r(centiles) {
xx[1,`i']=r(c_`i')
}
without success.
Thanks
EDIT:
I've since found this to work:
matrix X = 0,0
forvalues i=1/21 {
matrix X = `i',round(r(c_`i'),.001)\ X
}
Only inconveniences are 1) I have to include a a first row of 0,0 in the output, which I will then subsequently drop. 2) In this case I have 21 centiles, but it would be nice to automate the number of centiles in case I want to change it, for example something like this:
forvalues i=1/r(n_cent) {
matrix X = `i',round(r(c_`i'),.001)\ X
}
But the "i=1/r(n_cent)" is invalid syntax. Any advice as to how I might overcome these two inconveniences would be much appreciated.
Thanks
You can use the following syntax.
Load some data and compute the percentiles.
sysuse auto, clear
centile price, centile(1,5(5)95,99)
The matrix that is supposed to contain the results has to be initialized. This matrix is called X. It has as many rows as there are centiles requested via the centile command. It has two columns. At this stage, the matrix is populated with zeroes.
matrix X = J(`=wordcount("`r(centiles)'")', 2, 0)
The following loop is stepping through the results of the centile command and is replacing the zeroes in matrix X with the appropriate results. The first column of the matrix contains the number of the centile (1, 5, 10, ...) and the second column contains the result
forvalues i = 1 / `=wordcount("`r(centiles)'")' {
local cent: word `i' of `r(centiles)'
matrix X[`i', 1] = `cent'
matrix X[`i', 2] = r(c_`i')
}
Print the results:
matrix list X
If you are using round(), you are likely doing something wrong. There are few reasons to deliberately lose precision in the data; you can always display as many digits as you like using format this way or another (either applied to the data, or as an option of list or matrix list).
I wrote epctile command that returns percentiles as an estimation command, i.e., in the e(b) vector. This can be usable immediately; findit epctile to download.
You can modify your proposal as follows:
local thenumlist 1, 5(5)95, 99
centile variable, centile(`thenumlist')
forvalues i=1/`=r(n_cent)' {
matrix X = nullmat(X) \ r(c_`i')
}
numlist "`thenumlist'"
matrix rownames X = `r(numlist)'
matrix list X, format(%9.3f)

Resources