Extrapolation -- awk based - linux

I need help in the following: I have a data file (columns separated by "\t" tabular) like this data.dat
# y1 y2 y3 y4
17.1685 21.6875 20.2393 26.3158
These are x values of 4 points for a linear fit. The four y values are constant: 0, 200, 400, 600.
I can create a linear fit of the point pairs (x,y): (x1,y1)=(17.1685,0), (x2,y2)=(21.6875,200), (x3,y3)=(20.2393,400), (x4,y4)=(26.3158,600).
Now I would like to make a linear fit on three of these point paris, (x1,y1), (x2,y2), (x3,y3) and (x2,y2), (x3,y3), (x4,y4) and (x1,y1), (x3,y3), (x4,y4) and (x1,y1), (x2,y2), (x4,y4).
If I have these three of points with a linear fit I would like to know the value of the x value of the extrapolated point being out of these three fitted points.
I have so far this awk code:
#!/usr/bin/awk -f
BEGIN{
z[1] = 0;
z[2] = 200;
z[3] = 400;
z[4] = 600;
}
{
split($0,str,"\t");
n = 0.0;
for(i=1; i<=NF; i++)
{
centr[i] = str[i];
n += 1.0;
# printf("%d\t%f\t%.1f\t",i,centr[i],z[i]);
}
# print "";
if (n > 2)
{
lsq(n,z,centr);
}
}
function lsq(n,x,y)
{
sx = 0.0
sy = 0.0
sxx = 0.0
syy = 0.0
sxy = 0.0
eps = 0.0
for (i=1;i<=n;i++)
{
sx += x[i]
sy += y[i]
sxx += x[i]*x[i]
sxy += x[i]*y[i]
syy += y[i]*y[i]
}
if ( (n==0) || ((n*sxx-sx*sx)==0) )
{
next;
}
# print "number of data points = " n;
a = (sxx*sy-sxy*sx)/(n*sxx-sx*sx)
b = (n*sxy-sx*sy)/(n*sxx-sx*sx)
for(i=1;i<=n;i++)
{
ycalc[i] = a+b*x[i]
dy[i] = y[i]-ycalc[i]
eps += dy[i]*dy[i]
}
print "# Intercept =\t"a"
print "# Slope =\t"b"
for (i=1;i<=n;i++)
{
printf("%8g %8g %8g \n",x[i],y[i],ycalc[i])
}
} # function lsq()
So,
If we extrapolate to the place of 4th
0 17.1685 <--(x1,y1)
200 21.6875 <--(x2,y2)
400 20.2393 <--(x3,y3)
600 22.7692 <<< (x4 = 600,y1 = 22.7692)
If we extrapolate to the place of 3th
0 17.1685 <--(x1,y1)
200 21.6875 <--(x2,y2)
400 23.6867 <<< (x3 = 400,y3 = 23.6867)
600 26.3158 <--(x4,y4)
0 17.1685
200 19.35266 <<<
400 20.2393
600 26.3158
0 18.1192 <<<
200 21.6875
400 20.2393
600 26.3158
My current output is the following:
$> ./prog.awk data.dat
# Intercept = 17.4537
# Slope = 0.0129968
0 17.1685 17.4537
200 21.6875 20.0531
400 20.2393 22.6525
600 26.3158 25.2518

Assuming the core calculation in the lsq function is OK (it looks about right, but I haven't scrutinized it), then that gives you the slope and intercept for the least sum of squares line of best fit for the input data set (parameters x, y, n). I'm not sure I understand the tail end of the function.
For your 'take three points and calculate the fourth' problem, the simplest way is to generate the 4 subsets (logically, by deleting one point from the set of four on each of four calls), and redo the calculation.
You need to call another function that takes the line data (slope, intercept) from lsq and interpolates (extrapolates) the value at another y value. That's a straight-forward calculation (x = m * y + c), but you need to determine which y value is missing from the set of 3 you pass in.
You could 'optimize' (meaning 'complicate') this scheme by dropping one value at a time from the 'sums of squares' and 'sums' and 'sum of products' values, recalculating the slope, intercept, and then calculating the missing point again.
(I'll also observe that normally it would be the x-coordinates with the fixed values 0, 200, 400, 600 and the y-coordinates would be the values read. However, that's just a matter of orientation, so it is not crucial.)
Here's at least plausibly working code. Since awk automatically splits on white space, there's no need for you to split on tabs specifically; the read loop takes this into account.
The code needs serious refactoring; there is a ton of repetition in it - however, I also have a job that I'm supposed to do.
#!/usr/bin/awk -f
BEGIN{
z[1] = 0;
z[2] = 200;
z[3] = 400;
z[4] = 600;
}
{
for (i = 1; i <= NF; i++)
{
centr[i] = $i
}
if (NF > 2)
{
lsq(NF, z, centr);
}
}
function lsq(n, x, y)
{
if (n == 0) return
sx = 0.0
sy = 0.0
sxx = 0.0
syy = 0.0
sxy = 0.0
for (i = 1; i <= n; i++)
{
print "x[" i "] = " x[i] ", y[" i "] = " y[i]
sx += x[i]
sy += y[i]
sxx += x[i]*x[i]
sxy += x[i]*y[i]
syy += y[i]*y[i]
}
if ((n*sxx - sx*sx) == 0) return
# print "number of data points = " n;
a = (sxx*sy-sxy*sx)/(n*sxx-sx*sx)
b = (n*sxy-sx*sy)/(n*sxx-sx*sx)
for (i = 1; i <= n; i++)
{
ycalc[i] = a+b*x[i]
}
print "# Intercept = " a
print "# Slope = " b
print "Line: x = " a " + " b " * y"
for (i = 1; i <= n; i++)
{
printf("x = %8g, yo = %8g, yc = %8g\n", x[i], y[i], ycalc[i])
}
print ""
print "Different subsets\n"
for (drop = 1; drop <= n; drop++)
{
print "Subset " drop
sx = sy = sxx = sxy = syy = 0
j = 1
for (i = 1; i <= n; i++)
{
if (i == drop) continue
print "x[" j "] = " x[i] ", y[" j "] = " y[i]
sx += x[i]
sy += y[i]
sxx += x[i]*x[i]
sxy += x[i]*y[i]
syy += y[i]*y[i]
j++
}
if (((n-1)*sxx - sx*sx) == 0) continue
a = (sxx*sy-sxy*sx)/((n-1)*sxx-sx*sx)
b = ((n-1)*sxy-sx*sy)/((n-1)*sxx-sx*sx)
print "Line: x = " a " + " b " * y"
xt = x[drop]
yt = a + b * xt;
print "Interpolate: x = " xt ", y = " yt
}
}
Since awk doesn't provide an easy way to pass back multiple values from a function, nor does it provide structures other than arrays (sometimes associative), it is not perhaps the best language for this task. On the other hand, it can be made to do the job. You might be able to bundle the Least Squares calculation in a function that returns an array containing the slope and intercept, and then use that. Your turn to explore options.
Given the script lsq.awk and the input file lsq.data shown, I get the output shown:
$ cat lsq.data
17.1685 21.6875 20.2393 26.3158
$ awk -f lsq.awk lsq.data
x[1] = 0, y[1] = 17.1685
x[2] = 200, y[2] = 21.6875
x[3] = 400, y[3] = 20.2393
x[4] = 600, y[4] = 26.3158
# Intercept = 17.4537
# Slope = 0.0129968
Line: x = 17.4537 + 0.0129968 * y
x = 0, yo = 17.1685, yc = 17.4537
x = 200, yo = 21.6875, yc = 20.0531
x = 400, yo = 20.2393, yc = 22.6525
x = 600, yo = 26.3158, yc = 25.2518
Different subsets
Subset 1
x[1] = 200, y[1] = 21.6875
x[2] = 400, y[2] = 20.2393
x[3] = 600, y[3] = 26.3158
Line: x = 18.1192 + 0.0115708 * y
Interpolate: x = 0, y = 18.1192
Subset 2
x[1] = 0, y[1] = 17.1685
x[2] = 400, y[2] = 20.2393
x[3] = 600, y[3] = 26.3158
Line: x = 16.5198 + 0.0141643 * y
Interpolate: x = 200, y = 19.3526
Subset 3
x[1] = 0, y[1] = 17.1685
x[2] = 200, y[2] = 21.6875
x[3] = 600, y[3] = 26.3158
Line: x = 17.7985 + 0.0147205 * y
Interpolate: x = 400, y = 23.6867
Subset 4
x[1] = 0, y[1] = 17.1685
x[2] = 200, y[2] = 21.6875
x[3] = 400, y[3] = 20.2393
Line: x = 18.163 + 0.007677 * y
Interpolate: x = 600, y = 22.7692
$
Edit: In the previous version of the answer, the subsets were multiplying by n instead of (n-1). The values in the revised output seem to agree with what you expect. The residual issues are presentational, not computational.

Related

Get a certain combination of numbers in Python

Is there a efficient and convenient solution in Python to do something like -
Find largest combination of two numbers x and y, with the following conditions -
0 < x < 1000
0 < y < 2000
x/y = 0.75
x & y are integers
It's easy to do it using a simple graphing calculator but trying to find the best way to do it in Python
import pulp
My_optimization_prob = pulp.LpProblem('My_Optimization_Problem', pulp.LpMaximize)
# Creating the variables
x = pulp.LpVariable("x", lowBound = 1, cat='Integer')
y = pulp.LpVariable("y", lowBound = 1, cat='Integer')
# Adding the Constraints
My_optimization_prob += x + y #Maximize X and Y
My_optimization_prob += x <= 999 # x < 1000
My_optimization_prob += y <= 1999 # y < 2000
My_optimization_prob += x - 0.75*y == 0 # x/y = 0.75
#Printing the Problem and Constraints
print(My_optimization_prob)
My_optimization_prob.solve()
#printing X Y
print('x = ',pulp.value(x))
print('y = ',pulp.value(y))
Probably just -
z = [(x, y) for x in range(1, 1000) for y in range(1, 2000) if x/y==0.75]
z.sort(key=lambda x: sum(x), reverse=True)
z[0]
#Returns (999, 1332)
This is convenient, not sure if this is the most efficient way.
Another possible relatively efficient solution is -
x_upper_limit = 1000
y_upper_limit = 2000
x = 0
y = 0
temp_variable = 0
ratio = 0.75
for i in range(x_upper_limit, 0, -1):
temp_variable = i/ratio
if temp_variable.is_integer() and temp_variable < y_upper_limit:
x = i
y = int(temp_variable)
break
print(x,y)

Plotting solution 2nd ODE using Euler

I have used the Equation of Motion (Newtons Law) for a simple spring and mass scenario incorporating it into the given 2nd ODE equation y" + (k/m)x = 0; y(0) = 3; y'(0) = 0.
Using the Euler method and the exact solution to solve the problem, I have been able to run and receive some ok results. However, when I execute a plot of the results I get this diagonal line across the oscillating results that I am after.
Current plot output with diagonal line
Can anyone help point out what is causing this issue, and how I can fix it please?
MY CODE:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sympy import Function, dsolve, Eq, Derivative, sin, cos, symbols
from sympy.abc import x, i
import math
# Given is y" + (k/m)x = 0; y(0) = 3; y'(0) = 0
# Parameters
h = 0.01; #Step Size
t = 50.0; #Time(sec)
k = 1; #Spring Stiffness
m = 1; #Mass
x0 = 3;
v0 = 0;
# Exact Analytical Solution
x_exact = x0*cos(math.sqrt(k/m)*t);
v_exact = -x0*math.sqrt(k/m)*sin(math.sqrt(k/m)*t);
# Eulers Method
x = np.zeros( int( t/h ) );
v = np.zeros( int( t/h ) );
x[1] = x0;
v[1] = v0;
x_exact = np.zeros( int( t/h ) );
v_exact = np.zeros( int( t/h ) );
te = np.zeros( int( t/h ) );
x_exact[1] = x0;
v_exact[1] = v0;
#print(len(x));
for i in range(1, int(t/h) - 1): #MAIN LOOP
x[i+1] = x[i] + h*v[i];
v[i+1] = v[i] - h*k/m*x[i];
te[i] = i * h
x_exact[i] = x0*cos(math.sqrt(k/m)* te[i]);
v_exact[i] = -x0*math.sqrt(k/m)*sin(math.sqrt(k/m)* te[i]);
# print(x_exact[i], '\t'*2, x[i]);
#plot
%config InlineBackend.figure_format = 'svg'
plt.plot(te, x_exact, te ,v_exact)
plt.title("DISPLACEMENT")
plt.xlabel("Time (s)")
plt.ylabel("Displacement (m)")
plt.grid(linewidth=0.3)
An in some details more direct computation is
te = np.arange(0,t,h)
N = len(te)
w = (k/m)**0.5
x_exact = x0*np.cos(w*te);
v_exact = -x0*w*np.sin(w*te);
plt.plot(te, x_exact, te ,v_exact)
resulting in
Note that arrays in python start at the index zero,
x = np.empty(N)
v = np.empty(N)
x[0] = x0;
v[0] = v0;
for i in range(N - 1): #MAIN LOOP
x[i+1] = x[i] + h*v[i];
v[i+1] = v[i] - h*k/m*x[i];
plt.plot(te, x, te ,v)
then gives the plot
with the expected increasing amplitude.

How to make algorithm of counting big negative numbers faster?

I was trying to do a program which can take square matrix of any size and find two any square submatrixes(they shouldn't overlap), whose qualities are the closest. So every elementary square in matrix has a quality(any number). The problem is algorithm is too slow with processing matrixes 100*100 containing big negative numbers(-10^8).
from collections import defaultdict
from math import fsum
from itertools import islice, count
def matrix():
matritsa = list()
koord = list()
ssum = 0
diff = defaultdict(list)
size = int(input()) #size of the matrix
for line in range(size):
matritsa += list(map(int,input().split())) #filling the matrix
Q1,Q2 = map(int,input().split()) #quality range
for i in range(len(matritsa)):
element = matritsa[i]
if Q1 <= element and element <= Q2: #checking if an element is in range
koord.append([element, i//size, i%size, 1]) #coordinates of 1*1 elements
for razmer in range(2,size): #taking squares of 2+ size
#print("Razmer: ",razmer)
for a in range(len(matritsa)):
(x,y) = a//size,a%size #coordinates of the left top square
if y + razmer > size:
continue
ssum = 0
for b in range(x,x+razmer):
ssum += sum(matritsa[b*size+y:b*size+(y+razmer)])
if Q1 <= ssum and ssum <= Q2: #checking if it is in quality range
koord.append([ssum,x,y,razmer])
#print(koord)
#print("Coordinate: {0},{1}; ssum: {2}".format(x,y,ssum))
if x + razmer == size and y + razmer == size:
#print("Final: {0},{1}".format(x,y))
break
g = 0
temp = len(koord)
while g != temp:
(value1,x1,y1,a) = koord[g]
for i in range(g,temp - 1):
(value2,x2,y2,b) = koord[i+1]
if y1 >= y2 + b or y2 >= y1 + a:
diff[abs(value1 - value2)].append([value1,value2])
if x1 >= x2 + b or x2 >= x1 + a:
diff[abs(value1 - value2)].append([value1,value2])
g += 1
minimum = min(diff.keys())
if minimum == 0:
print(*max(diff[minimum]))
elif len(diff[minimum]) > 1:
maximum = sum(diff[minimum][0])
result = diff[minimum][0]
for pair in diff[minimum]:
if sum(pair) > maximum:
maximum = sum(pair)
result = pair
The problem is in this part
g = 0
temp = len(koord)
print(temp)
while g != temp:
(value1,x1,y1,a) = koord[g]
for i in range(g,temp - 1):
(value2,x2,y2,b) = koord[i+1]
if y1 >= y2 + b or y2 >= y1 + a:
diff[abs(value1 - value2)].append([value1,value2])
if x1 >= x2 + b or x2 >= x1 + a:
diff[abs(value1 - value2)].append([value1,value2])
g += 1

Error using dde23 (line 224) Derivative and history vectors have different lengths

I am trying to solve a couple system of delay differential equations using dde23. While running the following code, I am getting an annoying error "Derivative and history vectors have different lengths"
function sol = prob1
clf
global Lembda alpha u1 u2 p q c d k a T b zeta1 zeta2 A1 A2
Lembda=2; b=0.07; d=0.0123; a=0.6; k=50; q=13; c=40; p=30; alpha = 0.4476; T=1; B=0.4; A1 =200; A2=100; zeta1=10; zeta2=30;
lags = [ 10; 0.2; 2; 10; 0.2; 10; 0.2; 2; 10; 0.2; 15; 0.9; 0.17; 0.01; 0.5; 0.000010; 0.00002];
sol = dde23(#prob2f,T,lags,[0,10], u1, u2);
function yp = prob2f(t,y,Z,B)
global Lembda alpha p b d c q T a k zeta1 zeta2 A1 A2
x2 = y(1);
y2 = y(2);
z2 = y(3);
v = y(4);
w = y(5);
xlag = Z(1,1);
vlag = Z(2,1);
%%%%%%%%%%%%%%%%
x1 = y(6);
y1 = y(7);
z1 = y(8);
v1 = y(9);
w1 = y(10);
x1lag = Z(1,1);
v1lag = Z(2,1);
%%%%%%%%%%%%%%%%%%%
lambda1 = y(11);
lambda2 = y(12);
lambda3 = y(13);
lambda4 = y(14);
lambda5 = y(15);
u1 = y(16);
u2= y(17);
lambda1lag = Z(1,1);
lambda4lag = Z(2,1);
%%%%%%%%%
dxdt=Lembda-d*x2-B*x2*v;
dydt=B*exp(-a*T)*xlag*vlag-a*y2 - alpha*y2*w;
dzdt=alpha*y2*w - b*z2;
dvdt=k*y2-p*v;
dwdt=c*z2-q*w;
%%%%%%%%%
dx1dt=Lembda-d*x1-(1-u1)*B*x1*v1;
dy1dt=(1-u1)*B*exp(-a*T)*x1lag*v1lag-a*y1 - alpha*y1*w1;
dz1dt=alpha*y1*w1 - b*z1;
dv1dt=(1-u2)*k*y1-p*v1;
dw1dt=c*z1-q*w1;
%%%%%%%%%%
dlambda1dt= A1+lambda1*d+(1-u1)*lambda1*B*v1-(1-u1)*lambda2*B*v1lag*exp(-a*T)*lambda2*(T);
dlambda2dt= a*lambda2+(lambda2-lambda3)*alpha*w1-lambda4*k*(u2-1);
dlambda3dt= b*lambda3-c*lambda5;
dlambda4dt= A2+(1-u1)*lambda1*B*x1+lambda4*p+lambda4*(T)*lambda2*x1lag*(u2-1)*exp(-a*T);
dlambda5dt=alpha*lambda2*z1-alpha*lambda3*z1+lambda5*q;
du1dt = ( lambda2*x1lag*v1lag - lambda1*x1*v1)*(B/zeta1);
du2dt =(lambda4*k*y2)/zeta2;
yp = [ dxdt; dydt; dzdt; dvdt;dwdt; dx1dt; dy1dt; dz1dt; dv1dt;dw1dt; dlambda1dt; dlambda2dt; dlambda3dt ;dlambda4dt ;dlambda5dt; du1dt; du2dt ];
Can anyone guide me, to be able to resolve this issue?
Thanks
The error occurs because your return vector yp is not the same size as the lags vector.
The lags vector has length 17, but the yp vector comes out to be of length 10. Even though you have 17 entries in yp, many of them as []
yp = [ dxdt; dydt; dzdt; dvdt;dwdt; dx1dt; dy1dt; dz1dt; dv1dt;dw1dt;
dlambda1dt; dlambda2dt; dlambda3dt ;dlambda4dt ;dlambda5dt; du1dt; du2dt ];
K>> dxdt
dxdt =
[]
K>> length(yp)
10
lags = [ 10; 0.2; 2; 10; 0.2; 10; 0.2; 2; 10; 0.2; 15; 0.9; 0.17; 0.01;
0.5; 0.000010; 0.00002];
sol = dde23(#prob2f,T,lags,[0,10], u1, u2);
K>> length(lags)
17
The return from your prob2f() should have same length as lags. This is why the error shows up
f0 = feval(ddefun,t0,y0,Z0,varargin{:});
nfevals = nfevals + 1;
[m,n] = size(f0);
if n > 1
error(message('MATLAB:dde23:DDEOutputNotCol'))
elseif m ~= neq
error(message('MATLAB:dde23:DDELengthMismatchHistory')); <========
end
You need to check your prob2f function and make sure yp has same length as lags.

is inputted point within the rectangle centered on the (0,0) axis

Trying to write a simple python program to determine if two points (x, y) input by a user fall within a rectangle centered on the (0,0) axis with a width of 10 and height of 5.
Here is where I am.
x = eval(input("Enter the x point : "))
y = eval(input("Enter the y point : "))
if x <= 10.0 / 2:
if y <= 5.0 / 2:
print("Point (" + str(x) + ", " + str(y) + ") is in the rectangle")
else:
print("Point (" + str(x) + ", " + str(y) + ") is not in the rectangle")
This wont work on the negative side. There is a math function I am needing, just can't figure out which one.
I have added a distance = math.sqrt (( x * x ) + ( y * y )) and changed the ifs to distance instead of x and y. I am just so confused.
WIDTH = 10
HEIGHT = 5
X_CENTER = 0
Y_CENTER = 0
x_str = input("Enter the x point: ")
y_str = input("Enter the y point: ")
x = float(x_str)
y = float(y_str)
is_in = True
if not (X_CENTER - WIDTH/2 <= x <= X_CENTER + WIDTH/2):
is_in = False
if not (Y_CENTER - HEIGHT/2 <= y <= Y_CENTER + HEIGHT/2):
is_in = False
if is_in:
statement = "is"
else:
statement = "is not"
print("Point (%s, %s) %s in the rectangle." % (x_str, y_str, statement))

Resources