Related
Good morning as per the attached code, I would like to create a function that if the work shift (Bezahltezeit variable is showed in the Überziet column in Excel) is more than 10 hours, a 25% surcharge is created for every minute worked after ten hours.
I should have a result in minutes so that I can then add it up and convert it to excel.
Also in the nachtzeit column the 10% surcharge is calculated for every minute worked between 10 am and 6 am, as you see in the table it is transcribed in hours and minutes, I should have the result in minutes again so that I can add it up and convert to excel.
The function was created with the help of user #constantstranger whom I thank again!
Thanks
Year = int(IcsStartData.year)
Month = int(IcsStartData.month)
Day = int(IcsStartData.day)
StartH = int(IcsStartData.hour)
StartM = int(IcsStartData.minute)
EndH = int(IcsEndData.hour)
EndM = int(IcsEndData.minute)
currentDateTime = datetime.now()
Date_Time_Stamp = currentDateTime.strftime("%d%m%Y%H%M")
Current_DateTime = currentDateTime.strftime("%d.%m.%Y %H:%M")
Datetime = IcsStartData.strftime("%d.%m.%Y")
StartTime = IcsStartData.strftime("%H.%M")
EndTime = IcsEndData.strftime("%H.%M")
Schichtdauer = IcsSchichtDauer.strftime("%H.%M")
Bezahltezeit = IcsBezahlteZeit.strftime("%H.%M")
Bezahltezeit_Stunden = float(IcsBezahlteZeit.strftime("%H"))
Bezahltezeit_Minuten = float(IcsBezahlteZeit.strftime("%M"))
Terminaldata = IcsEndData.strftime("%d.%m.%Y")
EndDay = int(IcsEndData.day)
EndMonth = int(IcsEndData.month)
EndYear = int(IcsEndData.year)
def excelworking():
endTime = IcsEndData
startTime = IcsStartData
def getRegularAndBonusHours(startTime, endTime):
if endTime < startTime:
raise ValueError(f'endTime {endTime} is before startTime {startTime}')
startDateStr = startTime.strftime("%d.%m.%Y")
bonusStartTime = datetime.strptime(startDateStr + " " + "20:00", "%d.%m.%Y %H:%M")
prevBonusEndTime = datetime.strptime(startTime.strftime("%d.%m.%Y") + " " + "06:00", "%d.%m.%Y %H:%M")
bonusEndTime = prevBonusEndTime + timedelta(days=1)
NachtArbeitZeit = timedelta(days=0)
dienstdauer = endTime - startTime
hours = dienstdauer.total_seconds() // 3600
if hours > 24:
fullDays = hours // 24
NachtArbeitZeit += fullDays * (bonusEndTime - bonusStartTime)
endTime -= timedelta(days=fullDays)
if startTime < prevBonusEndTime:
NachtArbeitZeit += prevBonusEndTime - startTime
if endTime < prevBonusEndTime:
NachtArbeitZeit -= prevBonusEndTime - endTime
if startTime > bonusStartTime:
NachtArbeitZeit -= startTime - bonusStartTime
if endTime > bonusStartTime:
NachtArbeitZeit += min(endTime, bonusEndTime) - bonusStartTime
return dienstdauer, NachtArbeitZeit
def getHours(startTime, endTime, extraFraction):
dienstdauer, NachtArbeitZeit = getRegularAndBonusHours(startTime, endTime)
delta = dienstdauer + NachtArbeitZeit * extraFraction
return delta
def testing(start, end):
dienstdauer, NachtArbeitZeit = getRegularAndBonusHours(start, end)
def getHoursRoundedUp(delta):
return delta.days * 24 + delta.seconds // 3600 + (1 if delta.seconds % 3600 else 0)
regularHours, nachtszulage = getHoursRoundedUp(dienstdauer), getHoursRoundedUp(NachtArbeitZeit)
# print(f'start {start}, end {end}, nachtszulage {nachtszulage}, Nachstüberzeit {NachtArbeitZeit / 60 * 10} dienstdauer: {dienstdauer} {NachtArbeitZeit}')
# Writing on a EXCEL FILE
filename = (f"{myPath}/Monatsplan {username} {month} {year}.xlsx")
emptycell = ' '
wegzeiten = (int('13'))
try:
wb = load_workbook(filename)
ws = wb.worksheets[0] # select first worksheet
except FileNotFoundError:
headers_row = ['Datum','Dienst','Funktion','Von','Bis','Schichtdauer','Bezahlte Zeit (Studen)','Bezahlte Zeit (Minuten)','Zeit Konvertierung','Überzeit (ab 10 St.)','Nachtzeitzuschlag.','Nachtdienstentschädigung','Wegzeiten']
wb = Workbook()
ws = wb.active
ws.append(headers_row)
wb.save(filename)
ws.append([f'{Datetime}',f'{string1}'f'{tagesinfo2}',f'{soup_funktion}',f'{StartTime}',f'{EndTime}',f'{Schichtdauer}',f'{Bezahltezeit_Stunden}',f'{Bezahltezeit_Minuten}',f'{emptycell}',f'{emptycell}',f'{NachtArbeitZeit / 60 * 10}',f'{nachtszulage}',f'{wegzeiten}'])
for cols in ws.iter_cols( ):
if cols[-1].value:
cols[-1].border = Border(left=Side(style='thin'),right=Side(style='thin'),top=Side(style='thin'),bottom=Side(style='thin'))
cols[-1].number_format = '0.00'
wb.save(filename)
wb.close()
I want to draw a fixed horizontal line (or a ruler) that gives info about size/distance or zooming factor like the one in Google Maps (see here).
Here is the another example with different zoom levels and camera used is orthographic
I try to implement the same with perspective camera but I would not able do it correctly
Below is the result I am getting with perspective camera
The logic that i am using to draw the ruler is
var rect = myCanvas.getBoundingClientRect();
var canvasWidth = rect.right - rect.left;
var canvasHeight = rect.bottom - rect.top;
var Canvas2D_ctx = myCanvas.getContext("2d");
// logic to calculate the rulerwidth
var distance = getDistance(camera.position, model.center);
canvasWidth > canvasHeight && (distance *= canvasWidth / canvasHeight);
var a = 1 / 3 * distance,
l = Math.log(a) / Math.LN10,
l = Math.pow(10, Math.floor(l)),
a = Math.floor(a / l) * l;
var rulerWidth = a / h;
var text = 1E5 <= a ? a.toExponential(3) : 1E3 <= a ? a.toFixed(0) : 100 <= a ? a.toFixed(1) : 10 <= a ? a.toFixed(2) : 1 <= a ? a.toFixed(3) : .01 <= a ? a.toFixed(4) : a.toExponential(3);
Canvas2D_ctx.lineCap = "round";
Canvas2D_ctx.textBaseline = "middle";
Canvas2D_ctx.textAlign = "start";
Canvas2D_ctx.font = "12px Sans-Serif";
Canvas2D_ctx.strokeStyle = 'rgba(255, 0, 0, 1)';
Canvas2D_ctx.lineWidth = 0;
var m = canvasWidth * 0.01;
var n = canvasHeight - 50;
Canvas2D_ctx.beginPath();
Canvas2D_ctx.moveTo(m, n);
n += 12;
Canvas2D_ctx.lineTo(m, n);
m += canvasWidth * rulerWidth;
Canvas2D_ctx.lineTo(m, n);
n -= 12;
Canvas2D_ctx.lineTo(m, n);
Canvas2D_ctx.stroke();
Canvas2D_ctx.fillStyle = 'rgba(255, 0, 0, 1)';
Canvas2D_ctx.fillText(text + " ( m )", (m) /2 , n + 6)
Can any one help me ( logic to calculate the ruler Width) in fixing this issue and to render the scale meter / ruler correctly for both perspective and orthographic camera.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import linalg, optimize
%matplotlib inline
Data load
data = pd.read_csv("D:/Stat/TimeSeries/KRW_month_0617_1.csv",index_col="Date") / 100
para = open("D:/Stat/TimeSeries/KRW_month_0617_1.txt").readlines()[0:2]
data.index = pd.to_datetime(data.index)
Parameters
cond = []
params = []
time = []
for i in para:
j = i.split()
for k in j:
cond.append(k)
cond = cond[1:]
for i in range(len(cond)):
cond[i] = round(float(cond[i]),4)
params = cond[0:23]
time = cond[23:]
maturity = np.array(time[1:])
timegap = 1/cond[23]
Functions We need
def Paramcheck(Params, checkStationary = 1):
result = 0
Kappa = np.array([[params[20],0,0], [0,params[21],0], [0,0,params[22]]])
Sigma = np.array([[params[1],0,0], [params[2],params[3],0], [params[4],params[5],params[6]]])
State = np.array([params[7], params[8], params[9]])
Lambda = params[0]
SigmaEps = np.identity(10)
for i in range(10):
SigmaEps[i][i] = params[i+10]
for i in range(len(Sigma)):
if Sigma[i][i] < 0:
result = 1
for j in SigmaEps:
if np.any(SigmaEps) < 0:
result = 1
if Lambda < 0.05 or Lambda > 2:
result = 1
elif State[0] < 0:
result = 1
elif Kappa[0][0] < 0:
result = 1
if result == 0 and checkStationary > 0:
if max(np.linalg.eigvals(-Kappa).real) > 0:
result = 2
return result
def CheckDet(x):
if x == np.inf or x == np.nan:
result = 1
elif x < 0:
result = 2
elif abs(x) < 10**-250:
result = 3
else:
result = 0
return result
def NS_factor(lambda_val, maturity):
col1 = np.ones(len(maturity))
col2 = (1 - np.exp(-lambda_val*maturity))/(lambda_val*maturity)
col3 = col2 - np.exp(-lambda_val*maturity)
factor = np.array([col1,col2,col3]).transpose()
return factor
def DNS_Kalman_filter(Params, *args):
N = Paramcheck(Params)
if N == 0:
Kappa = np.array([[params[20],0,0], [0,params[21],0], [0,0,params[22]]])
Sigma = np.array([[params[1],0,0], [params[2],params[3],0],
[params[4],params[5],params[6]]])
State = np.array([params[7], params[8], params[9]])
Lambda = params[0]
SigmaEps = np.identity(10)
for i in range(10):
SigmaEps[i][i] = params[i+10]
Obs_Yield = args[0]
Obs_Date = args[1]
Timegap = args[2]
Obs_Mty = args[3]
Finalstate = args[4]
Mty_length = len(Obs_Mty)
B = NS_factor(lambda_val = Lambda,maturity = Obs_Mty)
H_large = SigmaEps **2
N_obs = len(Obs_Date)
LLH_vec = np.zeros(N_obs)
phi1 = linalg.expm(-Kappa*Timegap)
phi0 = (np.identity(3)-phi1) # State
Eigenvalues = np.linalg.eig(Kappa)[0]
Eigen_vec = np.linalg.eig(Kappa)[1]
Eigen_vec_inv = np.linalg.inv(Eigen_vec)
S = Eigen_vec_inv # Sigma # Sigma.transpose() # Eigen_vec_inv.transpose()
Atilde = np.dot(Sigma[0], Sigma[0])
Btilde = np.dot(Sigma[1], Sigma[1])
Ctilde = np.dot(Sigma[2], Sigma[2])
Dtilde = np.dot(Sigma[0], Sigma[1])
Etilde = np.dot(Sigma[0], Sigma[2])
Ftilde = np.dot(Sigma[1], Sigma[2])
res1= Atilde* Obs_Mty* Obs_Mty/6
res2= Btilde*(1/(2*Lambda**2) - (1-np.exp(-Lambda*Obs_Mty))/(Lambda**3*Obs_Mty) + (1-
np.exp(-2*Lambda*Obs_Mty))/(4*Lambda**3*Obs_Mty))
res3= Ctilde*(1/(2*Lambda**2) + np.exp(-Lambda*Obs_Mty)/(Lambda**2)-
Obs_Mty*np.exp(-2*Lambda*Obs_Mty)/(4*Lambda) -
3*np.exp(-2*Lambda*Obs_Mty)/(4*Lambda**2) - 2*(1-np.exp(-
Lambda*Obs_Mty))/(Lambda**3*Obs_Mty) + 5*(1-
np.exp(-2*Lambda*Obs_Mty))/(8*Lambda**3*Obs_Mty))
res4= Dtilde*(Obs_Mty/(2*Lambda) + np.exp(-Lambda*Obs_Mty)/(Lambda**2) - (1-np.exp(-
Lambda*Obs_Mty))/(Lambda**3*Obs_Mty))
res5= Etilde*(3*np.exp(-Lambda*Obs_Mty)/(Lambda**2) + Obs_Mty/(2*Lambda)+Obs_Mty*np.exp(-
Lambda*Obs_Mty)/(Lambda) - 3*(1-np.exp(-Lambda*Obs_Mty))/(Lambda**3*Obs_Mty))
res6= Ftilde*(1/(Lambda**2) + np.exp(-Lambda*Obs_Mty)/(Lambda**2) -
np.exp(-2*Lambda*Obs_Mty)/(2*Lambda**2) - 3*(1-np.exp(-
Lambda*Obs_Mty))/(Lambda**3*Obs_Mty) + 3*(1-
np.exp(-2*Lambda*Obs_Mty))/(4*Lambda**3*Obs_Mty))
val = res1 + res2 + res3 + res4 + res5 + res6
V_mat = np.zeros([3,3])
V_lim = np.zeros([3,3])
for i in range(3):
for j in range(3):
V_mat[i][j] = S[i][j]*(1-np.exp(-(Eigenvalues[i] +
Eigenvalues[j])*Timegap))/(Eigenvalues[i] + Eigenvalues[j])
V_lim[i][j] = S[i][j]/(Eigenvalues[i] + Eigenvalues[j])
Q = (Eigen_vec # V_mat # Eigen_vec.transpose()).real
Sigma_lim = (Eigen_vec # V_lim # Eigen_vec.transpose()).real
for i in range(N_obs):
y = Obs_Yield[i]
xhat = phi0 + phi1 # State
y_implied = B # xhat
v = y - y_implied + val
Sigmahat = phi1 # Sigma_lim # phi1.transpose() + Q
F = B # Sigmahat # B.transpose() + H_large
detF = np.linalg.det(F)
if CheckDet(detF) > 0:
N = 3
break
Finv = np.linalg.inv(F)
State = xhat + Sigmahat # B.transpose() # Finv # v
Sigma_lim = Sigmahat - Sigmahat # B.transpose() # Finv # B # Sigmahat
LLH_vec[i] = np.log(detF) + v.transpose() # Finv # v
if N == 0:
if Finalstate:
yDate = Obs_Date[-1]
result = np.array([yDate,State])
else:
result = 0.5 * (sum(LLH_vec) + Mty_length*N_obs*np.log(2*np.pi))
else:
result = 7000000
return result
I made a code that does Arbitrage Free Nelson-Siegel model. Data is return rates of bond (1Y,1.5Y, ... ,20Y). I wanna optimize that function with scipy optimize.minimize function with fixed *args.
Suppose that Initial parmas are verified that it's close to optimized params from empirical experiments using Dynamic Nelson-Siegel Model.
LLC_new = 0
while True:
LLC_old = LLC_new
OPT = optimize.minimize(x0=params,fun=DNS_Kalman_filter, args=
(data.values,data.index,timegap,maturity,0))
params = OPT.x
LLC_new = round(OPT.fun,5)
print("Current LLC: %0.5f" %LLC_new)
if LLC_old == LLC_new:
OPT_para = params
FinalState = DNS_Kalman_filter(params,data.values,data.index,timegap,maturity,True)
break
Result is
Current LLC: -7613.70146
Current LLC: -7613.70146
LLC(log-likelihood value) isn't maximized. It's not a result I desire using Optimizer.
Is there any solution for that?
In R, there is optim() function works as similar as scipy.optimize.minimize() which works really well. I also have a R code for that very similar to this Python code.
I have Matlab code that simulates something called the 2-D Lid Driven Cavity Flow. In this code I have the following structure:
for i = 1:timeStep
%Lets call this Part 1
for a1 = 1:N
for b1 = 1:N
%calculates things
end
end
%Lets call this Part 2
for a2 = 1:N
for b2 = 1:N
%calculates things
end
end
%Lets call this Part 3
for a3 = 1:N
for b3 = 1:N
%calculates things
end
end
end
Since Part 1, Part 2, and Part 3 are independent pf each other I would like to compute them in parallel, or multi thread them, every time there is a timeStep (every iteration of primary for loop). Is there any way I can achieve this?
Thanks!
I include my code to to reference:
Nx = 50;
Ny = 50;
numTimesteps = 10000;
reynoldsNum = 1000;
dt = 0.0025;
numIter = 100000;
Beta = 1.5;
maxErr = 0.001;
ds = 1/(Nx + 1);
x1 = 0:ds:1;
x2 = 0:ds:1;
time = 0;
boundarySpeed = 1;
PHI = zeros(Nx+2, Ny+2);
OMEGA = zeros(Nx+2, Ny+2);
U = zeros(Nx+2, Ny+2);
V = zeros(Nx+2, Ny+2);
x2d = zeros(Nx+2, Ny+2);
y2d = zeros(Nx+2, Ny+2);
PRESSURE = zeros(Nx+2, Ny+2);
B = zeros(Nx+2, Ny+2);
pressureOLD = zeros(Nx+2, Ny+2);
W = zeros(Nx+2, Ny+2);
for i = 1:Nx+2
for j = 1:Ny+2
x2d(i,j) = x1(i);
y2d(i,j) = x2(j);
end
end
for timeStep = 1:numTimesteps
if(mod(timeStep,10000) == 0)
disp(timeStep);
end
OLDPHI = PHI;
OLDOMEGA = OMEGA;
OLDPRESSURE = PRESSURE;
parfor parJob = 1:4
switch parJob
%{
----------------------------------
STREAM FUNCTION CALCULATION
----------------------------------
%}
case 1
for iter = 1:numIter
ERRMATRIX = OLDPHI;
for i = 2:Nx+1
for j = 2:Ny+1
PHI(i,j) = (1/4) * Beta * (OLDPHI(i+1,j) + OLDPHI(i-1,j) + OLDPHI(i,j+1) + OLDPHI(i,j-1) + ...
ds * ds * OLDOMEGA(i,j)) + (1 - Beta) * OLDPHI(i,j);
end
end
Err = 0;
for i = 1:Nx+2
for j = 1:Ny+2
Err = Err + abs(ERRMATRIX(i,j) - PHI(i,j));
end
end
if (Err <= maxErr)
break;
end
OLDPHI = PHI;
end
%{
----------------------------------
BOUNDARY CONDITIONS FOR VORTICITY
----------------------------------
%}
case 2
for i = 2:Nx+1
for j = 2:Ny+1
OMEGA(i,1) = -2 * OLDPHI(i,2) / (ds * ds); % bottom wall
OMEGA(i,Ny+2) = -2 * OLDPHI(i,Ny+1) / (ds * ds) - 2 * boundarySpeed / ds; % top wall
OMEGA(1,j) = -2 * OLDPHI(2,j) / (ds * ds); % right wall
OMEGA(Nx+2,j) = -2 * OLDPHI(Nx+1,j) / (ds * ds); % left wall
end
end
%{
----------------------------------
VORTICITY CALCULATIONS
----------------------------------
%}
for i = 2:Nx+1
for j = 2:Ny+1
W(i,j) = -(1 / 4) * ((OLDPHI(i,j+1) - OLDPHI(i,j-1)) * (OLDOMEGA(i+1,j) - OLDOMEGA(i-1,j)) ...
- (OLDPHI(i+1,j) - OLDPHI(i-1,j)) * (OLDOMEGA(i,j+1) - OLDOMEGA(i,j-1))) / (ds * ds) ...
+(1 / reynoldsNum) * (OLDOMEGA(i+1,j) + OLDOMEGA(i-1,j) + OLDOMEGA(i,j+1) + ...
OLDOMEGA(i,j-1) - 4 * OLDOMEGA(i,j)) / (ds * ds);
end
end
OMEGA(2:Nx+1,2:Ny+1) = OLDOMEGA(2:Nx+1,2:Ny+1) + dt * W(2:Nx+1,2:Ny+1);
time = time + dt;
for i = 1:Nx
for j = 1:Ny
x2d(i,j) = x1(i);
y2d(i,j) = x2(j);
end
end
%{
----------------------------------
U AND V CALCULATIONS
----------------------------------
%}
case 3
for i = 2:Nx+1
for j = 2:Ny+1
U(i,j) = (OLDPHI(i,j+1) - OLDPHI(i,j)) / (2 * ds);
V(i,j) = -(OLDPHI(i+1,j) - OLDPHI(i,j)) / (2 * ds);
U(:,Ny+2) = 1;
V(Nx+2,:) = 0.0;
end
end
%{
----------------------------------
PRESSURE CALCULATIONS
----------------------------------
%}
otherwise
for i = 2:Nx+1
for j = 2:Ny+1
PRESSURE(i,j) = (1/4) * (pressureOLD(i+1,j) + pressureOLD(i-1,j) + pressureOLD(i,j+1) ...
+ pressureOLD(i,j-1)) - (1/2) * (((((OLDPHI(i-1,j) - 2 * OLDPHI(i,j) + ...
OLDPHI(i+1,j)) / (ds^2)) * ((OLDPHI(i,j-1) - 2 * OLDPHI(i,j) + OLDPHI(i,j+1)) / (ds^2))) ...
- (OLDPHI(i+1,j+1) - OLDPHI(i+1,j-1) - OLDPHI(i-1,j+1) + OLDPHI(i-1,j-1)) / (4 * (ds^2))) * ds^2);
end
pressureOLD = PRESSURE;
end
end
end
You can use parfor to run jobs in parallel.
result = cell(3, 1);
parfor k = 1:3
result{k} = ['result-' num2str(k)];
switch k
case 1
disp('do part one')
case 2
disp('do part two')
otherwise
disp('do part three')
end
end
I need help in the following: I have a data file (columns separated by "\t" tabular) like this data.dat
# y1 y2 y3 y4
17.1685 21.6875 20.2393 26.3158
These are x values of 4 points for a linear fit. The four y values are constant: 0, 200, 400, 600.
I can create a linear fit of the point pairs (x,y): (x1,y1)=(17.1685,0), (x2,y2)=(21.6875,200), (x3,y3)=(20.2393,400), (x4,y4)=(26.3158,600).
Now I would like to make a linear fit on three of these point paris, (x1,y1), (x2,y2), (x3,y3) and (x2,y2), (x3,y3), (x4,y4) and (x1,y1), (x3,y3), (x4,y4) and (x1,y1), (x2,y2), (x4,y4).
If I have these three of points with a linear fit I would like to know the value of the x value of the extrapolated point being out of these three fitted points.
I have so far this awk code:
#!/usr/bin/awk -f
BEGIN{
z[1] = 0;
z[2] = 200;
z[3] = 400;
z[4] = 600;
}
{
split($0,str,"\t");
n = 0.0;
for(i=1; i<=NF; i++)
{
centr[i] = str[i];
n += 1.0;
# printf("%d\t%f\t%.1f\t",i,centr[i],z[i]);
}
# print "";
if (n > 2)
{
lsq(n,z,centr);
}
}
function lsq(n,x,y)
{
sx = 0.0
sy = 0.0
sxx = 0.0
syy = 0.0
sxy = 0.0
eps = 0.0
for (i=1;i<=n;i++)
{
sx += x[i]
sy += y[i]
sxx += x[i]*x[i]
sxy += x[i]*y[i]
syy += y[i]*y[i]
}
if ( (n==0) || ((n*sxx-sx*sx)==0) )
{
next;
}
# print "number of data points = " n;
a = (sxx*sy-sxy*sx)/(n*sxx-sx*sx)
b = (n*sxy-sx*sy)/(n*sxx-sx*sx)
for(i=1;i<=n;i++)
{
ycalc[i] = a+b*x[i]
dy[i] = y[i]-ycalc[i]
eps += dy[i]*dy[i]
}
print "# Intercept =\t"a"
print "# Slope =\t"b"
for (i=1;i<=n;i++)
{
printf("%8g %8g %8g \n",x[i],y[i],ycalc[i])
}
} # function lsq()
So,
If we extrapolate to the place of 4th
0 17.1685 <--(x1,y1)
200 21.6875 <--(x2,y2)
400 20.2393 <--(x3,y3)
600 22.7692 <<< (x4 = 600,y1 = 22.7692)
If we extrapolate to the place of 3th
0 17.1685 <--(x1,y1)
200 21.6875 <--(x2,y2)
400 23.6867 <<< (x3 = 400,y3 = 23.6867)
600 26.3158 <--(x4,y4)
0 17.1685
200 19.35266 <<<
400 20.2393
600 26.3158
0 18.1192 <<<
200 21.6875
400 20.2393
600 26.3158
My current output is the following:
$> ./prog.awk data.dat
# Intercept = 17.4537
# Slope = 0.0129968
0 17.1685 17.4537
200 21.6875 20.0531
400 20.2393 22.6525
600 26.3158 25.2518
Assuming the core calculation in the lsq function is OK (it looks about right, but I haven't scrutinized it), then that gives you the slope and intercept for the least sum of squares line of best fit for the input data set (parameters x, y, n). I'm not sure I understand the tail end of the function.
For your 'take three points and calculate the fourth' problem, the simplest way is to generate the 4 subsets (logically, by deleting one point from the set of four on each of four calls), and redo the calculation.
You need to call another function that takes the line data (slope, intercept) from lsq and interpolates (extrapolates) the value at another y value. That's a straight-forward calculation (x = m * y + c), but you need to determine which y value is missing from the set of 3 you pass in.
You could 'optimize' (meaning 'complicate') this scheme by dropping one value at a time from the 'sums of squares' and 'sums' and 'sum of products' values, recalculating the slope, intercept, and then calculating the missing point again.
(I'll also observe that normally it would be the x-coordinates with the fixed values 0, 200, 400, 600 and the y-coordinates would be the values read. However, that's just a matter of orientation, so it is not crucial.)
Here's at least plausibly working code. Since awk automatically splits on white space, there's no need for you to split on tabs specifically; the read loop takes this into account.
The code needs serious refactoring; there is a ton of repetition in it - however, I also have a job that I'm supposed to do.
#!/usr/bin/awk -f
BEGIN{
z[1] = 0;
z[2] = 200;
z[3] = 400;
z[4] = 600;
}
{
for (i = 1; i <= NF; i++)
{
centr[i] = $i
}
if (NF > 2)
{
lsq(NF, z, centr);
}
}
function lsq(n, x, y)
{
if (n == 0) return
sx = 0.0
sy = 0.0
sxx = 0.0
syy = 0.0
sxy = 0.0
for (i = 1; i <= n; i++)
{
print "x[" i "] = " x[i] ", y[" i "] = " y[i]
sx += x[i]
sy += y[i]
sxx += x[i]*x[i]
sxy += x[i]*y[i]
syy += y[i]*y[i]
}
if ((n*sxx - sx*sx) == 0) return
# print "number of data points = " n;
a = (sxx*sy-sxy*sx)/(n*sxx-sx*sx)
b = (n*sxy-sx*sy)/(n*sxx-sx*sx)
for (i = 1; i <= n; i++)
{
ycalc[i] = a+b*x[i]
}
print "# Intercept = " a
print "# Slope = " b
print "Line: x = " a " + " b " * y"
for (i = 1; i <= n; i++)
{
printf("x = %8g, yo = %8g, yc = %8g\n", x[i], y[i], ycalc[i])
}
print ""
print "Different subsets\n"
for (drop = 1; drop <= n; drop++)
{
print "Subset " drop
sx = sy = sxx = sxy = syy = 0
j = 1
for (i = 1; i <= n; i++)
{
if (i == drop) continue
print "x[" j "] = " x[i] ", y[" j "] = " y[i]
sx += x[i]
sy += y[i]
sxx += x[i]*x[i]
sxy += x[i]*y[i]
syy += y[i]*y[i]
j++
}
if (((n-1)*sxx - sx*sx) == 0) continue
a = (sxx*sy-sxy*sx)/((n-1)*sxx-sx*sx)
b = ((n-1)*sxy-sx*sy)/((n-1)*sxx-sx*sx)
print "Line: x = " a " + " b " * y"
xt = x[drop]
yt = a + b * xt;
print "Interpolate: x = " xt ", y = " yt
}
}
Since awk doesn't provide an easy way to pass back multiple values from a function, nor does it provide structures other than arrays (sometimes associative), it is not perhaps the best language for this task. On the other hand, it can be made to do the job. You might be able to bundle the Least Squares calculation in a function that returns an array containing the slope and intercept, and then use that. Your turn to explore options.
Given the script lsq.awk and the input file lsq.data shown, I get the output shown:
$ cat lsq.data
17.1685 21.6875 20.2393 26.3158
$ awk -f lsq.awk lsq.data
x[1] = 0, y[1] = 17.1685
x[2] = 200, y[2] = 21.6875
x[3] = 400, y[3] = 20.2393
x[4] = 600, y[4] = 26.3158
# Intercept = 17.4537
# Slope = 0.0129968
Line: x = 17.4537 + 0.0129968 * y
x = 0, yo = 17.1685, yc = 17.4537
x = 200, yo = 21.6875, yc = 20.0531
x = 400, yo = 20.2393, yc = 22.6525
x = 600, yo = 26.3158, yc = 25.2518
Different subsets
Subset 1
x[1] = 200, y[1] = 21.6875
x[2] = 400, y[2] = 20.2393
x[3] = 600, y[3] = 26.3158
Line: x = 18.1192 + 0.0115708 * y
Interpolate: x = 0, y = 18.1192
Subset 2
x[1] = 0, y[1] = 17.1685
x[2] = 400, y[2] = 20.2393
x[3] = 600, y[3] = 26.3158
Line: x = 16.5198 + 0.0141643 * y
Interpolate: x = 200, y = 19.3526
Subset 3
x[1] = 0, y[1] = 17.1685
x[2] = 200, y[2] = 21.6875
x[3] = 600, y[3] = 26.3158
Line: x = 17.7985 + 0.0147205 * y
Interpolate: x = 400, y = 23.6867
Subset 4
x[1] = 0, y[1] = 17.1685
x[2] = 200, y[2] = 21.6875
x[3] = 400, y[3] = 20.2393
Line: x = 18.163 + 0.007677 * y
Interpolate: x = 600, y = 22.7692
$
Edit: In the previous version of the answer, the subsets were multiplying by n instead of (n-1). The values in the revised output seem to agree with what you expect. The residual issues are presentational, not computational.