I am running a back-testing program on python. However, even though the maths/logic is simple, python seems to be taking an extremely long time to calculate the FOR loop.
For each row/line, it takes on average 1-sec; and when I have thousands to potentially ten-of-thousands of rows-of-data, the time-taken is impractical.
I use a panda dataframe as the base, and generate forward calculations by for-loop. Is there a more efficient way, or what could I do to reduce the computational time?
def signal_TA1(data, periods):
columns = ['x1', 'x2', 'x3', .......]
pd_Append = pd.DataFrame((np.zeros((len(data.index),len(columns)))), columns = columns) #create and initialize as zeros needed columns
data = data.join(pd_Append)
data['Size'] = data.bidQ + data.askQ
data['prx'] = (data.bid * data.askQ + data.ask * data.bidQ)/data.Size
for i in range(1, len(data.index), 1):
data.emaX.iloc[i] = data.lambda_.iloc[i] * data.Size.iloc[i] + (1 - data.lambda_.iloc[i]) * data.emaX.iloc[i-1]
xxxxxx
xxxxx
xxxxx
return data
It seems (well, it seems to be relatively known) that numpy processes looped calculations much more effectively than pandas (as it has to re-built the whole array each time).
Basically, I create a numpy array [x,y] within the function. Then, I calculate via a for-loop and populate the numpy array, row-by-row. Finally, I merely convert the finished numpy array into a pandas dataframe (for easier display and plotting).
The time difference is forever versus < 1 second for about 2,500 rows of data-and-calculation.
def signal_M2(data, weight, pandas = True):
bid = np.array(data.bid)
ask = np.array(data.ask)
askQ = np.array(data.askQ)
bidQ = np.array(data.bidQ)
size = bidQ + askQ
VWAP = (bid * askQ + ask * bidQ)/(bidQ + askQ)
columns = [x1, x2, x3, x4, x5, .....]
datB = np.zeros((len(data.index), len(columns)))
datA = pd.DataFrame(index=[0], columns = columns)
lambda_ = 0.5
weight = 0.3
x1 = VWAP[0]
x2 = VWAP[0]
x3 = VWAP[0]
x4 = VWAP[0]
x5 = size[0]
....
....
datB[0] = (bid[0], ask[0], bidQ[0], askQ[0], size[0], ..........)
for row in range(1, len(data.index), 1):
x1 = lambda_ * size[row] + (1 - lambda_) * emaInertia
x2 = weight * VWAP[row] + (1 - weight) * emaPrx
x3 = weight * VWAP[row] + (1 -weight) * emaPrxSlow
x4 = weight * VWAP[row] + (1 -weight) * emaPrxFast
x5 = weight * VWAP[row] + (1 -weight) * emaPrxLead
if pandas == True:
datB[row] = (bid[row], ask[row], bidQ[row], ...........)
else:
print(................)
if pandas == True:
datB = pd.DataFrame(datB, columns = columns)
return datB
else :
print('no pandas dataframe was asked to be be stored')
Related
I created a sample data frame as below:
data = {'X': [10,20,30,40,50,60,70,80,90],
'Y': [1,2,3,4,5,6,7,8,9],
}
df_test = pd.DataFrame(data)
Now I need to find the distance between two points for the above sample data set, to that, I had already defined a function
import numpy as np
def dist(X1,Y1,X2,Y2):
#find the distance between 2 points
d = np.sqrt((X2 - X1) * (X2 - X1) + (Y2 - Y1) * (Y2 - Y1))
return d
I need to find the distance between 2 points, as example, if I used the 1st row I need to compare that row with all the other rows in the same data frame column, Then I need to use 2nd row and compare that row with all the other rows in the same column and so on until all the rows had compared with each other
To do this I used a nested for loop as below
res = []
for i in range(0,len(df_test)):
X1 = df_test.A[i]
Y1 = df_test.B[i]
for j in range(0,len(df_test)):
X2 = df_test.A[j]
Y2 = df_test.B[j]
res = dist(X1,Y1,X2,Y2)
print(res)
print( )
How can I do the above-mentioned process without using a nested for loop , since this method will work with a small number of rows in a data frame but when dealing with large datasets terminal will get killed
For reasons, I need to implement the Runge-Kutta4 method in PyTorch (so no, I'm not going to use scipy.odeint). I tried and I get weird results on the simplest test case, solving x'=x with x(0)=1 (analytical solution: x=exp(t)). Basically, as I reduce the time step, I cannot get the numerical error to go down. I'm able to do it with a simpler Euler method, but not with the Runge-Kutta 4 method, which makes me suspect some floating point issue here (maybe I'm missing some hidden conversion from double precision to single)?
import torch
import numpy as np
import matplotlib.pyplot as plt
def Euler(f, IC, time_grid):
y0 = torch.tensor([IC])
time_grid = time_grid.to(y0[0])
values = y0
for i in range(0, time_grid.shape[0] - 1):
t_i = time_grid[i]
t_next = time_grid[i+1]
y_i = values[i]
dt = t_next - t_i
dy = f(t_i, y_i) * dt
y_next = y_i + dy
y_next = y_next.unsqueeze(0)
values = torch.cat((values, y_next), dim=0)
return values
def RungeKutta4(f, IC, time_grid):
y0 = torch.tensor([IC])
time_grid = time_grid.to(y0[0])
values = y0
for i in range(0, time_grid.shape[0] - 1):
t_i = time_grid[i]
t_next = time_grid[i+1]
y_i = values[i]
dt = t_next - t_i
dtd2 = 0.5 * dt
f1 = f(t_i, y_i)
f2 = f(t_i + dtd2, y_i + dtd2 * f1)
f3 = f(t_i + dtd2, y_i + dtd2 * f2)
f4 = f(t_next, y_i + dt * f3)
dy = 1/6 * dt * (f1 + 2 * (f2 + f3) +f4)
y_next = y_i + dy
y_next = y_next.unsqueeze(0)
values = torch.cat((values, y_next), dim=0)
return values
# differential equation
def f(T, X):
return X
# initial condition
IC = 1.
# integration interval
def integration_interval(steps, ND=1):
return torch.linspace(0, ND, steps)
# analytical solution
def analytical_solution(t_range):
return np.exp(t_range)
# test a numerical method
def test_method(method, t_range, analytical_solution):
numerical_solution = method(f, IC, t_range)
L_inf_err = torch.dist(numerical_solution, analytical_solution, float('inf'))
return L_inf_err
if __name__ == '__main__':
Euler_error = np.array([0.,0.,0.])
RungeKutta4_error = np.array([0.,0.,0.])
indices = np.arange(1, Euler_error.shape[0]+1)
n_steps = np.power(10, indices)
for i, n in np.ndenumerate(n_steps):
t_range = integration_interval(steps=n)
solution = analytical_solution(t_range)
Euler_error[i] = test_method(Euler, t_range, solution).numpy()
RungeKutta4_error[i] = test_method(RungeKutta4, t_range, solution).numpy()
plots_path = "./plots"
a = plt.figure()
plt.xscale('log')
plt.yscale('log')
plt.plot(n_steps, Euler_error, label="Euler error", linestyle='-')
plt.plot(n_steps, RungeKutta4_error, label="RungeKutta 4 error", linestyle='-.')
plt.legend()
plt.savefig(plots_path + "/errors.png")
The result:
As you can see, the Euler method converges (slowly, as expected of a first order method). However, the Runge-Kutta4 method does not converge as the time step gets smaller and smaller. The error goes down initially, and then up again. What's the issue here?
The reason is indeed a floating point precision issue. torch defaults to single precision, so once the truncation error becomes small enough, the total error is basically determined by the roundoff error, and reducing the truncation error further by increasing the number of steps <=> decreasing the time step doesn't lead to any decrease in the total error.
To fix this, we need to enforce double precision 64bit floats for all floating point torch tensors and numpy arrays. Note that the right way to do this is to use respectively torch.float64 and np.float64 rather than, e.g., torch.double and np.double, because the former are fixed-sized float values, (always 64bit) while the latter depend on the machine and/or compiler. Here's the fixed code:
import torch
import numpy as np
import matplotlib.pyplot as plt
def Euler(f, IC, time_grid):
y0 = torch.tensor([IC], dtype=torch.float64)
time_grid = time_grid.to(y0[0])
values = y0
for i in range(0, time_grid.shape[0] - 1):
t_i = time_grid[i]
t_next = time_grid[i+1]
y_i = values[i]
dt = t_next - t_i
dy = f(t_i, y_i) * dt
y_next = y_i + dy
y_next = y_next.unsqueeze(0)
values = torch.cat((values, y_next), dim=0)
return values
def RungeKutta4(f, IC, time_grid):
y0 = torch.tensor([IC], dtype=torch.float64)
time_grid = time_grid.to(y0[0])
values = y0
for i in range(0, time_grid.shape[0] - 1):
t_i = time_grid[i]
t_next = time_grid[i+1]
y_i = values[i]
dt = t_next - t_i
dtd2 = 0.5 * dt
f1 = f(t_i, y_i)
f2 = f(t_i + dtd2, y_i + dtd2 * f1)
f3 = f(t_i + dtd2, y_i + dtd2 * f2)
f4 = f(t_next, y_i + dt * f3)
dy = 1/6 * dt * (f1 + 2 * (f2 + f3) +f4)
y_next = y_i + dy
y_next = y_next.unsqueeze(0)
values = torch.cat((values, y_next), dim=0)
return values
# differential equation
def f(T, X):
return X
# initial condition
IC = 1.
# integration interval
def integration_interval(steps, ND=1):
return torch.linspace(0, ND, steps, dtype=torch.float64)
# analytical solution
def analytical_solution(t_range):
return np.exp(t_range, dtype=np.float64)
# test a numerical method
def test_method(method, t_range, analytical_solution):
numerical_solution = method(f, IC, t_range)
L_inf_err = torch.dist(numerical_solution, analytical_solution, float('inf'))
return L_inf_err
if __name__ == '__main__':
Euler_error = np.array([0.,0.,0.], dtype=np.float64)
RungeKutta4_error = np.array([0.,0.,0.], dtype=np.float64)
indices = np.arange(1, Euler_error.shape[0]+1)
n_steps = np.power(10, indices)
for i, n in np.ndenumerate(n_steps):
t_range = integration_interval(steps=n)
solution = analytical_solution(t_range)
Euler_error[i] = test_method(Euler, t_range, solution).numpy()
RungeKutta4_error[i] = test_method(RungeKutta4, t_range, solution).numpy()
plots_path = "./plots"
a = plt.figure()
plt.xscale('log')
plt.yscale('log')
plt.plot(n_steps, Euler_error, label="Euler error", linestyle='-')
plt.plot(n_steps, RungeKutta4_error, label="RungeKutta 4 error", linestyle='-.')
plt.legend()
plt.savefig(plots_path + "/errors.png")
Result:
Now, as we decrease the time step, the error of the RungeKutta4 approximation decreases with the correct rate.
A bit of background:
I want to calculate the array factor of a MxN antenna array, which is given by the following equation:
Where w_i are the complex weight of the i-th element, (x_i,y_i,z_i) is the position of the i-th element, k is the wave number, theta and phi are the elevation and azimuth respectively, and i ranges from 0 to MxN-1.
In the code I have:
-theta and phi are np.mgrid with shape (200,200) each,
-w_i, and (x,y,z)_i are np.array with shape (NxM,) each
so AF is a np.array with shape (200,200) (sum over i).There is no problem so far, and I can get AF easily doing:
af = zeros([theta.shape[0],phi.shape[0]])
for i in range(self.size[0]*self.size[1]):
af = af + ( w[i]*e**(-1j*(k * x_pos[i]*sin(theta)*cos(phi) + k * y_pos[i]* sin(theta)*sin(phi)+ k * z_pos[i] * cos(theta))) )
Now, each w_i depends on frequency, so AF too, and now I have w_i with shape (NxM,1000) (I have 1000 samples of each w_i in frequency). I tried to use the above code changing
af = zeros([1000,theta.shape[0],phi.shape[0]])
but I get 'operands could not be broadcast together'. I can solve this by using a for loop through the 1000 values, but it is slow and is a bit ugly. So, what is the correct way to do the summation, or the correct way to properly define w_i and AF ?
Any help would be appreciated. Thanks.
edit
The code with the new dimension I'm trying to add is the next:
from numpy import *
class AntennaArray:
def __init__(self,f,asize=None,tipo=None,dx=None,dy=None):
self.Lambda = 299792458 / f
self.k = 2*pi/self.Lambda
self.size = asize
self.type = tipo
self._AF_DATA_SIZE = 200
self.theta,self.phi = mgrid[0 : pi : self._AF_DATA_SIZE*1j,0 : 2*pi : self._AF_DATA_SIZE*1j]
self.element_pos = None
self.element_amp = None
self.element_pha = None
if dx == None:
self.dx = self.Lambda/2
else:
self.dx = dx
if dy == None:
self.dy = self.Lambda/2
else:
self.dy = dy
self.generate_array()
def generate_array(self):
M = self.size[0]
N = self.size[1]
dx = self.dx
dy = self.dy
x_pos = arange(0,dx*N,dx)
y_pos = arange(0,dy*M,dy)
z_pos = 0
ele = zeros([N*M,3])
for i in range(M):
ele[i*N:(i+1)*N,0] = x_pos[:]
for i in range(M):
ele[i*N:(i+1)*N,1] = y_pos[i]
self.element_pos = ele
#self.array_factor = self.calculate_array_factor()
def calculate_array_factor(self):
theta,phi = self.theta,self.phi
k = self.k
x_pos = self.element_pos[:,0]
y_pos = self.element_pos[:,1]
z_pos = self.element_pos[:,2]
w = self.element_amp*exp(1j*self.element_pha)
if len(self.element_pha.shape) > 1:
#I have f_size samples of w_i(f)
f_size = self.element_pha.shape[1]
af = zeros([f_size,theta.shape[0],phi.shape[0]])
else:
#I only have w_i
af = zeros([theta.shape[0],phi.shape[0]])
for i in range(self.size[0]*self.size[1]):
**strong text**#This for loop does the summation over i
af = af + ( w[i]*e**(-1j*(k * x_pos[i]*sin(theta)*cos(phi) + k * y_pos[i]* sin(theta)*sin(phi)+ k * z_pos[i] * cos(theta))) )
return af
I tried to test it with the next main
from numpy import *
f_points = 10
M = 2
N = 2
a = AntennaArray(5.8e9,[M,N])
a.element_amp = ones([M*N,f_points])
a.element_pha = zeros([M*N,f_points])
af = a.calculate_array_factor()
But I get
ValueError: 'operands could not be broadcast together with shapes (10,) (200,200) '
Note that if I set
a.element_amp = ones([M*N])
a.element_pha = zeros([M*N])
This works well.
Thanks.
I had a look at the code, and I think this for loop:
af = zeros([theta.shape[0],phi.shape[0]])
for i in range(self.size[0]*self.size[1]):
af = af + ( w[i]*e**(-1j*(k * x_pos[i]*sin(theta)*cos(phi) + k * y_pos[i]* sin(theta)*sin(phi)+ k * z_pos[i] * cos(theta))) )
is wrong in many ways. You are mixing dimensions, you cannot loop that way.
And by the way, to make full use of numpy efficiency, never loop over the arrays. It slows down the execution significantly.
I tried to rework that part.
First, I advice you to not use from numpy import *, it's bad practice (see here). Use import numpy as np. I reintroduced the np abbreviation, so you can understand what comes from numpy.
Frequency independent case
This first snippet assumes that w is a 1D array of length 4: I am neglecting the frequency dependency of w, to show you how you can get what you already obtained without the for loop and using instead the power of numpy.
af_points = w[:,np.newaxis,np.newaxis]*np.e**(-1j*
(k * x_pos[:,np.newaxis,np.newaxis]*np.sin(theta)*np.cos(phi) +
k * y_pos[:,np.newaxis,np.newaxis]*np.sin(theta)*np.sin(phi) +
k * z_pos[:,np.newaxis,np.newaxis]*np.cos(theta)
))
af = np.sum(af_points, axis=0)
I am using numpy broadcasting to obtain a 3D array named af_points, whose shape is (4, 200, 200). To do it, I use np.newaxis to extend the number of axis of an array in order to use broadcasting correctly. More here on np.newaxis.
So, w[:,np.newaxis,np.newaxis] is an array of shape (4, 1, 1). Similarly for x_pos[:,np.newaxis,np.newaxis], y_pos[:,np.newaxis,np.newaxis] and z_pos[:,np.newaxis,np.newaxis]. Since the angles have shape (200, 200), broadcasting can be done, and af_points has shape (4, 200, 200).
Finally the sum is done by np.sum, summing over the first axis to obtain a (200, 200) array.
Frequency dependent case
Now w has shape (4, 10), where 10 are the frequency points. The idea is the same, just consider that the frequency is an additional dimension in your numpy arrays: now af_points will be an array of shape (4, 10, 200, 200) where 10 are the f_points you have defined.
To keep it understandable, I've split the calculation:
#exp_point is only the exponent, frequency independent. Will be a (4, 200, 200) array.
exp_points = np.e**(-1j*
(k * x_pos[:,np.newaxis,np.newaxis]*np.sin(theta)*np.cos(phi) +
k * y_pos[:,np.newaxis,np.newaxis]*np.sin(theta)*np.sin(phi) +
k * z_pos[:,np.newaxis,np.newaxis]*np.cos(theta)
))
af_points = w[:,:,np.newaxis,np.newaxis] * exp_points[:,np.newaxis,:,:]
af = np.sum(af_points, axis=0)
And now af has shape (10, 200, 200).
I have two functions that compute the same metric. One ends up using a list comprehension to cycle through a calculation, the other uses only numpy tensor operations. The functions take in a (N, 3) array, where N is the number of points in 3D space. When N <~ 3000 the tensor function is faster, when N >~ 3000 the list comprehension is faster. Both seem to have linear time complexity in terms of N i.e two time-N lines cross at N=~3000.
def approximate_area_loop(section, num_area_divisions):
n_a_d = num_area_divisions
interp_vectors = get_section_interp_(section)
a1 = section[:-1]
b1 = section[1:]
a2 = interp_vectors[:-1]
b2 = interp_vectors[1:]
c = lambda u: (1 - u) * a1 + u * a2
d = lambda u: (1 - u) * b1 + u * b2
x = lambda u, v: (1 - v) * c(u) + v * d(u)
area = np.sum([np.linalg.norm(np.cross((x((i + 1)/n_a_d, j/n_a_d) - x(i/n_a_d, j/n_a_d)),\
(x(i/n_a_d, (j +1)/n_a_d) - x(i/n_a_d, j/n_a_d))), axis = 1)\
for i in range(n_a_d) for j in range(n_a_d)])
Dt = section[-1, 0] - section[0, 0]
return area, Dt
def approximate_area_tensor(section, num_area_divisions):
divisors = np.linspace(0, 1, num_area_divisions + 1)
interp_vectors = get_section_interp_(section)
a1 = section[:-1]
b1 = section[1:]
a2 = interp_vectors[:-1]
b2 = interp_vectors[1:]
c = np.multiply.outer(a1, (1 - divisors)) + np.multiply.outer(a2, divisors) # c_areas_vecs_divs
d = np.multiply.outer(b1, (1 - divisors)) + np.multiply.outer(b2, divisors) # d_areas_vecs_divs
x = np.multiply.outer(c, (1 - divisors)) + np.multiply.outer(d, divisors) # x_areas_vecs_Divs_divs
u = x[:, :, 1:, :-1] - x[:, :, :-1, :-1] # u_areas_vecs_Divs_divs
v = x[:, :, :-1, 1:] - x[:, :, :-1, :-1] # v_areas_vecs_Divs_divs
sub_area_norm_vecs = np.cross(u, v, axis = 1) # areas_crosses_Divs_divs
sub_areas = np.linalg.norm(sub_area_norm_vecs, axis = 1) # areas_Divs_divs (values are now sub areas)
area = np.sum(sub_areas)
Dt = section[-1, 0] - section[0, 0]
return area, Dt
Why does the list comprehension version work faster at large N? Surely the tensor version should be faster? I'm wondering if it's something to do with the size of the calculations meaning it's too big to be done in cache? Please ask if I haven't included enough information, I'd really like to get to the bottom of this.
The bottleneck in the fully vectorized function was indeed in the np.linalg.norm as #hpauljs comment suggested.
Norm was used only to get the magnitude of all the vectors contained in axis 1. A much simpler and faster method was to just:
sub_areas = np.sqrt((sub_area_norm_vecs*sub_area_norm_vecs).sum(axis = 1))
This gives exactly the same results and sped up the code by up to 25 times faster than the loop implementation (even when the loop doesn't use linalg.norm either).
This program converts coordinates. What I am trying to do is to
use a csv file as input
use the function to convert the coordinates
save the output as a new csv file.
My file (worksheet.csv) has three columns, latitude, longitude and height.
How would I approach this?
import math
import csv
# semi-major axis of earth
a = 6378137.0
# 1/f is reciprocal of flatteing
f= 0.00335281068
# converts the input from degree to radians
latitude = math.radians(float(input('Enter Latitude:')))
longitude = math.radians(float(input('Enter Longitude:')))
height = float(input('Enter Height:'))
def earthConverter(latitude, longitude, height):
e = math.sqrt((2 * f) - (f**2))
N = a / math.sqrt(1-e**2 * math.sin(latitude)**2)
x = (N + height) * math.cos(latitude) * math.cos(longitude)
y = (N + height) * math.cos(latitude) * math.sin(longitude)
z = (N * (1 - e**2 ) + height) * math.sin(latitude)
return x, y, z
############################################
with open('worksheet.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
writer = csv.writer(csvFile)
writer.writerow(row[0], row[1], row[2], earthConverter(math.radians(float(row[0])),
earthConverter(math.radians(float(row[1])), earthConverter(float(row[2])) )
csvFile.close()
You've very close, but there are several things that need to be changed. Here's what I think is a full solution, but below I'll work through each part of the code
import math
import csv
def earthConverter(latitude, longitude, height):
f = 0.00335281068
a = 6378137.0
e = math.sqrt((2 * f) - (f**2))
N = a / math.sqrt(1-e**2 * math.sin(latitude)**2)
x = (N + height) * math.cos(latitude) * math.cos(longitude)
y = (N + height) * math.cos(latitude) * math.sin(longitude)
z = (N * (1 - e**2 ) + height) * math.sin(latitude)
return x, y, z
with open('worksheet.csv', 'r') as Infile, open('worksheet_out.csv', 'w') as Outfile:
reader = csv.reader(Infile)
# next(reader, None)
writer = csv.writer(Outfile)
for row in reader:
lat = math.radians(float(row[0]))
lon = math.radians(float(row[1]))
ht = math.radians(float(row[2]))
x, y, z = earthConverter(lat, lon, ht)
row_out = [row[0], row[1], row[2], x, y, z]
writer.writerow(row_out)
First, you can move the definitions of f and a into the earthConverter function itself to avoid any possible problems with variable scoping. This isn't strictly necessary.
Second, you can get rid of the latitude = math.radians(float(input('Enter Latitude:'))) lines. Those ask for user input, which is not what you want here.
Third, you cannot write back to the same csv. You've opened it in read mode ('r'), but even if you changed that, this post gives some details about why that won't work/is a bad idea. You can also get rid of the separate call to close the csv at the end of your code -- the with open() construction takes care of that for you.
Fourth, your earthConverter function returns a tuple, so you need to unpack those values somehow before trying to write them out again.
Everything in the for row in reader: block could be condensed into fewer rows. I broke it up this way because it makes it a little easier to read.
Also, you didn't mention whether your input csv had a header. If it does, then uncomment the line next(reader, None), which will skip the header. If you need to write a header out again, then you could change the for row in reader: block to this:
for i, row in enumerate(reader):
if i == 1:
header_out = ['lat', 'lon', 'ht', 'x', 'y', 'z'] # or whatever
writer.writerow(header_out)
lat = math.radians(float(row[0]))
lon = math.radians(float(row[1]))
ht = math.radians(float(row[2]))
x, y, z = earthConverter(lat, lon, ht)
row_out = [row[0], row[1], row[2], x, y, z]
writer.writerow(row_out)
All you have to do is create a Dataframe to read the csv file and create a for loop to iterate through each row so and insert it into a new Dataframe. Then we let the panda library Export it into a new csv file.
import pandas as pd
import math
# semi-major axis of earth
a = 6378137.0
# 1/f is reciprocal of flatteing
f = 0.00335281068
def earthConverter(latitude, longitude, height):
e = math.sqrt((2 * f) - (f**2))
N = a / math.sqrt(1-e**2 * math.sin(latitude)**2)
x = (N + height) * math.cos(latitude) * math.cos(longitude)
y = (N + height) * math.cos(latitude) * math.sin(longitude)
z = (N * (1 - e**2 ) + height) * math.sin(latitude)
return x, y, z
def new_csv(input_file, output_file):
df = pd.read_csv(input_file)
points_df = pd.DataFrame(columns=['Latitude', 'Longitude', 'Height'])
for i, row in df.iterrows():
x1, y1, z1 = earthConverter(row['Latitude'], row['Longitude'], row['Height'])
temp_df = pd.DataFrame({'Latitude': x1,
'Longitude': y1,
'Height': z1}, index=[0])
points_df = points_df.append(temp_df, ignore_index=True)
points_df.to_csv(output_file)
new_csv('worksheet.csv', 'new_worksheet.csv')