How to convert a list of tuples to a csv file - python-3.x

Very new to programming and was trying to create an amortization table. Found some great questions and answers on here, but now I am stuck trying to convert the results into a csv file.
from datetime import date
from collections import OrderedDict
from dateutil.relativedelta import *
import csv
def amortization_schedule(rate, principal, period):
start_date=date.today()
#defining the monthly payment for a loan
payment = -float(principal / ((((1 + (rate / 12)) ** period) - 1) / ((rate / 12) * (1 + (rate / 12)) ** period)))
beg_balance = principal
end_balance = principal
period = 1
while end_balance > 0 and period <= 60 * 12:
#Recalculate the interest based on the current balance
interest_paid = round((rate / 12) * beg_balance, 2)
#Determine payment based on whether or not this period will pay off the loan
payment = round(min(payment, beg_balance + interest_paid), 2)
principal = round(-payment - interest_paid, 2)
yield OrderedDict([('Month', start_date),
('Period', period),
('Begin Balance', beg_balance),
('Payment', payment),
('Principal', principal),
('Interest', interest_paid),
('End Balance', end_balance)])
#increment the counter, date and balance
period +=1
start_date += relativedelta(months=1)
beg_balance = end_balance
I attempted to use this link as part of my solution but ended up with a csv that looked like the following:
M,o,n,t,h
P,e,r,i,o,d
B,e,g,i,n, ,B,a,l,a,n,c,e
P,a,y,m,e,n,t
P,r,i,n,c,i,p,a,l
I,n,t,e,r,e,s,t
E,n,d, ,B,a,l,a,n,c,e
Here is my code for the conversion to csv.
for start_date, period, beg_balance, payment, principal,
interest_paid, end_balance in amortization_schedule(user_rate,
user_principal, user_period):
start_dates.append(start_date)
periods.append(period)
beg_balances.append(beg_balance)
payments.append(payment)
principals.append(principal)
interest_paids.append(interest_paid)
end_balances.append(end_balance)
with open('amortization.csv', 'w') as outfile:
csvwriter = csv.writer(outfile)
csvwriter.writerow(start_dates)
csvwriter.writerow(periods)
csvwriter.writerow(beg_balances)
csvwriter.writerow(payments)
csvwriter.writerow(principals)
csvwriter.writerow(interest_paids)
csvwriter.writerow(end_balances)
Any help would be appreciated!

with open('amortization.csv', 'w', newline='') as outfile:
fieldnames = ['Month', 'Period', 'Begin Balance', 'Payment',
'Principal', 'Interest', 'End Balance']
csvwriter = csv.DictWriter(outfile, fieldnames)
for line in amortization_schedule(user_rate, user_principal, user_period):
csvwriter.writerow(line)
Code for write of csv file.
collections.OrderedDict is a dictionary, so may need to use csv.DictWriter
to write the dictionary. It is a dictionary so you should not need all of the
lines that you have for the conversion to csv.

Related

Python: Identify invalid online link for a zip file

I am trying to automate stock price data extraction from https://www.nseindia.com/. Data is stored as a zip file and url for the zip file file varies by date. If on a certain date stock market is closed eg - weekends and holidays, there would be no file/url.
I want to identify invalid links (links that dont exist) and skip to next link.
This is a valid link -
path = 'https://archives.nseindia.com/content/historical/EQUITIES/2021/MAY/cm05MAY2021bhav.csv.zip'
This is an invalid link - (as 1st May is a weekend and stock market is closed for the day)
path2 = 'https://archives.nseindia.com/content/historical/EQUITIES/2021/MAY/cm01MAY2021bhav.csv.zip'
This is what I do to extract the data
from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile
import pandas as pd
import datetime
start_date = datetime.date(2021, 5, 3)
end_date = datetime.date(2021, 5, 7)
delta = datetime.timedelta(days=1)
final = pd.DataFrame()
while start_date <= end_date:
print(start_date)
day = start_date.strftime('%d')
month = start_date.strftime('%b').upper()
year = start_date.strftime('%Y')
start_date += delta
path = 'https://archives.nseindia.com/content/historical/EQUITIES/' + year + '/' + month + '/cm' + day + month + year + 'bhav.csv.zip'
file = 'cm' + day + month + year + 'bhav.csv'
try:
with urlopen(path) as f:
with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
foofile = myzipfile.open(file)
df = pd.read_csv(foofile)
final.append(df)
except:
print(file + 'not there')
If the path is invalid, python is stuck and I have to restart Python. I am not able to error handle or identify invalid link while looping over multiple dates.
What I have tried so far to differentiate between valid and invalid links -
# Attempt 1
import os
os.path.exists(path)
os.path.isfile(path)
os.path.isdir(path)
os.path.islink(path)
# output is False for both Path and Path2
# Attempt 2
import validators
validators.url(path)
# output is True for both Path and Path2
# Attempt 3
import requests
site_ping = requests.get(path)
site_ping.status_code < 400
# Output for Path is True, but Python crashes/gets stuck when I run requests.get(path2) and I have to restart everytime.
Thanks for your help in advance.
As suggested by SuperStormer - adding a timeout to the request solved the issue
try:
with urlopen(zipFileURL, timeout = 5) as f:
with BytesIO(f.read()) as b, ZipFile(b) as myzipfile:
foofile = myzipfile.open(file)
df = pd.read_csv(foofile)
final.append(df)
except:
print(file + 'not there')

Time efficient filtering of list in python

I have a database table called 'do_not_call', which contains information about files that hold a range of 10 digit phone numbers in the increasing order. The column 'filename' holds the name of file that contain the range of numbers from 'first_phone' to 'last_phone'. There are about 2500 records in 'do_not_call' table.
And I have a list of sqlalchemy records. I need to find which file is holding the 'phone' field of these records. So I have created a function which takes in the sqlalchemy records and returns a dictionary where the key is the name of file and value is a list of phone numbers from the sqlalchemy records that falls in the range of first and last phone numbers, contained in the file.
def get_file_mappings(dbcursor, import_records):
start_time = datetime.now()
phone_list = [int(rec.phone) for rec in import_records]
dnc_sql = "SELECT * from do_not_call;"
dbcursor.execute(dnc_sql)
dnc_result = dbcursor.fetchall()
file_mappings = {}
for file_info in dnc_result:
first_phone = int(file_info.get('first_phone'))
last_phone = int(file_info.get('last_phone'))
phone_ranges = list(filter(lambda phone: phone in range(first_phone, last_phone), phone_list))
if phone_ranges:
file_mappings.update({file_info.get('filename'): phone_ranges})
phone_list = list(set(phone_list) - set(phone_ranges))
# print(file_mappings)
print("Time = ", datetime.now() - start_time)
return file_mappings
For example if the phone_list is
[2023143300, 2024393100, 2027981539, 2022760321, 2026416368, 2027585911], the file_mappings returned will be
{'1500000_2020-9-24_Global_45A62481-17A2-4E45-82D6-DDF8B58B1BF8.txt': [2023143300, 2022760321],
'1700000_2020-9-24_Global_45A62481-17A2-4E45-82D6-DDF8B58B1BF8.txt': [2024393100],
'1900000_2020-9-24_Global_45A62481-17A2-4E45-82D6-DDF8B58B1BF8.txt': [2027981539, 2026416368, 2027585911]}
The problem here is that it takes a lot of time to execute. On average it takes about 1.5 seconds for 1000 records. Is there a better approach/algorithm to solve this problem. Any help is appreciated.
This is a very inefficient approach to binning things into a sorted list. You are not taking advantage of the fact that your bins are sorted (or could easily be sorted if they were not.) You are making a big nested loop here by testing phone numbers with the lambda statement.
You could make some marginal improvements by being consistent with set use (see below.) But in the end, you could/should just find each phone's place in the listing with an efficient search, like bisection. See example below with timing of original, set implementation, and bisection insertion.
If your phone_list is just massive, then other approaches may be advantageous, such as finding where the cutoff bins fit into a sorted copy of the phone list... but this below is 500x faster than what you have now for 1,000 or 10,000 records
# phone sorter
import random
import bisect
import time
from collections import defaultdict
# make some fake data of representative size
low_phone = 200_000_0000
data = [] # [file, low_phone, high_phone]
for idx in range(2500):
row = []
row.append(f'file_{idx}')
row.append(low_phone + idx * 20000000)
row.append(low_phone + (idx + 1) * 20000000 - 20) # some gap
data.append(row)
high_phone = data[-1][-1]
# generate some random phone numbers in range
num_phones = 10000
phone_list_orig = [random.randint(low_phone, high_phone) for t in range(num_phones)]
# orig method...
phone_list = phone_list_orig[:]
tic = time.time()
results = {}
for row in data:
low = row[1]
high = row[2]
phone_ranges = list(filter(lambda phone: phone in range(low, high), phone_list))
if phone_ranges:
results.update({row[0]:phone_ranges})
phone_list = list(set(phone_list) - set(phone_ranges))
toc = time.time()
print(f'orig time: {toc-tic:.3f}')
# with sets across the board...
phone_list = set(phone_list_orig)
tic = time.time()
results2 = {}
for row in data:
low = row[1]
high = row[2]
phone_ranges = set(filter(lambda phone: phone in range(low, high), phone_list))
if phone_ranges:
results2.update({row[0]:phone_ranges})
phone_list = phone_list - phone_ranges
toc = time.time()
print(f'using sets time: {toc-tic:.3f}')
# using bisection search
phone_list = set(phone_list_orig)
tic = time.time()
results3 = defaultdict(list)
lows = [t[1] for t in data]
for phone in phone_list:
location = bisect.bisect(lows, phone) - 1
if phone <= data[location][2]: # it is within the high limit of bin
results3[data[location][0]].append(phone)
toc = time.time()
print(f'using bisection sort time: {toc-tic:.3f}')
# for k in sorted(results3):
# print(k, ':', results.get(k))
assert(results==results2==results3)
results:
orig time: 5.236
using sets time: 4.597
using bisection sort time: 0.012
[Finished in 9.9s]

Python how to add repeating values to list

What I am trying to figure out is how to add "Cases" and "Deaths" for each day, so that it starts with: "1/19/2020 Cases" and "1/19/2020 Deaths" then "1/20/2020 Cases" etc. It seems the append function does not work for this, and I don't know how else to add this. It doesn't seem like python has a way to do this task. My eventual goal is to make this a pandas dataframe.
import pandas as pd
dates = pd.date_range(start = '1/19/2020', end = '12/31/2021')
lst = dates.repeat(repeats = 2)
print(lst)
Thanks
If I am not mistaken, I don't think there's a way to do it with purely pandas. However with python and datetime, you can do so:
import pandas as pd
from datetime import timedelta, date
def daterange(start_date, end_date):
# Credit: https://stackoverflow.com/a/1060330/10640517
for n in range(int((end_date - start_date).days)):
yield start_date + timedelta(n)
dates = []
start_date = date(2020, 1, 19) # Start date here
end_date = date(2021, 12, 31) # End date here
for single_date in daterange(start_date, end_date):
dates.append(single_date.strftime("%m/%d/%Y") + " Cases")
dates.append(single_date.strftime("%m/%d/%Y") + " Deaths")
pdates = pd.DataFrame(dates)
print (pdates)
Is this what you want? If not, I can delete it.

How can I speed these API queries up?

I am feeding a long list of inputs in a function that calls an API to retrieve data. My list is around 40.000 unique inputs. Currently, the function returns output every 1-2 seconds or so. Quick maths tells me that it would take over 10+ hrs before my function will be done. I therefore want to speed this process up, but have struggles finding a solution. I am quite a beginner, so threading/pooling is quite difficult for me. I hope someone is able to help me out here.
The function:
import quandl
import datetime
import numpy as np
quandl.ApiConfig.api_key = 'API key here'
def get_data(issue_date, stock_ticker):
# Prepare var
stock_ticker = "EOD/" + stock_ticker
# Volatility
date_1 = datetime.datetime.strptime(issue_date, "%d/%m/%Y")
pricing_date = date_1 + datetime.timedelta(days=-40) # -40 days of issue date
volatility_date = date_1 + datetime.timedelta(days=-240) # -240 days of issue date (-40,-240 range)
# Check if code exists : if not -> return empty array
try:
stock = quandl.get(stock_ticker, start_date=volatility_date, end_date=pricing_date) # get pricing data
except quandl.errors.quandl_error.NotFoundError:
return []
daily_close = stock['Adj_Close'].pct_change() # returns using adj.close
stock_vola = np.std(daily_close) * np.sqrt(252) # annualized volatility
# Average price
stock_pricing_date = date_1 + datetime.timedelta(days=-2) # -2 days of issue date
stock_pricing_date2 = date_1 + datetime.timedelta(days=-12) # -12 days of issue date
stock_price = quandl.get(stock_ticker, start_date=stock_pricing_date2, end_date=stock_pricing_date)
stock_price_average = np.mean(stock_price['Adj_Close']) # get average price
# Amihuds Liquidity measure
liquidity_pricing_date = date_1 + datetime.timedelta(days=-20)
liquidity_pricing_date2 = date_1 + datetime.timedelta(days=-120)
stock_data = quandl.get(stock_ticker, start_date=liquidity_pricing_date2, end_date=liquidity_pricing_date)
p = np.array(stock_data['Adj_Close'])
returns = np.array(stock_data['Adj_Close'].pct_change())
dollar_volume = np.array(stock_data['Adj_Volume'] * p)
illiq = (np.divide(returns, dollar_volume))
print(np.nanmean(illiq))
illiquidity_measure = np.nanmean(illiq, dtype=float) * (10 ** 6) # multiply by 10^6 for expositional purposes
return [stock_vola, stock_price_average, illiquidity_measure]
I then use a seperate script to select my csv file with the list with rows, each row containing the issue_date, stock_ticker
import function
import csv
import tkinter as tk
from tkinter import filedialog
# Open File Dialog
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
# Load Spreadsheet data
f = open(file_path)
csv_f = csv.reader(f)
next(csv_f)
result_data = []
# Iterate
for row in csv_f:
try:
return_data = function.get_data(row[1], row[0])
if len(return_data) != 0:
# print(return_data)
result_data_loc = [row[1], row[0]]
result_data_loc.extend(return_data)
result_data.append(result_data_loc)
except AttributeError:
print(row[0])
print('\n\n')
print(row[1])
continue
if result_data is not None:
with open('resuls.csv', mode='w', newline='') as result_file:
csv_writer = csv.writer(result_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for result in result_data:
# print(result)
csv_writer.writerow(result)
else:
print("No results found!")
It is quite messy, but like I mentioned before, I am definitely a beginner. Speeding this up would greatly help me.

How to recursively ask for input when given wrong date format for two raw input and continue the operation in python

I'm trying to get two dates as input and convert is epoch time, but i need the two different dates given as input to be validated in correct format else recursively ask for correct input.
from datetime import date
import datetime
start_date = datetime.datetime.strptime(raw_input('Enter Start date in the format DD-MM-YYYY: '), '%d-%m-%Y')
end_date = datetime.datetime.strptime(raw_input('Enter Start date in the format DD-MM-YYYY: '), '%d-%m-%Y')
epoch_date = datetime.datetime(1970,1,1)
diff1 = (start_date - epoch_date).days
diff2 = (end_date - epoch_date).days
epoch1 = (diff1 * 86400)
epoch2 = (diff2 * 86400)
print('\nPTime_Start: %i' % diff1),
print("&"),
print('PTime_End: %i' % diff2)
print('Epoch_Start: %i' % epoch1),
print("&"),
print('Epoch_End: %i' % epoch2)
First of all, you are using Python 3.x and Python 3.x does not have any function that is called "raw_input()". It has been changed to "input()".
def take_date_input():
input_date = input('Enter date in the format DD-MM-YYYY: ')
try:
one_date = datetime.datetime.strptime(input_date, '%d-%m-%Y')
except ValueError:
return take_date_input()
return one_date
You can do this if you really want recursiveness in your code but it would be better with while loop.

Resources