How to verify if Ids are present in each day within a period of x days? - python-3.x

The problem is the dataset has variable data rates per ID, I would like to filter out the IDs that do not have at least one data point per day.
I have a dataframe with IDs, dates, and data, in which I counted the daily sampling rate for each ID.
dfcounted = df.reset_index().groupby(['id', pd.Grouper(key='datetime', freq='D')]).count().reset_index()
Now, i have taken the first and last date of the dataframe, and created a dataframe of each day between the starting and ending dates:
# take dates
sdate = df['datetime'].min() # start date
edate = df['datetime'].max() # end date
# interval
delta = edate - sdate # as timedelta
# empty list
dates = []
# store each date in list
for i in range(delta.days + 1):
day = sdate + timedelta(days=i)
dates.append(day)
# convert to dataframe
dates = pd.DataFrame(data = dates, columns=["date"])
From here, I am lost on how to proceed. I have created a sample dataframe
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random
import string
letters = string.ascii_lowercase
ids = random.choices(letters,k=100)
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(99), freq='D')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days,'ids': ids, 'data': data})
df = df.set_index('date')
With the sample df, i would expect to create a "results" df with only the ids that have data in each date.

Related

pandas datareader. Save all data to one dataframe

I am new to Python and I have trouble getting data into one dataframe.
I have the following code.
from pandas_datareader import data as pdr
from datetime import date
from datetime import timedelta
import yfinance as yf
yf.pdr_override()
import pandas as pd
# tickers list
ticker_list = ['0P0001A532.CO','0P00018Q4V.CO','0P00017UBI.CO','0P00000YYT.CO','PFIBAA.CO','PFIBAB.CO','PFIBAC.CO','PFIDKA.CO','PFIGLA.CO','PFIMLO.CO','PFIKRB.CO','0P00019SMI.F','WEKAFKI.CO','0P0001CICW.CO','WEISTA.CO','WEISTS.CO','WEISA.CO','WEITISOP.CO']
today = date.today()
# We can get data by our choice by days bracket
if date.today().weekday()==0:
start_date = (today + timedelta((4 + today.weekday()) % 7)) - timedelta(days=7) # Friday. If it is monday we do not have a price since it is based on the previous day close.
else:
start_date = today - timedelta(days=1)
files=[]
allData = []
dafr_All = []
def getData(ticker):
print(ticker)
data = pdr.get_data_yahoo(ticker, start= start_date, end=(today + timedelta(days=2)))['Adj Close']
dataname = ticker+'_'+str(today)
files.append(dataname)
allData.append(data)
SaveData(data, dataname)
# Create a data folder in your current dir.
def SaveData(df, filename):
df.to_csv('./data/'+filename+'.csv')
#This loop will iterate over ticker list, will pass one ticker to get data, and save that data as file.
for tik in ticker_list:
getData(tik)
for i in range(0,11):
df1= pd.read_csv('./data/'+ str(files[i])+'.csv')
print (df1.head())
I get several csv files containing the adjusted close values (if there exists an adjusted close).
I want to save all the data to a dataframe where the first column consist of tickers, while the second column consist of adjusted close values. The dataframe then needs to be exported into a csv-file.

Python Pandas Writing Value to a Specific Row & Column in the data frame

I have a Pandas df of Stock Tickers with specific dates, I want to add the adjusted close for that date using yahoo finance. I iterate through the dataframe, do the yahoo call for that Ticker and Date, and the correct information is returned. However, I am not able to add that information back to the original df. I have tried various loc, iloc, and join methods, and none of them are working for me. The df shows the initialized zero values instead of the new value.
import pandas as pd
import yfinance as yf
from datetime import timedelta
# Build the dataframe
df = pd.DataFrame({'Ticker':['BGFV','META','WIRE','UG'],
'Date':['5/18/2021','5/18/2021','4/12/2022','6/3/2019'],
})
# Change the Date to Datetime
df['Date'] = pd.to_datetime(df.Date)
# initialize the adjusted close
df['Adj_Close'] = 0.00 # You'll get a column of all 0s
# iterate through the rows of the df and retrieve the Adjusted Close from Yahoo
for i in range(len(df)):
ticker = df.iloc[i]['Ticker']
start = df.iloc[i]['Date']
end = start + timedelta(days=1)
# YF call
data = yf.download(ticker, start=start, end=end)
# Get just the adjusted close
adj_close = data['Adj Close']
# Write the acjusted close to the dataframe on the correct row
df.iloc[i]['Adj_Close'] = adj_close
print(f'i value is {i} and adjusted close value is {adj_close} \n')
print(df)
The simplest way to do is to use loc as below-
# change this line
df.loc[i,'Adj_Close'] = adj_close.values[0]
You can use:
def get_adj_close(x):
# You needn't specify end param because period is already set to 1 day
df = df = yf.download(x['Ticker'], start=x['Date'], progress=False)
return df['Adj Close'][0].squeeze()
df['Adj_Close'] = df.apply(get_adj_close, axis=1)
Output:
>>> df
Ticker Date Adj_Close
0 BGFV 2021-05-18 27.808811
1 META 2021-05-18 315.459991
2 WIRE 2022-04-12 104.320045
3 UG 2019-06-03 16.746983

Change date to next trading date

I have two tables:
-event dates
-return dates
Some event dates are not at a trading day.
How can I change the event date to the next trading day?
So if event date is not in return dates, take the next day in return dates.
The approach to change weekend days to working days does not work because of days like Christmas.
The best would be to look up the next day in the return table.
for i in event['date']:
if i is not in return ['date'].values:
event ['date']=i+datetime.timedelta(days=1)
but this doenst work
I am working with dataframes and dates have the format datetime64[ns]. If the event date does not exist in return date than event date plus one day
Edit
After the clarifications concerning the desired logic, here is the new solution
from datetime import datetime, timedelta
import numpy as np
import pandas as pd
# Create two df
event_date = datetime.now()
event_dates = pd.DataFrame([datetime(2020, 2, _) for _ in range(1, 29)], columns=['date'])
print(event_dates.date[0])
# 2020-02-01 00:00:00
return_dates = pd.DataFrame([datetime(2020, 1, _) for _ in range(1, 32)], columns=['date'])
# Apply logic
event_dates.date = [_ if _ in return_dates.date else _ + timedelta(days=1) for _ in event_dates.date]
print(event_dates.date[0])
# 2020-02-02 00:00:00
Base Python
Here is a solution using the standard datetime library
from datetime import datetime
from typing import List
def get_next_trade_date(date: datetime, date_list: List[datetime]) -> datetime: # The annotations here are just to specify the types of the objects
if date in date_list: # Check if the date is contained in the list
return date
delta, res = None, None # Initialize both to None
for _ in date_list:
tmp = abs((date - _).days) # Time difference in current iteration
if not delta or tmp < delta: # See bullet point 1.
delta, res = tmp, _
return res
if __name__ == '__main__':
event_date = datetime.now()
return_dates = [datetime(2020, 1, _) for _ in range(1, 32)]
print(get_next_trade_date(event_date, return_dates))
# 2020-01-01 00:00:00
Notice that
The condition not delta or tmp < delta is twofold: in the first iteration delta, res are both None so we will overwrite them with tmp, _. We catch this by using not delta. The other part (tmp < delta) is more obvious: if we have a new minimal delta then we overwrite delta, res.
I only considered days intervals ((date - _).days), you could go further into details (see datetime.timedelta for more info)
coming from R I believe there must be a simpler solution using numpy - see below
Numpy
This solution uses numpy. (date_list - date) is an array of timedeltas, (date_list - date).argmin() returns the index of the minimal value.
from datetime import datetime
import numpy as np
def get_next_trade_date(date: datetime, date_list: np.ndarray) -> datetime:
return date_list[(date_list - date).argmin()]
if __name__ == '__main__':
event_date = datetime.now()
return_dates = np.array([datetime(2020, 1, _) for _ in range(1, 32)])
print(get_next_trade_date(event_date, return_dates))
# 2020-01-01 00:00:00

Moving Unique Count Calculation Pandas DataFrame

I am defining a function that is being applied to every row in my Data Frame that counts unique codes in a the column "Code" for every id in the set. The code I have works, but it is incredibly slow and I am using a large data set. I am looking for a different approach that speed up the operation.
from datetime import timedelta as td
import pandas as pd
df['Trailing_12M'] = df['Date'] - td(365) #current date - 1 year as new column
def Unique_Count(row):
"""Creating a new df for each id and returning unique count to every row in original df"""
temp1 = np.array(df['ID'] == row['ID'])
temp2 = np.array(df['Date'] <= row['Date'])
temp3 = np.array(df['Date'] >= row['Trailing_12M'])
temp4 = np.array(temp1 & temp2 & temp3)
df_Unique_Code_Count = np.array(df[temp4].Code.nunique())
return df_Unique_Code_Count
df['Unique_Code_Count'] = df.apply(Unique_Count, axis=1)

Pull stock value of last business day of a month

Need help in Pulling stock value of last business day of a month in a time series/dataframe
I am executing the fol code:
import pandas as pd
import datetime
import matplotlib
import warnings
warnings.filterwarnings('ignore')
start = datetime.datetime(2014, 3, 31)
end = datetime.datetime(2017, 9, 30)
stocks = ['AAPL', 'GOOG']
col = 'Adj Close'
df = web.get_data_yahoo(stocks,start,end)
data = df.ix[col]
dataframe = pd.DataFrame(data)
This gives me a dataframe with all the close values.
I want to get the values only from the last business day of the month
Please ignore the question, I've managed to find the answer

Resources