How to get data in pandas dataframe in a range of date - python-3.x

I have a front end where my clients select a date period like
date_start = 2020/01/03
date_end = 2020/03/10
I have a data frame that has 1975 lines and 4 columns, including Date, like:
Date|Tax|Values|Total
I need to get all columns in that to be in a period between date_start and date_end on Pandas Dataframe. How Can I get it?
What I tried:
Try to do it with code:
new_df= df[(df['Date'] >= date_start) & (df['Date'] <= date_end)]
But the return was wrong.

welcome.
Keep in mind you're not filtering for those dates but selecting the dates in between.
Try the following:
# To make sure your column is in datetime format
df['Date'] = pd.to_datetime(df['Date'])
new_df = df.loc[(df['Date']>=date_start) & (df['Date']<=date_end)]

Example:
from_date = "2021-08-27"
to_date = "2021-08-31"
inclusive can be {“both”, “neither”, “left”, “right”}
df[df['date'].between(from_date, to_date, inclusive='both')]
This function is equivalent to:
df[(from_date <= df['date']) & (df['date'] <= to_date)]

Related

Filtering yfinance data

I am trying to test stock algorithms using historical data, I want to be able to select a date range or even just a date. But I keep getting empty dataframes. What am I not doing right?
All I want it to do is select that day's market data.
here is the relevant code:
def getdata(symbol,end_date,days):
start_date = end_date - datetime.timedelta(days= days)
return pdr.get_data_yahoo(symbol, start=start_date, end=end_date)
today = datetime.date.today()
date = today - datetime.timedelta(days=3)
df = getdata("MLM",today,375)
print(date)
df2 = df.loc[df.index == date]
print(df2)

How do filter date time column according to time ranges specified in two column?

In one excel File that I have filtered-eda data, I want to filter such data according to my second excel file by using two columns; StartTime and EndTime; as a time range
(time column types in two excel datetime64[ns])
you can see my two excel files at picture
enter image description here
My code is
df1 = pd.read_excel(filename_1)`
df2 = pd.read_excel(filename_2, usecols= "A,C")
df3 = df1[df1['BinaryLabels'] == 1]
df2 = df2[(df3["StartTime"] <= df2.Time) & (df2.Time <= df3["EndTime"])]
print(df2)
and get error as :ValueError: Can only compare identically-labeled Series objects
How can I solve it?
Thanks for advance..
If geenral DataFrames with different number of rows you have to merge with cross join before filtering:
df1 = pd.read_excel(filename_1)
df2 = pd.read_excel(filename_2, usecols= "A,C")
df3 = df1[df1['BinaryLabels'] == 1]
df = df2.assign(a=1).merge(df3.assign(a=1), on='a', how='outer')
df = df[(df["StartTime"] <= df.Time) & (df.Time <= df["EndTime"])]

i want to convert 4/1/2019 to 1/4/2019 in my dataframe column

i want to swap month and date in a date in a column of a dataframe
i have tried all the below methods:
#df_combined1['BILLING_START_DATE_x'] = pd.to_datetime(df_combined1['BILLING_START_DATE_x'], format='%d/%m/%Y').dt.strftime('%d-%m-%Y')
#df_combined1['BILLING_START_DATE_x'] = df_combined1['BILLING_START_DATE_x'].apply(lambda x: dt.datetime.strftime(x, '%d-%m-%Y'))
#df_combined1['BILLING_START_DATE_x'] = pd.to_datetime(df_combined1['BILLING_START_DATE_x'], format='%m-%d-%Y')
i need to swap month and the day
If format all datetimes is DD/MM/YYYY:
df_combined1['BILLING_START_DATE_x'] = (pd.to_datetime(df_combined1['BILLING_START_DATE_x'],
format='%d/%m/%Y')
.dt.strftime('%m/%d/%Y'))
If format all datetimes is MM/DD/YYYY:
df_combined1['BILLING_START_DATE_x'] = (pd.to_datetime(df_combined1['BILLING_START_DATE_x'],
format='%m/%d/%Y')
.dt.strftime('%d/%m/%Y'))

How to replace the null valued column with the values store in the list with the corresponding indexes in CSV using python and pandas?

Check if the value of the cell in some specific row of CLOSE DATE is blank then proceed with formul of adding 3 days to the SOLVED DATE and update the value of the cell
I'm using pandas library and jupyter Notebook as my text editor.
d is the object of my csv file
for index, row in d.iterrows():
startdate = row["SOLVED DATE"]
print(index, startdate)
enddate = pd.to_datetime(startdate) + pd.DateOffset(days=3)
row["CLOSE DATE"]=enddate
#d.iloc[index,10]=enddate
l1.append(enddate)
L1 is the list that contains the values in datetime format
and i need to replace the values of the column named "CLOSE DATE" with the values of the L1 and update my csv file accordingly
Welcome to the Stackoverflow Community!
Iterrows() is usually a slow method and should be avoided in most cases. There are a few ways we can do your task.
Making Two Dataframes = Null DF & Not Null DF and imputing values in the Null DF then merging the two
Imputing values in the Null Df itself.
As a supplementary on the logic of adding the updated date column. It is as follows.
Let's first take the "SOLVED DATE" and store it in a new series,
let's call it "new_date".
Let's Modify the "new_date" by adding 3 days.
Once done set this "new_date" as the value of the column you want to be updated.
In terms of code
# 1st Method
import pandas as pd
null = d.loc[d['CLOSE DATE'].isna() == True]
not_null = d.loc[d['CLOSE DATE'].isna() != True]
new_date = null['SOLVED DATE]
new_date = pd.to_datetime(new_date) + pd.DateOffset(days=3)
null['CLOSE DATE'] = new_date
d = pd.concat([null not_null], axis = 0)
d = d.reset_index(drop = True)
# 2nd Method
import pandas as pd
new_date = d.loc[d['CLOSE DATE'].isna() == True,'SOLVED DATE]
new_date = pd.to_datetime(new_date) + pd.DateOffset(days=3)
d['CLOSE DATE'] = d['CLOSE DATE'].fillna(new_date)

regarding selected certain rows based on a given requirements into another dataframe

I have read the csv file into a dataframe using Pandas, the csv format is as follows. I would like to put the rows whose “time column information” is between the interval of 6/3/2011-10/20/2011 into another dataframe. How can I do it efficiently in Pandas?
Try this method:
data_frame['time'] = pd.to_datetime(data_frame['time'])
select_rows = (data_frame['time'] > start_date) & (data_frame['time'] <= end_date)
data_frame.loc[select_rows]
Or, you can make time column date time index and then select rows based on that as well.
I think you need to_datetime first and then filter by between with boolean indexing:
df['time'] = pd.to_datetime(df['time'], format='%m/%d/%Y')
df1 = df[df['time'].between('2011-06-03','2011-10-20')]
Create DatetimeIndex and select by loc:
df['time'] = pd.to_datetime(df['time'], format='%m/%d/%Y')
df = df.set_index('time')
df1 = df.loc['2011-06-03':'2011-10-20']

Resources