Rename substring of column values of a python DataFrame - python-3.x

My problem:
I have a datetime columns, with formats like
'27SEP18:05:02:11'
When trying to convert the datetime values I started like
df['dtimes'] = pd.to_datetime(df['dtimes'],format = '%d%b%Y:%H:%M:%S')
and ran into the problem that 'SEP' is not of the form 'Sep'. Surely I would not like to loop these columns.
Any speed code suggestions, please!?

Use %y for match year in format YY, %Y is used for YYYY format:
#YY format of year - %y
df = pd.DataFrame({'dtimes':['27SEP18:05:02:11','27JAN18:05:02:11']})
df['dtimes'] = pd.to_datetime(df['dtimes'],format = '%d%b%y:%H:%M:%S')
print (df)
dtimes
0 2018-09-27 05:02:11
1 2018-01-27 05:02:11
#YYYY format of year - %Y
df = pd.DataFrame({'dtimes':['27SEP2018:05:02:11','27JAN2018:05:02:11']})
df['dtimes'] = pd.to_datetime(df['dtimes'],format = '%d%b%Y:%H:%M:%S')
print (df)
dtimes
0 2018-09-27 05:02:11
1 2018-01-27 05:02:11

Related

Datetime conversion format

I converted the datetime from a format '2018-06-22T09:38:00.000-04:00'
to pandas datetime format
i tried to convert using pandas and got output but the output is
o/p: 2018-06-22 09:38:00-04:00
date = '2018-06-22T09:38:00.000-04:00'
dt = pd.to_datetime(date)
expected result: 2018-06-22 09:38
actual result: 2018-06-22 09:38:00-04:00
There is timestamps with timezones, so if convert to UTC by Timestamp.tz_convert, times are changed:
date = '2018-06-22T09:38:00.000-04:00'
dt = pd.to_datetime(date).tz_convert(None)
print (dt)
2018-06-22 13:38:00
So possible solution is remove last 6 values in datetimes:
dt = pd.to_datetime(date[:-6])
print (dt)
2018-06-22 09:38:00

Compute difference in days between two date variables - Python

I have two date variables, and I tried to compute the difference in days between them with:
from datetime import date, timedelta,datetime
date_format = "%Y/%m/%d"
a = datetime.strptime(df.D1, date_format)
b = datetime.strptime(df.D2, date_format)
df['delta'] = b - a
print delta.days
But I'm getting this error:
TypeError: strptime() argument 1 must be str, not Series
How could I do this? The variables are objects, should I transform them in Datatime64?
Since you're working with pandas, you can use pd.to_datetime instead of the datetime package:
# Convert each date column to datetime:
df['D1'] = pd.to_datetime(df.D1,format='%Y/%m/%d')
df['D2'] = pd.to_datetime(df.D2,format='%Y/%m/%d')
# With 2 datetime Series, a simple subtraction will give you a Timedelta column:
df['delta'] = df.D1 - df.D2
For example:
>>> df
D1 D2
0 2015/05/18 2014/06/21
1 2015/10/18 2014/08/14
df['D1'] = pd.to_datetime(df.D1,format='%Y/%m/%d')
df['D2'] = pd.to_datetime(df.D2,format='%Y/%m/%d')
df['delta'] = df.D1 - df.D2
>>> df
D1 D2 delta
0 2015/05/18 2014/06/21 331 days
1 2015/10/18 2014/08/14 430 days

Creating a daily account log from a Pandas expense file in data frame format

I have an expense file that I am trying to read in and from this file create a daily log. A small subset of the file that extends over years is shown below, for a few days in January 2015.
Date,Checking_Debit,Checking_Addition,Savings_Debit,Savings_Addition
2015-01-07,342.1,0.0,0.0,0.0
2015-01-07,981.0,0.0,0.0,0.0
2015-01-07,3185.0,0.0,0.0,0.0
2015-01-05,55.0,0.0,0.0,0.0
2015-01-05,75.0,0.0,0.0,0.0
2015-01-03,287.0,0.0,0.0,0.0
2015-01-02,64.8,0.0,0.0,0.0
2015-01-02,75.0,0.0,0.0,75.0
2015-01-02,1280.0,0.0,0.0,0.0
2015-01-02,245.0,0.0,0.0,0.0
2015-01-01,45.0,0.0,0.0,0.0
In my code I start with the variables checking_start and savings_start that contain the start values of the checking and savings account. I would like to give the code a start date and an end date and have the code iterate through each day, see if there was an expense on that day and subtract the checking and savings debits and add the checking and savings additions. If there were no expenses on that day it should keep the accounts at the same value as the previous day. In addition, I am trying to constrain myself to Pandas data frames in the implementation. So far my code looks like this.
import pandas as pd
from date time import date
check_start = 8500.0
savings_start = 4000.0
start_date = date(2017, 1, 1)
end_date = date(2017, 1, 8)
df = pd.read_csv(file_name.csv, dtype={'Date': str, 'Checking_Debit': float,
'Checking_Addition': float,
'Savings_Debit': float,
'Savings_Addition': float})
In a Pythonic format with the Pandas module, how do I walk through from the start date to the end date, one day at a time, then see if there is an expense or expenses on those date and then subtract that from the checking and savings. At the end I should have an array for the value of the checking account on each date and the same for the savings account on that day.
The result should be arrays written into another .csv file with the following format.
Date,Checking,Savings
2017-01-07,1865.1,3925.0
2017-01-06,6373.2,3925.0
2017-01-05,6373.2,3925.0
2017-01-04,6503.2,3925.0
2017-01-03,6503.2,3925.0
2017-01-02,6790.2,3925.0
2017-01-01,8455.0,4000.0
Start by reading the data that you provided and identifying the date column in data with it
import pandas as pd
df = pd.read_csv(r"dat.csv", parse_dates=[0],dtype={'Checking_Debit': float,
'Checking_Addition': float,
'Savings_Debit': float,
'Savings_Addition': float})
Set Date as index for better data manipulation.
df = df.set_index("Date")
Initialize all the variables for the loop
check_start = 8500.0
savings_start = 4000.0
start_date = pd.to_datetime('2015/1/1')
end_date = pd.to_datetime('2015/1/8')
delta = pd.Timedelta('1 days') # time that needs to be added to start date
Now group the expense data w.r.t to each date
grp_df = df.groupby('Date').sum()
Now we will do while loop for create expense report for each day
expense_report = []
while start_date<=end_date:
if start_date in df.index:
savings_start += (grp_df.loc[start_date,"Savings_Addition"]-grp_df.loc[start_date,"Savings_Debit"])
check_start += (grp_df.loc[start_date,"Checking_Addition"]-grp_df.loc[start_date,"Checking_Debit"])
expense_report.append([start_date,check_start,savings_start])
elif start_date not in df.index:
expense_report.append([start_date,check_start,savings_start])
start_date += delta
convert expense_report list to pandas Dataframe
df_exp_rpt = pd.DataFrame(expense_report,columns=["Date","Checking","Savings"])
print(df_exp_rpt)
Date Checking Savings
0 2015-01-01 8455.0 4000.0
1 2015-01-02 6790.2 4075.0
2 2015-01-03 6503.2 4075.0
3 2015-01-04 6503.2 4075.0
4 2015-01-05 6373.2 4075.0
5 2015-01-06 6373.2 4075.0
6 2015-01-07 1865.1 4075.0
7 2015-01-08 1865.1 4075.0
You can save to csv by
df_exp_rpt.to_csv("filename.csv")
Note: The saving column values are 4075 instead of 3925.0 because you have 75 value in saving_addition column in your original data

Number to Date Conversion using Pandas in Python?

When I try to convert from number format to Date I'm not getting the same result what I get in Excel.
I need to convert a Number to date format and get the same result what I get in Excel.
For Example in Excel for the below Number I get the following:
Input - 42970.73819
Output- 8/23/2017 17:43
I tried using the date conversion in Pandas but not getting the same result as of Excel.
Thank you
Madan
I think you need convert serial date:
df = pd.DataFrame({'date':[42970.73819,42970.73819]})
print (df)
date
0 42970.73819
1 42970.73819
df = pd.to_datetime((df['date'] - 25569) * 86400.0, unit='s')
print (df)
0 2017-08-23 17:42:59.616
1 2017-08-23 17:42:59.616
Name: date, dtype: datetime64[ns]

taking date input and grouping pandas datfarme as per time

I have a pandas dataframe with date and time values as follows.
Date Time Pattern
0 06/01/13 0:00:01 A
1 06/02/13 1:00:01 B
2 06/03/13 2:00:01 A
3 06/04/13 3:00:01 C
Now i intend to take date input from user as follows:
date = str(input('Input date in mm-dd-yy format'))
Now how should i find/group by all the rows with input date by user and copy it to a new dataframe. I tried many things but got confused with datatime conversion.
How should i go about it?
First make sure your Date column is datetime
df.Date = pd.to_datetime(df.Date)
Then use query
date = pd.to_datetime(input('Input date in mm-dd-yyyy format'))
df.query('Date == #date')
response to #learningprogramming
You can include other criteria in query
date = pd.to_datetime(input('Input date in mm-dd-yyyy format: '))
df.query('Date == #date & Pattern == "B"')
loc works as well
date = pd.to_datetime(input('Input date in mm-dd-yyyy format: '))
df.loc[(df.Date == date) & (df.Pattern == 'B')]
putting all in the inputs
date = pd.to_datetime(input('Input date in mm-dd-yyyy format: '))
pattern = str(input('Input pattern type: '))
df.query('Date == #date & Pattern == #pattern')
Is the column named 'Date' a string? If so, you can try something like:
subset = df[df['Date'] == date]

Resources