Convert all dates from a data frame column python - python-3.x

I have a csv file that have a column with the date that ppl get vaccinated, in format 'YYYY-MM-DD' as string. Then, my goal its add X days to the respective date, with X based on the vaccine that these person got. In order to add days to a date, i've to convert the string date to iso date, so i need to loop each element in that column conveting those dates. Im kinda new to Python and im not getting really right how do deal with it.
So i read and create a data frame with pandas, then i tryed as follow in the image:
df column content and for try
I dont know why im getting this error, i tryed different ways to deal with it but cant figure it out.
Thx

This is because the type of values is 'str,' and 'str' does not have 'fromisoformat' method. I would recommend you to convert a type of the values to 'datetime' instead of 'str,' so that you can do whatever you want regarding date calculation such as calculating X days from a specific date.
You can convert the values from 'str' to 'datetime' and do what you want as follows:
import pandas as pd
import datetime
df_reduzido['vacina_dataAplicacao'] = pd.to_datetime(df_reduzido['vacina_dataAplicacao'] , format='%Y-%m-%d')
df_reduzido['vacina_dataAplicacao'] = df_reduzido['vacina_dataAplicacao'] + datetime.datetime.timedelta(days=3)
print(df_reduzido['vacina_dataAplicacao']) # 3 days added
You can study how to deal with datetime in detail here: https://docs.python.org/3/library/datetime.html

Thanks for your help Sangkeun. Just want to point out that, for some reason, python was returning me error saying: "'AttributeError: type object 'datetime.datetime' has no attribute 'datetime'".
Then i've found a solution by calling
import datetime
from datetime import timedelta, date, datetime
Then using " + timedelta() ", like this:
df_reduzido['vacina_dataAplicacao'] = ( pd.to_datetime(df_reduzido['vacina_dataAplicacao'] , format='%Y-%m-%d', utc=False) + timedelta(days=10) ).dt.date
At the end, i set ().dt.date in order to rid off the time from pd.to_datetime(). Look that i tryed to set utc=False hoping that this would do the job but nothing happened. Anyway,
i'm grateful for your help.
Problem solved.

Related

Adding a number in one column to a date in another in a pandas dataframe

My first python project that didn't print 'Hello World' - so be gentle. Tried answers from similar questions but they don't seem to work.
I'm working with an Excel file, parsing as pandas dataframe.
I have a calculated column that calculates the number of days to later be added to a date. The number of days to add column is done as below, with 'choices' being a list of integers. This seems to work fine.
choices = [0,0,925,778,567,608, 638,730]
df['Days_to_add'] = np.select(conditions, choices, default=0)
I now want to add this to an existing date column, to return a new column with the new date. So far i've tried this but Jupyter says its depreciated and will return a TypeError in a future version:
df["Estimated Start"] = pd.to_timedelta(df["Date1"]) + df['Days_to_add']
Also tried this:
df['Estimated_Start'] = df.Max_Dec_Date + pd.DateOffset(df['Days_to_add'])
And something else that told me to use timedelta index, and something else that pointed to timedelta range. I think the problem is something to do with trying to add an integer to a series?
No success with any of it. Help?
Date is not TimeDelta, but DateTime,
so the addition should go like this:
df["Estimated Start"] = pd.to_datetime(df["Date1"]) + pd.to_timedelta(df['Days_to_add'], unit='D')

Trying to Pass date to pandas search from input prompt

I am trying to figure out how to pass a date inputted at a prompt by the user to pandas to search by date. I have both the search and the input prompt working separately but not together. I will show you what I mean. And maybe someone can tell me how to properly pass the date to pandas for the search.
This is how I successfully use pandas to extract rows in an excel sheet if any cell in column emr_first_access_date is greater than or equal to '2019-09-08'
I do this successfully with the following code:
import pandas as pd
HISorigFile = "C:\\folder\\inputfile1.xlsx"
#opens excel worksheet
df = pd.read_excel(HISorigFile, sheet_name='Non Live', skiprows=8)
#locates the columns I want to write to file including date column emr_first_access_date if greater than or equal to '2019-09-08'
data = df.loc[df['emr_first_access_date'] >= '2019-09-08', ['site_name','subs_num','emr_id', 'emr_first_access_date']]
#sorts the data
datasort = data.sort_values("emr_first_access_date",ascending=False)
#this creates the file (data already sorted) in panda with date and time.
datasort.to_excel(r'C:\\folder\sitesTestedInLastWeek.xlsx', index=False, header=True)
However, the date above is hardcoded of course. So, I need the user running this script to input the date. I created a very basic working input prompt with the following:
import datetime
#prompts for input date
TestedDateBegin = input('Enter beginning date to search for sites tested in YYYY-MM-DD format')
year, month, day = map(int, TestedDateBegin.split('-'))
date1 = datetime.date(year, month, day)
Obviously I want to pass TestedDateBegin to pandas, changing the pertinent code line:
data = df.loc[df['emr_first_access_date'] >= '2019-09-08', ['site_name','subs_num','emr_id', 'emr_first_access_date']]
to something like:
data = df.loc[df[b]['emr_first_access_date'] >= 'TestedDateBegin', ['site_name','subs_num','emr_id', 'emr_first_access_date']]
Obviously this doesn't work. But how do I proceed? I am very new to programming so I not always clear how to proceed. Does the date inputted in TestedDateBegin need to be added to a return? Or should it be put in a single item list? What is the right approach? Thx!
This is resolved.
I had to remove the single quotes around TestedDateBegin as python, of course, interpreted that as a string and not a variable. Living and learning. :-)
data = df.loc[df[b]['emr_first_access_date'] >= TestedDateBegin,['site_name','subs_num','emr_id', 'emr_first_access_date']]

Python string to datetime-date

I've got lots of dates that look like this: 16.8.18 (American: 8/16/18) of type string. Now, I need to check if a date is in the past or future but, datetime doesn't support the German format.
How can I accomplish this?
from datetime import datetime
s = "16.8.18"
d = datetime.strptime(s, "%d.%m.%y")
if d > datetime.now():
print('Date is in the future.')
else:
print('Date is in the past.')
Prints (today is 20.7.2018):
Date is in the future.
The format used in strptime() is explained in the manual pages.

parse datetime in python

I have a string like Apr-23-2018_10:57:19_EDT. Now I want to make a datetime object from it. I am using code in python 3 like below -
from datetime import datetime
datetime_object = datetime.strptime('Apr-23-2018_10:57:19_EDT', '%b-%d-%Y_%H:%M:%S_%Z')
And it is giving me error like below -
ValueError: time data 'Apr-23-2018_10:57:19_EDT' does not match format '%b-%d-%Y_%H:%M:%S_%Z'
Need help
Timezones are a mine field. If you can get away without it you can do something like:
Code:
datetime_object = dt.datetime.strptime(
'Apr-23-2018_10:57:19_EDT'[:-4], '%b-%d-%Y_%H:%M:%S')
print(datetime_object)
Result:
2018-04-23 10:57:19

Problems with graphing excel data off an internet source with dates

this is my first post on stackoveflow and I'm pretty new to programming especially python. I'm in engineering and am learning python to compliment that going forward, mostly at math and graphing applications.
Basically my question is how do I download csv excel data off a source (in my case stock data from google), and plot only certain rows against the date. For myself I want the date against the close value.
Right now the error message I'm getting is timedata '5-Jul-17' does not match '%d-%m-%Y'
previously I was also getting tuple data does not match
The description of the opened csv data in excel is
[7 columns (Date,Open,High,Low,Close,AdjClose,Volume, and the date is organized as 2017-05-30][1]
I'm sure there are other errors as well unfortunately
I would really be grateful for any help on this,
thank you in advance!
--edit--
Upon fiddling some more I don't think names and dtypes are necessary, when I check the matrix dimensions without those identifiers I get (250L, 6L) which seems right. Now my main problem is coverting the dates to something usable, My error now is strptime only accepts strings, so I'm not sure what to use. (see updated code below)
import matplotlib.pyplot as plt
importnumpy as np
from datetime import datetime
def graph_data(stock):
%getting the data off google finance
data = np.genfromtxt('urlgoeshere'+stock+'forthecsvdata', delimiter=',',
skip_header=1)
# checking format of matrix
print data.shape (returns 250L,6L)
time_format = '%d-%m-%Y'
# I only want the 1st column (dates) and 5 column (close), all rows
date = data[:,0][:,]
close = data[:,4][:,]
dates = [datetime.strptime(date, time_format)]
%plotting section
plt.plot_date(dates,close, '-')
plt.legend()
plt.show()
graph_data('stockhere')
Assuming the dates in the csv file are in the format '5-Jul-17', the proper format string to use is %d-%b-%y.
In [6]: datetime.strptime('5-Jul-17','%d-%m-%Y')
ValueError: time data '5-Jul-17' does not match format '%d-%m-%Y'
In [7]: datetime.strptime('5-Jul-17','%d-%b-%y')
Out[7]: datetime.datetime(2017, 7, 5, 0, 0)
See the Python documentation on strptime() behavior.

Resources