I am looking to read data from a column in my CSV file.
All of the data in this column are dates. (DD/MM/YYYY).
I want my program to read the Dates column, and if the date is within 3 days of the current date, I want to add variables to all of the values in that row.
Ex.
Date,Name,LaterDate
1/1/19,John Smith, 2/21/19
If I run my program on 2/19/2019, I want an email sent that says "John Smith's case is closing on "2/21/2019".
I understand how to send an email. The part that I get stuck on is:
Reading the CSV column specifically.
If the date is within 3 days,
Assign variables to the values in the ROW,
Use those variables to send a custom email.
I see a lot of "Use Pandas" but I might need the individual steps broken down.
Thank you.
First things first, you need to read all the values of the csv file and store it in a variable (old_df). Then you need to save all the dates in the Series (dates). Next we create an empty DataFrame with the same columns. From here we create a simple for loop for each date in dates and it's index i. Turn date into a datetime object from the datetime library. Then we subtract amount of days between the current date and date. Take the absolute value of days so we always get a positive amount of days. Then add the index of that particular date in old_df to new_df.
import pandas as pd
from datetime import datetime
old_df = pd.read_csv('example.csv')
dates = old_df['LaterDate']
new_df = pd.DataFrame(columns=['Date', 'Name', 'LaterDate'])
for i, date in enumerate(dates):
date = datetime.strptime(date, '%m/%d/%y')
days = (datetime.now() - date).days
if abs(days) <= 3:
new_df = new_df.append(old_df.loc[i, :])
print(new_df)
Related
I have below date expressed as yearmon '202112'
I want to convert this to yearqtr and report the next quarter. Therefore from above string I want to get 2022Q1
I unsuccessfully tried below
import pandas as pd
pd.PeriodIndex(pd.to_datetime('202112') ,freq='Q')
Could you please help how to obtain the expected quarter. Any pointer will be veru helpful
import pandas as pd
df = pd.DataFrame({"Date": ['202112']}) # dummy data
df['next_quarter'] = pd.PeriodIndex(pd.to_datetime(df['Date'], format='%Y%m'), freq='Q') + 1
print(df)
Output:
Date next_quarter
0 202112 2022Q1
Note that column Date may be a string type but Quarter will be type period. You can convert it to a string if that's what you want.
I think one issue you're running into is that '202112' is not a valid date format. You'll want to use '2021-12'. Then you can do something like this:
pd.to_datetime('2021-12').to_period('Q') + 1
You can convert your date to this new format by simply inserting a - at index 4 of your string like so: date[:4] + '-' + date[4:]
This will take your date, convert it to quarters, and add 1 quarter.
how to separate date and time from datetime column if you have the format as below :
click here to view image
I am trying int(datetime column) for fetching date ; Datetime column - int(datetime column) for fetching time column
Your formula cannot work because your data is a text string (note that it has a letter included) and not a number.
So first convert the string into a "real" time with:
=substitute(a2,"T"," ")
You can then use:
Date: =INT(SUBSTITUTE(A2,"T"," "))
Time: =MOD(SUBSTITUTE(A2,"T"," "),1)
and be sure to format the results as desired:
If your column is formatted true date then use to separate date
=TEXT(A1,"yyyy-mm-dd")
For time
=TEXT(A1,"hh:mm:ss")
If data is in text string or output by TEXT() function then try below functions.
for date =TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[1]"),"yyyy-mm-dd")
for time =TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[last()]"),"hh:mm:ss")
For date
=LEFT(A2,FIND("T",A2)-1)
For time
=RIGHT(A2,LEN(A2)-FIND("T",A2))
I have a varchar column [DB_TIMESTAMP] in a (DB2) table which get data from different sources/environments, This column have different formats in it like:
11/15/2019 11:30:02
11/15/2019 11:22 AM
2019/11/15 11:15 AM
I have to put remarks using CASE in my query to find if there is any row that has 2 hours delay from current DateTime with this column data then mark them pending.
I tried like following, but it need the column with DateTime format which it is not because of different format of data entered in it:
CASE WHEN days (current date) - days(DB_TIMESTAMP))>2
[for checking 2 hours difference]
I think, this column needs to be converted into DateTime then try with above may work, but how:
Please help.
Shamshad Ali
Try Something May it helps:
CASE WHEN DAYS (Replace (CONVERT(nvarchar (500), CURRENT_DATE ,106),' ','-') as current_date)
- DAYS(Replace (CONVERT(nvarchar (500), DB_TIMESTAMP ,106),' ','-') as DB_TIMESTAMP))>2
I have a column of times expressed as seconds since Jan 1, 1990, that I need to convert to a DateTime. I can figure out how to do this for a constant (e.g. add 10 seconds), but not a series or column.
I eventually tried writing a loop to do this one row at a time. (Probably not the right way, and I'm new to python).
This code works for a single row:
def addSecs(secs):
fulldate = datetime(1990,1,1)
fulldate = fulldate + timedelta(seconds=secs)
return fulldate
b= addSecs(intag112['outTags_1_2'].iloc[1])
print(b)
2018-06-20 01:05:13
Does anyone know an easy way to do this for a whole column in a dataframe?
I tried this:
for i in range(len(intag112)):
intag112['TransactionTime'].iloc[i]=addSecs(intag112['outTags_1_2'].iloc[i])
but it errored out.
If you want to do something with column (series) in DataFrame you can use apply method, for example:
import datetime
# New column 'datetime' is created from old 'seconds'
df['datetime'] = df['seconds'].apply(lambda x: datetime.datetime.fromtimestamp(x))
Check documentation for more examples. Overall advice - try to think in terms of vectors (or series) of values. Most operations in pandas can be done with entire series or even dataframe.
I have imported an excel sheet where the date1 is 4/1/16 date2 is 5/29/14 and date3 is 5/2/14. However, when I import the sheet into SAS and do PROC PRINT gives the first 2 variable columns as "42461" and "41788" while the date3 is 05/02/2014.
I need these date formats consistent b/c I am doing a Cox regression with PROC PHREG.
Any thoughts about how to make these dates consistent?
Thanks!
This probably depends on how the data is represented in Excel and how it is imported into SAS. First, are the formats the same in Excel? The first two are being imported as a number. The second as a string.
In Excel, you can format the column using a date format. Perhaps your import method will recognize this. You can also define another column as a string, using the text(<whatever>, "YYYY-MM-DD") to convert to a string in that format.
Alternatively, you can import all as numbers and then add the value to 1899-12-31. That is the base date for Excel. This makes more sense if you think of "1" as being 1900-01-01.
Because your column had mixed numeric (date) and character values SAS imported the field as character. So the actual dates got imported as the text version of the actual number that Excel stores for dates. The ones that look like date strings in SAS are the fields that were strings in Excel also.
Or if in your case one of the three columns was all valid dates then SAS imported it as a number and assigned a date format to it so there is nothing to fix for that column.
The best way to fix it is to make sure that all of the values in the date column are either real dates or empty cells. Then PROC IMPORT will be able to make the right guess at how to import it.
Once you have the strings in SAS and you want to try to fix them then you need to decide which strings look like integers and which should be treated as date strings.
So you might just check if they have any non-digit characters and assume those are the ones that are date strings instead of numbers. For the ones that look like integers just adjust the number to account for the fact that Excel numbers dates from 1900 and SAS numbers them from 1960.
data want ;
set have ;
if missing(exel_string) then date=.;
else if notdigit(trim(excel_string)) then date=input(excel_string,anydtdte32.);
else date=input(excel_string,32.) + '01JAN1900'd -2 ;
format date yymmdd10. ;
run;
You might wonder why the minus 2? It is because Excel starts from 1 instead of 0 and also because Excel thinks 1900 was a leap year. Here are the Excel date numbers for some key dates and a little SAS program to convert them. Try it.
data excel_dates;
input datestr :$10. excel_num :comma32. #1 sas_num :yymmdd10. ;
diff = sas_num - excel_num ;
format _numeric_ comma14. ;
sasdate1 = excel_num - 21916;
sasdate2 = excel_num + '01JAN1900'd -2 ;
format sasdate: yymmdd10.;
cards;
1900-01-01 1
1900-02-28 59
1900-03-01 61
1960-01-01 21,916
2018-01-01 43,101
;