Date Manipulation and Comparisons Python,Pandas and Excel - excel

I have a datetime column[TRANSFER_DATE] in an excel sheet shows dates formated as
1/4/2019 0:45 when this date is selected, in it appears as
01/04/2019 00:45:08 am using a python scrip to read this column[TRANSFER_DATE] which shows the datetime as 01/04/2019 00:45:08
However when i try to compare the column[TRANSFER_DATE] whith another date, I get this error
Can only use .dt accessor with datetimelike "
ValueError: : "Can only use .dt accessor with datetimelike values" while evaluating
implying those values are not actually recognized as datetime values
mask_part_date = data.loc[data['TRANSFER_DATE'].dt.date.astype(str) == '2019-04-12']

As seen in this question, the Excel import might have silently failed for some of the values in the column. If you check the column type with:
data.dtypes
it might show as object instead of datetime64.
If you force your column to have datetime values, that might solve your issue:
data['TRANSFER_DATE'] = pd.to_datetime(data['TRANSFER_DATE'], errors='coerce')
You will spot the non-converted values as NaT and you can debug those manually.
Regarding your comparison, after the dataframe conversion to datetime objects, this might be more efficient:
mask_part_date = data.loc[data['TRANSFER_DATE'] == pd.Timestamp('2019-04-12')]

Related

Pandas "<=" operator unexpected behaviour for datetime

I do have an analytics event table with column event_datetime
I want to filter out pandas table for only 2 days, for example, Feb 1st and 2nd:
event_data.loc[(event_data['event_datetime'] >= '2023-02-01') &(event_data['event_datetime'] <= '2023-02-02')]
But this code does return events for 2023-02-01 only, although I wrote less of equal '2023-02-02'.
In sql it works fine. Do I miss something? Did not find anything about it in pandas docs...
Thanks to #Galo do Leste and #Nick ODell hint, the problem was with datetime format of the column - it was not properly converted into str format.
So if column has datetime type - operator <= doesnt work as it expected.
After proper column casting to string format
event_data['event_datetime'] = event_data['event_datetime'].dt.strftime('%Y-%m-%d')
it works fine.

Pandas and datetime coercion. Can't convert whole column to Timestamp

So, I have an issue. Pandas keeps telling me that
'datetime.date' is coerced to a datetime.
In the future pandas will not coerce, and a TypeError will be raised. To >retain the current behavior, convert the 'datetime.date' to a datetime with >'pd.Timestamp'.
I'd like to get rid of this warning
So until now I had a dataframe with some data, I was doing some filtration and manipulation. At some point I have a column with dates in string format. I don't care about timzeones etc. It's all about day accuracy. I'm getting a warning mentioned above, when I convert the strings to datetime, like below:
df['Some_date'] = pd.to_datetime(df['Some_date'], format='%m/%d/%Y')
So I tried to do something like that:
df['Some_date'] = pd.Timestamp(df['Some_date'])
But it fails as pd.Timestamp doesn't accept Series as an argument.
I'm looking for a quickest way to convert those strings to Timestamp.
=====================================
EDIT
I'm so sorry, for confusion. I'm getting my error at another place. It happens when I try to filtrate my data like this:
df = df[(df['Some_date'] > firstday)]
Where firstday is being calculated basing on datetime. Like here:
import datetime
def get_dates_filter():
lastday = datetime.date.today().replace(day=1) - datetime.timedelta(days=1)
firstday = lastday.replace(day=1)
return firstday, lastday
So probably the issue is comparing two different types of date representation
In pandas python dates are still poor supported, the best is working with datetimes with no times.
If there are python dates you can convert to strings before to_datetime:
df['Some_date'] = pd.to_datetime(df['Some_date'].astype(str))
If need remove times from datetimes in column use:
df['Some_date'] = pd.to_datetime(df['Some_date'].astype(str)).dt.floor('d')
Test:
rng = pd.date_range('2017-04-03', periods=3).date
df = pd.DataFrame({'Some_date': rng})
print (df)
Some_date
0 2017-04-03
1 2017-04-04
2 2017-04-05
print (type(df.loc[0, 'Some_date']))
<class 'datetime.date'>
df['Some_date'] = pd.to_datetime(df['Some_date'].astype(str))
print (df)
Some_date
0 2017-04-03
1 2017-04-04
2 2017-04-05
print (type(df.loc[0, 'Some_date']))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
print (df['Some_date'].dtype)
datetime64[ns]

pandas to_datetime formatting

I am trying to compare a pandas to_datetime object to another to_datetime object. In both locations, I am entering the date as date = pd.to_datetime('2017-01-03'), but when I run a print statement on each, in one case I get 2017-01-03, but in another I get 2017-01-03 00:00:00. This causes a problem because if I use an if statement comparing them such as if date1 == date2: they will not compare as equal, when in reality they are. Is there a format statement that I can use to force the to_datetime() command to yield the 2017-01-03 format?
You can use date() method to just select date from pandas timestamp and also use strftimeme(format) method to convert it into string with different formats.
date = pd.to_datetime('2017-01-03').date()
print(date)
>datetime.date(2017, 1, 3)
or
date = pd.to_datetime('2017-01-03').strftime("%Y-%m-%d")
print(date)
>'2017-01-03'
try .date()
pd.to_datetime('2017-01-03').date()
You can use
pd.to_datetime().date()
For example:
a='2017-12-24 22:44:09'
b='2017-12-24'
if pd.to_datetime(a).date() == pd.to_datetime(b).date():
print('perfect')

Pandas: Calling df.loc[] from an index consisting of pd.datetime

Say I have a df as follows:
a=pd.DataFrame([[1,3]]*3,columns=['a','b'],index=['5/4/2017','5/6/2017','5/8/2017'])
a.index=pd.to_datetime(a.index,format='%m/%d/%Y')
The type of of the df.index is now
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
When we try to call a row of data based on the index of type pd.datetime, it is possible to call the value based on a string format of date instead of inputting a datetime object. In the above case, if I want to call a row of data on 5/4/2017, I can simply input the string format of the date to .loc as follows:
print(a.loc['5/4/2017'])
And we do not need to input the datetime object
print(a.loc[pd.datetime(2017,5,4)]
My question is, when calling the data from .loc based on string format of date, how does pandas know if my date string format follows m-d-y or d-m-y or other combinations? In this above case, I used a.loc['5/4/2017'] and it succeeds in returning the value. Why wouldn't it think it might mean April 5 which is not within this index?
Here's my best shot:
Pandas has an internal function called pandas._guess_datetime_format. This is what gets called when passing the 'infer_datetime_format' argument to pandas.to_datetime. It takes a string and runs through a list of "guess" formats and returns its best guess on how to convert that string to a datetime object.
Referencing a datetime index with a string may use a similar approach.
I did some testing to see what would happen in the case you described - where a dataframe contains both the date 2017-04-05 and 2017-05-04.
In this case, the following:
df.loc['5/4/2017']
Returned the Data for May 4th, 2017
df.loc['4/5/2017']
Returned the data for April 5th, 2017.
Attempting to reference 4/5/2017 in your original matrix gave an "is not in the [index]" error.
Based on this, my conclusion is that pandas._guess_datetime_format defaults to a "%m/%d/%Y" format in cases where it cannot be distinguished from "%d/%m/%Y". This is the standard date format in the US.

SAS import excel date format changes

I need to import an excel, the excel has a few columns and the 1st column A is a date column. Column A has the date format DDMMMYYYY e.g. '01Jan2017' and in excel the data type is date type. But when I import it to SAS, all the other columns remain the same data type (numeric, character, etc.) and value. But column A becomes a number e.g. ('42736' for '01Jan2017'). How do I import the data as it is and without converting the data type to other types?
libname out '/path';
proc import out=out.sas_output_dataset
datafile='/path/excel_file.xlsx'
DBMS=XLSX
REPLACE;
sheet="Sheet1";
run;
It is hard to know without seeing the data. The below is general information, it may not answer your precise problem.
To avoid common errors you should set mixed=yes in your libname. You may also want to include stringdate=yes statement.
The mixed=yes allows for any out of range excel date values.
stringdates=yes brings all dates into SAS in character format, so you will need to use the input() function to convert this into a SAS date.
Date = input( Date , mmddyy10. )
I would suggest that you import the excel with the import wizard in SAS. Afterwards right-click on the query and extract the code, see here: SAS Import Query DE
In the generated code itself you can format each imported column into the desired format.
For the possible format see: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n0p2fmevfgj470n17h4k9f27qjag.htm
Hope this helps.
A value of '42736' for '01Jan2017' is an indication that the column in the Excel file has a mix of cells with date values and cells with string values. In that case SAS will make the variable character and store the date values as a digit string that represents the raw number excel uses for the date. To convert '42736' to a date value you need to first convert it to a number and then adjust the number for the difference in the base date used by Excel.
date_value = input(date_string,32.) + '30DEC1899'd ;
To convert the strings that look like '01JAN2017' use the DATE informat instead.
date_value = input(date_string,date11.);
You could add logic to do both to handle a column with mixed values.
date_value = input(date_string,??date11.);
if missing(date_value) then
date_value = input(date_string,??32.) + '30DEC1899'd
;
To have the new variable print the date values in a human readable style attach a date type format to the variable.
format date_value date9. ;

Resources