Pandas: Calling df.loc[] from an index consisting of pd.datetime - python-3.x

Say I have a df as follows:
a=pd.DataFrame([[1,3]]*3,columns=['a','b'],index=['5/4/2017','5/6/2017','5/8/2017'])
a.index=pd.to_datetime(a.index,format='%m/%d/%Y')
The type of of the df.index is now
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
When we try to call a row of data based on the index of type pd.datetime, it is possible to call the value based on a string format of date instead of inputting a datetime object. In the above case, if I want to call a row of data on 5/4/2017, I can simply input the string format of the date to .loc as follows:
print(a.loc['5/4/2017'])
And we do not need to input the datetime object
print(a.loc[pd.datetime(2017,5,4)]
My question is, when calling the data from .loc based on string format of date, how does pandas know if my date string format follows m-d-y or d-m-y or other combinations? In this above case, I used a.loc['5/4/2017'] and it succeeds in returning the value. Why wouldn't it think it might mean April 5 which is not within this index?

Here's my best shot:
Pandas has an internal function called pandas._guess_datetime_format. This is what gets called when passing the 'infer_datetime_format' argument to pandas.to_datetime. It takes a string and runs through a list of "guess" formats and returns its best guess on how to convert that string to a datetime object.
Referencing a datetime index with a string may use a similar approach.
I did some testing to see what would happen in the case you described - where a dataframe contains both the date 2017-04-05 and 2017-05-04.
In this case, the following:
df.loc['5/4/2017']
Returned the Data for May 4th, 2017
df.loc['4/5/2017']
Returned the data for April 5th, 2017.
Attempting to reference 4/5/2017 in your original matrix gave an "is not in the [index]" error.
Based on this, my conclusion is that pandas._guess_datetime_format defaults to a "%m/%d/%Y" format in cases where it cannot be distinguished from "%d/%m/%Y". This is the standard date format in the US.

Related

Date Manipulation and Comparisons Python,Pandas and Excel

I have a datetime column[TRANSFER_DATE] in an excel sheet shows dates formated as
1/4/2019 0:45 when this date is selected, in it appears as
01/04/2019 00:45:08 am using a python scrip to read this column[TRANSFER_DATE] which shows the datetime as 01/04/2019 00:45:08
However when i try to compare the column[TRANSFER_DATE] whith another date, I get this error
Can only use .dt accessor with datetimelike "
ValueError: : "Can only use .dt accessor with datetimelike values" while evaluating
implying those values are not actually recognized as datetime values
mask_part_date = data.loc[data['TRANSFER_DATE'].dt.date.astype(str) == '2019-04-12']
As seen in this question, the Excel import might have silently failed for some of the values in the column. If you check the column type with:
data.dtypes
it might show as object instead of datetime64.
If you force your column to have datetime values, that might solve your issue:
data['TRANSFER_DATE'] = pd.to_datetime(data['TRANSFER_DATE'], errors='coerce')
You will spot the non-converted values as NaT and you can debug those manually.
Regarding your comparison, after the dataframe conversion to datetime objects, this might be more efficient:
mask_part_date = data.loc[data['TRANSFER_DATE'] == pd.Timestamp('2019-04-12')]

Changing format of TODAY() in excel

I'm using today to aquire todays date and then adding a static value to the end of it using the following:
=TODAY()&"T23:00:00"
Which Returns 43202T23:00:00
I really need it in the format 2018-04-12T23:00:00
Any help on this would be great!
There are a couple ways to accomplish this, depending on whether your goal is a formatted String (to display) or a numeric value (such as data type Date) for storing or using with calculations.
If you want a formatted date/time result (to display to the user)...
Use the TEXT worksheet function:
=TEXT(TODAY(),"yyyy-mm-dd")&"T23:00:00"
...the reason this works is because TODAY() returns a Date data type, which is basically just a number representing the date/time, (where 1 = midnight on January 1, 1900, 2 = midnight on January 2, 1900, 2.5 = noon on January 2, 1900,etc).
You can convert the date type to a String (text) with the TEXT function, in whatever format you like. The example above will display today's date as 2018-04-12.
If, for example, you wanted the date portion of the string displayed asApril 12, 2018 then you would instead use:
TEXT(TODAY(),"mmmm d, yyyy")
Note that the TEXT worksheet function (and VBA's Format function) always return Strings, ready to be concatenated with the rest of the String that you're trying to add ("T23:00:00").
If you want to use the result in calculations...
If you instead want the result to be in a Date type, then instead of concatenating a string (produced by the TEXT function) to a string (from "T23:00:00"), you could instead add a date to a date:
=TODAY()+TIME(23,0,0)
or
=TODAY()+TIMEVALUE("23:00")
..and then you can format it as you like to show or hide Y/M/D/H/M/S as necessary with Number Formats (shortcut: Ctrl+1).
More Information:
MSDN : TEXT Function (Excel)
MSDN : TIMEVALUE Function (Excel)
MSDN : TIME Function (Excel)

SAS import excel date format changes

I need to import an excel, the excel has a few columns and the 1st column A is a date column. Column A has the date format DDMMMYYYY e.g. '01Jan2017' and in excel the data type is date type. But when I import it to SAS, all the other columns remain the same data type (numeric, character, etc.) and value. But column A becomes a number e.g. ('42736' for '01Jan2017'). How do I import the data as it is and without converting the data type to other types?
libname out '/path';
proc import out=out.sas_output_dataset
datafile='/path/excel_file.xlsx'
DBMS=XLSX
REPLACE;
sheet="Sheet1";
run;
It is hard to know without seeing the data. The below is general information, it may not answer your precise problem.
To avoid common errors you should set mixed=yes in your libname. You may also want to include stringdate=yes statement.
The mixed=yes allows for any out of range excel date values.
stringdates=yes brings all dates into SAS in character format, so you will need to use the input() function to convert this into a SAS date.
Date = input( Date , mmddyy10. )
I would suggest that you import the excel with the import wizard in SAS. Afterwards right-click on the query and extract the code, see here: SAS Import Query DE
In the generated code itself you can format each imported column into the desired format.
For the possible format see: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n0p2fmevfgj470n17h4k9f27qjag.htm
Hope this helps.
A value of '42736' for '01Jan2017' is an indication that the column in the Excel file has a mix of cells with date values and cells with string values. In that case SAS will make the variable character and store the date values as a digit string that represents the raw number excel uses for the date. To convert '42736' to a date value you need to first convert it to a number and then adjust the number for the difference in the base date used by Excel.
date_value = input(date_string,32.) + '30DEC1899'd ;
To convert the strings that look like '01JAN2017' use the DATE informat instead.
date_value = input(date_string,date11.);
You could add logic to do both to handle a column with mixed values.
date_value = input(date_string,??date11.);
if missing(date_value) then
date_value = input(date_string,??32.) + '30DEC1899'd
;
To have the new variable print the date values in a human readable style attach a date type format to the variable.
format date_value date9. ;

Oracle: Assistance in breaking apart a complex date string

I have a complex date string being read from a csv file. The format is unable to be processed by Oracle's TO_DATE function. Looking for an efficient method to break this string apart and return a date object, to insert into a DATE column. The suggested option of using TO_DATE with 'DD-MON-YY HH.MI.SS AM' does not work. Not variation of this will break up this particular string. Hence the need for a custom function. I have also tried with the 'HH.MI.SS.SSSSS AM' format which also does not work. I have found that if I drop the fractional seconds, it will work. If I run a regex to drop that portion, it should convert as expected.
The string is formatted as: 21-OCT-04 01.03.23.966000 PM
My initial thought is to break up by space first, resulting in three sub strings.
Then break the first substring by - and the second by ., and load the resulting pieces into a DATE object directly.
Is there a better method I could use?
Thank you, Allan
Use what you have, which is a timestamp literal, to create a timestamp, and then cast it "as date":
select
cast(to_timestamp('21-OCT-04 01.03.23.966000 PM', 'dd-MON-rr hh.mi.ss.ff AM') as date) dt
from dual;
DT
----------------------
2004/10/21 01:03:23 PM
(The output format depends on my specific session NLS_DATE_FORMAT, which I actually changed for this illustration to 'yyyy/mm/dd hh:mi:ss AM'.)

Excel C API equivalent of Interop Range.Value in C#

Trying to figure out how to read the contents of a reference and get same results and Interop/COM's Range.Value...namely that the object[,] returned contains string, DateTime, and doubles.
I'm using ExcelDNA (and underlying XlCall.Excel to call C API) and both...
ExcelReference.GetValue() and
XlCall.Excel( XlCall.xlfDeref, reference )
Both return an object[,] that is equivalent to Interop/COM's Range.Value2...namely that the object[,] returned contains only string and doubles.
The problem with this is that Dates are returned as double and I have no way of determining if the value should be a double or a DateTime.
ExcelReference.GetValue() will never return a DateTime, since that's never the stored value of a cell - it is just a display format applied to a numeric (double) value. It is similar for currency and percentage formatting.
You can read the "Contents of the cell as it is currently displayed, as text, including any additional numbers or symbols resulting from the cell's formatting." using the xlfGetCell call with the C API, using option 53. However, you then have to figure out whether the string represents a date/time yourself.
One could also read the "Number format of the cell, as text (for example, "m/d/yy" or "General")." using xlfGetCell option 7.

Resources