How to convert excel date to numeric value using Python - python-3.x

How do I convert Excel date format to number in Python? I'm importing a number of Excel files into Pandas dataframe in a loop and some values are formatted incorrectly in Excel. For example, the number column is imported as date and I'm trying to convert this date value into numeric.
Original New
1912-04-26 00:00:00 4500
How do I convert the date value in original to the numeric value in new? I know this code can convert numeric to date, but is there any similar function that does the opposite?
df.loc[0]['Date']= xlrd.xldate_as_datetime(df.loc[0]['Date'], 0)
I tried to specify the data type when I read in the files and also tried to simply change the data type of the column to 'float' but both didn't work.
Thank you.

I found that the number means the number of days from 1900-01-00.
Following code is to calculate how many days passed from 1900-01-00 until the given date.
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame(
{
'date': ['1912-04-26 00:00:00'],
}
)
print(df)
# date
#0 1912-04-26 00:00:00
def date_to_int(given_date):
given_date = datetime.strptime(given_date, '%Y-%m-%d %H:%M:%S')
base_date = datetime(1900, 1, 1) - timedelta(days=2)
delta = given_date - base_date
return delta.days
df['date'] = df['date'].apply(date_to_int)
print(df)
# date
#0 4500

Related

Change number to datetime for whole column in dataframe (ddf) in Pandas

I have an Excel .xlsb sheet with data, some columns have number as output data, other columns should have dates as output. After uploading the data in Python, some columns have a number in stead of date. How can I change the format of the number in that specific column to a date?
I use Pandas and ddf
The output of the dataframe of column date of birth ('dob_l1') shows '12150', which should be date '6-4-1933'.
I tried to solve this, but unfortunately I only managed to get the date '2050-01-12' which is incorrect.
I used code 'ddf['nwdob_l1'] = pd.to_datetime(ddf['dob_l1'], format='%d%m%y',errors='coerce')'
Who can help me. I was happy to received some good feedback from joe90. He showed me a function that could help for singular dates:
import datetime
def xldate2date(xl):
# valid for dates from 1900-03-01
basedate = datetime.date(1899,12,30)
d = basedate + datetime.timedelta(days=xl)
return d
# Example:
# >>> print(xldate2date(44948))
# 2023-01-22
That is correct, however, I need to change all values in the column (> 500.000), so I cannot do that 1-by-1.
As that question is closed, I hereby open a new question.
Is there anyone who can help me to find the correct code to get the right date in the whole column?
When you read the data in using pandas there are tools for the dates. You want to use parse_dates
Documentation for read_excel
example:
import pandas as pd
df = pd.read_excel('file/path/the.xlsx', parse_dates=['Date'])
This will change the date to be datetime64 format which is better than a number.

Sort pandas dataframe by date in day/month/year format

I am trying to parse data from a csv file, sort them by date and write the sorted dataframe in a new csv file.
Say we have a very simple csv file with date entries following the pattern day/month/year:
Date,Reference
15/11/2020,'001'
02/11/2020,'002'
10/11/2020,'003'
26/11/2020,'004'
23/10/2020,'005'
I read the csv into a Pandas dataframe. When I attempt to order the dataframe based on the dates in ascending order I expect the data to be ordered as follows:
23/10/2020,'005'
02/11/2020,'002'
10/11/2020,'003'
15/11/2020,'001'
26/11/2020,'004'
Sadly, this is not what I get.
If I attempt to convert the date to datetime and then sort, then some date entries are converted to the month/day/year (e.g. 2020-10-23 instead of 2020-23-10) which messes up the ordering:
date reference
2020-02-11 '002'
2020-10-11 '003'
2020-10-23 '005'
2020-11-15 '001'
2020-11-26 '004'
If I sort without converting to datetime, then the ordering is also wrong:
date reference
02/11/2020 '002'
10/11/2020 '003'
15/11/2020 '001'
23/10/2020 '005'
26/11/2020 '004'
Here is my code:
import pandas as pd
df = pd.read_csv('order_dates.csv',
header=0,
names=['date', 'reference'],
dayfirst=True)
df.reset_index(drop=True, inplace=True)
# df.date = pd.to_datetime(df.date)
df.sort_val
df.sort_values(by='date', ascending=True, inplace=True)
print(df)
df.to_csv('sorted.csv')
Why is sorting by date so hard? Can someone explain why the above sorting attempts fail?
Ideally, I would like the sorted.csv to have the date entries in the day/month/year format.
Try:
df.loc[:,'date'] = pd.to_datetime(df.loc[:, 'date'], format='%d/%m-%Y')
What you can do is to specify the datetime format while reading the csv file. To do this try that:
>>> df = pd.read_csv('filename.csv', parse_dates=['Date'],infer_datetime_format='%d/%m/%Y').sort_values(by='Date')
This will read your dates from csv and give you this output where dates are sorted.
Date Reference
4 2020-10-23 '005
1 2020-11-02 '002'
2 2020-11-10 '003'
0 2020-11-15 '001'
3 2020-11-26 '004'
What's left now is to simply change the formatting to the desired one
>>> df['Date'] = df['Date'].dt.strftime('%d/%m/%Y')
Keep in mind however that this will change the Date back to string (object)
>>> df
Date Reference
4 23/10/2020 '005
1 02/11/2020 '002'
2 10/11/2020 '003'
0 15/11/2020 '001'
3 26/11/2020 '004'
>>> df.dtypes
Date object

How to convert a column of data in a DataFrame filled with string representation of non-uniformed date formats to datetime?

Let's say:
>>> print(df)
location date
paris 23/02/2010
chicago 3-23-2013
...
new york 04-23-2013
helsinki 13/10/2015
Currently, df["date"] is in str. I want to convert the date column to datetime using
>>> df["date"] = pd.to_datetime(df["date"])
I would get ValueError due to ParserError. This is because the format of the date is inconsistent (i.e. dd/mm/yyyy, then next one is m/dd/yyyy).
If I were to write the code below, it still wouldn't work due to the date not being uniformed and delimiters being different:
>>> df["date"] = pd.to_datetime(df["date"], format="%d/%m/%Y")
The last option that I could think of was to write the code below, which replaces all of the dates that are not formatted like the first date to NaT:
>>> df["date"] = pd.to_datetime(df["date"], errors="coerce")
How do I convert the whole date column to datetime while having the dates not uniform in terms of the delimiters, and the orders of days, months and years?
use, apply method of pandas
df['date'] = df.apply(lambda x: pd.to_datetime(x['date']),axis = 1)

How to convert date column to into three columns of date, month and year format in pandas?

I want to convert the given date column into date, month and year format. Initially, there are 2 columns after conversion it would be 4 colums like
Country|Date|Month|Year
The given data frame is of the type
test=pd.DataFrame({'Date':['2014,1,1','2014,4,17'],'Country':['Denmark','Australia']})
Pandas has a to_datetime function.
import pandas as pd
df = pd.DataFrame({'Date':['2014,1,1','2014,4,17']})
df["Date"] = pd.to_datetime(df["Date"], format="%Y,%m,%d")
# If you want to save other datetime attributes as their own columns
# just pull them out assign them to their own columns
# df["Month"] = df["Date"].dt.month
# df["Year"] = df["Date"].dt.year

Construct a DataTime index from multiple columns of a datadrame

I am parsing a dataframe from a sas7bdat file and I want to convert the index into datetime to resample the data.
I have one column with the Date which is type String and another column of the time which is of type datetime.time. Does anybody know how to convert this to one column of datetime?
I already tried the pd.datetime like this but it requires individual columns for year, month and day:
df['TimeIn']=str(df['TimeIn'])
df['datetime']=pd.to_datetime(df[['Date', 'TimeIn']], dayfirst=True)
This gives me a value error:
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
DataFrame column headers
If you convert both the date and time column to str then you can concatenate them and then call to_datetime:
In[155]:
df = pd.DataFrame({'Date':['08/05/2018'], 'TimeIn':['10:32:12']})
df
Out[155]:
Date TimeIn
0 08/05/2018 10:32:12
In[156]:
df['new_date'] = pd.to_datetime(df['Date']+' '+df['TimeIn'])
df
Out[156]:
Date TimeIn new_date
0 08/05/2018 10:32:12 2018-08-05 10:32:12

Resources