Number to Date Conversion using Pandas in Python? - python-3.x

When I try to convert from number format to Date I'm not getting the same result what I get in Excel.
I need to convert a Number to date format and get the same result what I get in Excel.
For Example in Excel for the below Number I get the following:
Input - 42970.73819
Output- 8/23/2017 17:43
I tried using the date conversion in Pandas but not getting the same result as of Excel.
Thank you
Madan

I think you need convert serial date:
df = pd.DataFrame({'date':[42970.73819,42970.73819]})
print (df)
date
0 42970.73819
1 42970.73819
df = pd.to_datetime((df['date'] - 25569) * 86400.0, unit='s')
print (df)
0 2017-08-23 17:42:59.616
1 2017-08-23 17:42:59.616
Name: date, dtype: datetime64[ns]

Related

convert datetime to date python --> error: unhashable type: 'numpy.ndarray'

Pandas by default represent dates with datetime64 [ns], so I have in my columns this format [2016-02-05 00:00:00] but I just want the date 2016-02-05, so I applied this code for a few columns:
df3a['MA'] = pd.to_datetime(df3a['MA'])
df3a['BA'] = pd.to_datetime(df3a['BA'])
df3a['FF'] = pd.to_datetime(df3a['FF'])
df3a['JJ'] = pd.to_datetime(df3a['JJ'])
.....
but it gives me as result this error: TypeError: type unhashable: 'numpy.ndarray'
my question is: why i got this error and how do i convert datetime to date for multiple columns (around 50)?
i will be grateful for your help
One way to achieve what you'd like is with a DatetimeIndex. I've first created an Example DataFrame with 'date' and 'values' columns and tried from there on to reproduce the error you've got.
import pandas as pd
import numpy as np
# Example DataFrame with a DatetimeIndex (dti)
dti = pd.date_range('2020-12-01','2020-12-17') # dates from first of december up to date
values = np.random.choice(range(1, 101), len(dti)) # random values between 1 and 100
df = pd.DataFrame({'date':dti,'values':values}, index=range(len(dti)))
print(df.head())
>>> date values
0 2020-12-01 85
1 2020-12-02 100
2 2020-12-03 96
3 2020-12-04 40
4 2020-12-05 27
In the example, just the dates are already shown without the time in the 'date' column, I guess since it is a DatetimeIndex.
What I haven't tested but might can work for you is:
# Your dataframe
df3a['MA'] = pd.DatetimeIndex(df3a['MA'])
...
# automated transform for all columns (if all columns are datetimes!)
for label in df3a.columns:
df3a[label] = pd.DatetimeIndex(df3a[label])
Use DataFrame.apply:
cols = ['MA', 'BA', 'FF', 'JJ']
df3a[cols] = df3a[cols].apply(pd.to_datetime)

is there any method in pandas to convert dataframe from day to defaullt d/m/y format?

I would like to convert all day in the data-frame into day/feb/2020 format
here date field consist only day
from first one convert the date field like this
My current approach is:
import datetime
y=[]
for day in planned_ds.Date:
x=datetime.datetime(2020, 5, day)
print(x)
Is there any easy method to convert all day data-frame to d/m/y format?
One way as assuming you have data like
df = pd.DataFrame([1,2,3,4,5], columns=["date"])
is to convert them to dates and then shift them to start when you need them to:
pd.to_datetime(df["date"], unit="D") - pd.datetime(1970,1,1) + pd.datetime(2020,1,31)
this results in
0 2020-02-01
1 2020-02-02
2 2020-02-03
3 2020-02-04
4 2020-02-05

Issue while converting string to datetime in pandas

I am having dataframe like
Input
Date
2020-12-21
2019-09-30
2019-12-04
I want to convert this specific date time format.
Expected Format
Date
2020-12-21T00:00:00Z
2019-09-30T00:00:00Z
2019-12-04T00:00:00Z
My current code
df.loc[:,'Date'] = pd.to_datetime(df.loc[:,'Date'])
Its not working correctly. How can this be fixed.
I'm not sure there's a shortcut for ISO time format. Here's a hack around:
pd.to_datetime(df['Date']).dt.strftime("%Y-%m-%dT%H:%M:%SZ")
Output:
0 2020-12-21T00:00:00Z
1 2019-09-30T00:00:00Z
2 2019-12-04T00:00:00Z
Name: Date, dtype: object

Date Format changing automatically in pandas data frame [duplicate]

Im learning python (3.6 with anaconda) for my studies.
Im using pandas to import a xls file with 2 columns : Date (dd-mm-yyyy) and price.
But pandas changes the date format :
xls_file = pd.read_excel('myfile.xls')
print(xls_file.iloc[0, 0])
Im getting :
2010-01-04 00:00:00
instead of :
04-01-2010 or at least : 2010-01-04
I dont know why hh:mm:ss is added, I get the same result for each row from the Date column. I tried also different things using to_datetime but it didnt fix it.
Any idea ?
Thanks
What you need is to define the format that the datetime values get printed. There might be a more elegant way to do it but something like that will work:
In [11]: df
Out[11]:
id date
0 1 2017-09-12
1 2 2017-10-20
# Specifying the format
In [16]: print(pd.datetime.strftime(df.iloc[0,1], "%Y-%m-%d"))
2017-09-12
If you want to store the date as string in your specific format then you can also do something like:
In [17]: df["datestr"] = pd.datetime.strftime(df.iloc[0,1], "%Y-%m-%d")
In [18]: df
Out[18]:
id date datestr
0 1 2017-09-12 2017-09-12
1 2 2017-10-20 2017-09-12
In [19]: df.dtypes
Out[19]:
id int64
date datetime64[ns]
datestr object
dtype: object

Efficient way of converting String column to Date in Pandas (in Python), but without Timestamp

I am having a DataFrame which contains two String columns df['month'] and df['year']. I want to create a new column df['date'] by combining month and the year column. I have done that successfully using the structure below -
df['date']=pd.to_datetime((df['month']+df['year']),format='%m%Y')
where by for df['month'] = '08' and df['year']='1968'
we get df['date']=1968-08-01
This is exactly what I wanted.
Problem at hand: My DataFrame has more than 200,000 rows and I notice that sometimes, in addition, I also get Timestamp like the one below for a few rows and I want to avoid that -
1972-03-01 00:00:00
I solved this issue by using the .dt acessor, which can be used to manipulate the Series, whereby I explicitly extracted only the date using the code below-
df['date']=pd.to_datetime((df['month']+df['year']),format='%m%Y') #Line 1
df['date']=df['date']=.dt.date #Line 2
The problem was solved, just that the Line 2 took 5 times more time than Line 1.
Question: Is there any way where I could tweak Line 1 into giving just the dates and not the Timestamp? I am sure this simple problem cannot have such an inefficient solution. Can I solve this issue in a more time and resource efficient manner?
AFAIk we don't have date dtype n Pandas, we only have datetime, so we will always have a time part.
Even though Pandas shows: 1968-08-01, it has a time part: 00:00:00.
Demo:
In [32]: df = pd.DataFrame(pd.to_datetime(['1968-08-01', '2017-08-01']), columns=['Date'])
In [33]: df
Out[33]:
Date
0 1968-08-01
1 2017-08-01
In [34]: df['Date'].dt.time
Out[34]:
0 00:00:00
1 00:00:00
Name: Date, dtype: object
And if you want to have a string representation, there is a faster way:
df['date'] = df['year'].astype(str) + '-' + df['month'].astype(str) + '-01'
UPDATE: be aware that .dt.date will give you a string representation:
In [53]: df.dtypes
Out[53]:
Date datetime64[ns]
dtype: object
In [54]: df['new'] = df['Date'].dt.date
In [55]: df
Out[55]:
Date new
0 1968-08-01 1968-08-01
1 2017-08-01 2017-08-01
In [56]: df.dtypes
Out[56]:
Date datetime64[ns]
new object # <--- NOTE !!!
dtype: object

Resources