I am having dataframe like
Input
Date
2020-12-21
2019-09-30
2019-12-04
I want to convert this specific date time format.
Expected Format
Date
2020-12-21T00:00:00Z
2019-09-30T00:00:00Z
2019-12-04T00:00:00Z
My current code
df.loc[:,'Date'] = pd.to_datetime(df.loc[:,'Date'])
Its not working correctly. How can this be fixed.
I'm not sure there's a shortcut for ISO time format. Here's a hack around:
pd.to_datetime(df['Date']).dt.strftime("%Y-%m-%dT%H:%M:%SZ")
Output:
0 2020-12-21T00:00:00Z
1 2019-09-30T00:00:00Z
2 2019-12-04T00:00:00Z
Name: Date, dtype: object
Related
I want to use a date/time as an index for a dataframe in Pandas.
However, daylight saving time is not properly addressed in the database, so the date/time values for the day in which daylight saving time ends have 25 hours and are represented as such:
2019102700
2019102701
...
2019102724
I am using the following code to convert those values to a DateTime object that I use as an index to a Pandas dataframe:
df.index = pd.to_datetime(df["date_time"], format="%Y%m%d%H")
However, that gives an error:
ValueError: unconverted data remains: 4
Presumably because the to_datetime function is not expecting the hour to be 24. Similarly, the day in which daylight saving time starts only has 23 hours.
One solution I thought of was storing the dates as strings, but that seems neither elegant nor efficient. Is there any way to solve the issue of handling daylight saving time when using to_datetime?
If you know the timezone, here's a way to calculate UTC timestamps. Parse only the date part, localize to the actual time zone the data "belongs" to, and convert that to UTC. Now you can parse the hour part and add it as a time delta - e.g.
import pandas as pd
df = pd.DataFrame({'date_time_str': ['2019102722','2019102723','2019102724',
'2019102800','2019102801','2019102802']})
df['date_time'] = (pd.to_datetime(df['date_time_str'].str[:-2], format='%Y%m%d')
.dt.tz_localize('Europe/Berlin')
.dt.tz_convert('UTC'))
df['date_time'] += df['date_time_str'].str[-2:].astype('timedelta64[h]')
# df['date_time']
# 0 2019-10-27 20:00:00+00:00
# 1 2019-10-27 21:00:00+00:00
# 2 2019-10-27 22:00:00+00:00
# 3 2019-10-27 23:00:00+00:00
# 4 2019-10-28 00:00:00+00:00
# 5 2019-10-28 01:00:00+00:00
# Name: date_time, dtype: datetime64[ns, UTC]
I'm not sure if it is the most elegant or efficient solution, but I would:
df.loc[df.date_time.str[-2:]=='25', 'date_time'] = (pd.to_numeric(df.date_time[df.date_time.str[-2:]=='25'])+100-24).apply(str)
df.index = pd.to_datetime(df["date_time"], format="%Y%m%d%H")
Pick the first and the last index, convert them to tz_aware datetime, then you can generate a date_range that handles 25-hour days. And assign the date_range to your df index:
start = pd.to_datetime(df.index[0]).tz_localize("Europe/Berlin")
end = pd.to_datetime(df.index[-1]).tz_localize("Europe/Berlin")
index_ = pd.date_range(start, end, freq="15min")
df = df.set_index(index_)
I have a CSV dataset with the 'date' attribute as follows:
2012-04-29
2012-04-29
2012-04-29
2012-05-05
2012-05-05
Name: date, dtype: datetime64[ns]
I want to convert the unique date values to integer values. So the first 3 values for same date '2012-04-29' become 1, the second two values for same date '2012-05-05' becomes 2 and so on.
How can I do this conversion of 'date' attribute to a new integer attribute/column say 'date_int'?
Thanks
We can do
df['date'].rank(method='dense')
You are looking at factorize:
df['date'].factorize()[0] + 1
I would like to convert all day in the data-frame into day/feb/2020 format
here date field consist only day
from first one convert the date field like this
My current approach is:
import datetime
y=[]
for day in planned_ds.Date:
x=datetime.datetime(2020, 5, day)
print(x)
Is there any easy method to convert all day data-frame to d/m/y format?
One way as assuming you have data like
df = pd.DataFrame([1,2,3,4,5], columns=["date"])
is to convert them to dates and then shift them to start when you need them to:
pd.to_datetime(df["date"], unit="D") - pd.datetime(1970,1,1) + pd.datetime(2020,1,31)
this results in
0 2020-02-01
1 2020-02-02
2 2020-02-03
3 2020-02-04
4 2020-02-05
I have been struggling with removing the time zone info from a column in a pandas dataframe. I have checked the following question, but it does not work for me:
Can I export pandas DataFrame to Excel stripping tzinfo?
I used tz_localize to assign a timezone to a datetime object, because I need to convert to another timezone using tz_convert. This adds an UTC offset, in the way "-06:00". I need to get rid of this offset, because it results in an error when I try to export the dataframe to Excel.
Actual output
2015-12-01 00:00:00-06:00
Desired output
2015-12-01 00:00:00
I have tried to get the characters I want using the str() method, but it seems the result of tz_localize is not a string. My solution so far is to export the dataframe to csv, read the file, and to use the str() method to get the characters I want.
Is there an easier solution?
If your series contains only datetimes, then you can do:
my_series.dt.tz_localize(None)
This will remove the timezone information ( it will not change the time) and return a series of naive local times, which can be exported to excel using to_excel() for example.
Maybe help strip last 6 chars:
print df
datetime
0 2015-12-01 00:00:00-06:00
1 2015-12-01 00:00:00-06:00
2 2015-12-01 00:00:00-06:00
df['datetime'] = df['datetime'].astype(str).str[:-6]
print df
datetime
0 2015-12-01 00:00:00
1 2015-12-01 00:00:00
2 2015-12-01 00:00:00
To remove timezone from all datetime columns in a DataFrame with mixed columns just use:
for col in df.select_dtypes(['datetimetz']).columns:
df[col] = df[col].dt.tz_localize(None)
if you can't save df to excel file just use this (not delete timezone!):
for col in df.select_dtypes(['datetimetz']).columns:
df[col] = df[col].dt.tz_convert(None)
Following Beatriz Fonseca's suggestion, I ended up doing the following:
from datetime import datetime
df['dates'].apply(lambda x:datetime.replace(x,tzinfo=None))
If it is always the last 6 characters that you want to ignore, you may simply slice your current string:
>>> '2015-12-01 00:00:00-06:00'[0:-6]
'2015-12-01 00:00:00'
When I try to convert from number format to Date I'm not getting the same result what I get in Excel.
I need to convert a Number to date format and get the same result what I get in Excel.
For Example in Excel for the below Number I get the following:
Input - 42970.73819
Output- 8/23/2017 17:43
I tried using the date conversion in Pandas but not getting the same result as of Excel.
Thank you
Madan
I think you need convert serial date:
df = pd.DataFrame({'date':[42970.73819,42970.73819]})
print (df)
date
0 42970.73819
1 42970.73819
df = pd.to_datetime((df['date'] - 25569) * 86400.0, unit='s')
print (df)
0 2017-08-23 17:42:59.616
1 2017-08-23 17:42:59.616
Name: date, dtype: datetime64[ns]