Pandas - Exclude Timezone when using .apply(pd.to_datetime) [duplicate] - python-3.x

I have been struggling with removing the time zone info from a column in a pandas dataframe. I have checked the following question, but it does not work for me:
Can I export pandas DataFrame to Excel stripping tzinfo?
I used tz_localize to assign a timezone to a datetime object, because I need to convert to another timezone using tz_convert. This adds an UTC offset, in the way "-06:00". I need to get rid of this offset, because it results in an error when I try to export the dataframe to Excel.
Actual output
2015-12-01 00:00:00-06:00
Desired output
2015-12-01 00:00:00
I have tried to get the characters I want using the str() method, but it seems the result of tz_localize is not a string. My solution so far is to export the dataframe to csv, read the file, and to use the str() method to get the characters I want.
Is there an easier solution?

If your series contains only datetimes, then you can do:
my_series.dt.tz_localize(None)
This will remove the timezone information ( it will not change the time) and return a series of naive local times, which can be exported to excel using to_excel() for example.

Maybe help strip last 6 chars:
print df
datetime
0 2015-12-01 00:00:00-06:00
1 2015-12-01 00:00:00-06:00
2 2015-12-01 00:00:00-06:00
df['datetime'] = df['datetime'].astype(str).str[:-6]
print df
datetime
0 2015-12-01 00:00:00
1 2015-12-01 00:00:00
2 2015-12-01 00:00:00

To remove timezone from all datetime columns in a DataFrame with mixed columns just use:
for col in df.select_dtypes(['datetimetz']).columns:
df[col] = df[col].dt.tz_localize(None)
if you can't save df to excel file just use this (not delete timezone!):
for col in df.select_dtypes(['datetimetz']).columns:
df[col] = df[col].dt.tz_convert(None)

Following Beatriz Fonseca's suggestion, I ended up doing the following:
from datetime import datetime
df['dates'].apply(lambda x:datetime.replace(x,tzinfo=None))

If it is always the last 6 characters that you want to ignore, you may simply slice your current string:
>>> '2015-12-01 00:00:00-06:00'[0:-6]
'2015-12-01 00:00:00'

Related

Python Pandas: Supporting 25 hours in datetime index

I want to use a date/time as an index for a dataframe in Pandas.
However, daylight saving time is not properly addressed in the database, so the date/time values for the day in which daylight saving time ends have 25 hours and are represented as such:
2019102700
2019102701
...
2019102724
I am using the following code to convert those values to a DateTime object that I use as an index to a Pandas dataframe:
df.index = pd.to_datetime(df["date_time"], format="%Y%m%d%H")
However, that gives an error:
ValueError: unconverted data remains: 4
Presumably because the to_datetime function is not expecting the hour to be 24. Similarly, the day in which daylight saving time starts only has 23 hours.
One solution I thought of was storing the dates as strings, but that seems neither elegant nor efficient. Is there any way to solve the issue of handling daylight saving time when using to_datetime?
If you know the timezone, here's a way to calculate UTC timestamps. Parse only the date part, localize to the actual time zone the data "belongs" to, and convert that to UTC. Now you can parse the hour part and add it as a time delta - e.g.
import pandas as pd
df = pd.DataFrame({'date_time_str': ['2019102722','2019102723','2019102724',
'2019102800','2019102801','2019102802']})
df['date_time'] = (pd.to_datetime(df['date_time_str'].str[:-2], format='%Y%m%d')
.dt.tz_localize('Europe/Berlin')
.dt.tz_convert('UTC'))
df['date_time'] += df['date_time_str'].str[-2:].astype('timedelta64[h]')
# df['date_time']
# 0 2019-10-27 20:00:00+00:00
# 1 2019-10-27 21:00:00+00:00
# 2 2019-10-27 22:00:00+00:00
# 3 2019-10-27 23:00:00+00:00
# 4 2019-10-28 00:00:00+00:00
# 5 2019-10-28 01:00:00+00:00
# Name: date_time, dtype: datetime64[ns, UTC]
I'm not sure if it is the most elegant or efficient solution, but I would:
df.loc[df.date_time.str[-2:]=='25', 'date_time'] = (pd.to_numeric(df.date_time[df.date_time.str[-2:]=='25'])+100-24).apply(str)
df.index = pd.to_datetime(df["date_time"], format="%Y%m%d%H")
Pick the first and the last index, convert them to tz_aware datetime, then you can generate a date_range that handles 25-hour days. And assign the date_range to your df index:
start = pd.to_datetime(df.index[0]).tz_localize("Europe/Berlin")
end = pd.to_datetime(df.index[-1]).tz_localize("Europe/Berlin")
index_ = pd.date_range(start, end, freq="15min")
df = df.set_index(index_)

is there any method in pandas to convert dataframe from day to defaullt d/m/y format?

I would like to convert all day in the data-frame into day/feb/2020 format
here date field consist only day
from first one convert the date field like this
My current approach is:
import datetime
y=[]
for day in planned_ds.Date:
x=datetime.datetime(2020, 5, day)
print(x)
Is there any easy method to convert all day data-frame to d/m/y format?
One way as assuming you have data like
df = pd.DataFrame([1,2,3,4,5], columns=["date"])
is to convert them to dates and then shift them to start when you need them to:
pd.to_datetime(df["date"], unit="D") - pd.datetime(1970,1,1) + pd.datetime(2020,1,31)
this results in
0 2020-02-01
1 2020-02-02
2 2020-02-03
3 2020-02-04
4 2020-02-05

Issue while converting string to datetime in pandas

I am having dataframe like
Input
Date
2020-12-21
2019-09-30
2019-12-04
I want to convert this specific date time format.
Expected Format
Date
2020-12-21T00:00:00Z
2019-09-30T00:00:00Z
2019-12-04T00:00:00Z
My current code
df.loc[:,'Date'] = pd.to_datetime(df.loc[:,'Date'])
Its not working correctly. How can this be fixed.
I'm not sure there's a shortcut for ISO time format. Here's a hack around:
pd.to_datetime(df['Date']).dt.strftime("%Y-%m-%dT%H:%M:%SZ")
Output:
0 2020-12-21T00:00:00Z
1 2019-09-30T00:00:00Z
2 2019-12-04T00:00:00Z
Name: Date, dtype: object

Slicing Time in Pandas DataFrame

I'm reading two db columns into pandas dataframe. It works fine but time data on Db is like this "2018-01-18T00:00:00". I just need to slice the year,month and day. I don't need time since its all 00:00 in db. How do we slice it? Thank you!
tables_prices='''SELECT date, tryprice FROM Price'''
df=pd.read_sql_query(tables_prices, conn)
x=df['date']
y=df['tryprice']
You can using to_datetime
df
Out[254]:
date
0 2018-01-18T00:00:00
1 2018-01-18T00:00:00
2 2018-01-18T00:00:00
df.date=pd.to_datetime(df.date).dt.date
df
Out[256]:
date
0 2018-01-18
1 2018-01-18
2 2018-01-18
#year: pd.to_datetime(df.date).dt.year
#Month: pd.to_datetime(df.date).dt.month
#day:pd.to_datetime(df.date).dt.day

Efficient way of converting String column to Date in Pandas (in Python), but without Timestamp

I am having a DataFrame which contains two String columns df['month'] and df['year']. I want to create a new column df['date'] by combining month and the year column. I have done that successfully using the structure below -
df['date']=pd.to_datetime((df['month']+df['year']),format='%m%Y')
where by for df['month'] = '08' and df['year']='1968'
we get df['date']=1968-08-01
This is exactly what I wanted.
Problem at hand: My DataFrame has more than 200,000 rows and I notice that sometimes, in addition, I also get Timestamp like the one below for a few rows and I want to avoid that -
1972-03-01 00:00:00
I solved this issue by using the .dt acessor, which can be used to manipulate the Series, whereby I explicitly extracted only the date using the code below-
df['date']=pd.to_datetime((df['month']+df['year']),format='%m%Y') #Line 1
df['date']=df['date']=.dt.date #Line 2
The problem was solved, just that the Line 2 took 5 times more time than Line 1.
Question: Is there any way where I could tweak Line 1 into giving just the dates and not the Timestamp? I am sure this simple problem cannot have such an inefficient solution. Can I solve this issue in a more time and resource efficient manner?
AFAIk we don't have date dtype n Pandas, we only have datetime, so we will always have a time part.
Even though Pandas shows: 1968-08-01, it has a time part: 00:00:00.
Demo:
In [32]: df = pd.DataFrame(pd.to_datetime(['1968-08-01', '2017-08-01']), columns=['Date'])
In [33]: df
Out[33]:
Date
0 1968-08-01
1 2017-08-01
In [34]: df['Date'].dt.time
Out[34]:
0 00:00:00
1 00:00:00
Name: Date, dtype: object
And if you want to have a string representation, there is a faster way:
df['date'] = df['year'].astype(str) + '-' + df['month'].astype(str) + '-01'
UPDATE: be aware that .dt.date will give you a string representation:
In [53]: df.dtypes
Out[53]:
Date datetime64[ns]
dtype: object
In [54]: df['new'] = df['Date'].dt.date
In [55]: df
Out[55]:
Date new
0 1968-08-01 1968-08-01
1 2017-08-01 2017-08-01
In [56]: df.dtypes
Out[56]:
Date datetime64[ns]
new object # <--- NOTE !!!
dtype: object

Resources