Slicing Time in Pandas DataFrame - python-3.x

I'm reading two db columns into pandas dataframe. It works fine but time data on Db is like this "2018-01-18T00:00:00". I just need to slice the year,month and day. I don't need time since its all 00:00 in db. How do we slice it? Thank you!
tables_prices='''SELECT date, tryprice FROM Price'''
df=pd.read_sql_query(tables_prices, conn)
x=df['date']
y=df['tryprice']

You can using to_datetime
df
Out[254]:
date
0 2018-01-18T00:00:00
1 2018-01-18T00:00:00
2 2018-01-18T00:00:00
df.date=pd.to_datetime(df.date).dt.date
df
Out[256]:
date
0 2018-01-18
1 2018-01-18
2 2018-01-18
#year: pd.to_datetime(df.date).dt.year
#Month: pd.to_datetime(df.date).dt.month
#day:pd.to_datetime(df.date).dt.day

Related

Widening long table grouped on date

I have run into a problem in transforming a dataframe. I'm trying to widen a table grouped on a datetime column, but cant seem to make it work. I have tried to transpose it, and pivot it but cant really make it the way i want it.
Example table:
datetime value
2022-04-29T02:00:00.000000000 5
2022-04-29T03:00:00.000000000 6
2022-05-29T02:00:00.000000000 5
2022-05-29T03:00:00.000000000 7
What I want to achieve is:
index date 02:00 03:00
1 2022-04-29 5 6
2 2022-05-29 5 7
The real data has one data point from 00:00 - 20:00 fore each day. So I guess a loop would be the way to go to generate the columns.
Does anyone know a way to solve this, or can nudge me in the right direction?
Thanks in advance!
Assuming from details you have provided, I think you are dealing with timeseries data and you have data from different dates acquired at 02:00:00 and 03:00:00. Please correct me if I am wrong.
First we replicate your DataFrame object.
import datetime as dt
from io import StringIO
import pandas as pd
data_str = """2022-04-29T02:00:00.000000000 5
2022-04-29T03:00:00.000000000 6
2022-05-29T02:00:00.000000000 5
2022-05-29T03:00:00.000000000 7"""
df = pd.read_csv(StringIO(data_str), sep=" ", header=None)
df.columns = ["date", "value"]
now we calculate unique days where you acquired data:
unique_days = df["date"].apply(lambda x: dt.datetime.strptime(x[:-3], "%Y-%m-%dT%H:%M:%S.%f").date()).unique()
Here I trimmed last 3 0s from your date because it would get complicated to parse. We convert the datetime to datetime object and get unique values
Now we create a new empty df in desired form:
new_df = pd.DataFrame(columns=["date", "02:00", "03:00"])
after this we can populate the values:
for day in unique_days:
new_row_data = [day] # this creates a row of 3 elems, which will be inserted into empty df
new_row_data.append(df.loc[df["date"] == f"{day}T02:00:00.000000000", "value"].values[0]) # here we find data for 02:00 for that date
new_row_data.append(df.loc[df["date"] == f"{day}T03:00:00.000000000", "value"].values[0]) # here we find data for 03:00 same day
new_df.loc[len(new_df)] = new_row_data # now we insert row to last pos
this should give you:
date 02:00 03:00
0 2022-04-29 5 6
1 2022-05-29 5 7

Convert number into hours and minutes wile reading CSV in Pandas

I have CSV file where the second column indicates a time point with the format HHMMSS.
ID;TIME
A;110500
B;090000
C;130200
This situation indicates some questions for me.
Does pandas have a data format to represent a time point with hour, minutes and seconds but without the day, month, ...?
How can I convert that fields to such a format?
On Python I would iterate over the fields. But I am sure that Pandas have a more efficient way.
If there is no time of day format without date I could add a day-month-year date to that timepoint.
That is an MWE
import pandas
import io
csv = io.StringIO('ID;TIME\nA;110500\nB;090000\nC;130200')
df = pandas.read_csv(csv, sep=';')
print(df)
Results in
ID TIME
0 A 110500
1 B 90000
2 C 130200
But what I want to see is
ID TIME
0 A 11:05:00
1 B 9:00:00
2 C 13:02:00
Or much better cutting the seconds also
ID TIME
0 A 11:05
1 B 9:00
2 C 13:02
You could use the parameter date_parser in read_csv like and the time accesor
df = pandas.read_csv(csv, sep=';',
parse_dates=[1], # need to know the position of the TIME column
date_parser=lambda x: pandas.to_datetime(x, format='%H%M%S').time)
print(df)
ID TIME
0 A 11:05:00
1 B 09:00:00
2 C 13:02:00
But doing it after reading might be as good
df = (pandas.read_csv(csv, sep=';')
.assign(TIME=lambda x: pandas.to_datetime(x['TIME'], format='%H%M%S').dt.time)
#or lambda x: pandas.to_datetime(x['TIME'], format='%H%M%S').dt.strftime('%#H:%M')
)

How can I get merge table column resample in python?

I have a question handling dataframe in pandas.
I really don't know what to do.
Could you check this problem?
[df1]
This is first dataframe and I want to get second dataframe.
Like this
I got a index value DATE(Week), DATE(Month) using resample method in pandas.
but I don't know merge the table like second table.
so please check this question. Thank you so much.
What I have understood from your question is that you want to diversify DATE column to its nearest week and month, so if that is the case you need not have to create two separate DataFrame, there is an easier way to do it using DateOffsets
#taking sample from your data
import pandas as pd
from pandas.tseries.offsets import *
>>d = {'DATE': ['2019-01-14', '2019-01-16', '2019-02-19'], 'TX_COST': [156800, 157000, 150000]}
>>df = pd.DataFrame(data=d)
>>df
DATE TX_COST
0 2019-01-14 156800
1 2019-01-16 157000
2 2019-02-19 150000
#convert Date column to datetime format
df['DATE'] = pd.to_datetime(df['DATE'])
#as per your requirement set weekday=6 that is sunday as the week ending date
>>> df['WEEK'] = df['DATE'] + Week(weekday=6)
>>> df
DATE TX_COST WEEK
0 2019-01-14 156800 2019-01-20
1 2019-01-16 157000 2019-01-20
2 2019-02-19 150000 2019-02-24
#use month offset to round the date to nearest month end
>>> df['MONTH'] = df['DATE'] + pd.offsets.MonthEnd()
>>> df
DATE TX_COST WEEK MONTH
0 2019-01-14 156800 2019-01-20 2019-01-31
1 2019-01-16 157000 2019-01-20 2019-01-31
2 2019-02-19 150000 2019-02-24 2019-02-28
This will create the DataFrame which you require

Pandas - Exclude Timezone when using .apply(pd.to_datetime) [duplicate]

I have been struggling with removing the time zone info from a column in a pandas dataframe. I have checked the following question, but it does not work for me:
Can I export pandas DataFrame to Excel stripping tzinfo?
I used tz_localize to assign a timezone to a datetime object, because I need to convert to another timezone using tz_convert. This adds an UTC offset, in the way "-06:00". I need to get rid of this offset, because it results in an error when I try to export the dataframe to Excel.
Actual output
2015-12-01 00:00:00-06:00
Desired output
2015-12-01 00:00:00
I have tried to get the characters I want using the str() method, but it seems the result of tz_localize is not a string. My solution so far is to export the dataframe to csv, read the file, and to use the str() method to get the characters I want.
Is there an easier solution?
If your series contains only datetimes, then you can do:
my_series.dt.tz_localize(None)
This will remove the timezone information ( it will not change the time) and return a series of naive local times, which can be exported to excel using to_excel() for example.
Maybe help strip last 6 chars:
print df
datetime
0 2015-12-01 00:00:00-06:00
1 2015-12-01 00:00:00-06:00
2 2015-12-01 00:00:00-06:00
df['datetime'] = df['datetime'].astype(str).str[:-6]
print df
datetime
0 2015-12-01 00:00:00
1 2015-12-01 00:00:00
2 2015-12-01 00:00:00
To remove timezone from all datetime columns in a DataFrame with mixed columns just use:
for col in df.select_dtypes(['datetimetz']).columns:
df[col] = df[col].dt.tz_localize(None)
if you can't save df to excel file just use this (not delete timezone!):
for col in df.select_dtypes(['datetimetz']).columns:
df[col] = df[col].dt.tz_convert(None)
Following Beatriz Fonseca's suggestion, I ended up doing the following:
from datetime import datetime
df['dates'].apply(lambda x:datetime.replace(x,tzinfo=None))
If it is always the last 6 characters that you want to ignore, you may simply slice your current string:
>>> '2015-12-01 00:00:00-06:00'[0:-6]
'2015-12-01 00:00:00'

Efficient way of converting String column to Date in Pandas (in Python), but without Timestamp

I am having a DataFrame which contains two String columns df['month'] and df['year']. I want to create a new column df['date'] by combining month and the year column. I have done that successfully using the structure below -
df['date']=pd.to_datetime((df['month']+df['year']),format='%m%Y')
where by for df['month'] = '08' and df['year']='1968'
we get df['date']=1968-08-01
This is exactly what I wanted.
Problem at hand: My DataFrame has more than 200,000 rows and I notice that sometimes, in addition, I also get Timestamp like the one below for a few rows and I want to avoid that -
1972-03-01 00:00:00
I solved this issue by using the .dt acessor, which can be used to manipulate the Series, whereby I explicitly extracted only the date using the code below-
df['date']=pd.to_datetime((df['month']+df['year']),format='%m%Y') #Line 1
df['date']=df['date']=.dt.date #Line 2
The problem was solved, just that the Line 2 took 5 times more time than Line 1.
Question: Is there any way where I could tweak Line 1 into giving just the dates and not the Timestamp? I am sure this simple problem cannot have such an inefficient solution. Can I solve this issue in a more time and resource efficient manner?
AFAIk we don't have date dtype n Pandas, we only have datetime, so we will always have a time part.
Even though Pandas shows: 1968-08-01, it has a time part: 00:00:00.
Demo:
In [32]: df = pd.DataFrame(pd.to_datetime(['1968-08-01', '2017-08-01']), columns=['Date'])
In [33]: df
Out[33]:
Date
0 1968-08-01
1 2017-08-01
In [34]: df['Date'].dt.time
Out[34]:
0 00:00:00
1 00:00:00
Name: Date, dtype: object
And if you want to have a string representation, there is a faster way:
df['date'] = df['year'].astype(str) + '-' + df['month'].astype(str) + '-01'
UPDATE: be aware that .dt.date will give you a string representation:
In [53]: df.dtypes
Out[53]:
Date datetime64[ns]
dtype: object
In [54]: df['new'] = df['Date'].dt.date
In [55]: df
Out[55]:
Date new
0 1968-08-01 1968-08-01
1 2017-08-01 2017-08-01
In [56]: df.dtypes
Out[56]:
Date datetime64[ns]
new object # <--- NOTE !!!
dtype: object

Resources