Hours, minutes and seconds are not showing in my timestamp after converting a datetime [duplicate] - python-3.x

I have a column I_DATE of type string(object) in a dataframe called train as show below.
I_DATE
28-03-2012 2:15:00 PM
28-03-2012 2:17:28 PM
28-03-2012 2:50:50 PM
How to convert I_DATE from string to datetime format & specify the format of input string.
Also, how to filter rows based on a range of dates in pandas?

Use to_datetime. There is no need for a format string since the parser is able to handle it:
In [51]:
pd.to_datetime(df['I_DATE'])
Out[51]:
0 2012-03-28 14:15:00
1 2012-03-28 14:17:28
2 2012-03-28 14:50:50
Name: I_DATE, dtype: datetime64[ns]
To access the date/day/time component use the dt accessor:
In [54]:
df['I_DATE'].dt.date
Out[54]:
0 2012-03-28
1 2012-03-28
2 2012-03-28
dtype: object
In [56]:
df['I_DATE'].dt.time
Out[56]:
0 14:15:00
1 14:17:28
2 14:50:50
dtype: object
You can use strings to filter as an example:
In [59]:
df = pd.DataFrame({'date':pd.date_range(start = dt.datetime(2015,1,1), end = dt.datetime.now())})
df[(df['date'] > '2015-02-04') & (df['date'] < '2015-02-10')]
Out[59]:
date
35 2015-02-05
36 2015-02-06
37 2015-02-07
38 2015-02-08
39 2015-02-09

Approach: 1
Given original string format: 2019/03/04 00:08:48
you can use
updated_df = df['timestamp'].astype('datetime64[ns]')
The result will be in this datetime format: 2019-03-04 00:08:48
Approach: 2
updated_df = df.astype({'timestamp':'datetime64[ns]'})

For a datetime in AM/PM format, the time format is '%I:%M:%S %p'. See all possible format combinations at https://strftime.org/. N.B. If you have time component as in the OP, the conversion will be done much, much faster if you pass the format= (see here for more info).
df['I_DATE'] = pd.to_datetime(df['I_DATE'], format='%d-%m-%Y %I:%M:%S %p')
To filter a datetime using a range, you can use query:
df = pd.DataFrame({'date': pd.date_range('2015-01-01', '2015-04-01')})
df.query("'2015-02-04' < date < '2015-02-10'")
or use between to create a mask and filter.
df[df['date'].between('2015-02-04', '2015-02-10')]

Related

Determining correlation for datetime between two time series.ValueError: could not convert string to float: [duplicate]

I have a column I_DATE of type string(object) in a dataframe called train as show below.
I_DATE
28-03-2012 2:15:00 PM
28-03-2012 2:17:28 PM
28-03-2012 2:50:50 PM
How to convert I_DATE from string to datetime format & specify the format of input string.
Also, how to filter rows based on a range of dates in pandas?
Use to_datetime. There is no need for a format string since the parser is able to handle it:
In [51]:
pd.to_datetime(df['I_DATE'])
Out[51]:
0 2012-03-28 14:15:00
1 2012-03-28 14:17:28
2 2012-03-28 14:50:50
Name: I_DATE, dtype: datetime64[ns]
To access the date/day/time component use the dt accessor:
In [54]:
df['I_DATE'].dt.date
Out[54]:
0 2012-03-28
1 2012-03-28
2 2012-03-28
dtype: object
In [56]:
df['I_DATE'].dt.time
Out[56]:
0 14:15:00
1 14:17:28
2 14:50:50
dtype: object
You can use strings to filter as an example:
In [59]:
df = pd.DataFrame({'date':pd.date_range(start = dt.datetime(2015,1,1), end = dt.datetime.now())})
df[(df['date'] > '2015-02-04') & (df['date'] < '2015-02-10')]
Out[59]:
date
35 2015-02-05
36 2015-02-06
37 2015-02-07
38 2015-02-08
39 2015-02-09
Approach: 1
Given original string format: 2019/03/04 00:08:48
you can use
updated_df = df['timestamp'].astype('datetime64[ns]')
The result will be in this datetime format: 2019-03-04 00:08:48
Approach: 2
updated_df = df.astype({'timestamp':'datetime64[ns]'})
For a datetime in AM/PM format, the time format is '%I:%M:%S %p'. See all possible format combinations at https://strftime.org/. N.B. If you have time component as in the OP, the conversion will be done much, much faster if you pass the format= (see here for more info).
df['I_DATE'] = pd.to_datetime(df['I_DATE'], format='%d-%m-%Y %I:%M:%S %p')
To filter a datetime using a range, you can use query:
df = pd.DataFrame({'date': pd.date_range('2015-01-01', '2015-04-01')})
df.query("'2015-02-04' < date < '2015-02-10'")
or use between to create a mask and filter.
df[df['date'].between('2015-02-04', '2015-02-10')]

How to get the minimum time value in a dataframe with excluding specific value

I have a dataframe that has the format as below. I am looking to get the minimum time value for each column and save it in a list with excluding a specific time value with a format (00:00:00) to be a minimum value in any column in a dataframe.
df =
10.0.0.155 192.168.1.240 192.168.0.242
0 19:48:46 16:23:40 20:14:07
1 20:15:46 16:23:39 20:14:09
2 19:49:37 16:23:20 00:00:00
3 20:15:08 00:00:00 00:00:00
4 19:48:46 00:00:00 00:00:00
5 19:47:30 00:00:00 00:00:00
6 19:49:13 00:00:00 00:00:00
7 20:15:50 00:00:00 00:00:00
8 19:45:34 00:00:00 00:00:00
9 19:45:33 00:00:00 00:00:00
I tried to use the code below, but it doesn't work:
minValues = []
for column in df:
#print(df[column])
if "00:00:00" in df[column]:
minValues.append (df[column].nlargest(2).iloc[-1])
else:
minValues.append (df[column].min())
print (df)
print (minValues)
Idea is replace 0 to missing values and then get minimal timedeltas:
df1 = df.astype(str).apply(pd.to_timedelta)
s1 = df1.mask(df1.eq(pd.Timedelta(0))).min()
print (s1)
10.0.0.155 0 days 19:45:33
192.168.1.240 0 days 16:23:20
192.168.0.242 0 days 20:14:07
dtype: timedelta64[ns]
Or with get minimal datetimes and last convert output to HH:MM:SS values:
df1 = df.astype(str).apply(pd.to_datetime)
s2 = (df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.strftime('%H:%M:%S')
print (s2)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object
Or to times:
df1 = df.astype(str).apply(pd.to_datetime)
s3 = df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.time
print (s3)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object

How to read in unusual date\time format

I have a small df with a date\time column using a format I have never seen.
Pandas reads it in as an object even if I use parse_dates, and to_datetime() chokes on it.
The dates in the column are formatted as such:
2019/12/29 GMT+8 18:00
2019/12/15 GMT+8 05:00
I think the best approach is using a date parsing pattern. Something like this:
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)
But I simply do not know how to approach this format.
The datatime format for UTC is very specific for converting the offset.
strftime() and strptime() Format Codes
The format must be + or - and then 00:00
Use str.zfill to backfill the 0s between the sign and the integer
+08:00 or -08:00 or +10:00 or -10:00
import pandas as pd
# sample data
df = pd.DataFrame({'datetime': ['2019/12/29 GMT+8 18:00', '2019/12/15 GMT+8 05:00', '2019/12/15 GMT+10 05:00', '2019/12/15 GMT-10 05:00']})
# display(df)
datetime
2019/12/29 GMT+8 18:00
2019/12/15 GMT+8 05:00
2019/12/15 GMT+10 05:00
2019/12/15 GMT-10 05:00
# fix the format
df.datetime = df.datetime.str.split(' ').apply(lambda x: x[0] + x[2] + x[1][3:].zfill(3) + ':00')
# convert to a utc datetime
df.datetime = pd.to_datetime(df.datetime, format='%Y/%m/%d%H:%M%z', utc=True)
# display(df)
datetime
2019-12-29 10:00:00+00:00
2019-12-14 21:00:00+00:00
2019-12-14 19:00:00+00:00
2019-12-15 15:00:00+00:00
print(df.info())
[out]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 4 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1)
memory usage: 160.0 bytes
You could pass the custom format with GMT+8 in the middle and then subtract eight hours with timedelta(hours=8):
import pandas as pd
from datetime import datetime, timedelta
df['Date'] = pd.to_datetime(df['Date'], format='%Y/%m/%d GMT+8 %H:%M') - timedelta(hours=8)
df
Date
0 2019-12-29 10:00:00
1 2019-12-14 21:00:00

Multiple columns to datetime as an index without losing other column

I have a dataframe that looks like this (except much longer). I want to convert to a datetime index.
YYYY MM D value
679 1900 1 1 46.42
1355 1900 2 1 137.14
1213 1900 3 1 104.25
1380 1900 4 1 149.39
1336 1900 5 1 130.33
When I use this
df = pd.to_datetime((df.YYYY*10000+df.MM*100+df.D).apply(str),format='%Y%m%d')
I retrieve a datetime index but I lose the value column.
What I want in the end is -
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33
How can I do this?
Thank you for you time in advance!
You can use pandas to_datetime to convert this
df = df.astype(str)
df.index = pd.to_datetime(df['YYYY'] +' '+ df['MM']+' ' +df['D'])
df.drop(['YYYY','MM','D'],axis=1,inplace=True)
Out:
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33

Select the data from between two timestamp in python

My query is regrading getting the data, given two timestamp in python.
I need to have a input field, where i can enter the two timestamp, then from the CSV read, i need to retrieve for that particular input.
Actaul Data(CSV)
Daily_KWH_System PowerScout Temperature Timestamp Visibility Daily_electric_cost kW_System
0 4136.900384 P371602077 0 07/09/2016 23:58 0 180.657705 162.224216
1 3061.657187 P371602077 66 08/09/2016 23:59 10 133.693074 174.193804
2 4099.614033 P371602077 63 09/09/2016 05:58 10 179.029562 162.774013
3 3922.490275 P371602077 63 10/09/2016 11:58 10 171.297701 169.230047
4 3957.128982 P371602077 88 11/09/2016 17:58 10 172.806125 164.099307
Example:
Input:
start date : 2-1-2017
end date :10-1-2017
Output
Timestamp Value
2-1-2017 10
3-1-2017 35
.
.
.
.
10-1-2017 25
The original CSV would contain all the data
Timestamp Value
1-12-2016 10
2-12-2016 25
.
.
.
1-1-2017 15
2-1-2017 10
.
.
.
10-1-2017 25
.
.
31-1-2017 50
use pd.read_csv to read the file
df = pd.read_csv('my.csv', index_col='Timestamp', parse_dates=[0])
Then use your inputs to slice
df[start_date:end_date]
It seems you need dayfirst=True in read_csv with select by [] if all start and end dates are in df.index:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Timestamp;Value
1-12-2016;10
2-12-2016;25
1-1-2017;15
2-1-2017;10
10-1-2017;25
31-1-2017;50"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
#if necessary add sep
#index_col=[0] convert first column to index
#parse_dates=[0] parse first column to datetime
df = pd.read_csv(StringIO(temp), sep=";", index_col=[0], parse_dates=[0], dayfirst=True)
print (df)
Value
Timestamp
2016-12-01 10
2016-12-02 25
2017-01-01 15
2017-01-02 10
2017-01-10 25
2017-01-31 50
print (df.index.dtype)
datetime64[ns]
print (df.index)
DatetimeIndex(['2016-12-01', '2016-12-02', '2017-01-01', '2017-01-02',
'2017-01-10', '2017-01-31'],
dtype='datetime64[ns]', name='Timestamp', freq=None)
start_date = pd.to_datetime('2-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[start_date:end_date])
Value
Timestamp
2017-01-02 10
2017-01-10 25
If some dates are not in index you need boolean indexing:
start_date = pd.to_datetime('3-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[(df.index > start_date) & (df.index > end_date)])
Value
Timestamp
2017-01-31 50

Resources