Hours, minutes and seconds are not showing in my timestamp after converting a datetime [duplicate]

Hours, minutes and seconds are not showing in my timestamp after converting a datetime [duplicate] - python-3.x

I have a column I_DATE of type string(object) in a dataframe called train as show below.
I_DATE
28-03-2012 2:15:00 PM
28-03-2012 2:17:28 PM
28-03-2012 2:50:50 PM
How to convert I_DATE from string to datetime format & specify the format of input string.
Also, how to filter rows based on a range of dates in pandas?

Use to_datetime. There is no need for a format string since the parser is able to handle it:
In [51]:
pd.to_datetime(df['I_DATE'])
Out[51]:
0 2012-03-28 14:15:00
1 2012-03-28 14:17:28
2 2012-03-28 14:50:50
Name: I_DATE, dtype: datetime64[ns]
To access the date/day/time component use the dt accessor:
In [54]:
df['I_DATE'].dt.date
Out[54]:
0 2012-03-28
1 2012-03-28
2 2012-03-28
dtype: object
In [56]:
df['I_DATE'].dt.time
Out[56]:
0 14:15:00
1 14:17:28
2 14:50:50
dtype: object
You can use strings to filter as an example:
In [59]:
df = pd.DataFrame({'date':pd.date_range(start = dt.datetime(2015,1,1), end = dt.datetime.now())})
df[(df['date'] > '2015-02-04') & (df['date'] < '2015-02-10')]
Out[59]:
date
35 2015-02-05
36 2015-02-06
37 2015-02-07
38 2015-02-08
39 2015-02-09

Approach: 1
Given original string format: 2019/03/04 00:08:48
you can use
updated_df = df['timestamp'].astype('datetime64[ns]')
The result will be in this datetime format: 2019-03-04 00:08:48
Approach: 2
updated_df = df.astype({'timestamp':'datetime64[ns]'})

For a datetime in AM/PM format, the time format is '%I:%M:%S %p'. See all possible format combinations at https://strftime.org/. N.B. If you have time component as in the OP, the conversion will be done much, much faster if you pass the format= (see here for more info).
df['I_DATE'] = pd.to_datetime(df['I_DATE'], format='%d-%m-%Y %I:%M:%S %p')
To filter a datetime using a range, you can use query:
df = pd.DataFrame({'date': pd.date_range('2015-01-01', '2015-04-01')})
df.query("'2015-02-04' < date < '2015-02-10'")
or use between to create a mask and filter.
df[df['date'].between('2015-02-04', '2015-02-10')]

Related

Determining correlation for datetime between two time series.ValueError: could not convert string to float: [duplicate]

I have a column I_DATE of type string(object) in a dataframe called train as show below.
I_DATE
28-03-2012 2:15:00 PM
28-03-2012 2:17:28 PM
28-03-2012 2:50:50 PM
How to convert I_DATE from string to datetime format & specify the format of input string.
Also, how to filter rows based on a range of dates in pandas?

Use to_datetime. There is no need for a format string since the parser is able to handle it:
In [51]:
pd.to_datetime(df['I_DATE'])
Out[51]:
0 2012-03-28 14:15:00
1 2012-03-28 14:17:28
2 2012-03-28 14:50:50
Name: I_DATE, dtype: datetime64[ns]
To access the date/day/time component use the dt accessor:
In [54]:
df['I_DATE'].dt.date
Out[54]:
0 2012-03-28
1 2012-03-28
2 2012-03-28
dtype: object
In [56]:
df['I_DATE'].dt.time
Out[56]:
0 14:15:00
1 14:17:28
2 14:50:50
dtype: object
You can use strings to filter as an example:
In [59]:
df = pd.DataFrame({'date':pd.date_range(start = dt.datetime(2015,1,1), end = dt.datetime.now())})
df[(df['date'] > '2015-02-04') & (df['date'] < '2015-02-10')]
Out[59]:
date
35 2015-02-05
36 2015-02-06
37 2015-02-07
38 2015-02-08
39 2015-02-09

Approach: 1
Given original string format: 2019/03/04 00:08:48
you can use
updated_df = df['timestamp'].astype('datetime64[ns]')
The result will be in this datetime format: 2019-03-04 00:08:48
Approach: 2
updated_df = df.astype({'timestamp':'datetime64[ns]'})

For a datetime in AM/PM format, the time format is '%I:%M:%S %p'. See all possible format combinations at https://strftime.org/. N.B. If you have time component as in the OP, the conversion will be done much, much faster if you pass the format= (see here for more info).
df['I_DATE'] = pd.to_datetime(df['I_DATE'], format='%d-%m-%Y %I:%M:%S %p')
To filter a datetime using a range, you can use query:
df = pd.DataFrame({'date': pd.date_range('2015-01-01', '2015-04-01')})
df.query("'2015-02-04' < date < '2015-02-10'")
or use between to create a mask and filter.
df[df['date'].between('2015-02-04', '2015-02-10')]

How to get the minimum time value in a dataframe with excluding specific value

I have a dataframe that has the format as below. I am looking to get the minimum time value for each column and save it in a list with excluding a specific time value with a format (00:00:00) to be a minimum value in any column in a dataframe.
df =
10.0.0.155 192.168.1.240 192.168.0.242
0 19:48:46 16:23:40 20:14:07
1 20:15:46 16:23:39 20:14:09
2 19:49:37 16:23:20 00:00:00
3 20:15:08 00:00:00 00:00:00
4 19:48:46 00:00:00 00:00:00
5 19:47:30 00:00:00 00:00:00
6 19:49:13 00:00:00 00:00:00
7 20:15:50 00:00:00 00:00:00
8 19:45:34 00:00:00 00:00:00
9 19:45:33 00:00:00 00:00:00
I tried to use the code below, but it doesn't work:
minValues = []
for column in df:
#print(df[column])
if "00:00:00" in df[column]:
minValues.append (df[column].nlargest(2).iloc[-1])
else:
minValues.append (df[column].min())
print (df)
print (minValues)

Idea is replace 0 to missing values and then get minimal timedeltas:
df1 = df.astype(str).apply(pd.to_timedelta)
s1 = df1.mask(df1.eq(pd.Timedelta(0))).min()
print (s1)
10.0.0.155 0 days 19:45:33
192.168.1.240 0 days 16:23:20
192.168.0.242 0 days 20:14:07
dtype: timedelta64[ns]
Or with get minimal datetimes and last convert output to HH:MM:SS values:
df1 = df.astype(str).apply(pd.to_datetime)
s2 = (df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.strftime('%H:%M:%S')
print (s2)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object
Or to times:
df1 = df.astype(str).apply(pd.to_datetime)
s3 = df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.time
print (s3)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object

How to read in unusual date\time format

I have a small df with a date\time column using a format I have never seen.
Pandas reads it in as an object even if I use parse_dates, and to_datetime() chokes on it.
The dates in the column are formatted as such:
2019/12/29 GMT+8 18:00
2019/12/15 GMT+8 05:00
I think the best approach is using a date parsing pattern. Something like this:
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)
But I simply do not know how to approach this format.

The datatime format for UTC is very specific for converting the offset.
strftime() and strptime() Format Codes
The format must be + or - and then 00:00
Use str.zfill to backfill the 0s between the sign and the integer
+08:00 or -08:00 or +10:00 or -10:00
import pandas as pd
# sample data
df = pd.DataFrame({'datetime': ['2019/12/29 GMT+8 18:00', '2019/12/15 GMT+8 05:00', '2019/12/15 GMT+10 05:00', '2019/12/15 GMT-10 05:00']})
# display(df)
datetime
2019/12/29 GMT+8 18:00
2019/12/15 GMT+8 05:00
2019/12/15 GMT+10 05:00
2019/12/15 GMT-10 05:00
# fix the format
df.datetime = df.datetime.str.split(' ').apply(lambda x: x[0] + x[2] + x[1][3:].zfill(3) + ':00')
# convert to a utc datetime
df.datetime = pd.to_datetime(df.datetime, format='%Y/%m/%d%H:%M%z', utc=True)
# display(df)
datetime
2019-12-29 10:00:00+00:00
2019-12-14 21:00:00+00:00
2019-12-14 19:00:00+00:00
2019-12-15 15:00:00+00:00
print(df.info())
[out]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 4 non-null datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1)
memory usage: 160.0 bytes

You could pass the custom format with GMT+8 in the middle and then subtract eight hours with timedelta(hours=8):
import pandas as pd
from datetime import datetime, timedelta
df['Date'] = pd.to_datetime(df['Date'], format='%Y/%m/%d GMT+8 %H:%M') - timedelta(hours=8)
df
Date
0 2019-12-29 10:00:00
1 2019-12-14 21:00:00

Multiple columns to datetime as an index without losing other column

I have a dataframe that looks like this (except much longer). I want to convert to a datetime index.
YYYY MM D value
679 1900 1 1 46.42
1355 1900 2 1 137.14
1213 1900 3 1 104.25
1380 1900 4 1 149.39
1336 1900 5 1 130.33
When I use this
df = pd.to_datetime((df.YYYY*10000+df.MM*100+df.D).apply(str),format='%Y%m%d')
I retrieve a datetime index but I lose the value column.
What I want in the end is -
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33
How can I do this?
Thank you for you time in advance!

You can use pandas to_datetime to convert this
df = df.astype(str)
df.index = pd.to_datetime(df['YYYY'] +' '+ df['MM']+' ' +df['D'])
df.drop(['YYYY','MM','D'],axis=1,inplace=True)
Out:
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33

Select the data from between two timestamp in python

My query is regrading getting the data, given two timestamp in python.
I need to have a input field, where i can enter the two timestamp, then from the CSV read, i need to retrieve for that particular input.
Actaul Data(CSV)
Daily_KWH_System PowerScout Temperature Timestamp Visibility Daily_electric_cost kW_System
0 4136.900384 P371602077 0 07/09/2016 23:58 0 180.657705 162.224216
1 3061.657187 P371602077 66 08/09/2016 23:59 10 133.693074 174.193804
2 4099.614033 P371602077 63 09/09/2016 05:58 10 179.029562 162.774013
3 3922.490275 P371602077 63 10/09/2016 11:58 10 171.297701 169.230047
4 3957.128982 P371602077 88 11/09/2016 17:58 10 172.806125 164.099307
Example:
Input:
start date : 2-1-2017
end date :10-1-2017
Output
Timestamp Value
2-1-2017 10
3-1-2017 35
.
.
.
.
10-1-2017 25
The original CSV would contain all the data
Timestamp Value
1-12-2016 10
2-12-2016 25
.
.
.
1-1-2017 15
2-1-2017 10
.
.
.
10-1-2017 25
.
.
31-1-2017 50

use pd.read_csv to read the file
df = pd.read_csv('my.csv', index_col='Timestamp', parse_dates=[0])
Then use your inputs to slice
df[start_date:end_date]

It seems you need dayfirst=True in read_csv with select by [] if all start and end dates are in df.index:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Timestamp;Value
1-12-2016;10
2-12-2016;25
1-1-2017;15
2-1-2017;10
10-1-2017;25
31-1-2017;50"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
#if necessary add sep
#index_col=[0] convert first column to index
#parse_dates=[0] parse first column to datetime
df = pd.read_csv(StringIO(temp), sep=";", index_col=[0], parse_dates=[0], dayfirst=True)
print (df)
Value
Timestamp
2016-12-01 10
2016-12-02 25
2017-01-01 15
2017-01-02 10
2017-01-10 25
2017-01-31 50
print (df.index.dtype)
datetime64[ns]
print (df.index)
DatetimeIndex(['2016-12-01', '2016-12-02', '2017-01-01', '2017-01-02',
'2017-01-10', '2017-01-31'],
dtype='datetime64[ns]', name='Timestamp', freq=None)
start_date = pd.to_datetime('2-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[start_date:end_date])
Value
Timestamp
2017-01-02 10
2017-01-10 25
If some dates are not in index you need boolean indexing:
start_date = pd.to_datetime('3-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[(df.index > start_date) & (df.index > end_date)])
Value
Timestamp
2017-01-31 50

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Hours, minutes and seconds are not showing in my timestamp after converting a datetime [duplicate] - python-3.x

Approach: 1 Given original string format: 2019/03/04 00:08:48 you can use updated_df = df['timestamp'].astype('datetime64[ns]') The result will be in this datetime format: 2019-03-04 00:08:48 Approach: 2 updated_df = df.astype({'timestamp':'datetime64[ns]'})

Related

Determining correlation for datetime between two time series.ValueError: could not convert string to float: [duplicate]

How to get the minimum time value in a dataframe with excluding specific value

How to read in unusual date\time format

Multiple columns to datetime as an index without losing other column

Select the data from between two timestamp in python

Categories

Resources