How to get a dictionary or set of data within a particular time frame? - python-3.x

My datafile contains datetimeindex - which is date and time in format - 1900-01-01 07:35:23.253.
I have one million records where every minute , multiple data points are collected .
datafile =
TIme---------------------------- datapoint1-----------datapoint2
1900-01-01 07:35:23.253---- A --------------------B
1900-01-01 07:35:23.253 -----B----------------------BH
1900-01-01 08:35:23.253------V---------------------gh
1900-01-01 09:35:23.253--------u--------------------90
1900-01-01 09:36:23.253--------i----------------------op
1900-01-01 10:36:23.253---------y---------------------op
1900-01-01 10:46:23.253--------ir---------------------op
So My output should be , I want to get the all the number of rows within one hour interval time period like below
07:00:00--08:00:00 --- 2
08:00:00-09:00:00 - 1
09:00:00=10:00:00 - 2
10:00:00-11:00:00 -1

You can use pd.Grouper with freq='1H' and then use strftime to play around with the format you want as well as pd.DateOffset(hours=1) to add one hour to create a range (note: it is a string):
df['TIme'] = pd.to_datetime(df['TIme'])
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' +
((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df
Out[1]:
TIme datapoint1
0 1900-01-01 07:00:00-08:00:00 2
1 1900-01-01 08:00:00-09:00:00 1
2 1900-01-01 09:00:00-10:00:00 2
3 1900-01-01 10:00:00-11:00:00 2
If TIme is on the index, then you can first df = df.reset_index() before running code and then use df = df.set_index('TIme') after running code:
# df['TIme'] = pd.to_datetime(df['TIme'])
# df = df.set_index('TIme')
df = df.reset_index()
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' +
((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df = df.set_index('TIme')
df

Related

How to convert the the column Time nanoseconds with object data type to datetime?

I have the below dataset in "object datatype" . I want to change the datatype to datetime.
0 00:00:00.000:
1 00:00:00.000:
2 00:00:00.000:
3 00:00:00.000:
4 00:00:00.000:
...
4943983 16:11:21.000:
4943984 16:11:24.000:
4943986 16:11:39.000:
4943987 16:11:51.000:
Name: Time, Length: 4943988, dtype: object
​
I tried the below command . but It replaced all the values with nan.
timefmt = "%H:%M:%S"
dadr['Time'] = pd.to_datetime(dadr['Time'],
errors='coerce').dt.strftime(timefmt)
Output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
4943983 NaN
4943984 NaN
4943985 NaN
4943986 NaN
4943987 NaN
Name: Time, Length: 4943988, dtype: float64
I would like to add that , there are timefields with non zero values in Seconds place . such as time data '07:05:15.026:' does not match format '%H:%M:%S.000:' (match)
You can try to put the timefmt to format= parameter in pd.to_datetime:
timefmt = "%H:%M:%S.000:"
df['Time'] = pd.to_datetime(df['Time'], format=timefmt)
print(df)
Prints:
idx Time
0 4943983 1900-01-01 16:11:21
1 4943984 1900-01-01 16:11:24
2 4943985 1900-01-01 16:11:38
3 4943986 1900-01-01 16:11:39
4 4943987 1900-01-01 16:11:51
EDIT: To parse the second fraction after ., you can use %f:
timefmt = "%H:%M:%S.%f:"
df['Time'] = pd.to_datetime(df['Time'], format=timefmt)
print(df)
Prints:
idx Time
0 4943983 1900-01-01 16:11:21.100
1 4943984 1900-01-01 16:11:24.200
2 4943985 1900-01-01 16:11:38.300
3 4943986 1900-01-01 16:11:39.400
4 4943987 1900-01-01 16:11:51.500
You can try this:
# build your dataset
times = [
'00:00:00.000:',
'16:11:21.000:',
'16:11:24.000:',
'16:11:38.000:',
'16:11:39.000:',
'16:11:51.000:'
]
# remove the last colon in your timestamps
times = [x.replace('000:','000') for x in times]
# create your dataset
df = pd.DataFrame(pd.Series(times))
# perform your request
df['time'] = df[0].apply(lambda x: pd.Timestamp(x).strftime('%H:%M:%S.%f'))
Then you have:

Format datetime values in pandas by stripping

I have a df['timestamp'] column which has values in format: yyyy-mm-ddThh:mm:ssZ. The dtype is object.
Now, I want to split the value into 3 new columns, 1 for day, 1 for day index(mon,tues,wed,..) and 1 for hour like this:
Current:column=timestamp
yyyy-mm-ddThh:mm:ssZ
Desried:
New Col1|New Col2|New Col3
dd|hh|day_index
What function should I use?
Since you said column timestamp is of type object, I assume it's string. Since the format is fixed, use str.slice to get corresponding chars. To get the week days, use dt.day_name() on datetime64, which is converted from timestamp.
data = {'timestamp': ['2019-07-01T05:23:33Z', '2019-07-03T02:12:33Z', '2019-07-23T11:05:23Z', '2019-07-12T08:15:51Z'], 'Val': [1.24,1.259, 1.27,1.298] }
df = pd.DataFrame(data)
ds = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['datetime'] = ds
df['dd'] = df['timestamp'].str.slice(start=8, stop=10)
df['hh'] = df['timestamp'].str.slice(start=11, stop=13)
df['weekday'] = df['datetime'].dt.day_name()
print(df)
The output:
timestamp Val datetime dd hh weekday
0 2019-07-01T05:23:33Z 1.240 2019-07-01 05:23:33+00:00 01 05 Monday
1 2019-07-03T02:12:33Z 1.259 2019-07-03 02:12:33+00:00 03 02 Wednesday
2 2019-07-23T11:05:23Z 1.270 2019-07-23 11:05:23+00:00 23 11 Tuesday
3 2019-07-12T08:15:51Z 1.298 2019-07-12 08:15:51+00:00 12 08 Friday
First convert the df['timestamp'] column to a DateTime object. Then extract Year, Month & Day from it. Code below.
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['Year'] = df['timestamp'].dt.year
df['Month'] = df['timestamp'].dt.month
df['Day'] = df['timestamp'].dt.day

Multiple columns to datetime as an index without losing other column

I have a dataframe that looks like this (except much longer). I want to convert to a datetime index.
YYYY MM D value
679 1900 1 1 46.42
1355 1900 2 1 137.14
1213 1900 3 1 104.25
1380 1900 4 1 149.39
1336 1900 5 1 130.33
When I use this
df = pd.to_datetime((df.YYYY*10000+df.MM*100+df.D).apply(str),format='%Y%m%d')
I retrieve a datetime index but I lose the value column.
What I want in the end is -
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33
How can I do this?
Thank you for you time in advance!
You can use pandas to_datetime to convert this
df = df.astype(str)
df.index = pd.to_datetime(df['YYYY'] +' '+ df['MM']+' ' +df['D'])
df.drop(['YYYY','MM','D'],axis=1,inplace=True)
Out:
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33

Change all values of a column in pandas data frame

I have a panda data frame that contains values of a column like '01:00'. I want to deduct 1 from it means '01:00' will be '00:00'. Can anyone helps
You can use timedeltas:
df = pd.DataFrame({'col':['01:00', '02:00', '24:00']})
df['new'] = pd.to_timedelta(df['col'] + ':00') - pd.Timedelta(1, unit='h')
df['new'] = df['new'].astype(str).str[-18:-13]
print (df)
col sub
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
Another faster solution by map if format of all strings is 01:00 to 24:00:
L = ['{:02d}:00'.format(x) for x in range(25)]
d = dict(zip(L[1:], L[:-1]))
df['new'] = df['col'].map(d)
print (df)
col new
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
It seems that what you want to do is subtract an hour from the time which is stored in your dataframe as a string. You can do the following:
from datetime import datetime, timedelta
subtract_one_hour = lambda x: (datetime.strptime(x, '%H:%M') - timedelta(hours=1)).strftime("%H:%M")
df['minus_one_hour'] = df['original_time'].apply(subtract_one_hour)

Python Subtracting two columns with date data, from csv to get number of weeks , months?

I have a csv in which I have two columns representing start date: st_dt and end date: 'end_dt` , I have to subtract these columns to get the number of weeks. I tried iterating through columns using pandas, but it seems my output is wrong.
st_dt end_dt
---------------------------------------
20100315 20100431
Use read_csv with parse_dates for datetimes and then after substract days:
df = pd.read_csv(file, parse_dates=[0,1])
print (df)
st_dt end_dt
0 2010-03-15 2010-04-30
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 2010-04-30 46
If some dates are wrong like 20100431 use to_datetime with parameter errors='coerce' for convert them to NaT:
df = pd.read_csv(file)
print (df)
st_dt end_dt
0 20100315 20100431
1 20100315 20100430
df['st_dt'] = pd.to_datetime(df['st_dt'], errors='coerce', format='%Y%m%d')
df['end_dt'] = pd.to_datetime(df['end_dt'], errors='coerce', format='%Y%m%d')
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 NaT NaN
1 2010-03-15 2010-04-30 46.0

Resources