How to get a dictionary or set of data within a particular time frame?

How to get a dictionary or set of data within a particular time frame? - python-3.x

My datafile contains datetimeindex - which is date and time in format - 1900-01-01 07:35:23.253.
I have one million records where every minute , multiple data points are collected .
datafile =
TIme---------------------------- datapoint1-----------datapoint2
1900-01-01 07:35:23.253---- A --------------------B
1900-01-01 07:35:23.253 -----B----------------------BH
1900-01-01 08:35:23.253------V---------------------gh
1900-01-01 09:35:23.253--------u--------------------90
1900-01-01 09:36:23.253--------i----------------------op
1900-01-01 10:36:23.253---------y---------------------op
1900-01-01 10:46:23.253--------ir---------------------op
So My output should be , I want to get the all the number of rows within one hour interval time period like below
07:00:00--08:00:00 --- 2
08:00:00-09:00:00 - 1
09:00:00=10:00:00 - 2
10:00:00-11:00:00 -1

You can use pd.Grouper with freq='1H' and then use strftime to play around with the format you want as well as pd.DateOffset(hours=1) to add one hour to create a range (note: it is a string):
df['TIme'] = pd.to_datetime(df['TIme'])
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' +
((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df
Out[1]:
TIme datapoint1
0 1900-01-01 07:00:00-08:00:00 2
1 1900-01-01 08:00:00-09:00:00 1
2 1900-01-01 09:00:00-10:00:00 2
3 1900-01-01 10:00:00-11:00:00 2
If TIme is on the index, then you can first df = df.reset_index() before running code and then use df = df.set_index('TIme') after running code:
# df['TIme'] = pd.to_datetime(df['TIme'])
# df = df.set_index('TIme')
df = df.reset_index()
df = df.groupby(pd.Grouper(freq='1H', key='TIme'))['datapoint1'].count().reset_index()
df['TIme'] = (df['TIme'].astype(str) + '-' +
((df['TIme'] + pd.DateOffset(hours=1)).dt.strftime('%H:%M:%S')).astype(str))
df = df.set_index('TIme')
df

Related

How to convert the the column Time nanoseconds with object data type to datetime?

I have the below dataset in "object datatype" . I want to change the datatype to datetime.
0 00:00:00.000:
1 00:00:00.000:
2 00:00:00.000:
3 00:00:00.000:
4 00:00:00.000:
...
4943983 16:11:21.000:
4943984 16:11:24.000:
4943986 16:11:39.000:
4943987 16:11:51.000:
Name: Time, Length: 4943988, dtype: object

I tried the below command . but It replaced all the values with nan.
timefmt = "%H:%M:%S"
dadr['Time'] = pd.to_datetime(dadr['Time'],
errors='coerce').dt.strftime(timefmt)
Output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
4943983 NaN
4943984 NaN
4943985 NaN
4943986 NaN
4943987 NaN
Name: Time, Length: 4943988, dtype: float64
I would like to add that , there are timefields with non zero values in Seconds place . such as time data '07:05:15.026:' does not match format '%H:%M:%S.000:' (match)

You can try to put the timefmt to format= parameter in pd.to_datetime:
timefmt = "%H:%M:%S.000:"
df['Time'] = pd.to_datetime(df['Time'], format=timefmt)
print(df)
Prints:
idx Time
0 4943983 1900-01-01 16:11:21
1 4943984 1900-01-01 16:11:24
2 4943985 1900-01-01 16:11:38
3 4943986 1900-01-01 16:11:39
4 4943987 1900-01-01 16:11:51
EDIT: To parse the second fraction after ., you can use %f:
timefmt = "%H:%M:%S.%f:"
df['Time'] = pd.to_datetime(df['Time'], format=timefmt)
print(df)
Prints:
idx Time
0 4943983 1900-01-01 16:11:21.100
1 4943984 1900-01-01 16:11:24.200
2 4943985 1900-01-01 16:11:38.300
3 4943986 1900-01-01 16:11:39.400
4 4943987 1900-01-01 16:11:51.500

You can try this:
# build your dataset
times = [
'00:00:00.000:',
'16:11:21.000:',
'16:11:24.000:',
'16:11:38.000:',
'16:11:39.000:',
'16:11:51.000:'
]
# remove the last colon in your timestamps
times = [x.replace('000:','000') for x in times]
# create your dataset
df = pd.DataFrame(pd.Series(times))
# perform your request
df['time'] = df[0].apply(lambda x: pd.Timestamp(x).strftime('%H:%M:%S.%f'))
Then you have:

Format datetime values in pandas by stripping

I have a df['timestamp'] column which has values in format: yyyy-mm-ddThh:mm:ssZ. The dtype is object.
Now, I want to split the value into 3 new columns, 1 for day, 1 for day index(mon,tues,wed,..) and 1 for hour like this:
Current:column=timestamp
yyyy-mm-ddThh:mm:ssZ
Desried:
New Col1|New Col2|New Col3
dd|hh|day_index
What function should I use?

Since you said column timestamp is of type object, I assume it's string. Since the format is fixed, use str.slice to get corresponding chars. To get the week days, use dt.day_name() on datetime64, which is converted from timestamp.
data = {'timestamp': ['2019-07-01T05:23:33Z', '2019-07-03T02:12:33Z', '2019-07-23T11:05:23Z', '2019-07-12T08:15:51Z'], 'Val': [1.24,1.259, 1.27,1.298] }
df = pd.DataFrame(data)
ds = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['datetime'] = ds
df['dd'] = df['timestamp'].str.slice(start=8, stop=10)
df['hh'] = df['timestamp'].str.slice(start=11, stop=13)
df['weekday'] = df['datetime'].dt.day_name()
print(df)
The output:
timestamp Val datetime dd hh weekday
0 2019-07-01T05:23:33Z 1.240 2019-07-01 05:23:33+00:00 01 05 Monday
1 2019-07-03T02:12:33Z 1.259 2019-07-03 02:12:33+00:00 03 02 Wednesday
2 2019-07-23T11:05:23Z 1.270 2019-07-23 11:05:23+00:00 23 11 Tuesday
3 2019-07-12T08:15:51Z 1.298 2019-07-12 08:15:51+00:00 12 08 Friday

First convert the df['timestamp'] column to a DateTime object. Then extract Year, Month & Day from it. Code below.
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y-%m-%d', errors='coerce')
df['Year'] = df['timestamp'].dt.year
df['Month'] = df['timestamp'].dt.month
df['Day'] = df['timestamp'].dt.day

Multiple columns to datetime as an index without losing other column

I have a dataframe that looks like this (except much longer). I want to convert to a datetime index.
YYYY MM D value
679 1900 1 1 46.42
1355 1900 2 1 137.14
1213 1900 3 1 104.25
1380 1900 4 1 149.39
1336 1900 5 1 130.33
When I use this
df = pd.to_datetime((df.YYYY*10000+df.MM*100+df.D).apply(str),format='%Y%m%d')
I retrieve a datetime index but I lose the value column.
What I want in the end is -
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33
How can I do this?
Thank you for you time in advance!

You can use pandas to_datetime to convert this
df = df.astype(str)
df.index = pd.to_datetime(df['YYYY'] +' '+ df['MM']+' ' +df['D'])
df.drop(['YYYY','MM','D'],axis=1,inplace=True)
Out:
value
1900-01-01 46.42
1900-02-01 137.14
1900-03-01 104.25
1900-04-01 149.39
1900-05-01 130.33

Change all values of a column in pandas data frame

I have a panda data frame that contains values of a column like '01:00'. I want to deduct 1 from it means '01:00' will be '00:00'. Can anyone helps

You can use timedeltas:
df = pd.DataFrame({'col':['01:00', '02:00', '24:00']})
df['new'] = pd.to_timedelta(df['col'] + ':00') - pd.Timedelta(1, unit='h')
df['new'] = df['new'].astype(str).str[-18:-13]
print (df)
col sub
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
Another faster solution by map if format of all strings is 01:00 to 24:00:
L = ['{:02d}:00'.format(x) for x in range(25)]
d = dict(zip(L[1:], L[:-1]))
df['new'] = df['col'].map(d)
print (df)
col new
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00

It seems that what you want to do is subtract an hour from the time which is stored in your dataframe as a string. You can do the following:
from datetime import datetime, timedelta
subtract_one_hour = lambda x: (datetime.strptime(x, '%H:%M') - timedelta(hours=1)).strftime("%H:%M")
df['minus_one_hour'] = df['original_time'].apply(subtract_one_hour)

Python Subtracting two columns with date data, from csv to get number of weeks , months?

I have a csv in which I have two columns representing start date: st_dt and end date: 'end_dt` , I have to subtract these columns to get the number of weeks. I tried iterating through columns using pandas, but it seems my output is wrong.
st_dt end_dt
---------------------------------------
20100315 20100431

Use read_csv with parse_dates for datetimes and then after substract days:
df = pd.read_csv(file, parse_dates=[0,1])
print (df)
st_dt end_dt
0 2010-03-15 2010-04-30
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 2010-04-30 46
If some dates are wrong like 20100431 use to_datetime with parameter errors='coerce' for convert them to NaT:
df = pd.read_csv(file)
print (df)
st_dt end_dt
0 20100315 20100431
1 20100315 20100430
df['st_dt'] = pd.to_datetime(df['st_dt'], errors='coerce', format='%Y%m%d')
df['end_dt'] = pd.to_datetime(df['end_dt'], errors='coerce', format='%Y%m%d')
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 NaT NaN
1 2010-03-15 2010-04-30 46.0

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get a dictionary or set of data within a particular time frame? - python-3.x

Related

How to convert the the column Time nanoseconds with object data type to datetime?

Format datetime values in pandas by stripping

Multiple columns to datetime as an index without losing other column

Change all values of a column in pandas data frame

Python Subtracting two columns with date data, from csv to get number of weeks , months?

Categories

Resources