How to join Minute based time-range with Date using Pandas? - python-3.x

My dataset df looks like this:
DateTimeVal Open
2017-01-01 17:00:00 5.1532
2017-01-01 17:01:00 5.3522
2017-01-01 17:02:00 5.4535
2017-01-01 17:03:00 5.3567
2017-01-01 17:04:00 5.1512
....
It is a Minute based data
The Time value starts from 17:00:00 however I want to only change the Time value to start from 00:00:00 as a Minute based data and up to 23:59:00
The current Time starts at 17:00:00 and increments per Minute and ends on 16:59:00. The total row value is 1440 so I can confirm that it is a Minute based 24 Hour data
My new df should looks like this:
DateTimeVal Open
2017-01-01 00:00:00 5.1532
2017-01-01 00:01:00 5.3522
2017-01-01 00:02:00 5.4535
2017-01-01 00:03:00 5.3567
2017-01-01 00:04:00 5.1512
....
Here, we did not change anything except the Time part.
What did I do?
My logic was to remove the Time and then populate with new Time
Here is what I did:
pd.DatetimeIndex(df['DateTimeVal'].astype(str).str.rsplit(' ', 1).str[0], dayfirst=True)
But I do not know how to add the new Time data. Could you please help?

How about subtracting 17 hours from your DateTimeVal:
df['DateTimeVal'] -= pd.Timedelta(hours=17)

Related

Dividing two dataframes gives NaN

I have two dataframes, one with a metric as of the last day of the month. The other contains a metric summed for the whole month. The former (monthly_profit) looks like this:
profit
yyyy_mm_dd
2018-01-01 8797234233.0
2018-02-01 3464234233.0
2018-03-01 5676234233.0
...
2019-10-01 4368234233.0
While the latter (monthly_employees) looks like this:
employees
yyyy_mm_dd
2018-01-31 924358
2018-02-28 974652
2018-03-31 146975
...
2019-10-31 255589
I want to get profit per employee, so I've done this:
profit_per_employee = (monthly_profit['profit']/monthly_employees['employees'])*100
This is the output that I get:
yyyy_mm_dd
2018-01-01 NaN
2018-01-31 NaN
2018-02-01 NaN
2018-02-28 NaN
How could I fix this? The reason that one dataframe is the last day of the month and the other is the first day of the month is due to rolling vs non-rolling data.
monthly_profit is the result of grouping and summing daily profit data:
monthly_profit = df.groupby(['yyyy_mm_dd'])[['proft']].sum()
monthly_profit = monthly_profit.resample('MS').sum()
While monthly_employees is a running total, so I need to take the current value for the last day of each month:
monthly_employees = df.groupby(['yyyy_mm_dd'])[['employees']].sum()
monthly_employees = monthly_employees.groupby([monthly_employees.index.year, monthly_employees.index.month]).tail(1)
Change MS to M for end of months for match both DatatimeIndex:
monthly_profit = monthly_profit.resample('M').sum()

Copy a single row value and apply it as a column using Pandas

My dataset looks like this:
# df1 - minute based dataset
date Open
2018-01-01 00:00:00 1.0536
2018-01-01 00:01:00 1.0527
2018-01-01 00:02:00 1.0558
2018-01-01 00:03:00 1.0534
2018-01-01 00:04:00 1.0524
...
What I want to do is get the value at 05:00:00 daily and create a new column called, OpenVal_5AM and put that corresponding value on that column. The new df will look like this:
# df2 - minute based dataset with 05:00:00 Open value
date Open OpenVal_5AM
2018-01-01 00:00:00 1.0536 1.0133
2018-01-01 00:01:00 1.0527 1.0133
2018-01-01 00:02:00 1.0558 1.0133
2018-01-01 00:03:00 1.0534 1.0133
2018-01-01 00:04:00 1.0524 1.0133
...
Since this is a minute based data, we will have 1440 same data point in the new column OpenVal_5AM for each day. It is because we are just grabbing the value at a point in time in a day and creating a new column.
What did I do?
I used this step:
df['OpenVal_5AM'] = df.groupby(df.date.dt.date,sort=False).Open.dt.hour.between(5, 5)
That's the closest I could come but it does not work.
Here's my suggestion:
df['OpenVal_5AM'] = df.apply(lambda r: df.Open.loc[r.name.replace(minute=5)], axis=1)
Disclaimer: I didn't test it with a huge dataset; so I don't know how it'll perform in that situation.

How to generate Fixed Minute based DateTime using Pandas

I need help generating Minute based time-range for a pre-defined Date.
The Date range values will change so I should be able to update it.
I also want to exclude Friday and Saturday from the generated data.
What did I do?
I successfully generated the date-range by doing this:
pd.date_range(start='1/1/2017', end='8/06/2019', freq='T')
But how do I add Minute data and exclude Friday and Saturday?
Once this is done, I want to create a column name called 'TIME_MIN' and assign it to a df
Could you please help?
You can exclude Friday and Saturday using:
df = pd.DataFrame({
'time': pd.date_range(start='1/1/2017', end='8/06/2019', freq='T')
})
df.loc[~df['time'].dt.weekday_name.isin(['Friday', 'Saturday'])]
Output:
time
0 2017-01-01 00:00:00
1 2017-01-01 00:01:00
2 2017-01-01 00:02:00
3 2017-01-01 00:03:00
4 2017-01-01 00:04:00
5 2017-01-01 00:05:00
6 2017-01-01 00:06:00
7 2017-01-01 00:07:00
...

Groupby expanding count - elements changing of group at different time stamps

I have a HUGHE DataFrame that looks as follows (this is just an example to illustrate the problem):
id timestamp target_time interval
1 08:00:00 10:20:00 (10-11]
1 08:30:00 10:21:00 (10-11]
1 09:10:00 11:30:00 (11-12]
2 09:15:00 10:15:00 (10-11]
2 09:35:00 10:11:00 (10-11]
3 09:45:00 11:12:00 (11-12]
...
I would like to create a series looking as follows:
interval timestamp unique_ids
(10-11] 08:00:00 1
08:30:00 1
09:15:00 1
09:35:00 1
(11-12] 09:10:00 1
09:45:00 2
The objective is to count, for each time interval, how many unique ids had their corresponding target_time within the interval at their timestamp. Note that the target_time for each id can change at different timestamps. For instance, for the id 1 the interval is (10-11] from 08:00:00 to 08:30:00, but then it changes to (11-12] at 09:10:00. Therefore, at 09:15:00 I do not want to count the id 1 in the resulting Series.
I tried a groupby -> expand -> np.unique approach, but it does not provide the result that I want:
df.set_index('timestamp').groupby('interval').id.expanding().apply(lambda x: np.unique(x).shape[0])
interval timestamp unique_ids
(10-11] 08:00:00 1
08:30:00 1
09:15:00 2
09:35:00 2
(11-12] 09:10:00 1
09:45:00 2
Any hint on how can I approach this problem? I want to make use of pandas routines as much as possible, in order to reduce computational time, since the length of the DataFrame is 1453076...
Many thanks in advance!

Convert datetime object to date and datetime2 to time then combine to single column

I have a dataset where the transaction date is stored as YYYY-MM-DD 00:00:00 and the transaction time is stored as 1900-01-01 HH:MM:SS
I need to truncate these timestamps and then either leave as is or convert to a singular timestamp. I've tried several methods and all continue to return the full timestamp. Thoughts?
Use split and pd.to_datetime:
df = pd.DataFrame({'TransDate':['2015-01-01 00:00:00','2015-01-02 00:00:00','2015-01-03 00:00:00'],
'TransTime':['1900-01-01 07:00:00','1900-01-01 08:30:00','1900-01-01 09:45:15']})
df['Date'] = (pd.to_datetime(df['TransDate'].str.split().str[0] +
' ' +
df['TransTime'].str.split().str[1]))
Output:
TransDate TransTime Date
0 2015-01-01 00:00:00 1900-01-01 07:00:00 2015-01-01 07:00:00
1 2015-01-02 00:00:00 1900-01-01 08:30:00 2015-01-02 08:30:00
2 2015-01-03 00:00:00 1900-01-01 09:45:15 2015-01-03 09:45:15
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
TransDate 3 non-null object
TransTime 3 non-null object
Date 3 non-null datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 152.0+ bytes
None

Resources