How to get the minimum time value in a dataframe with excluding specific value - python-3.x

I have a dataframe that has the format as below. I am looking to get the minimum time value for each column and save it in a list with excluding a specific time value with a format (00:00:00) to be a minimum value in any column in a dataframe.
df =
10.0.0.155 192.168.1.240 192.168.0.242
0 19:48:46 16:23:40 20:14:07
1 20:15:46 16:23:39 20:14:09
2 19:49:37 16:23:20 00:00:00
3 20:15:08 00:00:00 00:00:00
4 19:48:46 00:00:00 00:00:00
5 19:47:30 00:00:00 00:00:00
6 19:49:13 00:00:00 00:00:00
7 20:15:50 00:00:00 00:00:00
8 19:45:34 00:00:00 00:00:00
9 19:45:33 00:00:00 00:00:00
I tried to use the code below, but it doesn't work:
minValues = []
for column in df:
#print(df[column])
if "00:00:00" in df[column]:
minValues.append (df[column].nlargest(2).iloc[-1])
else:
minValues.append (df[column].min())
print (df)
print (minValues)

Idea is replace 0 to missing values and then get minimal timedeltas:
df1 = df.astype(str).apply(pd.to_timedelta)
s1 = df1.mask(df1.eq(pd.Timedelta(0))).min()
print (s1)
10.0.0.155 0 days 19:45:33
192.168.1.240 0 days 16:23:20
192.168.0.242 0 days 20:14:07
dtype: timedelta64[ns]
Or with get minimal datetimes and last convert output to HH:MM:SS values:
df1 = df.astype(str).apply(pd.to_datetime)
s2 = (df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.strftime('%H:%M:%S')
print (s2)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object
Or to times:
df1 = df.astype(str).apply(pd.to_datetime)
s3 = df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.time
print (s3)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object

Related

How to find the next month and current month datetime in python

I need to find the next month and current month Datetime in python
I have a dataframe which has date columns and i need to filter the values based on todays date.
if todays date is less than 15, i need all row values starting from current month (2019-08-01 00:00:00)
if todays date is >=15, i need all row values starting from next month(2019-09-01 00:00:00)
Dataframe:
PC GEO Month Values
A IN 2019-08-01 00:00:00 1
B IN 2019-08-02 00:00:00 1
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
E IN 2019-07-01 00:00:00 1
if today's date is < 15
PC GEO Month Values
A IN 2019-08-01 00:00:00 1
B IN 2019-08-02 00:00:00 1
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
if today's date is >= 15
PC GEO Month Values
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
I am passing todays date as below
dat = date.today()
dat = dat.strftime("%d")
dat = int(dat)
Use:
#convert column to datetimes
df['Month'] = pd.to_datetime(df['Month'])
#get today timestamp
dat = pd.to_datetime('now')
print (dat)
2019-08-27 13:40:54.272257
#convert datetime to month periods
per = df['Month'].dt.to_period('m')
#convert timestamp to period
today_per = dat.to_period('m')
#compare day and filter
if dat.day < 15:
df = df[per >= today_per]
print (df)
else:
df = df[per > today_per]
print (df)
PC GEO Month Values
2 C IN 2019-09-14 1
3 D IN 2019-10-01 1
Test with values <15:
df['Month'] = pd.to_datetime(df['Month'])
dat = pd.to_datetime('2019-08-02')
print (dat)
2019-08-02 00:00:00
per = df['Month'].dt.to_period('m')
today_per = dat.to_period('m')
if dat.day < 15:
df = df[per >= today_per]
print (df)
PC GEO Month Values
0 A IN 2019-08-01 1
1 B IN 2019-08-02 1
2 C IN 2019-09-14 1
3 D IN 2019-10-01 1
else:
df = df[per > today_per]

Create a pandas column based on a lookup value from another dataframe

I have a pandas dataframe that has some data values by hour (which is also the index of this lookup dataframe). The dataframe looks like this:
In [1] print (df_lookup)
Out[1] 0 1.109248
1 1.102435
2 1.085014
3 1.073487
4 1.079385
5 1.088759
6 1.044708
7 0.902482
8 0.852348
9 0.995912
10 1.031643
11 1.023458
12 1.006961
...
23 0.889541
I want to multiply the values from this lookup dataframe to create a column of another dataframe, which has datetime as index.
The dataframe looks like this:
In [2] print (df)
Out[2]
Date_Label ID data-1 data-2 data-3
2015-08-09 00:00:00 1 2513.0 2502 NaN
2015-08-09 00:00:00 1 2113.0 2102 NaN
2015-08-09 01:00:00 2 2006.0 1988 NaN
2015-08-09 02:00:00 3 2016.0 2003 NaN
...
2018-07-19 23:00:00 33 3216.0 333 NaN
I want to calculate the data-3 column from data-2 column, where the weight given to 'data-2' column depends on corresponding value in df_lookup. I get the desired values by looping over the index as follows, but that is too slow:
for idx in df.index:
df.loc[idx,'data-3'] = df.loc[idx, 'data-2']*df_lookup.at[idx.hour]
Is there a faster way someone could suggest?
Using .loc
df['data-2']*df_lookup.loc[df.index.hour].values
Out[275]:
Date_Label
2015-08-09 00:00:00 2775.338496
2015-08-09 00:00:00 2331.639296
2015-08-09 01:00:00 2191.640780
2015-08-09 02:00:00 2173.283042
Name: data-2, dtype: float64
#df['data-3']=df['data-2']*df_lookup.loc[df.index.hour].values
I'd probably try doing a join.
# Fix column name
df_lookup.columns = ['multiplier']
# Get hour index
df['hour'] = df.index.hour
# Join
df = df.join(df_lookup, how='left', on=['hour'])
df['data-3'] = df['data-2'] * df['multiplier']
df = df.drop(['multiplier', 'hour'], axis=1)

Indicate whether datetime of row is in a daterange

I'm trying to get dummy variables for holidays in a dataset. I have a couple of dateranges (pd.daterange()) with holidays and a dataframe to which I would like to append a dummy to indicate whether the datetime of that row is in a certain daterange of the specified holidays.
Small example:
ChristmasBreak = list(pd.date_range('2014-12-20','2015-01-04').date)
dates = pd.date_range('2015-01-03', '2015-01-06, freq='H')
d = {'Date': dates, 'Number': np.rand(len(dates))}
df = pd.DataFrame(data=d)
df.set_index('Date', inplace=True)
for i, row in df.iterrows():
if i in ChristmasBreak:
df[i,'Christmas] = 1
The if loop is never entered, so matching the dates won't work. Is there any way to do this? Alternative methods to come to dummies for this case are welcome as well!
First dont use iterrows, because really slow.
Better is use dt.date with Series,isin, last convert boolean mask to integer - Trues are 1:
df = pd.DataFrame(data=d)
df['Christmas'] = df['Date'].dt.date.isin(ChristmasBreak).astype(int)
Or use between:
df['Christmas'] = df['Date'].between('2014-12-20', '2015-01-04').astype(int)
If want compare with DatetimeIndex:
df = pd.DataFrame(data=d)
df.set_index('Date', inplace=True)
df['Christmas'] = df.index.date.isin(ChristmasBreak).astype(int)
df['Christmas'] = ((df.index > '2014-12-20') & (df.index < '2015-01-04')).astype(int)
Sample:
ChristmasBreak = pd.date_range('2014-12-20','2015-01-04').date
dates = pd.date_range('2014-12-19 20:00', '2014-12-20 05:00', freq='H')
d = {'Date': dates, 'Number': np.random.randint(10, size=len(dates))}
df = pd.DataFrame(data=d)
df['Christmas'] = df['Date'].dt.date.isin(ChristmasBreak).astype(int)
print (df)
Date Number Christmas
0 2014-12-19 20:00:00 6 0
1 2014-12-19 21:00:00 7 0
2 2014-12-19 22:00:00 0 0
3 2014-12-19 23:00:00 9 0
4 2014-12-20 00:00:00 1 1
5 2014-12-20 01:00:00 3 1
6 2014-12-20 02:00:00 1 1
7 2014-12-20 03:00:00 8 1
8 2014-12-20 04:00:00 2 1
9 2014-12-20 05:00:00 1 1
This should do what you want:
df['Christmas'] = df.index.isin(ChristmasBreak).astype(int)

Convert datetime object to date and datetime2 to time then combine to single column

I have a dataset where the transaction date is stored as YYYY-MM-DD 00:00:00 and the transaction time is stored as 1900-01-01 HH:MM:SS
I need to truncate these timestamps and then either leave as is or convert to a singular timestamp. I've tried several methods and all continue to return the full timestamp. Thoughts?
Use split and pd.to_datetime:
df = pd.DataFrame({'TransDate':['2015-01-01 00:00:00','2015-01-02 00:00:00','2015-01-03 00:00:00'],
'TransTime':['1900-01-01 07:00:00','1900-01-01 08:30:00','1900-01-01 09:45:15']})
df['Date'] = (pd.to_datetime(df['TransDate'].str.split().str[0] +
' ' +
df['TransTime'].str.split().str[1]))
Output:
TransDate TransTime Date
0 2015-01-01 00:00:00 1900-01-01 07:00:00 2015-01-01 07:00:00
1 2015-01-02 00:00:00 1900-01-01 08:30:00 2015-01-02 08:30:00
2 2015-01-03 00:00:00 1900-01-01 09:45:15 2015-01-03 09:45:15
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
TransDate 3 non-null object
TransTime 3 non-null object
Date 3 non-null datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 152.0+ bytes
None

Apply a value to max values in a groupby

I have a DF like this:
ID Time
1 20:29
1 20:45
1 23:16
2 11:00
2 13:00
3 01:00
I want to create a new column that puts a 1 next to the largest time value within each ID grouping like so:
ID Time Value
1 20:29 0
1 20:45 0
1 23:16 1
2 11:00 0
2 13:00 1
3 01:00 1
I know the answer involves a groupby mechanism and have been fiddling around with something like:
df.groupby('ID')['Time'].max() = 1
The idea is to write an anonymous function that operates on each of your groups and feed this to your groupby using apply:
df['Value']=df.groupby('ID',as_index=False).apply(lambda x : x.Time == max(x.Time)).values
Assuming that your 'Time' column is already a datetime64 then you want to groupby on 'ID' column and then call transform to apply a lambda to create a series with an index aligned with your original df:
In [92]:
df['Value'] = df.groupby('ID')['Time'].transform(lambda x: (x == x.max())).dt.nanosecond
df
Out[92]:
ID Time Value
0 1 2015-11-20 20:29:00 0
1 1 2015-11-20 20:45:00 0
2 1 2015-11-20 23:16:00 1
3 2 2015-11-20 11:00:00 0
4 2 2015-11-20 13:00:00 1
5 3 2015-11-20 01:00:00 1
The dt.nanosecond call is because the dtype returned is a datetime for some reason rather than a boolean:
In [93]:
df.groupby('ID')['Time'].transform(lambda x: (x == x.max()))
Out[93]:
0 1970-01-01 00:00:00.000000000
1 1970-01-01 00:00:00.000000000
2 1970-01-01 00:00:00.000000001
3 1970-01-01 00:00:00.000000000
4 1970-01-01 00:00:00.000000001
5 1970-01-01 00:00:00.000000001
Name: Time, dtype: datetime64[ns]

Resources