How to find the next month and current month datetime in python - python-3.x

I need to find the next month and current month Datetime in python
I have a dataframe which has date columns and i need to filter the values based on todays date.
if todays date is less than 15, i need all row values starting from current month (2019-08-01 00:00:00)
if todays date is >=15, i need all row values starting from next month(2019-09-01 00:00:00)
Dataframe:
PC GEO Month Values
A IN 2019-08-01 00:00:00 1
B IN 2019-08-02 00:00:00 1
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
E IN 2019-07-01 00:00:00 1
if today's date is < 15
PC GEO Month Values
A IN 2019-08-01 00:00:00 1
B IN 2019-08-02 00:00:00 1
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
if today's date is >= 15
PC GEO Month Values
C IN 2019-09-14 00:00:00 1
D IN 2019-10-01 00:00:00 1
I am passing todays date as below
dat = date.today()
dat = dat.strftime("%d")
dat = int(dat)

Use:
#convert column to datetimes
df['Month'] = pd.to_datetime(df['Month'])
#get today timestamp
dat = pd.to_datetime('now')
print (dat)
2019-08-27 13:40:54.272257
#convert datetime to month periods
per = df['Month'].dt.to_period('m')
#convert timestamp to period
today_per = dat.to_period('m')
#compare day and filter
if dat.day < 15:
df = df[per >= today_per]
print (df)
else:
df = df[per > today_per]
print (df)
PC GEO Month Values
2 C IN 2019-09-14 1
3 D IN 2019-10-01 1
Test with values <15:
df['Month'] = pd.to_datetime(df['Month'])
dat = pd.to_datetime('2019-08-02')
print (dat)
2019-08-02 00:00:00
per = df['Month'].dt.to_period('m')
today_per = dat.to_period('m')
if dat.day < 15:
df = df[per >= today_per]
print (df)
PC GEO Month Values
0 A IN 2019-08-01 1
1 B IN 2019-08-02 1
2 C IN 2019-09-14 1
3 D IN 2019-10-01 1
else:
df = df[per > today_per]

Related

How to get the minimum time value in a dataframe with excluding specific value

I have a dataframe that has the format as below. I am looking to get the minimum time value for each column and save it in a list with excluding a specific time value with a format (00:00:00) to be a minimum value in any column in a dataframe.
df =
10.0.0.155 192.168.1.240 192.168.0.242
0 19:48:46 16:23:40 20:14:07
1 20:15:46 16:23:39 20:14:09
2 19:49:37 16:23:20 00:00:00
3 20:15:08 00:00:00 00:00:00
4 19:48:46 00:00:00 00:00:00
5 19:47:30 00:00:00 00:00:00
6 19:49:13 00:00:00 00:00:00
7 20:15:50 00:00:00 00:00:00
8 19:45:34 00:00:00 00:00:00
9 19:45:33 00:00:00 00:00:00
I tried to use the code below, but it doesn't work:
minValues = []
for column in df:
#print(df[column])
if "00:00:00" in df[column]:
minValues.append (df[column].nlargest(2).iloc[-1])
else:
minValues.append (df[column].min())
print (df)
print (minValues)
Idea is replace 0 to missing values and then get minimal timedeltas:
df1 = df.astype(str).apply(pd.to_timedelta)
s1 = df1.mask(df1.eq(pd.Timedelta(0))).min()
print (s1)
10.0.0.155 0 days 19:45:33
192.168.1.240 0 days 16:23:20
192.168.0.242 0 days 20:14:07
dtype: timedelta64[ns]
Or with get minimal datetimes and last convert output to HH:MM:SS values:
df1 = df.astype(str).apply(pd.to_datetime)
s2 = (df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.strftime('%H:%M:%S')
print (s2)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object
Or to times:
df1 = df.astype(str).apply(pd.to_datetime)
s3 = df1.mask(df1.eq(pd.to_datetime("00:00:00"))).min().dt.time
print (s3)
10.0.0.155 19:45:33
192.168.1.240 16:23:20
192.168.0.242 20:14:07
dtype: object

manipulating pandas dataframe - conditional

I have a pandas dataframe that looks like this:
ID Date Event_Type
1 01/01/2019 A
1 01/01/2019 B
2 02/01/2019 A
3 02/01/2019 A
I want to be left with:
ID Date
1 01/01/2019
2 02/01/2019
3 02/01/2019
Where my condition is:
If the ID is the same AND the dates are within 2 days of each other then drop one of the rows.
If however the dates are more than 2 days apart then keep both rows.
How do I do this?
I believe you need first convert values to datetimes by to_datetime, then get diff and get first values per groups by isnull() chained with comparing if next values are higher like timedelta treshold:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
s = df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
ID Date Event_Type
0 1 2019-01-01 A
2 2 2019-02-01 A
3 3 2019-02-01 A
Check solution with another data:
print (df)
ID Date Event_Type
0 1 01/01/2019 A
1 1 04/01/2019 B <-difference 3 days
2 2 02/01/2019 A
3 3 02/01/2019 A
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
s = df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
ID Date Event_Type
0 1 2019-01-01 A
1 1 2019-01-04 B
2 2 2019-01-02 A
3 3 2019-01-02 A

day of Year values starting from a particular date

I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018.
Date
02/11/2017
03/11/2017
05/11/2017
.
.
01/11/2018
I want to add an adjacent column called Day_Of_Year as follows:
Date Day_Of_Year
02/11/2017 1
03/11/2017 2
05/11/2017 4
.
.
01/11/2018 365
I apologize if it's a very basic question, but unfortunately I haven't been able to start with this.
I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd january and so on.. irrespective of the year. So, that wouldn't work for me.
First convert column to_datetime and then subtract datetime, convert to days and add 1:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(pd.Timestamp('2017-11-02')).dt.days + 1
print (df)
Date Day_Of_Year
0 02/11/2017 1
1 03/11/2017 2
2 05/11/2017 4
3 01/11/2018 365
Or subtract by first value of column:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(df['Date'].iat[0]).dt.days + 1
print (df)
Date Day_Of_Year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
Using strftime with '%j'
s=pd.to_datetime(df.Date,dayfirst=True).dt.strftime('%j').astype(int)
s-s.iloc[0]
Out[750]:
0 0
1 1
2 3
Name: Date, dtype: int32
#df['new']=s-s.iloc[0]
Python has dayofyear. So put your column in the right format with pd.to_datetime and then apply Series.dt.dayofyear. Lastly, use some modulo arithmetic to find everything in terms of your original date
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['day of year'] = df['Date'].dt.dayofyear - df['Date'].dt.dayofyear[0] + 1
df['day of year'] = df['day of year'] + 365*((365 - df['day of year']) // 365)
Output
Date day of year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
But I'm doing essentially the same as Jezrael in more lines of code, so my vote goes to her/him

Change all values of a column in pandas data frame

I have a panda data frame that contains values of a column like '01:00'. I want to deduct 1 from it means '01:00' will be '00:00'. Can anyone helps
You can use timedeltas:
df = pd.DataFrame({'col':['01:00', '02:00', '24:00']})
df['new'] = pd.to_timedelta(df['col'] + ':00') - pd.Timedelta(1, unit='h')
df['new'] = df['new'].astype(str).str[-18:-13]
print (df)
col sub
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
Another faster solution by map if format of all strings is 01:00 to 24:00:
L = ['{:02d}:00'.format(x) for x in range(25)]
d = dict(zip(L[1:], L[:-1]))
df['new'] = df['col'].map(d)
print (df)
col new
0 01:00 00:00
1 02:00 01:00
2 24:00 23:00
It seems that what you want to do is subtract an hour from the time which is stored in your dataframe as a string. You can do the following:
from datetime import datetime, timedelta
subtract_one_hour = lambda x: (datetime.strptime(x, '%H:%M') - timedelta(hours=1)).strftime("%H:%M")
df['minus_one_hour'] = df['original_time'].apply(subtract_one_hour)

Indicate whether datetime of row is in a daterange

I'm trying to get dummy variables for holidays in a dataset. I have a couple of dateranges (pd.daterange()) with holidays and a dataframe to which I would like to append a dummy to indicate whether the datetime of that row is in a certain daterange of the specified holidays.
Small example:
ChristmasBreak = list(pd.date_range('2014-12-20','2015-01-04').date)
dates = pd.date_range('2015-01-03', '2015-01-06, freq='H')
d = {'Date': dates, 'Number': np.rand(len(dates))}
df = pd.DataFrame(data=d)
df.set_index('Date', inplace=True)
for i, row in df.iterrows():
if i in ChristmasBreak:
df[i,'Christmas] = 1
The if loop is never entered, so matching the dates won't work. Is there any way to do this? Alternative methods to come to dummies for this case are welcome as well!
First dont use iterrows, because really slow.
Better is use dt.date with Series,isin, last convert boolean mask to integer - Trues are 1:
df = pd.DataFrame(data=d)
df['Christmas'] = df['Date'].dt.date.isin(ChristmasBreak).astype(int)
Or use between:
df['Christmas'] = df['Date'].between('2014-12-20', '2015-01-04').astype(int)
If want compare with DatetimeIndex:
df = pd.DataFrame(data=d)
df.set_index('Date', inplace=True)
df['Christmas'] = df.index.date.isin(ChristmasBreak).astype(int)
df['Christmas'] = ((df.index > '2014-12-20') & (df.index < '2015-01-04')).astype(int)
Sample:
ChristmasBreak = pd.date_range('2014-12-20','2015-01-04').date
dates = pd.date_range('2014-12-19 20:00', '2014-12-20 05:00', freq='H')
d = {'Date': dates, 'Number': np.random.randint(10, size=len(dates))}
df = pd.DataFrame(data=d)
df['Christmas'] = df['Date'].dt.date.isin(ChristmasBreak).astype(int)
print (df)
Date Number Christmas
0 2014-12-19 20:00:00 6 0
1 2014-12-19 21:00:00 7 0
2 2014-12-19 22:00:00 0 0
3 2014-12-19 23:00:00 9 0
4 2014-12-20 00:00:00 1 1
5 2014-12-20 01:00:00 3 1
6 2014-12-20 02:00:00 1 1
7 2014-12-20 03:00:00 8 1
8 2014-12-20 04:00:00 2 1
9 2014-12-20 05:00:00 1 1
This should do what you want:
df['Christmas'] = df.index.isin(ChristmasBreak).astype(int)

Resources