Merge cells based on week number openpyxl - python-3.x

I am generating an excel sheet based on date range.
Along with date I am also finding week number.
Currently I am able to generate excel sheet like this below:
Date
Week
1/4/2021
1
1/5/2021
1
1/6/2021
1
1/7/2021
1
1/8/2021
1
1/9/2021
1
1/10/2021
1
1/11/2021
2
1/12/2021
2
1/13/2021
2
1/14/2021
2
1/15/2021
2
1/16/2021
2
1/17/2021
2
1/18/2021
3
1/19/2021
3
1/20/2021
3
1/21/2021
3
1/22/2021
3
1/23/2021
3
1/24/2021
3
I am expecting it like this:
Date
Week
1/4/2021
1
1/5/2021
1/6/2021
1/7/2021
1/8/2021
1/9/2021
1/10/2021
1/11/2021
2
1/12/2021
1/13/2021
1/14/2021
1/15/2021
1/16/2021
1/17/2021
1/18/2021
3
1/19/2021
1/20/2021
1/21/2021
1/22/2021
1/23/2021
1/24/2021
Used markdown tables here, But column showing week, those cells should be merged based on same week number.
code used:
from datetime import datetime
start_date = datetime.strptime("2021-01-01", "%Y-%m-%d")
end_date = datetime.strptime("2022-01-01", "%Y-%m-%d")
delta = timedelta(days=1)
wb = Workbook()
ws2 = wb.create_sheet("Test sheet")
row_number = 4
while start_date <= end_date:
ws2[f'A{row_number}'] = f"{start_date.month}/{start_date.day}/{start_date.year}"
ws2[f'B{row_number}'] = start_date.isocalendar()[1]
ws2[f'C{row_number}'] = start_date.strftime("%B")
row_number+=1
start_date += delta

You can change:
ws2[f'B{row_number}'] = start_date.isocalendar()[1]
to:
ws2[f'B{row_number}'] = start_date.isocalendar()[1] if start_date.weekday() == 0 else ''

Related

Last Day previous Month

I have this dataframe
import pandas as pd
df = pd.DataFrame({'Found':['A','A','A','A','A','B','B','B','B'],
'Date':['14/10/2021','19/10/2021','29/10/2021','30/09/2021','20/09/2021','20/10/2021','29/10/2021','15/10/2021','10/09/2021'],
'LastDayMonth':['29/10/2021','29/10/2021','29/10/2021','30/09/2021','30/09/2021','29/10/2021','29/10/2021','29/10/2021','30/09/2021'],
'Mark':[1,2,3,4,3,1,2,3,2]
})
print(df)
Found Date LastDayMonth Mark
0 A 14/10/2021 29/10/2021 1
1 A 19/10/2021 29/10/2021 2
2 A 29/10/2021 29/10/2021 3
3 A 30/09/2021 30/09/2021 4
4 A 20/09/2021 30/09/2021 3
5 B 20/10/2021 29/10/2021 1
6 B 29/10/2021 29/10/2021 2
7 B 15/10/2021 29/10/2021 3
8 B 10/09/2021 30/09/2021 2
based on this dataframe I need to create a new column that is the "Mark" of the last day of the month to form this new column.
that is, I need the value of the 'Mark' column of the last day of the month of each Found
how i did
mark_last_day = df.loc[df.apply(lambda x: x['Date']==x['LastDayMonth'], 1)]
df.merge(mark_last_day[['Found', 'LastDayMonth', 'Mark']],
how='left',
on=['Found', 'LastDayMonth'],
suffixes=('', '_LastDayMonth'))
# Output
Found Date LastDayMonth Mark Mark_LastDayMonth
0 A 14/10/2021 29/10/2021 1 3
1 A 19/10/2021 29/10/2021 2 3
2 A 29/10/2021 29/10/2021 3 3
3 A 30/09/2021 30/09/2021 4 4
4 A 20/09/2021 30/09/2021 3 4
5 B 20/10/2021 29/10/2021 1 2
6 B 29/10/2021 29/10/2021 2 2
7 B 15/10/2021 29/10/2021 3 2
So far so good but I'm having trouble creating a new column with the Mark_LastDayMonth of the previous month or I need the last day of the current month and the previous month
how do i do it
Ex.
Found Date LastDayMonth Mark Mark_LastDayMonth Mark_LastDayPrevious_Month
0 A 14/10/2021 29/10/2021 1 3 4
1 A 19/10/2021 29/10/2021 2 3 4
2 A 29/10/2021 29/10/2021 3 3 4
3 A 30/09/2021 30/09/2021 4 4 x
4 A 20/09/2021 30/09/2021 3 4 x
5 B 20/10/2021 29/10/2021 1 2 1
6 B 29/10/2021 29/10/2021 2 2 1
7 B 15/10/2021 29/10/2021 3 2 1
8 B 10/09/2021 30/09/2021 1 1 x
Here is a function to get the last day of the previous month
import datetime
def get_prev_month(date_str):
format_str = '%d/%m/%Y'
datetime_obj = datetime.datetime.strptime(date_str, format_str)
first_day_of_this_month = datetime_obj.replace(day=1)
last_day_of_prev_month = first_day_of_this_month - datetime.timedelta(days=1)
return last_day_of_prev_month.strftime("%d/%m/%Y")
Here is a function to get the mark of any "date" and "found" from your mark_last_day variable
def get_mark_of(date_str, found):
same_date = last_day_mark.Date==date_str
same_found = last_day_mark.Found == found
return last_day_mark.where(same_date & same_found).dropna().Mark
If you want to add the LastDayPrevMonth column You don't need to do so unless you want it
df["LastDayPrevMonth"] = df.LastDayMonth.apply(lambda x: get_prev_month(x))
And at last the creating the column Mark_LastDayPrevMonth, and setting 0 if there exist no that previous month in the dataset.
df["Mark_LastDayPrevMonth"] = df.apply(lambda x: get_mark_of(get_prev_month(x["LastDayMonth"]), x["Found"]), axis=1).fillna(0).astype(int)
Use the date offset MonthEnd
from pandas.tseries.offsets import MonthEnd
df['LastDayPreviousMonth'] = df['Date'] - MonthEnd()
>>> df[['Date', 'LastDayPreviousMonth']]
Date LastDayPreviousMonth
0 2021-10-14 2021-09-30
1 2021-10-19 2021-09-30
2 2021-10-29 2021-09-30
3 2021-09-30 2021-08-31
4 2021-09-20 2021-08-31
5 2021-10-20 2021-09-30
6 2021-10-29 2021-09-30
7 2021-10-15 2021-09-30
Then do a similarly merge as you did for 'LastDayMonth'.
Does this help you complete the solution?
Note: I'm assuming 'Date' and 'LastDayPreviousMonth' are datetime-like. If they aren't you need to convert them first using
df[['Date', 'LastDayMonth']] = df[['Date', 'LastDayMonth']].apply(pd.to_datetime)

How do I add new column that adds and sums counts from existing column?

I have this python code:
counting_bach_new = counting_bach.groupby(['User Name', 'time_diff', 'Logon Time']).size()
print("\ncounting_bach_new")
print(counting_bach_new)
...getting this neat result:
counting_bach_new
User Name time_diff Logon Time
122770 -132 days +21:38:00 1 1
-122 days +00:41:00 1 1
123526 -30 days +12:04:00 1 1
-29 days +16:39:00 1 1
-27 days +18:16:00 1 1
..
201685 -131 days +21:21:00 1 1
202047 -106 days +10:14:00 1 1
202076 -132 days +10:22:00 1 1
-132 days +14:46:00 1 1
-131 days +21:21:00 1 1
So how do I add new column that adds and sums counts from existing column? The rightmost column with 1's should be disregarded, while I--on the other hand--would like to add a new column, summing up counts of 'time diff's per 'User Name', i.e. the result in the new col should sum # of observations listed per user. Either summing up # of time_diffs or Logon Time's. For User Name 122770 the new col should sum up to 2, for 123526 it should sum up to 3, and so on....
I tried several attempts, including (but not working)...
counting_bach_new.groupby('User Name').agg(MySum=('Logon Time', 'sum'), MyCount=('Logon Time', 'count'))
Any help would be appreciated. Thank you, for your kind support. Christmas Greetings from #Hubsandspokes
Use DataFrame.join with Series.reset_index:
df = (counting_bach_new.to_frame('count')
.join((counting_bach_new.reset_index()
.groupby('User Name')
.agg(MySum=('Logon Time', 'sum'),
MyCount=('Logon Time', 'count'))), on='User Name'))
print (df)
count MySum MyCount
User Name time_diff Logon Time
122770 -132 days +21:38:00 1 1 2 2
-122 days +00:41:00 1 1 2 2
123526 -30 days +12:04:00 1 1 3 3
-29 days +16:39:00 1 1 3 3
-27 days +18:16:00 1 1 3 3
201685 -131 days +21:21:00 1 1 1 1
202047 -106 days +10:14:00 1 1 1 1
202076 -132 days +10:22:00 1 1 3 3
-132 days +14:46:00 1 1 3 3
-131 days +21:21:00 1 1 3 3
If I understand the request correctly, try:
counting_bach_new.reset_index().groupby(['User Name'])['Logon Time'].count()
If you need to save starting number of columns, try:
counting_bach_new.reset_index().groupby(['User Name'])['Logon Time'].transform('count')

Groupby and filter rows based on multiple conditions in Pandas

Given a dataframe as follow:
store_id item_id items_sold date
1 1 0 2015-12-28
1 1 1 2015-12-28
1 1 0 2015-12-28
2 2 0 2015-12-28
2 2 1 2015-12-29
2 2 1 2015-12-29
2 2 0 2015-12-29
3 1 0 2015-12-30
3 1 0 2015-12-30
I want to groupby store_id and item_id, then remove for each group their number of entries are less than 4 and all values of items_sold are 0s.
For removing the groups based on the first condition I have used the code below, now how could I add and combine the second condition with it?
g = df.groupby(['store_id', 'item_id'])
df = g.filter(lambda x: len(x) >= 4)
The expected output will like:
store_id item_id items_sold date
2 2 0 2015-12-28
2 2 1 2015-12-29
2 2 1 2015-12-29
2 2 0 2015-12-29
Thanks.
We can get a boolean array of all the rows with items_sold = 0, then groupby on this array and check if all the rows of a group are True:
m1 = ~df['items_sold'].eq(0).groupby([df['store_id'], df['item_id']]).transform('all')
m2 = df.groupby(['store_id', 'item_id'])['store_id'].transform('size') >= 4
df[m1 & m2]
store_id item_id items_sold date
3 2 2 0 2015-12-28
4 2 2 1 2015-12-29
5 2 2 1 2015-12-29
6 2 2 0 2015-12-29
Fix your code
g.filter(lambda x: (len(x) >= 4) & (sum(x['items_sold'])>0))
store_id item_id items_sold date
3 2 2 0 2015-12-28
4 2 2 1 2015-12-29
5 2 2 1 2015-12-29
6 2 2 0 2015-12-29

manipulating pandas dataframe - conditional

I have a pandas dataframe that looks like this:
ID Date Event_Type
1 01/01/2019 A
1 01/01/2019 B
2 02/01/2019 A
3 02/01/2019 A
I want to be left with:
ID Date
1 01/01/2019
2 02/01/2019
3 02/01/2019
Where my condition is:
If the ID is the same AND the dates are within 2 days of each other then drop one of the rows.
If however the dates are more than 2 days apart then keep both rows.
How do I do this?
I believe you need first convert values to datetimes by to_datetime, then get diff and get first values per groups by isnull() chained with comparing if next values are higher like timedelta treshold:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
s = df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
ID Date Event_Type
0 1 2019-01-01 A
2 2 2019-02-01 A
3 3 2019-02-01 A
Check solution with another data:
print (df)
ID Date Event_Type
0 1 01/01/2019 A
1 1 04/01/2019 B <-difference 3 days
2 2 02/01/2019 A
3 3 02/01/2019 A
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
s = df.groupby('ID')['Date'].diff()
df = df[(s.isnull() | (s > pd.Timedelta(2, 'd')))]
print (df)
ID Date Event_Type
0 1 2019-01-01 A
1 1 2019-01-04 B
2 2 2019-01-02 A
3 3 2019-01-02 A

day of Year values starting from a particular date

I have a dataframe with a date column. The duration is 365 days starting from 02/11/2017 and ending at 01/11/2018.
Date
02/11/2017
03/11/2017
05/11/2017
.
.
01/11/2018
I want to add an adjacent column called Day_Of_Year as follows:
Date Day_Of_Year
02/11/2017 1
03/11/2017 2
05/11/2017 4
.
.
01/11/2018 365
I apologize if it's a very basic question, but unfortunately I haven't been able to start with this.
I could use datetime(), but that would return values such as 1 for 1st january, 2 for 2nd january and so on.. irrespective of the year. So, that wouldn't work for me.
First convert column to_datetime and then subtract datetime, convert to days and add 1:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(pd.Timestamp('2017-11-02')).dt.days + 1
print (df)
Date Day_Of_Year
0 02/11/2017 1
1 03/11/2017 2
2 05/11/2017 4
3 01/11/2018 365
Or subtract by first value of column:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Day_Of_Year'] = df['Date'].sub(df['Date'].iat[0]).dt.days + 1
print (df)
Date Day_Of_Year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
Using strftime with '%j'
s=pd.to_datetime(df.Date,dayfirst=True).dt.strftime('%j').astype(int)
s-s.iloc[0]
Out[750]:
0 0
1 1
2 3
Name: Date, dtype: int32
#df['new']=s-s.iloc[0]
Python has dayofyear. So put your column in the right format with pd.to_datetime and then apply Series.dt.dayofyear. Lastly, use some modulo arithmetic to find everything in terms of your original date
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['day of year'] = df['Date'].dt.dayofyear - df['Date'].dt.dayofyear[0] + 1
df['day of year'] = df['day of year'] + 365*((365 - df['day of year']) // 365)
Output
Date day of year
0 2017-11-02 1
1 2017-11-03 2
2 2017-11-05 4
3 2018-11-01 365
But I'm doing essentially the same as Jezrael in more lines of code, so my vote goes to her/him

Resources