Calculate running count on grouped df by more than one column

Calculate running count on grouped df by more than one column - python-3.x

I want to calculate running count on the basis of the customer_id, date and status so -
Sample df-
| id | order_id | customer_id | status | date |
| -------- | -------------- |-------|------|-------|
| 1 | 101 | 123 | X | 24-07-2021|
| 2 | 101| 223 | X | 24-07-2021|
| 3 | 101 | 223 | X | 24-07-2021|
| 4 | 101 | 123 | Y | 24-07-2021|
| 5 | 101| 123 | X | 24-07-2021|
| 6 | 102 | 123 | X | 25-07-2021|
| 7 | 101| 123 | Y | 24-07-2021|
Expected result -
| customer_id | status | date | cumulative_count |
| ----------|-----|----------|------------|
| 123 | X | 24-07-2021|1|
| 223 | X | 24-07-2021|1|
| 223 | X | 24-07-2021|(1+1)|
| 123 | Y | 24-07-2021|1|
| 123 | X | 24-07-2021|(1+1)|
| 123 | X | 25-07-2021|(1+1+1)|
| 123 | Y | 24-07-2021|(1+1)|
Due to some reason, I am unable to format the table, apologies for the same.

Use cumcount-
df['cumulative_count'] = df.groupby(['customer_id', 'status']).cumcount() + 1
Output
id order_id customer_id status date cumulative_count
1 1 101 123 X 24-07-2021 1
2 2 101 223 X 24-07-2021 1
3 3 101 223 X 24-07-2021 2
4 4 101 123 Y 24-07-2021 1
5 5 101 123 X 24-07-2021 2
6 6 102 123 X 25-07-2021 3
7 7 101 123 Y 24-07-2021 2

Related

How to Add/Subtract columns based on a particular value in Pandas

I am trying to create a new column called Total_Order_Amount in Dataframe whose value will be based on the Orderstatus. ex if order status is shipped then Total_Order_Amount = item_price + Tax - item_Discount - Tax_Discount. If the order status is cancelled then Total_Order_Aount = item_price - item_Discount.
Input DataFrame
+----------+--------------+------------+------+---------------+--------------+
| Order id | Order Status | item price | Tax | item discount | tax discount |
+----------+--------------+------------+------+---------------+--------------+
| 1 | Shipped | 400 | 72 | 30 | 72 |
+----------+--------------+------------+------+---------------+--------------+
| 2 | cancelled | 200 | 36 | 5 | 0 |
+----------+--------------+------------+------+---------------+--------------+
| 3 | Shipped | 180 | 32.4 | 18 | 0 |
+----------+--------------+------------+------+---------------+--------------+
| 4 | cancelled | 600 | 108 | 50 | 108 |
+----------+--------------+------------+------+---------------+--------------+
| 5 | shipped | 500 | 90 | 25 | 90 |
+----------+--------------+------------+------+---------------+--------------+
| 6 | cancelled | 280 | 50.4 | 15 | 50.4 |
+----------+--------------+------------+------+---------------+--------------+
Final Output required.
+----------+--------------+------------+------+---------------+--------------+--------------------+
| Order id | Order Status | item price | Tax | item discount | tax discount | total order amount |
+----------+--------------+------------+------+---------------+--------------+--------------------+
| 1 | Shipped | 400 | 72 | 30 | 72 | 370 |
+----------+--------------+------------+------+---------------+--------------+--------------------+
| 2 | cancelled | 200 | 36 | 5 | 0 | 195 |
+----------+--------------+------------+------+---------------+--------------+--------------------+
| 3 | Shipped | 180 | 32.4 | 18 | 0 | 194.4 |
+----------+--------------+------------+------+---------------+--------------+--------------------+
| 4 | cancelled | 600 | 108 | 50 | 108 | 550 |
+----------+--------------+------------+------+---------------+--------------+--------------------+
| 5 | shipped | 500 | 90 | 25 | 90 | 475 |
+----------+--------------+------------+------+---------------+--------------+--------------------+
| 6 | cancelled | 280 | 50.4 | 15 | 50.4 | 265 |
+----------+--------------+------------+------+---------------+--------------+--------------------+
please help.

Try this:
def new_col(x):
if x['Order Status'] == 'Shipped':
return x['item price'] + x.Tax - x['item discount'] - x['tax discount']
else:
return x['item price'] - x['item discount']
df['Total_Order_Amount'] = df.apply(new_col, axis=1)

Or without defining a function:
df['Total_Order_Amount'] = (
df['item_price']
+ df['Tax']
- df['item_Discount']
- df['Tax_Discount']
)
mask_shipped = df['Shipped'] != 'shipped'
df.loc[mask_shipped, 'Total_Order_Amount'] = (
df.loc[mask_shipped, 'item_price']
- df.loc[mask_shipped, 'item_Discount']
)

def func(a):
if a['Shipped'] == 'shipped':
return a['item_price'] + a['Tax'] - a['item_Discount'] - a['Tax_Discount']
elif a['Shipped'] == 'cancelled':
return a['item_price'] - a['item_Discount']
else:
return null
df['Total_Order_Amount'] = pd.Series(dtype='float64')
transformedDf = df['Total_Order_Amount'].apply(lambda x: func(x))

Pandas finding intervals (of n-Days) and capturing start/end dates

This started its life as a list of activities. I first built a matrix similar to the one below to represent all activities, which I inverted to show all inactivity, before building the following matrix, where zero indicates an activity, and anything greater than zero indicates the number of days before the next activity.
+------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Item | 01/08/2020 | 02/08/2020 | 03/08/2020 | 04/08/2020 | 05/08/2020 | 06/08/2020 | 07/08/2020 | 08/08/2020 | 09/08/2020 |
+------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| B | 3 | 2 | 1 | 0 | 0 | 3 | 2 | 1 | 0 |
| C | 0 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| D | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 0 |
| E | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 |
+------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
Now I need to find suitable intervals for each Item. For instance, in this case I want to find all intervals with a minimum duration of 3-days.
+------+------------+------------+------------+------------+
| Item | 1_START | 1_END | 2_START | 2_END |
+------+------------+------------+------------+------------+
| A | NaN | NaN | NaN | NaN |
| B | 01/08/2020 | 03/08/2020 | 06/08/2020 | 08/08/2020 |
| C | NaN | NaN | NaN | NaN |
| D | 01/08/2020 | 07/08/2020 | NaN | NaN |
| E | 01/08/2020 | NaN | NaN | NaN |
+------+------------+------------+------------+------------+
In reality the data is 700+ columns wide and 1,000+ rows. How can I do this efficiently?

Joining 2 Tables (without Power Query - Macbook, Index/Match too slow) - Potential VBA Option?

I want to join 2 tables. I know I can do it with power query but as I am on Macbook I can't do it, unfortunately. Does anyone have any suggestions? (I would love to try this in VBA would that be possible?) I've created Pivot Tables before using VBA but never joining 2 tables. My goal is to create a Pivot Table from the resulting table (resulting table being after combining Table 1 and Table 2).
Table 1
Foreign Keys: Division and Location
Division | Year | Week | Location | SchedDept | PlanNetSales | ActNetSales | AreaCategory
----------|------|------|----------|-----------|--------------|-------------|--------------
5 | 2018 | 10 | 520 | 541 | 1943.2 | 2271.115 | Non-Comm
5 | 2018 | 10 | 520 | 608 | 4378.4 | 5117.255 | Non-Comm
5 | 2018 | 10 | 520 | 1059 | 1044.8 | 1221.11 | Comm
5 | 2018 | 10 | 520 | 1126 | 6308 | 7372.475 | Non-Comm
5 | 2018 | 10 | 520 | 1605 | 1119.2 | 1308.065 | Non-Comm
5 | 2018 | 10 | 520 | 151 | 2995.2 | 3500.64 | Non-Comm
5 | 2018 | 10 | 520 | 1637 | 6371.2 | 7446.34 | Non-Comm
5 | 2018 | 10 | 520 | 3081 | 1203.2 | 1406.24 | Non-Comm
5 | 2018 | 10 | 520 | 6645 | 7350.4 | 8590.78 | Vendor Paid
5 | 2018 | 10 | 520 | 452 | 1676.8 | 1959.76 | Non-Comm
5 | 2018 | 10 | 520 | 527 | 7392 | 8639.4 | Non-Comm
5 | 2018 | 10 | 520 | 542 | 6824.8 | 7976.485 | Non-Comm
5 | 2018 | 10 | 520 | 824 | 1872.8 | 2188.835 | Non-Comm
5 | 2018 | 10 | 520 | 1201 | 6397.6 | 7477.195 | Non-Comm
5 | 2018 | 10 | 520 | 1277 | 2517.6 | 2942.445 | Non-Comm
5 | 2018 | 10 | 520 | 1607 | 2196.8 | 2567.51 | Vendor Paid
5 | 2018 | 10 | 520 | 104 | 3276.8 | 3829.76 | Non-Comm
Table 2
Foreign Keys: Division and Location
Division | Location | LocationName | Region | RegionName | District | DistrictName
----------|----------|--------------|--------|------------|----------|--------------
5 | 520 | Location 520 | 1 | Region 1 | 1 | District 1
5 | 584 | Location 584 | 1 | Region 1 | 1 | District 1
5 | 492 | Location 492 | 1 | Region 1 | 2 | District 2
5 | 215 | Location 215 | 1 | Region 1 | 3 | District 3
5 | 649 | Location 649 | 1 | Region 1 | 4 | District 4
5 | 674 | Location 674 | 1 | Region 1 | 1 | District 1
5 | 139 | Location 139 | 1 | Region 1 | 1 | District 1
5 | 539 | Location 539 | 1 | Region 1 | 5 | District 5
5 | 489 | Location 489 | 1 | Region 1 | 5 | District 5
5 | 139 | Location 139 | 1 | Region 1 | 1 | District 1
5 | 161 | Location 161 | 1 | Region 1 | 6 | District 6
5 | 543 | Location 543 | 1 | Region 1 | 4 | District 4
5 | 166 | Location 166 | 1 | Region 1 | 6 | District 6
5 | 71 | Location 71 | 1 | Region 1 | 5 | District 5
5 | 618 | Location 618 | 1 | Region 1 | 5 | District 5
I did it with index match but it is super slow. Here's a screenshot.
I tried it with the above and then again with the Table Name and Column Names.
=INDEX(LocTable[[#Headers],[Region]], MATCH(MetricsTable[[#Headers],[Division]]&MetricsTable[[#Headers],[Location]],LocTable[[#Headers],[Division]]&LocTable[[#Headers],[Location]],0))
However the above creates a table array "multi-cell array formulas are not allowed in tables". Is the only solution to revert back to nontables so I can run my formula and just deal with the super slowness or is there an option in VBA etc? Thanks in advance!

Insert Cell data in one row ( from A1 to An )

i have been try to solved this using tutorial from google ( everywhere ) but i'm not find any answer, i hope i can found the answer here.
here is my data
A B C
+-------+----+-----+
1 | 123 | 4 | 5 |
2 | 678 | 9 | 10 |
+-------+----+-----+
the result that i need :
A B C
+-------+----+-----+
1 | 123 | | |
2 | 678 | | |
3 | 4 | | |
4 | 5 | | |
5 | 9 | | |
6 | 10 | | |
+-------+----+-----+
or in other order like :
123
4
5
678
...
any one know how to solved this ?

excel I need formula in column name "FEBRUARY"

I have a set of data as below.
SHEET 1
+------+-------+
| JANUARY |
+------+-------+
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | ALFRED | 11 | 150 |
| 2 | ARIS | 22 | 120 |
| 3 | JOHN | 33 | 170 |
| 4 | CHRIS | 22 | 190 |
| 5 | JOE | 55 | 120 |
| 6 | ACE | 11 | 200 |
+----+----------+------+-------+
SHEET2
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | CHRIS | 13 | 123 |
| 2 | ACE | 26 | 165 |
| 3 | JOE | 39 | 178 |
| 4 | ALFRED | 21 | 198 |
| 5 | JOHN | 58 | 112 |
| 6 | ARIS | 11 | 200 |
+----+----------+------+-------+
The RESULT should look like this in sheet1 :
+------+-------++------+-------+
| JANUARY | FEBRUARY |
+------+-------++------+-------+
+----+----------+------+-------++-------+-------+
| ID | NAME |COUNT | PRICE || COUNT | PRICE |
+----+----------+------+-------++-------+-------+
| 1 | ALFRED | 11 | 150 || 21 | 198 |
| 2 | ARIS | 22 | 120 || 11 | 200 |
| 3 | JOHN | 33 | 170 || 58 | 112 |
| 4 | CHRIS | 22 | 190 || 13 | 123 |
| 5 | JOE | 55 | 120 || 39 | 178 |
| 6 | ACE | 11 | 200 || 26 | 165 |
+----+----------+------+-------++-------+-------+
I need formula in column name "FEBRUARY". this formula will find its match in sheet 2

Assuming the first Count value should go in cell E3 of Sheet1, the following formula would be the usual way of doing it:-
=INDEX(Sheet2!C:C,MATCH($B3,Sheet2!$B:$B,0))
Then the Price (in F3) would be given by
=INDEX(Sheet2!D:D,MATCH($B3,Sheet2!$B:$B,0))

I think this query will work fine for your requirement
SELECT `Sheet1$`.ID,`Sheet1$`.NAME, `Sheet1$`.COUNT AS 'Jan-COUNT',`Sheet1$`.PRICE AS 'Jan-PRICE', `Sheet2$`.COUNT AS 'Feb-COUNT',`Sheet2$`.PRICE AS 'Feb-PRICE'
FROM `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet1$` `Sheet1$`, `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet2$` `Sheet2$`
WHERE (`Sheet1$`.NAME=`Sheet2$`.NAME)
Provide Actual path insted of
C:\Users\Nagendra\Desktop\aaaaa.xlsx
First you need to know about how to make connection. So refer http://smallbusiness.chron.com/use-sql-statements-ms-excel-41193.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Calculate running count on grouped df by more than one column - python-3.x

Related

How to Add/Subtract columns based on a particular value in Pandas

Pandas finding intervals (of n-Days) and capturing start/end dates

Joining 2 Tables (without Power Query - Macbook, Index/Match too slow) - Potential VBA Option?

Insert Cell data in one row ( from A1 to An )

excel I need formula in column name "FEBRUARY"

Categories

Resources