Replace accounting notation for negative number with minus value - python-3.x

I have a dataframe which contains negative numbers, with accountancy notation i.e.:
df.select('sales').distinct().show()
+------------+
| sales |
+------------+
| 18 |
| 3 |
| 10 |
| (5)|
| 4 |
| 40 |
| 0 |
| 8 |
| 16 |
| (2)|
| 2 |
| (1)|
| 14 |
| (3)|
| 9 |
| 19 |
| (6)|
| 1 |
| (9)|
| (4)|
+------------+
only showing top 20 rows
The numbers wrapped in () are negative. How can I replace them to have minus values instead i.e. (5) becomes -5 and so on.
Here is what I have tried:
sales = (
df
.select('sales')
.withColumn('sales_new',
sf.when(sf.col('sales').substr(1,1) == '(',
sf.concat(sf.lit('-'), sf.col('sales').substr(2,3)))
.otherwise(sf.col('sales')))
)
sales.show(20,False)
+---------+---------+
|salees |sales_new|
+---------+---------+
| 151 | 151 |
| 134 | 134 |
| 151 | 151 |
|(151) |-151 |
|(134) |-134 |
|(151) |-151 |
| 151 | 151 |
| 50 | 50 |
| 101 | 101 |
| 134 | 134 |
|(134) |-134 |
| 46 | 46 |
| 151 | 151 |
| 134 | 134 |
| 185 | 185 |
| 84 | 84 |
| 188 | 188 |
|(94) |-94) |
| 38 | 38 |
| 21 | 21 |
+---------+---------+
The issue is that the length of sales can vary so hardcoding a value into the substring() won't work in some cases.
I have tried using regexp_replace but get an error that:
PatternSyntaxException: Unclosed group near index 1
sales = (
df
.select('sales')
.withColumn('sales_new', regexp_replace(sf.col('sales'), '(', ''))
)

This can be solved with a case statement and regular expression together:
from pyspark.sql.functions import regexp_replace, col
sales = (
df
.select('sales')
.withColumn('sales_new', sf.when(sf.col('sales').substr(1,1) == '(',
sf.concat(sf.lit('-'), regexp_replace(sf.col('sales'), '\(|\)', '')))
.otherwise(sf.col('sales')))
)
sales.show(20,False)
+---------+---------+
|sales |sales_new|
+---------+---------+
|151 |151 |
|134 |134 |
|151 |151 |
|(151) |-151 |
|(134) |-134 |
|(151) |-151 |
|151 |151 |
|50 |50 |
|101 |101 |
|134 |134 |
|(134) |-134 |
|46 |46 |
|151 |151 |
|134 |134 |
|185 |185 |
|84 |84 |
|188 |188 |
|(94) |-94 |
|38 |38 |
|21 |21 |
+---------+---------+

You can slice the string from the second character to the second last character, and then convert it to float, for example:
def convert(number):
try:
number = float(number)
except:
number = number[1:-1]
number = float(number)
return number
You can iterate through all the elements and apply this function.

Related

How does efficient_apriori package code itemsets?

Doing association rules mining using efficient_apriori package in python.Found a very useful answer on how to convert the output to a dataframe
However, struggling with the Itemset output and hoping someone can help me parse that correctly. Guessing LHS is an index value, but struggling with the decimal values in RHS. Does anyone know how the encoding is done? I have tried the same with SKU descriptions, and get the same output.
Input dataframe looks like this:
| SKU | Count | Percent |
|----------------------------------------------------------------------|-------|-------------|
| "('000000009100000749',)" | 110 | 0.029633621 |
| "('000000009100000749', '000000009100000776')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000776', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002530')" | 1 | 0.000269397 |
Output looks like this:
| | lhs | rhs | count_full | count_lhs | count_rhs | num_transactions | confidence | support |
|---|-----------------------------|-----------------------------|------------|-----------|-----------|------------------|------------|-------------|
| 0 | "(1,)" | "(0.00026939655172413793,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 1 | "(0.00026939655172413793,)" | "(1,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 2 | "(2,)" | "(0.0005387931034482759,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 3 | "(0.0005387931034482759,)" | "(2,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 4 | "(3,)" | "(0.0008081896551724138,)" | 21 | 21 | 21 | 297 | 1 | 0.070707071 |
Could someone help me understand what's outputting in the LHS and RHS columns, and how to join that back to the 'SKU'? Ideally would like the output to have the 'SKU' instead of whatever is showing up.
I have looked at the documentation and it is quite sparse.

Unpivoting data in excel that contains multiple (15) categories in columns using vba

I have code that unpivots columns into rows. There are 19 categories of data, 15 of which have been unpivoted. However, my problem is that some of the tables that are unpivoted are not showing up on the new rows. I am asking for anyone's expertise to help as this will be helpful for me in future endeavors. I have created a table. Bear in mind this time is extremely wide/long and has 131 columns I believe and only 7 rows. Below is the table of the original data (it is make believe data of course but will be used for real data in the future). Also, the 2nd table is the how I want it to look. The 3rd table is how I how it actually looks. Under that is my code. I will glady upvote anyone who will help me. Thank you in advance.
Original data:
| usr | Company | Dept.# | Dept1 | Dept2 | Dept3 | Dept4 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr1 | Tr1 | F1 | A1 | HOH1 | M1 | R1 | SO1 | BIG1 | T1 | P1 | X1 | Y1 | Z1 | Tin1 | Hr2 | Tr2 | F2 | A2 | HOH2 | M2 | R2 | SO2 | BIG2 | T2 | P2 | X2 | Y2 | Z2 | Tin2 | Hr2 | Tr2 | F2 | A2 | HOH2 | M2 | R2 | SO2 | BIG2 | T2 | P2 | X2 | Y2 | Z2 | Tin2 | Hr2 | Tr2 | F2 | A2 | HOH2 | M2 | R2 | SO2 | BIG2 | T2 | P2 | X2 | Y2 | Z2 | Tin2 | Hr3 | Tr3 | F3 | A3 | HOH3 | M3 | R3 | SO3 | BIG3 | T3 | P3 | X2 | Y2 | Z2 | Tin2 | Hr3 | Tr3 | F3 | A3 | HOH3 | M3 | R3 | SO3 | BIG3 | T3 | P3 | X3 | Y3 | Z3 | Tin3 | Hr4 | Tr4 | F4 | A4 | HOH4 | M4 | R4 | SO4 | BIG4 | T4 | P4 | X4 | Y4 | Z4 | Tin4 |
|------|---------|--------|-------|-------|-------|-------|-----|-----|-----|-----|------|----|----|-----|------|----|-----|-----|-----|----|------|-----|-----|----|-----|------|----|-----|-----|------|----|-----|-----|-----|----|------|-----|-----|----|-----|------|----|----|-----|------|-----|----|----|----|-----|------|-----|-----|----|-----|------|----|----|-----|------|----|----|-----|-----|----|------|-----|-----|-----|-----|------|----|----|-----|------|----|----|-----|----|-----|------|-----|-----|----|-----|------|----|----|-----|------|----|----|----|----|-----|------|-----|-----|-----|-----|------|----|----|-----|------|----|----|----|-----|----|------|-----|-----|-----|-----|------|----|-----|-----|------|----|----|-----|-----|----|------|-----|-----|-----|-----|------|----|----|-----|------|----|----|-----|-----|-----|------|-----|-----|-----|-----|------|----|----|-----|------|----|-----|-----|-----|-----|------|
| xxxx | OS | 1 | Train | | | | 20 | 89 | 355 | 123 | 435 | 90 | 5 | 55 | 676 | 34 | 43 | 984 | 345 | 74 | 846 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | OPC | 2 | Poxy1 | Poxy2 | | | | | | | | | | | | | | | | | | 45 | 546 | 68 | 345 | 903 | 70 | 345 | 23 | 54 | 32 | 234 | 23 | 567 | 69 | 64 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 38 | 67 | 235 | 789 | 7 | 40 | 99 | 98 | 87 | 89 | 34 | 312 | 42 | 756 | 23 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | Oxy R | 4 | H1 | H2 | H3 | H4 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 22 | 36 | 13 | 678 | 64 | 40 | 34 | 239 | 76 | 87 | 34 | 999 | 965 | 34 | 93 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 89 | 54 | 761 | 765 | 9 | 20 | 22 | 65 | 78 | 98 | 78 | 75 | 354 | 23 | 23 | | | | | | | | | | | | | | | | 36 | 80 | 123 | 543 | 17 | 20 | 11 | 908 | 988 | 7 | 86 | 245 | 546 | 763 | 324 | 25 | 90 | 111 | 432 | 84 | 25 | 63 | 784 | 98 | 78 | 854 | 754 | 234 | 865 | 43 |
| xxxx | HPK | 3 | Test1 | Test2 | Test3 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 99 | 456 | 39 | 567 | 223 | 50 | 5 | 32 | 549 | 435 | 34 | 87 | 64 | 348 | 942 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 52 | 21 | 47 | 876 | 1 | 30 | 46 | 92 | 78 | 12 | 34 | 12 | 12 | 421 | 23 | | | | | | | | | | | | | | | | 90 | 76 | 773 | 654 | 49 | 10 | 223 | 982 | 566 | 23 | 54 | 786 | 356 | 73 | 654 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | Mano | 1 | Porp | | | | 42 | 657 | 645 | 234 | 344 | 80 | 45 | 364 | 97 | 23 | 634 | 34 | 23 | 87 | 84 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| xxxx | Macro | 2 | Otto1 | Otto2 | | | | | | | | | | | | | | | | | | 75 | 574 | 46 | 456 | 453 | 60 | 44 | 235 | 867 | 5 | 433 | 234 | 346 | 46 | 35 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 23 | 433 | 186 | 987 | 2 | 30 | 34 | 58 | 87 | 43 | 34 | 23 | 62 | 73 | 32 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
How I want it to look:
| usr | Company | Dept# | Dept | Hrs | Tr | F | A | HOH | M | R | SO | BIG | T | P | X | Y | Z | Tin |
|------|---------|-------|-------|-----|-----|-----|-----|-----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| xxxx | OS | 1 | Train | 20 | 89 | 355 | 123 | 435 | 90 | 5 | 55 | 676 | 34 | 43 | 984 | 345 | 74 | 846 |
| xxxx | OPC | 2 | Poxy1 | 45 | 546 | 68 | 345 | 903 | 70 | 345 | 23 | 54 | 32 | 234 | 23 | 567 | 69 | 64 |
| xxxx | OPC | 2 | Poxy2 | 38 | 67 | 235 | 789 | 7 | 40 | 99 | 98 | 87 | 89 | 34 | 312 | 42 | 756 | 23 |
| xxxx | Oxy R | 4 | H1 | 22 | 36 | 13 | 678 | 64 | 40 | 34 | 239 | 76 | 87 | 34 | 999 | 965 | 34 | 93 |
| xxxx | Oxy R | 4 | H2 | 89 | 54 | 761 | 765 | 9 | 20 | 22 | 65 | 78 | 98 | 78 | 75 | 354 | 23 | 23 |
| xxxx | Oxy R | 4 | H3 | 36 | 80 | 123 | 543 | 17 | 20 | 11 | 908 | 988 | 7 | 86 | 245 | 546 | 763 | 324 |
| xxxx | Oxy R | 4 | H4 | 25 | 90 | 111 | 432 | 84 | 25 | 63 | 784 | 98 | 78 | 854 | 754 | 234 | 865 | 43 |
| xxxx | HPK | 3 | Test1 | 99 | 456 | 39 | 567 | 223 | 50 | 5 | 32 | 549 | 435 | 34 | 87 | 64 | 348 | 942 |
| xxxx | HPK | 3 | Test2 | 52 | 21 | 47 | 876 | 1 | 30 | 46 | 92 | 78 | 12 | 34 | 12 | 12 | 421 | 23 |
| xxxx | HPK | 3 | Test3 | 90 | 76 | 773 | 654 | 49 | 10 | 223 | 982 | 566 | 23 | 54 | 786 | 356 | 73 | 654 |
| xxxx | Mano | 1 | Porp | 42 | 657 | 645 | 234 | 344 | 80 | 45 | 364 | 97 | 23 | 634 | 34 | 23 | 87 | 84 |
| xxxx | Macro | 2 | Otto1 | 73 | 574 | 46 | 456 | 453 | 60 | 44 | 235 | 867 | 5 | 433 | 234 | 346 | 46 | 35 |
| xxxx | Macro | 2 | Otto2 | 23 | 433 | 186 | 987 | 2 | 30 | 34 | 58 | 87 | 43 | 34 | 23 | 62 | 73 | 32 |
This is how it actually looks which is wrong. As you can see data is missing for some reason:
| usr | Company | Dept# | Dept | Hrs | Tr | F | A | HOH | M | R | SO | BIG | T | P | X | Y | Z | Tin |
|------|---------|-------|-------|-----|-----|-----|-----|-----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| xxxx | OS | 1 | Train | 20 | 89 | 355 | 123 | 435 | 90 | 5 | 55 | 676 | 34 | 43 | 984 | 345 | 74 | 846 |
| xxxx | OPC | 2 | Poxy1 | 45 | 546 | 68 | 345 | 903 | 70 | 345 | 23 | 54 | 32 | 234 | 23 | 567 | 69 | 64 |
| xxxx | OPC | 2 | Poxy2 | 38 | 67 | 235 | 789 | 7 | 40 | 99 | 98 | 87 | 89 | 34 | 312 | 42 | 756 | 23 |
| xxxx | Oxy R | 4 | H1 | 22 | 36 | 13 | 678 | 64 | 40 | 34 | 239 | 76 | 87 | 34 | 999 | 965 | 34 | 93 |
| xxxx | Oxy R | 4 | H2 | 89 | 54 | 761 | 765 | 9 | 20 | 22 | 65 | 78 | 98 | 78 | 75 | 354 | 23 | 23 |
| xxxx | Oxy R | 4 | H3 | | | | | | | | | | | | | | | |
| xxxx | Oxy R | 4 | H4 | | | | | | | | | | | | | | | |
| xxxx | HPK | 3 | Test1 | 99 | 456 | 39 | 567 | 223 | 50 | 5 | 32 | 549 | 435 | 34 | 87 | 64 | 348 | 942 |
| xxxx | HPK | 3 | Test2 | 52 | 21 | 47 | 876 | 1 | 30 | 46 | 92 | 78 | 12 | 34 | 12 | 12 | 421 | 23 |
| xxxx | HPK | 3 | Test3 | | | | | | | | | | | | | | | |
| xxxx | Mano | 1 | Porp | 42 | 657 | 645 | 234 | 344 | 80 | 45 | 364 | 97 | 23 | 634 | 34 | 23 | 87 | 84 |
| xxxx | Macro | 2 | Otto1 | 73 | 574 | 46 | 456 | 453 | 60 | 44 | 235 | 867 | 5 | 433 | 234 | 346 | 46 | 35 |
| xxxx | Macro | 2 | Otto2 | 23 | 433 | 186 | 987 | 2 | 30 | 34 | 58 | 87 | 43 | 34 | 23 | 62 | 73 | 32 |
Here is my code:
Sub buttonclick()
Dim Ary As Variant, Nary As Variant, Cary As Variant
Dim r As Long, c As Long, nr As Long, cc As Long
Cary = Array("0853", 6898, 113128, 143143)
With Sheets("Sheet1")
Ary = .Range("A2:DM" & .Range("A" & Rows.Count).End(xlUp).Row).Value2
End With
ReDim Nary(1 To UBound(Ary) * 4, 1 To 19)
For r = 1 To UBound(Ary)
For c = 4 To 7
If Ary(r, c) = "" Then Exit For
nr = nr + 1
Nary(nr, 1) = Ary(r, 1): Nary(nr, 2) = Ary(r, 2): Nary(nr, 3) = Ary(r, 3)
Nary(nr, 4) = Ary(r, c)
For cc = Left(Cary(c - 4), 2) To Right(Cary(c - 4), 2) Step 15
Nary(nr, 5) = Nary(nr, 5) & Ary(r, cc)
Nary(nr, 6) = Nary(nr, 6) & Ary(r, cc + 1)
Nary(nr, 7) = Nary(nr, 7) & Ary(r, cc + 2)
Nary(nr, 8) = Nary(nr, 8) & Ary(r, cc + 3)
Nary(nr, 9) = Nary(nr, 9) & Ary(r, cc + 4)
Nary(nr, 10) = Nary(nr, 10) & Ary(r, cc + 5)
Nary(nr, 11) = Nary(nr, 11) & Ary(r, cc + 6)
Nary(nr, 12) = Nary(nr, 12) & Ary(r, cc + 7)
Nary(nr, 13) = Nary(nr, 13) & Ary(r, cc + 8)
Nary(nr, 14) = Nary(nr, 14) & Ary(r, cc + 9)
Nary(nr, 15) = Nary(nr, 15) & Ary(r, cc + 10)
Nary(nr, 16) = Nary(nr, 16) & Ary(r, cc + 11)
Nary(nr, 17) = Nary(nr, 17) & Ary(r, cc + 12)
Nary(nr, 18) = Nary(nr, 18) & Ary(r, cc + 13)
Nary(nr, 19) = Nary(nr, 19) & Ary(r, cc + 14)
Next cc
Next c
Next r
With Sheets("Sheet2")
.UsedRange.ClearContents
.Range("A1").Resize(, 19).Value = Array("usr", "Company", "Dept.#", "Dept", "Hrs",
"Tr", "F", "A", "HOH", "M", "R", "SO", "BIG", _
"T", "P", "X", "Y", "Z", "Tin")
.Range("A2").Resize(nr, 19).Value = Nary
End With
End Sub

Perform shift and other operations on a pandas dataframe

Here is a sample of my data
import pandas as pd
dic = {'Drug': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
'Date': ['01-01-20', '01-02-20', '01-03-20', '01-04-20', '01-05-20', '01-10-20', '01-15-20', '01-20-20', '01-21-20', '01-01-20', '01-02-20', '01-03-20', '01-04-20', '01-05-20'],
'Amount': [10, 20, 30, 40, 50,60, 70, 80, 90, 10, 20, 30, 40, 50]}
df = pd.DataFrame(dic)
| Drug | Date | Amount |
| ---- | -------- | ------ |
| A | 01-01-20 | 10 |
| | 01-02-20 | 20 |
| | 01-03-20 | 30 |
| | 01-04-20 | 40 |
| | 01-05-20 | 50 |
| | 01-10-20 | 60 |
| | 01-15-20 | 70 |
| | 01-20-20 | 80 |
| | 01-21-20 | 90 |
| B | 01-01-20 | 10 |
| | 01-02-20 | 20 |
| | 01-03-20 | 30 |
| | 01-04-20 | 40 |
| | 01-05-20 | 50 |
I have performed a groupby on Drug and want to apply a lambda function that calculates 3 metrics -
Lag -> The amount for a drug x days ago
Trend -> The difference between the amount for a drug today and the amount x days ago
Window -> Mean of the amounts for a drug between today and x days ago (Days not seen in the dataframe are assumed to have the same value as the day that comes before them in the data ie, Jan 6th 2020 has the same value as Jan 5th 2020. Days in 2019 are considered to have the same value as on Jan 1st 2020)
Here is my desired output for the case where x=7 -
| Drug | Date | Amount | Date 7 Days Ago | Lag | Trend | Window |
| ---- | -------- | ------ | --------------- | --- | ----- | ------ |
| A | 01-01-20 | 10 | 12-26-19 | 10 | 0 | 10.00 |
| | 01-02-20 | 20 | 12-27-19 | 10 | 10 | 11.43 |
| | 01-03-20 | 30 | 12-28-19 | 10 | 20 | 14.29 |
| | 01-04-20 | 40 | 12-29-19 | 10 | 30 | 18.57 |
| | 01-05-20 | 50 | 12-30-19 | 10 | 40 | 24.29 |
| | 01-10-20 | 60 | 01-04-20 | 40 | 20 | 50.00 |
| | 01-15-20 | 70 | 01-09-20 | 50 | 20 | 60.00 |
| | 01-20-20 | 80 | 01-14-20 | 60 | 20 | 70.00 |
| | 01-21-20 | 90 | 01-15-20 | 70 | 20 | 74.29 |
| B | 01-01-20 | 10 | 12-26-19 | 10 | 0 | 10.00 |
| | 01-02-20 | 20 | 12-27-19 | 10 | 10 | 11.43 |
| | 01-03-20 | 30 | 12-28-19 | 10 | 20 | 24.29 |
| | 01-04-20 | 40 | 12-29-19 | 10 | 30 | 18.57 |
| | 01-05-20 | 50 | 12-30-19 | 10 | 40 | 24.29 |
I have performed the above using for loops but I want to use a more Pandas way of doing this which I am unable to figure out.
First get your df in order, then try df.shift(...).

Count specific value for IDs in two dataframes

I have two dataframes
df1
+----+-------+
| | Key |
|----+-------|
| 0 | 30 |
| 1 | 31 |
| 2 | 32 |
| 3 | 33 |
| 4 | 34 |
| 5 | 35 |
+----+-------+
df2
+----+-------+--------+
| | Key | Test |
|----+-------+--------|
| 0 | 30 | Test4 |
| 1 | 30 | Test5 |
| 2 | 30 | Test6 |
| 3 | 31 | Test4 |
| 4 | 31 | Test5 |
| 5 | 31 | Test6 |
| 6 | 32 | Test3 |
| 7 | 33 | Test3 |
| 8 | 33 | Test3 |
| 9 | 34 | Test1 |
| 10 | 34 | Test1 |
| 11 | 34 | Test2 |
| 12 | 34 | Test3 |
| 13 | 34 | Test3 |
| 14 | 34 | Test3 |
| 15 | 35 | Test3 |
| 16 | 35 | Test3 |
| 17 | 35 | Test3 |
| 18 | 35 | Test3 |
| 19 | 35 | Test3 |
+----+-------+--------+
I want to count how many times each Test is listed for each Key.
+----+-------+-------+-------+-------+-------+-------+-------+
| | Key | Test1 | Test2 | Test3 | Test4 | Test5 | Test6 |
|----+-------|-------|-------|-------|-------|-------|-------|
| 0 | 30 | | | | 1 | 1 | 1 |
| 1 | 31 | | | | 1 | 1 | 1 |
| 2 | 32 | | | 1 | | | |
| 3 | 33 | | | 2 | | | |
| 4 | 34 | 2 | 1 | 3 | | | |
| 5 | 35 | | | 5 | | | |
+----+-------+-------+-------+-------+-------+-------+-------+
What I've tried
Using join and groupby, I first got the count for each Key, regardless of Test.
result_df = df1.join(df2.groupby('Key').size().rename('Count'), on='Key')
+----+-------+---------+
| | Key | Count |
|----+-------+---------|
| 0 | 30 | 3 |
| 1 | 31 | 3 |
| 2 | 32 | 1 |
| 3 | 33 | 2 |
| 4 | 34 | 6 |
| 5 | 35 | 5 |
+----+-------+---------+
I tried to group the Key with the Test
result_df = df1.join(df2.groupby(['Key', 'Test']).size().rename('Count'), on='Key')
but this returns an error
ValueError: len(left_on) must equal the number of levels in the index of "right"
Check with crosstab
pd.crosstab(df2.Key,df2.Test).reindex(df1.Key).replace({0:''})
Here another solution with groupby & pivot. Using this solution you don't need df1 at all.
# | create some dummy data
tests = ['Test' + str(i) for i in range(1,7)]
df = pd.DataFrame({'Test': np.random.choice(tests, size=100), 'Key': np.random.randint(30, 35, size=100)})
df['Count Variable'] = 1
# | group & count aggregation
df = df.groupby(['Key', 'Test']).count()
df = df.pivot(index="Key", columns="Test", values="Count Variable").reset_index()

How to add space between rows and sum up automatically in Excel

let's say that I have a table like the below:
| | Value 1 | Value 2 | Value 3 | |
|---|---------|---------|---------|---|
| A | 22 | 12 | 3 | |
| A | 5 | 6 | 12 | |
| A | 19 | 9 | 13 | |
| A | 22 | 43 | 31 | |
| B | 7 | 12 | 23 | |
| B | 5 | 5 | 8 | |
| B | 35 | 78 | 9 | |
| B | 45 | 1 | 8 | |
| C | 34 | 56 | 0 | |
| C | 22 | 1 | 14 | |
| C | 13 | 46 | 45 | |
and that I'd need to transform it into the below:
| | Value 1 | Value 2 | Value 3 | |
|---|---------|---------|---------|---|
| A | 22 | 12 | 3 | |
| A | 5 | 6 | 12 | |
| A | 19 | 9 | 13 | |
| A | 22 | 43 | 31 | |
| | 68 | 70 | 59 | |
| | | | | |
| B | 7 | 12 | 23 | |
| B | 5 | 5 | 8 | |
| B | 35 | 78 | 9 | |
| B | 45 | 1 | 8 | |
| | 92 | 96 | 48 | |
| | | | | |
| C | 34 | 56 | 0 | |
| C | 22 | 1 | 14 | |
| C | 13 | 46 | 45 | |
| | 69 | 103 | 59 | |
How could I obtain the desired effect automatically?
There would be n empty rows after each group and the sums of each column within the group.
You can use the Subtotal feature of Excel. Subtotal is in the "Data" tab of the ribbon. To automatically add the totals between groupings. I don't think it adds the blank row. If you absolutely need the blank row, then I can generate some VBA that will work.

Resources