Excel - Show summarised counts of average, min and max value from range of data - excel

I have 2 sets of data, set 1 is a list of accounts and a count of items for each account from set 2. Set 2 data is a breakdown of the values for each account.
The examples below are representative of the data I'm working with, and there are 1000's of records so I'm after a formulaic way of doing this.
Set 1
Account Count
123 3
128 2
135 4
157 5
...
Set 2
Account Value
123 11
123 31
123 98
128 77
128 99
...
What I'd like to do, is show a set of data containing the average of the values for each account and the min and max value.
For example:
Output
Account Count Average Min Max
123 3 47 11 98
128 2 88 77 99
...

Ignore the first set and merely use Set 2 as a basis for a pivot table.
Then drag the column Value four times into the Value box and summarize by Count, Average, Min, and Max.

Related

Initializing new columns in pd.pivot_table using existing columns within pd.pivot_table

I have table imported into python using pd.read_csv that looks as follows
I need to perform 3 activities on this table as follows
Calculate the number of free apps and paid apps in each genre using group by.
I wrote following code to get the desired output `
df.groupby(['prime_genre','Subscription'])[['id']].count()`
Output:
Convert the result from 1 into a dataframe having the columns as prime_genre, free, paid and the rows having the count of each
I wrote the following code to get the desired output
df1 = df.groupby(['prime_genre','Subscription'])['id'].count().reset_index()
df1.pivot_table(index='prime_genre', columns='Subscription', values='id', aggfunc='sum')
Output:
Now I need to initialize a column 'Total' that captures the sum of 'free app' and 'paid app' within the pivot table itself
I also I need to initialize two more columns perc_free and perc_paid that displays the percentage of free apps and paid apps within the pivot table itself
How do I go about 3 & 4?
Assuming the following pivot table named df2:
subscription free app paid app
prime_genre
book 66 46
business 20 37
catalogs 9 1
education 132 321
You can calculate the total using pandas.DataFrame.sum on the columns (axis=1). Then divide df2 with this total and multiply by 100 to get the percentage. You can add a suffix to the columns with pandas.DataFrame.add_suffix. Finally, combine everything with pandas.concat:
total = df2.sum(axis=1)
percent = df2.div(total, axis=0).mul(100).add_suffix(' percent')
df2['Total'] = total
pd.concat([df2, percent], axis=1)
output:
subscription free app paid app Total free app percent paid app percent
prime_genre
book 66 46 112 58.928571 41.071429
business 20 37 57 35.087719 64.912281
catalogs 9 1 10 90.000000 10.000000
education 132 321 453 29.139073 70.860927
Here is a variant to get the perc_free / perc_paid names:
total = df2.sum(axis=1)
percent = (df2.div(total, axis=0)
.mul(100)
.rename(columns=lambda x: re.sub('(.*)( app)', r'perc_\1',x))
)
df2['Total'] = total
pd.concat([df2, percent], axis=1)
subscription free app paid app Total perc_free perc_paid
prime_genre
book 66 46 112 58.928571 41.071429
business 20 37 57 35.087719 64.912281
catalogs 9 1 10 90.000000 10.000000
education 132 321 453 29.139073 70.860927

Change array of SUMIF in case criteria exists in two different columns

A B C D E F
1 Results List A List B
2 Campaign Sales Campaign Sales Campaign Sales
3 Campaign_A 1.510 Campaign_A 500 Campaign_B 50
4 Campaign_B 120 Campaign_A 450 Campaign_B 40
5 Campaign_C 90 Campaign_A 560 Campaign_B 30
6 Campaign_D 1.650 Campaign_B 700 Campaign_C 80
7 Campaign_E 100 Campaign_B 710 Campaign_C 10
8 Campaing_F 70 Campaign_C 200 Campaign_F 70
9 Campaing_D 850
10 Campaing_D 800
11 Campaing_E 100
12 Campaing_F 320
13 Campaing_F 360
14 Campaing_F 290
15
16
The Excel table above consists of:
List A = Column C:D
List B = Column E:F
In each list campaigns can appear mutliple times.
In Column A:B I want to sum up the sales per campaign from the two lists using the SUMIF formula:
=SUMIF(C:C,A3,D:D)
=SUMIF(E:E,A3,F:F)
However, the List B should be prioritized over List A which means in case a campaign exists in List B (Column E) the SUMIF function should be only applied to List B and List A should be totally ignored.
The formula might look something like htis:
IF campaign exists in Column E then SUMIF(E:E,A3,F:F) else SUMIF(C:C,A3,D:D)
How can I achieve the desired results in Column B?
Or,
=IF(COUNTIF(E:E,A3)>0,SUMIF(E:E,A3,F:F),SUMIF(C:C,A3,D:D))
I would try with the following:
if(sumIf(E:E,A3,F:F)>0;sumIf(E:E,A3,F:F);sumIf(C:C,A3,D:D))

How to generate random numbers from different intervals that add up to a fixed sum in excel?

I need to generate 13 numbers from 13 different intervals which will add up to 1360. In the chart below, "index" means the index of the 13 different numbers. Mean means the mean (average) of the intervals. The range will be plus or minus 15% of the mean as shown below. I will prefer to have the random numbers generated based on the normal distribution with N(mean, 7.5% of mean). I take it back. No normal distribution. Please use +- 15% as hard limits of the intervals.
It will be great if anyone could figure out how to do it in excel. Algorithms will be appreciated as well.
Index mean 15% low high
A 288 43 245 331
B 50 8 43 58
C 338 51 287 389
D 50 8 43 58
E 16 2 14 18
F 66 10 56 76
G 118 18 100 136
H 17 3 14 20
I 91 14 77 105
J 26 4 22 30
K 117 18 99 135
L 165 25 140 190
M 18 3 15 21
I would sort the table by increasing mean:
and use a column for a helper value (column H above).
The idea is to maintain -- while going to the next row -- the current deviation from a perfect aim for the final target. Perfect would mean that every random value coincides with the mean for that row. If a value is 2 less than the mean, then that 2 will appear in the H column for the next row. The random number generated for that next row will then not aim for the given mean, but for 2 less than the mean. The range for the random number will appropriately be reduced so that the low/high values will never be crossed.
By first sorting the rows, we can be sure that this corrected mean will always fall within the next row's low/high range, and so it will always be possible to generate an acceptable random number there.
The final value will be calculated differently: it will be the remainder that is needed to achieve the target sum. For the same reason as above, this value is guaranteed to be within the low/high range.
The formulas used are as follows:
| F | H
--+--------------------------------------------------+------------------------------
2 | =RANDBETWEEN(D2, E2) |
3 | =RANDBETWEEN(B3+H3-C3+ABS(H3), B3+H3+C3-ABS(H3)) | =SUM($B$2:$B2)-SUM($F$2:$F2)
4 | (copy above formula) | (copy above formula)
...| ... | ...
13 | (copy above formula) | (copy above formula)
14 | =SUM($B$2:$B14)-SUM($F$2:$F13) |
In theory the rows do not need to be sorted first, but then the formulas cannot be copied down like above, but must reference the correct rows. That would make it quite complicated.
If it is absolutely necessary that the rows are presented in order of the Index column (A, B, C...), then use another sheet to do the above. Then in the main sheet read the value into the F column with a VLOOKUP from the other sheet. So in F2 you would have:
=VLOOKUP(A2, OtherSheet!$A$2:$F$14, 6, 0)
Get the random number like this
num = Int ((300 - 200 + 1) * Rnd + 200) //between 200 and 300
Click here for more information
and the random number need to be the total sum minus the sum that you already got and the last one will be that left.
for example: (if we have 4 numbers sum up to 100)
A is a random number between 0 to 100 //lets say 42
then B is a random number between 0 to (100-42) => 0 to 78 //lets say 18
then C is a random number between 0 to (100-42-18) => 0 to 40 //lets say 25
then, in the end D is 100-42-18-25 => D is 15
*100-42-18-25 is the same as 100-Sum(A,B,C)
Here is my example generate random number based on low and high.
The formula in column F is just a RANDBETWEEN:
=RANDBETWEEN($D2,$E2)
Then you can get the result always equal to 1360 with the formula below for column G:
=F2/SUM($F$2:$F$14)*1360
So cell G15 will always be 1360 which is the sum of all those 13 intervals.

AGGREGAT with critiera and duplicates in array

I have the following Excel spreadsheet:
A B C D E
1 ProdID Price Unique ProdID 1. Biggest 2. Biggest
2 2606639 40 2606639 50 50
3 2606639 50 4633523 45 35
4 2606639 20 3911436 25 25
5 2606639 50
6 4633523 45
7 4633523 20
8 4633523 35
9 3911436 20
10 3911436 25
11 3911436 25
12 3911436 15
In Cells D2:E4 I want to show the 1. biggest and 2. biggest price of each ProdID in Column A. Therefore, I use the following formula:
D2 =AGGREGAT(14,6,$B$2:$B$12/($A$2:$A$12=$C2),1)
E2 =AGGREGAT(14,6,$B$2:$B$12/($A$2:$A$12=$C2),2)
This formula works as long as the prices are unique in Column B as you can see on the second ProdID (4633523).
However, once the price is not unique in Column B (for example 50 for ProdID 26026639 and 25 for ProdID 3911436) the functions in Cells D2:E4 does not show the right results.
Do you have an idea if you can solve this issue with the AGGREGAT-Formula and wihtout using an ARRAY-Formula?
you could check number of occurences of the first ProdID-price combinations and use that in the last argument of the AGGREGAT function. So instead of
=AGGREGAT(14,6,$B$2:$B$12/($A$2:$A$12=$C2),2)
you would have
=AGGREGAT(14,6,$B$2:$B$12/($A$2:$A$12=$C2),2+COUNTIFS(A:A,C2,B:B,D2)-1)
of course you can just put "1+COUNTIFS..." but I put it this way so it can be better understood that it uses position 2 + number of occurences of the combination of ProdID with biggest number after the first occurence.

Compare column A with column B and return minimum value in column A in Column C

Here is my table. I want to return the minimum value from column A in column C only if the values in Column B are equal.
A B C
1 Price Category Lowest Price Per Category
2 240 19
3 231 19 231
4 233 19
5 450 12
6 438 12
7 425 12 425
8 674 33
9 675 33
10 671 33 671
You could try the SUBTOTALfunction and use this formula in the column for lowest:
=IF(A2=SUBTOTAL(5;$A$2:$A$4);A2;"")
You would have to manually adjust the locked range ($A$2:$A$4) for every group though so that it matches the range for the group.
Or if your happy with getting the min value for each group on a separate row under every group you could just mark the two columns (including header row) and use the Subtotalbutton on the Data tab set like in this image:
Then the result would look something like this:
Price Category
240 19
231 19
233 19
231 19 Min
450 12
438 12
425 12
425 12 Min
674 33
675 33
671 33
671 33 Min
231 Grand Min
Try using this formula in C2 copied down
=IF(COUNTIFS(B:B,B2,A:A,"<"&A2),"",A2)
COUNTIFS here counts rows where the category matches and the price is lower than the current row. If there is no such row then the current row price must be lowest for that category and the price is returned.
If there are tied lowest prices within any category then they will all be shown

Resources