MS Excel: how can I make Max() more efficient? - excel

I have a set of data that looks like this:
ID Value MaxByID
0 32 80
0 80 80
0 4 80
0 68 80
0 6 80
1 32 68
1 54 68
1 56 68
1 68 68
1 44 68
2 54 92
2 52 92
2 92 92
4 68 68
4 52 68
5 74 74
5 22 74
6 52 94
6 52 94
6 46 94
6 94 94
6 56 94
6 14 94
I am using {=MAX(IF(A$2:A$100=A2,B$2:B$100))} to calculate the MaxByID column. However, the dataset has >100k rows, with mostly unique IDs: this seems to be a really inefficient way to do this, as each cell in C:C has to iterate through every cell in A:A.
The ID field is numeric and can be sorted- is there a way of more intelligently finding the MaxByID?

You may be able to use a pivot table to find the maximum for each unique ID: see this link for an example.
Once you have that table, VLOOKUP should enable you to quickly find MaxByID for each ID.

Once you have sorted by ID you could add columns to get the start row number and count for each unique. These 2 numbers allow you to calculate the size and position of the range of Unique values. So then you can use MAX(OFFSET(StartValueCell,StartThisUnique-1,0,CountThisUnique,1)) to get the max

This might be faster
{=IF(A2=A1,C1,MAX(($A$2:$A$24=A2)*($B$2:$B$24)))}
Since your data appears to be sorted, you could see if the ID matches the row above and simply copy the max down.

Related

When using min() - ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() [duplicate]

How can I reference the minimum value of two dataframes as part of a pandas dataframe equation? I tried using the python min() function which did not work. I'm sorry if this is well-documented somewhere but I have not been able to find a working solution for this problem. I am looking for something along the lines of this:
data['eff'] = pd.DataFrame([data['flow_h'], data['flow_c']]).min() *Cp* (data[' Thi'] - data[' Tci'])
I also tried to use pandas min() function, which is also not working.
min_flow = pd.DataFrame([data['flow_h'], data['flow_c']]).min()
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
I was confused by this error. The data columns are just numbers and a name, I wasn't sure where the index comes into play.
import pandas as pd
import numpy as np
np.random.seed(365)
rows = 10
flow = {'flow_c': [np.random.randint(100) for _ in range(rows)],
'flow_d': [np.random.randint(100) for _ in range(rows)],
'flow_h': [np.random.randint(100) for _ in range(rows)]}
data = pd.DataFrame(flow)
# display(data)
flow_c flow_d flow_h
0 82 36 43
1 52 48 12
2 33 28 77
3 91 99 11
4 44 95 27
5 5 94 64
6 98 3 88
7 73 39 92
8 26 39 62
9 56 74 50
If you are trying to get the row-wise mininum of two or more columns, use pandas.DataFrame.min. Note that by default axis=0; specifying axis=1 is necessary.
data['min_c_h'] = data[['flow_h','flow_c']].min(axis=1)
# display(data)
flow_c flow_d flow_h min_c_h
0 82 36 43 43
1 52 48 12 12
2 33 28 77 33
3 91 99 11 11
4 44 95 27 27
5 5 94 64 5
6 98 3 88 88
7 73 39 92 73
8 26 39 62 26
9 56 74 50 50
If you like to get a single minimum value of multiple columns:
data[['flow_h','flow_c']].min().min()
the first "min()" calculates the minimum per column and returns a pandas series. The second "min" returns the minimum of the minimums per column.

How to quintile by date group using Percentile in excel

Im just wondering if its possible to quintile my data by group in Excel, using the percentile function.
I can quintile my entire data by doing =MATCH(C2|PERCENTILE(C$2:C$20|{5,4,3,2,1}/5)|-1) but I want to group it up by date.
e.g of data
Date Team_Id Score
04/02/2019 1 50
04/02/2019 2 58
04/02/2019 3 75
04/02/2019 4 34
04/02/2019 5 52
04/02/2019 6 81
05/02/2019 1 87
05/02/2019 2 75
05/02/2019 3 24
05/02/2019 4 75
05/02/2019 5 11
05/02/2019 6 84
06/02/2019 1 45
06/02/2019 2 67
06/02/2019 3 56
06/02/2019 4 55
06/02/2019 5 61
06/02/2019 6 15
06/02/2019 7 88
So basically I want it to be quintiled by Score for each date group, resulting value for each row in Excel should be 1, 2, 3, 4, or 5. Ive been messing around with IF but just dont know where to place it.
If you can tolerate typing CTL SHFT ENTER (or at least wait until Microsoft comes out with their big release) I think this will work
=MATCH(C4,PERCENTILE(IF($A$4:$A$22=A4,$C$4:$C$22,""),{5,4,3,2,1}/5),-1)
This is essentially building a conditional array on each row based on the date
Again when entering the formula you have to type ctl SHIFT enter or it will work.
I'm not exactly sure what we're doing here so if this wrong, sorry.
Will this work?
=MATCH(C2,PERCENTILE(INDIRECT(ADDRESS(1+MATCH($A2,$A$2:$A$20,0),3)&":"&ADDRESS(ROW()+COUNTIF(A3:$A$20,$A2),3)),{5,4,3,2,1}/5),-1)
I've defined the range for the percentile calculation using an Indirect function where the start and end of the range are found with Match and Countif, respectively.

Find out minimum value of specific columns in a row in MS Excel

My table in Excel looks something like this:
abcd 67 94 52 89 24
efgh 23 45 93 54 34
ijkl 64 83 23 45 92
mnop 34 45 10 66 53
This is a student database containing marks obtained in various subjects. I need to calculate the percentage in each row such that out of 5 subjects, the first subject is always included with other 3 subject with maximum marks.
Example: abcd 67 94 52 89 24 75.5%
Here 75.5%=(67+94+52+89)/4=302/4=75.5 where 24 being the lowest has been excluded and 67 has to be taken even if it were the least.
What I require is the least(excluding the first column, of course) of all the columns in that particular row, so that I can sum all the marks and subtract this least marks and finally use it to calculate the percentage.
Any help/suggestion would be appreciated. Thank You.
You'll need to adjust this for your columns, but if you sum the entire range, then subtract the min value after, do a count of the range then subtract one from that, you will be able to get the average.
This code is using the 4 values from column B through F and the 4 values are: 67 94 52 89 24... which results in 75.5
=(SUM(B3:F3)-MIN(C3:F3))/(COUNT(B3:F3)-1)

How to format a number to appear as percentage in Excel

So lets say I have a few numbers in a sheet
a b c d
1 33 53 23 11
2 42 4 83 64
3 75 3 48 38
4 44 0 22 45
5 2 34 76 6
6
7 Total 85
I would like to display those numbers so that the cell value still holds the original figure (A1 = 33)
but the cell displays both the number and a percentage from the total (B7) eg
a b c d
1 33 (39%) 53 (62%) 23 (27%) 11 (13%)
2 42 (49%) 4 (5%) 83 (98%) 64 (75%)
3 75 (88%) 3 (4%) 48 (56%) 38 (45%)
4 44 (52%) 0 (0%) 22 (26%) 45 (53%)
5 2 (2%) 34 (40%) 76 (89%) 6 (7%)
6
7 Total 85
I know how to format a cell as a percentage, but I can't figure out how to display both original values, the calculated percentage value (value/total*100), but not change the cell value so I could still sum the cells in the end (eg. A6 =SUM(A1:A5) = 196)
Does anyone have an idea? I was hoping there could be a way to duplicate and calculate the figure using text formatting, but I can't get anything to work.
I'm guessing this is a trivial answer and maybe not what you're looking for, but why not just add a column for each of the columns you have now?
a a' b b' c c' d d'
1 33 (39%) 53 (62%) 23 (27%) 11 (13%)
2 42 (49%) 4 (5%) 83 (98%) 64 (75%)
3 75 (88%) 3 (4%) 48 (56%) 38 (45%)
4 44 (52%) 0 (0%) 22 (26%) 45 (53%)
5 2 (2%) 34 (40%) 76 (89%) 6 (7%)
6
7 Total 85
#Ari’s answer seems to meet to meet the requirements in your question, not repeat information more than the example you gave for output requirement and be viable for up to around 8000 or so columns to start with (unless a very old version of Excel) and Jerry’s comment is also correct that what you want to achieve the way you want to achieve it is not possible.
However there are other approaches that might be acceptable substitutes. One is to copy your data and Paste Special with Operation Divide, either elsewhere or over the top of your data. If over the top this either shows the values or the percentages otherwise duplicates your data. Over the top would also require something like Operation Multiply to revert back to values, and reformatting each time if to appear as in your example.
Another is to use a PivotTable with some calculated fields and both are shown below:
I appreciate neither is exactly what you are asking for.

Excel formula to auto-increment after X amount of rows

I imported a few thousand rows of data into Excel and whereas one item represented one row, I've had to modify each item so that 11 rows represent the same item id.
For example:-
Original
63 --->data
64 --->data
65 --->data
Current
63 --->data
63 --->data
63 --->data
63 --->data
63 --->data
63 --->data
63 --->data
63 --->data
63 --->data
63 --->data
63 --->data
64 --->data
64 --->data
64 --->data
64 --->data
64 --->data
64 --->data
64 --->data
64 --->data
64 --->data
64 --->data
64 --->data
(you get the idea)...
However, due to the formula I have used to populate the additional 10 rows per item, I am left with the same ID in Column A as all the rows the formula was based on.
I'm looking for a formula that will auto-increment the cell values based but only every 11 rows, so that I can click and drag down column A and it will fill the same id for 11 rows and then auto-increment (+1) and fill the next 11 rows like this.
I've tried a number of variants all to no avail. Thanks.
EDIT
Here is an example of what I currently have and wish to simplify:-
A B C D E F
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
79 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
80 <--already correct id
58 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
57 <-- needs to be changes to 81
58 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
57 <-- needs to be changes to 82
There are thousands of rows like this...
Here's another approach if you're interested:
Enter 1 into A1
Then enter this formula into A2:
=IF(MOD(ROWS($A$1:A1),11)=0,A1+1,A1)
Then just drag the formula from A2 down
You can also use this formula, it will also usefull for even and odd numbering
=INT(((ROW(a1)-1)/11))*1+1
use *1 for 1 increment, *2 for 2 increment,
+1 is starting number, if you want to start from 79 use +79 at the end
If you put one column containing a straight sequence from 1 to the number of lines you've got. (1, 2, 3, 4, 5, ...)
You can use that column to make a division by 11, taking only the integer part of the result.
Supposing the column with straight sequence is A:
= int(A1/11)
= int(A2/11)
See:
A B Result
0 =int(A1/11) 0
1 =int(A2/11) 0
2 =int(A3/11) 0
3 =int(A4/11) 0
4 =int(A5/11) 0
5 =int(A6/11) 0
6 =int(A7/11) 0
7 =int(A8/11) 0
8 =int(A9/11) 0
9 =int(A10/11) 0
10 =int(A11/11) 0
11 =int(A12/11) 1
12 =int(A13/11) 1
13 =int(A14/11) 1
14 =int(A15/11) 1
15 =int(A16/11) 1
16 =int(A17/11) 1
17 =int(A18/11) 1
18 =int(A19/11) 1
19 =int(A20/11) 1
20 =int(A21/11) 1
21 =int(A22/11) 1
22 =int(A23/11) 2
23 =int(A24/11) 2
.......keep on until the last line
If Im understanding the issue correctly there is no need for a complex formula.
try this in a column to test for your self to see if this is what you need.
Start in A1 and put the num 1 in each of 3 cells (a1,a2,a3)
in A4 put A4 = A1+1
then drag down. YOu will see the sequence you need...
1
1
1
2
2
2
3
3
3
if the sequence you need is indeed sequential then you can apply this as needed.

Resources