Find minimum value from four columns and compare it with range in different column? - excel

I have an excel sheet which has below data:
col1 col2 col3 col4 col5 output range
-----------------------------------------------------------------------------
-1 -1 -1 -1 -1 99.9% - 100%
-1 -1 -1 -1 -1 98% - 99.8%
87.8 78.6 95.2 98.2 94.7 95% - 98.9%
100 100 100 100 100 90% - 94.9%
90.4 86 96.6 73.2 95.5 80% - 89.9%
92.9 88.9 93.1 100 100 0% - 79.9%
85.7 80 82.2 100 100
85.7 80 82.2 100 100
98.3 100 97.9 100 94.4
Now I need to come up with a formula which can do below things:
I need to figure out out minimum of col2, col3, col4, col5 and if that minimun is falling under any of those range mentioned in Range column, then I need to print that range in output column.
But if col1 has value -1 then in output column I want to write "Fail". We will ignore point 1 above if col1 has value -1.
So for example output will be:
col1 col2 col3 col4 col5 output range
-----------------------------------------------------------------------------
-1 -1 -1 -1 -1 Fail 99.9% - 100%
-1 -1 -1 -1 -1 Fail 98% - 99.8%
87.8 78.6 95.2 98.2 94.7 0% - 79.9% 95% - 98.9%
100 100 100 100 100 99.9% - 100% 90% - 94.9%
90.4 86 96.6 73.2 95.5 0% - 79.9% 80% - 89.9%
92.9 88.9 93.1 100 100 80% - 89.9% 0% - 79.9%
85.7 80 82.2 100 100 80% - 89.9%
85.7 80 82.2 100 100 80% - 89.9%
98.3 100 97.9 100 94.4 90% - 94.9%
Is this possible to do by any chance in excel? It looks like pretty complex so I am kinda confuse on how to do this in excel automatically using some formula?

Here's one way. Columns K through N are a reference.
Formula for H2:
=MIN(B2:E2)
Formula for I2:
=IF(A2=-1,"z",IF(H2>$L$2,"a",IF(H2>$L$3,"b",IF(H2>$L$4,"c",IF(H2>$L$5,"d",IF(H2>$L$6,"e","f"))))))
Formula for F2:
=VLOOKUP(I2,K:N,4,FALSE)
Drag 'em down and you're done.
Granted, you could accomplish this with fewer columns, but I've laid it out this way for illustration.

Order the ranges in ascending order, create a column of the lower bounds of the ranges and use LOOKUP function to find the appropriate range.

Related

How to multiply B1 with C1, B2 with C2, Bn with Cn and Sum them?

This is my sheet:
Goods Type Quantity / Geram Gold Price 750K Price of Exchange
Gold 100 1,554.00 0
Silver 500 235.00 0
Euro 200 0.00 1.01
Pond 50 0.00 0.97
Gold 100 1,554.00 0
Silver 500 235.00 0
Euro 200 0.00 1.00
Pond 50 0.00 0.99
I want to do to this:
If Goods Type is Gold: multiply Quantity / Geram by Gold Price.
In my case it should be: (100 * 1554) + (100 * 1554) = 310800, and if I add more entries in the future, it should add more.
I have this cell in my current sheet: If Goods Type is Gold, then sumif all Quantity / Geram in another cell which is now 200.
Use SUMPRODUCT: sumproduct
=SUMPRODUCT((A2:A9="Gold")*B2:B9*C2:C9)
It multiplies TRUE/FALSE (result of A2:A9="Gold") with quantity and price and sums everything (in this case = 155400 + 0 + 0 + 0 + 15540 + 0 + 0 + 0))
Result:
Adjust ranges to your data.
You can make a SumProduct() for three columns, so you can create an IF() formula for getting 1 in case of "Gold" and 0 else. This leads to following kind of formula:
=SUMPRODUCT(IF(A2:A6="Gold",1,0),B2:B6,C2:C6)
See following screenshot:

How to do SUMPRODUCT with percentage and blank cells

I am trying to do SUMPRODUCT in Google Sheets but in a more complicated situation.
I want to sum product with percentage instead of decimal number.
This is what I am doing now, and it works just fine:
A B C D
Price Tax Cashback
100 1.09 0.95
80 1 1
50 1.09 0.95
Total =SUMPRODUCT(B:B, C:C, D:D)
What I actually want to do is
A B C D
Price Tax Cashback
100 9% 5%
80
50 9% 5%
Total ???
Use
=SUMPRODUCT(B2:B, 1+C2:C, 1-D2:D)

Applying "percentage of group total" to a column in a grouped dataframe

I have a dataframe from which I generate another dataframe using following code as under:
df.groupby(['Cat','Ans']).agg({'col1':'count','col2':'sum'})
This gives me following result:
Cat Ans col1 col2
A Y 100 10000.00
N 40 15000.00
B Y 80 50000.00
N 40 10000.00
Now, I need percentage of group totals for each group (level=0, i.e. "Cat") instead of count or sum.
For getting count percentage instead of count value, I could do this:
df['Cat'].value_counts(normalize=True)
But here I have sub-group "Ans" under the "Cat" group. And I need the percentage to be on each Cat group level and not the whole total.
So, expectation is:
Cat Ans col1 .. col3
A Y 100 .. 71.43 #(100/(100+40))*100
N 40 .. 28.57
B Y 80 .. 66.67
N 40 .. 33.33
Similarly, col4 will be percentage of group-total for col2.
Is there a function or method available for this?
How do we do this in an efficient way for large data?
You can use the level argument of DataFrame.sum (to perform a groupby) and have pandas take care of the index alignment for the division.
df['col3'] = df['col1']/df['col1'].sum(level='Cat')*100
col1 col2 col3
Cat Ans
A Y 100 10000.0 71.428571
N 40 15000.0 28.571429
B Y 80 50000.0 66.666667
N 40 10000.0 33.333333
For multiple columns you can loop the above, or have pandas align those too. I add a suffix to distinguish the new columns from the original columns when joining back with concat.
df = pd.concat([df, (df/df.sum(level='Cat')*100).add_suffix('_pct')], axis=1)
col1 col2 col1_pct col2_pct
Cat Ans
A Y 100 10000.0 71.428571 40.000000
N 40 15000.0 28.571429 60.000000
B Y 80 50000.0 66.666667 83.333333
N 40 10000.0 33.333333 16.666667

How do I One Hot encode mixed strings and number cell values in pandas?

I have a datasets in which i need to One hot encode composition mixture of different materials.
The columns of my dataset looks like this:
id Composition
0 ZrB2 - 5% B4C
1 HfB2 - 15% SiC - 3% WC
2 HfB2 - 15% SiC
enter image description here
I need to put it in this format:
0)
ZrB2 95
HfB2 0
SiC 0
B4C 5
WC 0
1)
ZrB2 0
HfB2 82
SiC 15
B4C 0
WC 3
2)
ZrB2 0
HfB2 85
SiC 15
B4C 0
WC 0
WB 0
enter image description here
This is not hot encoding as such but parsing a list of strings into constituent parts
each component is delimited by " - "
each component is made up of two parts percentage and column name. Build re that matches each of these constituent parts
from list/dict comprehension put it into a dataframe
complete logic for calculating %age of column where it was not defined
data = ['ZrB2 - 5% B4C', 'HfB2 - 15% SiC - 3% WC', 'HfB2 - 15% SiC']
dfhc = pd.DataFrame({"Composition":data})
# build a list of dict, where dict is of form {'ZrB2': -1, 'B4C': '5'}
# where no %age, default to -1 to be calculated later
parse1 = [{tt[1]:tt[0].replace("% ","") if len(tt[0])>0 else -1
for t in r
# parse out token and percentage, exclude empty tuples (default behaviour of re.findall())
for tt in [x for x in re.findall("([0-9]*[%]?[ ]?)([A-Z,a-z,0-9]*)",t) if x!=("","")]
}
# each column is delimited by " - "
for r in [re.split(" - ",r) for r in dfhc["Composition"].values]
]
df = pd.DataFrame(parse1)
# dtype is important for sum() to work
df = df.astype({c:np.float64 for c in df.columns})
# where %age was not known and defaulted to -1 set it to 100 - sum of other cols
for c in df.columns:
mask = df[df[c]==-1].index
df.loc[mask, c] = 100 - df.loc[mask, [cc for cc in df.columns if cc!=c]].sum(axis=1)
print(f"{dfhc.to_string(index=False)}\n\n{df.to_string(index=False)}\n\n{parse1}")
output
Composition
ZrB2 - 5% B4C
HfB2 - 15% SiC - 3% WC
HfB2 - 15% SiC
ZrB2 B4C HfB2 SiC WC
95.0 5.0 NaN NaN NaN
NaN NaN 82.0 15.0 3.0
NaN NaN 85.0 15.0 NaN
[{'ZrB2': -1, 'B4C': '5'}, {'HfB2': -1, 'SiC': '15', 'WC': '3'}, {'HfB2': -1, 'SiC': '15'}]

Percentage graph from absolute values

I have the following data:
Date A B C
2012/07 7 6 0
2012/08 9 4 0
2012/09 9 3 0
2012/10 14 2 1
2012/11 9 16 0
2012/12 0 14 0
2013/01 7 9 1
2013/02 8 13 1
2013/03 16 62 16
2013/04 7 12 4
2013/05 10 11 1
2013/06 6 37 4
I want to make a line graph from these data, but I want it to show percentages of line total (A + B + C) instead of the absolute values. How can I do this directly, without resorting to intermediate cells where I'd insert formulas to calculate the percentages or adding a line total column?
So the end result should look like this:
But I don't want to have to "manually" create cells like these:
A B C
2012/07 54% 46% 0%
2012/08 69% 31% 0%
2012/09 75% 25% 0%
2012/10 82% 12% 6%
2012/11 36% 64% 0%
2012/12 0% 100%0%
2013/01 41% 53% 6%
2013/02 36% 59% 5%
2013/03 17% 66% 17%
2013/04 30% 52% 17%
2013/05 45% 50% 5%
2013/06 13% 79% 9%
Use Named Ranges.
First, define the name "Total" as =B2:B12+C2:C12+D2:D12
Then, define three names "PctA"=B2:B12/Total, PctB etc.
Then, define a name "Dates"=A2:A12
Insert a line chart and enter the 3 pct names as the data series. Put in the names as Sheet1!PctA, etc. - Excel won't accept the names without a sheet reference.
Do same for Dates as the horizonal category range.

Resources