I have an excel sheet which has below data:
col1 col2 col3 col4 col5 output range
-----------------------------------------------------------------------------
-1 -1 -1 -1 -1 99.9% - 100%
-1 -1 -1 -1 -1 98% - 99.8%
87.8 78.6 95.2 98.2 94.7 95% - 98.9%
100 100 100 100 100 90% - 94.9%
90.4 86 96.6 73.2 95.5 80% - 89.9%
92.9 88.9 93.1 100 100 0% - 79.9%
85.7 80 82.2 100 100
85.7 80 82.2 100 100
98.3 100 97.9 100 94.4
Now I need to come up with a formula which can do below things:
I need to figure out out minimum of col2, col3, col4, col5 and if that minimun is falling under any of those range mentioned in Range column, then I need to print that range in output column.
But if col1 has value -1 then in output column I want to write "Fail". We will ignore point 1 above if col1 has value -1.
So for example output will be:
col1 col2 col3 col4 col5 output range
-----------------------------------------------------------------------------
-1 -1 -1 -1 -1 Fail 99.9% - 100%
-1 -1 -1 -1 -1 Fail 98% - 99.8%
87.8 78.6 95.2 98.2 94.7 0% - 79.9% 95% - 98.9%
100 100 100 100 100 99.9% - 100% 90% - 94.9%
90.4 86 96.6 73.2 95.5 0% - 79.9% 80% - 89.9%
92.9 88.9 93.1 100 100 80% - 89.9% 0% - 79.9%
85.7 80 82.2 100 100 80% - 89.9%
85.7 80 82.2 100 100 80% - 89.9%
98.3 100 97.9 100 94.4 90% - 94.9%
Is this possible to do by any chance in excel? It looks like pretty complex so I am kinda confuse on how to do this in excel automatically using some formula?
Here's one way. Columns K through N are a reference.
Formula for H2:
=MIN(B2:E2)
Formula for I2:
=IF(A2=-1,"z",IF(H2>$L$2,"a",IF(H2>$L$3,"b",IF(H2>$L$4,"c",IF(H2>$L$5,"d",IF(H2>$L$6,"e","f"))))))
Formula for F2:
=VLOOKUP(I2,K:N,4,FALSE)
Drag 'em down and you're done.
Granted, you could accomplish this with fewer columns, but I've laid it out this way for illustration.
Order the ranges in ascending order, create a column of the lower bounds of the ranges and use LOOKUP function to find the appropriate range.
Related
This is my sheet:
Goods Type Quantity / Geram Gold Price 750K Price of Exchange
Gold 100 1,554.00 0
Silver 500 235.00 0
Euro 200 0.00 1.01
Pond 50 0.00 0.97
Gold 100 1,554.00 0
Silver 500 235.00 0
Euro 200 0.00 1.00
Pond 50 0.00 0.99
I want to do to this:
If Goods Type is Gold: multiply Quantity / Geram by Gold Price.
In my case it should be: (100 * 1554) + (100 * 1554) = 310800, and if I add more entries in the future, it should add more.
I have this cell in my current sheet: If Goods Type is Gold, then sumif all Quantity / Geram in another cell which is now 200.
Use SUMPRODUCT: sumproduct
=SUMPRODUCT((A2:A9="Gold")*B2:B9*C2:C9)
It multiplies TRUE/FALSE (result of A2:A9="Gold") with quantity and price and sums everything (in this case = 155400 + 0 + 0 + 0 + 15540 + 0 + 0 + 0))
Result:
Adjust ranges to your data.
You can make a SumProduct() for three columns, so you can create an IF() formula for getting 1 in case of "Gold" and 0 else. This leads to following kind of formula:
=SUMPRODUCT(IF(A2:A6="Gold",1,0),B2:B6,C2:C6)
See following screenshot:
I am trying to do SUMPRODUCT in Google Sheets but in a more complicated situation.
I want to sum product with percentage instead of decimal number.
This is what I am doing now, and it works just fine:
A B C D
Price Tax Cashback
100 1.09 0.95
80 1 1
50 1.09 0.95
Total =SUMPRODUCT(B:B, C:C, D:D)
What I actually want to do is
A B C D
Price Tax Cashback
100 9% 5%
80
50 9% 5%
Total ???
Use
=SUMPRODUCT(B2:B, 1+C2:C, 1-D2:D)
I have a dataframe from which I generate another dataframe using following code as under:
df.groupby(['Cat','Ans']).agg({'col1':'count','col2':'sum'})
This gives me following result:
Cat Ans col1 col2
A Y 100 10000.00
N 40 15000.00
B Y 80 50000.00
N 40 10000.00
Now, I need percentage of group totals for each group (level=0, i.e. "Cat") instead of count or sum.
For getting count percentage instead of count value, I could do this:
df['Cat'].value_counts(normalize=True)
But here I have sub-group "Ans" under the "Cat" group. And I need the percentage to be on each Cat group level and not the whole total.
So, expectation is:
Cat Ans col1 .. col3
A Y 100 .. 71.43 #(100/(100+40))*100
N 40 .. 28.57
B Y 80 .. 66.67
N 40 .. 33.33
Similarly, col4 will be percentage of group-total for col2.
Is there a function or method available for this?
How do we do this in an efficient way for large data?
You can use the level argument of DataFrame.sum (to perform a groupby) and have pandas take care of the index alignment for the division.
df['col3'] = df['col1']/df['col1'].sum(level='Cat')*100
col1 col2 col3
Cat Ans
A Y 100 10000.0 71.428571
N 40 15000.0 28.571429
B Y 80 50000.0 66.666667
N 40 10000.0 33.333333
For multiple columns you can loop the above, or have pandas align those too. I add a suffix to distinguish the new columns from the original columns when joining back with concat.
df = pd.concat([df, (df/df.sum(level='Cat')*100).add_suffix('_pct')], axis=1)
col1 col2 col1_pct col2_pct
Cat Ans
A Y 100 10000.0 71.428571 40.000000
N 40 15000.0 28.571429 60.000000
B Y 80 50000.0 66.666667 83.333333
N 40 10000.0 33.333333 16.666667
I have a datasets in which i need to One hot encode composition mixture of different materials.
The columns of my dataset looks like this:
id Composition
0 ZrB2 - 5% B4C
1 HfB2 - 15% SiC - 3% WC
2 HfB2 - 15% SiC
enter image description here
I need to put it in this format:
0)
ZrB2 95
HfB2 0
SiC 0
B4C 5
WC 0
1)
ZrB2 0
HfB2 82
SiC 15
B4C 0
WC 3
2)
ZrB2 0
HfB2 85
SiC 15
B4C 0
WC 0
WB 0
enter image description here
This is not hot encoding as such but parsing a list of strings into constituent parts
each component is delimited by " - "
each component is made up of two parts percentage and column name. Build re that matches each of these constituent parts
from list/dict comprehension put it into a dataframe
complete logic for calculating %age of column where it was not defined
data = ['ZrB2 - 5% B4C', 'HfB2 - 15% SiC - 3% WC', 'HfB2 - 15% SiC']
dfhc = pd.DataFrame({"Composition":data})
# build a list of dict, where dict is of form {'ZrB2': -1, 'B4C': '5'}
# where no %age, default to -1 to be calculated later
parse1 = [{tt[1]:tt[0].replace("% ","") if len(tt[0])>0 else -1
for t in r
# parse out token and percentage, exclude empty tuples (default behaviour of re.findall())
for tt in [x for x in re.findall("([0-9]*[%]?[ ]?)([A-Z,a-z,0-9]*)",t) if x!=("","")]
}
# each column is delimited by " - "
for r in [re.split(" - ",r) for r in dfhc["Composition"].values]
]
df = pd.DataFrame(parse1)
# dtype is important for sum() to work
df = df.astype({c:np.float64 for c in df.columns})
# where %age was not known and defaulted to -1 set it to 100 - sum of other cols
for c in df.columns:
mask = df[df[c]==-1].index
df.loc[mask, c] = 100 - df.loc[mask, [cc for cc in df.columns if cc!=c]].sum(axis=1)
print(f"{dfhc.to_string(index=False)}\n\n{df.to_string(index=False)}\n\n{parse1}")
output
Composition
ZrB2 - 5% B4C
HfB2 - 15% SiC - 3% WC
HfB2 - 15% SiC
ZrB2 B4C HfB2 SiC WC
95.0 5.0 NaN NaN NaN
NaN NaN 82.0 15.0 3.0
NaN NaN 85.0 15.0 NaN
[{'ZrB2': -1, 'B4C': '5'}, {'HfB2': -1, 'SiC': '15', 'WC': '3'}, {'HfB2': -1, 'SiC': '15'}]
I have the following data:
Date A B C
2012/07 7 6 0
2012/08 9 4 0
2012/09 9 3 0
2012/10 14 2 1
2012/11 9 16 0
2012/12 0 14 0
2013/01 7 9 1
2013/02 8 13 1
2013/03 16 62 16
2013/04 7 12 4
2013/05 10 11 1
2013/06 6 37 4
I want to make a line graph from these data, but I want it to show percentages of line total (A + B + C) instead of the absolute values. How can I do this directly, without resorting to intermediate cells where I'd insert formulas to calculate the percentages or adding a line total column?
So the end result should look like this:
But I don't want to have to "manually" create cells like these:
A B C
2012/07 54% 46% 0%
2012/08 69% 31% 0%
2012/09 75% 25% 0%
2012/10 82% 12% 6%
2012/11 36% 64% 0%
2012/12 0% 100%0%
2013/01 41% 53% 6%
2013/02 36% 59% 5%
2013/03 17% 66% 17%
2013/04 30% 52% 17%
2013/05 45% 50% 5%
2013/06 13% 79% 9%
Use Named Ranges.
First, define the name "Total" as =B2:B12+C2:C12+D2:D12
Then, define three names "PctA"=B2:B12/Total, PctB etc.
Then, define a name "Dates"=A2:A12
Insert a line chart and enter the 3 pct names as the data series. Put in the names as Sheet1!PctA, etc. - Excel won't accept the names without a sheet reference.
Do same for Dates as the horizonal category range.