Running Percentage in Spotfire - spotfire

I have a requirement to make a mtd pareto of defect. I have 4 columns needed: item, qty, share% and defect%. the defect% should be a cumulative percentage. Any help is very much appreciated. thank you.
defect% = itemqty/total volume qty
share% = itemqty/total defect qty
(I would like to sum the share for the current row + share for the previous row.)
total defect itemqty 16
total volume itemqty 13
sample:
item itemqty total volume qty
hardware 7 /16
Locked 5/16+sharepct of hardware
corroded 4/16+sharepct of locked
below is a sample data:
month model item itemqty volumeqty
===============================================
2016(1) Jan P6 Locked 1 1
2016(1) Jan P6 Locked 1 0
2016(1) Jan P6 Locked 1 1
2016(1) Jan P6 Locked 1 1
2016(1) Jan P5 Locked 1 1
2016(1) Jan P6 hardware 1 1
2016(1) Jan P6 hardware 1 0
2016(1) Jan P6 hardware 1 1
2016(1) Jan P6 hardware 1 1
2016(1) Jan P5 hardware 1 1
2016(1) Jan P5 hardware 1 0
2016(1) Jan P5 hardware 1 1
2016(1) Jan P6 corroded 1 1
2016(1) Jan P6 corroded 1 1
2016(1) Jan P6 corroded 1 1
2016(1) Jan P5 corroded 1 1

A very similar thread discusses cumulative sums and percentages here:
Spotfire Custom Expression : Calculate (Num/Den) Percentages
You are going to want to use something similar to:
Sum(If(trim([model])="P6",[itemqty])) then Sum([Value]) over (AllPrevious([Axis.Rows]))

Related

How to change single index value in level 1 in MultiIndex dataframe?

I have this MultiIndex dataframe, df after parsing some text columns for dates with regex.
df.columns
Index(['all', 'month', 'day', 'year'], dtype='object')
all month day year
match
456 0 2000 1 1 2000
461 0 16 1 1 16
1 1991 1 1 1991
463 0 25 1 1 25
1 2014 1 1 2014
465 0 19 1 1 19
1 1976 1 1 1976
477 0 14 1 1 14
1 1994 1 1 1994
489 0 35 1 1 35
1 1985 1 1 1985
I need to keep the rows with years only (2000,1991,2014,1976,1994,1985). Most of these are indexed as 1 at level 1, except for the first one, (456,0).
so that I could handle them this way:
df=df.drop(index=0, level=1)
My result should be this.
all month day year
match
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985
I have tried
df.rename(index={(456,0):(456,1)}, level=1, inplace=True)
which did not seem to do anything.
I could do df1=df.drop((456,1)) and df2=df.drop(index=0, level=1) and then concat them and remove the duplicates, but that does not seem very efficient?
I cant drop the MultiIndex because I will need to append this subset to a bigger dataframe later on.
Thank you.
First idea is chain 2 masks by | for bitwise OR:
df = df[(df.index.get_level_values(1) == 1) | (df.index.get_level_values(0) == 456)]
print (df)
all month day year
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985
Another idea if need always first value is possible set array mask by index to True:
mask = df.index.get_level_values(1) == 1
mask[0] = True
df = df[mask]
print (df)
all month day year
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985
Another out of box solution is filtering not duplicated values by Index.duplicated, working here because first value 456 is unique and for all another values need second rows:
df1 = df[~df.index.get_level_values(0).duplicated(keep='last')]
print (df1)
all month day year
456 0 2000 1 1 2000
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985
Another way. Query the level
df.query('match == [1]')
match all month day year
461 1 1991 1 1 1991
463 1 2014 1 1 2014
465 1 1976 1 1 1976
477 1 1994 1 1 1994
489 1 1985 1 1 1985

condition after groupby: data science

i have a big df, this is a example to ilustrate my issue. I want to know from this dataframe whichs id by year_of_life are in the first percent in terms of jobs. I want to identify (i am thinking with a dummy) the one percent by years_of_life which has more jobs from the distribution.
for example
id year rap jobs_c jobs year_of_life rap_new
1 2009 0 300 10 NaN 0
2 2012 0 2012 12 0 0
3 2013 0 2012 12 1 1
4 2014 0 2012 13 2 1
5 2015 1 2012 15 3 1
6 2016 0 2012 17 4 0
7 2017 0 2012 19 5 0
8 2009 0 2009 15 0 1
9 2010 0 2009 2 1 1
10 2011 0 2009 3 2 1
11 2012 1 2009 3 3 0
12 2013 0 2009 15 4 0
13 2014 0 2009 12 5 0
14 2015 0 2009 13 6 0
15 2016 0 2009 13 7 0
16 2011 0 2009 3 2 1
17 2012 1 2009 3 3 0
18 2013 0 2009 18 4 0
19 2014 0 2009 12 5 0
20 2015 0 2009 13 6 0
.....
100 2009 0 2007 5 6 1
I want to identify (i am thinking with a dummy) the one percent by years_of_life which has more jobs from the distribution and then sum the jobs from those ids by year_of_life in the first percent
i try something like thi:
df.groupby(['year_of_life']).filter(lambda x : x.jobs>
x.jobs.quantile(.99))['jobs'].sum()
but i have the following error
TypeError: filter function returned a Series, but expected a scalar bool
Is this what you need ?
df.loc[df.groupby(['year_of_life']).jobs.apply(lambda x : x>x.quantile(.99)).fillna(True),'jobs'].sum()
Out[193]: 102

Index Match with multiple criteria

I have a list of products ranked by percentile. I want to be able to retrieve the first value less than a specific percentile.
Product Orders Percentile Current Value Should Be
Apples 192 100.00% 29 29
Apples 185 97.62% 29 29
Apples 125 95.24% 29 29
Apples 122 92.86% 29 29
Apples 120 90.48% 29 29
Apples 90 88.10% 29 29
Apples 30 85.71% 29 29
Apples 29 83.33% 29 29
Apples 27 80.95% 29 29
Apples 25 78.57% 29 29
Apples 25 78.57% 29 29
Apples 25 78.57% 29 29
Oranges 2 100.00% 0 1
Oranges 2 100.00% 0 1
Oranges 1 60.00% 0 1
Oranges 1 60.00% 0 1
Lemons 11 100.00% 0 2
Lemons 10 88.89% 0 2
Lemons 2 77.78% 0 2
Lemons 2 77.78% 0 2
Lemons 1 55.56% 0 2
Currently my formula in the "Current Value" column is: =SUMIFS([Orders],[Product],[#[Product]],[Percentile],INDEX([Percentile],MATCH(FALSE,[Percentile]>$O$1,0))) (entered as an array formula)
$O$1 contains the percentile that I am matching (85.00%).
The current value for "Apples" (29) is correct, but as you can see my formula is not producing the correct value for the remaining products as in "Should Be" but is returning "0". Not sure how to set this up to get it to do what I need it to. I tried several things with SumProduct but couldn't get that to work either. I need someone with more experience to give me a hand on this.
You don't need the SUMIFS(), just the INDEX/MATCH:
=INDEX([Orders],MATCH(1,([Percentile]<$O$1)*([Product]=[#Product]),0))
This is an array formula and must be confirmed with Ctrl-Shift-Enter on exiting edit mode. If done properly then Excel will put {} around the formula.

Excel formulas for range criteria date, arranged in columns

I want to write a formula for a large data chart. The criteria which I have to choose is on rows and columns.
I attach the file with the manually written calculus.
|PRODUCT|01-feb|02-feb|03-feb|04-feb|05-feb|06-feb|07-feb|08-feb|09-ef|10-feb|11-feb|feb-12|
|PRODUCT 1|4|3|1|5|2|9|1|3|5|8|0|5|
|PRODUCT 3|2|5|7|4|4|8|3|5|7|4|4|8|
|PRODUCT 1|1|0|5|3|1|1|8|0|5|3|1|1|
|PRODUCT 2|5|4|6|6|0|7|4|4|6|6|0|7|
|PRODUCT 5|8|7|8|7|1|9|2|7|8|7|1|9|
|PRODUCT 4|4|2|9|3|5|1|7|2|9|3|5|1|
|PRODUCT 1|9|8|1|4|4|6|5|8|1|4|4|6|
|PRODUCT 2|6|4|4|7|2|8|6|4|4|7|2|8|
|PRODUCT 5|2|6|1|8|3|9|3|6|1|8|3|9|
|PRODUCT 3|3|9|5|1|7|4|7|9|5|1|7|4|
|PRODUCT 4|7|6|5|5|8|2|1|6|5|5|8|2|
The compact chart that I have to get:
|PRODUCT|04-feb|08-feb|12-feb|
|PRODUCT 1|44|48|43|
|PRODUCT 2|42|35|40|
|PRODUCT 3|36|47|40|
|PRODUCT 4|41|32|38|
|PRODUCT 5|47|40|46|
The formula that it should works:
=SUMAR.SI.CONJUNTO(C5:N15,B5:B15,H20,C4:N4,"=<"&J19)
because I want to show a range of date between 01-feb to 04-feb from the first chart in the new column 04-feb.
Please, help me.
The following might help you. The formula in the upper left cell of the table of the summary is
{=SUM((($B$1:$M$1<=B$14)*($B$1:$M$1>=A$14)*$B$2:$M$13)*($A15=$A$2:$A$13))}
and can be copied over to the over cells. The 31.01 in the summary table is used as a "helper cell", so that you don't have to alter the formula for the different cells.
Product 01. Feb 02. Feb 03. Feb 04. Feb 05. Feb 06. Feb 07. Feb 08. Feb 09. Feb 10. Feb 11. Feb 12. Feb
Product1 5 2 3 3 5 5 3 3 5 3 3 5
Product3 5 4 2 4 5 1 5 3 3 5 3 3
Product4 3 1 2 2 4 5 5 1 5 5 1 5
Product1 4 1 4 3 4 1 4 1 3 4 1 3
Product3 1 2 2 4 5 2 5 1 1 5 1 1
Product4 3 2 4 1 1 4 3 5 2 3 5 2
Product1 4 3 5 1 1 1 2 2 2 2 2 2
Product3 3 2 4 3 5 1 1 1 4 1 1 4
Product4 2 1 4 2 2 1 4 4 3 4 4 3
Product1 4 5 5 2 3 4 3 4 5 3 4 5
Product3 4 2 3 1 4 1 1 3 1 1 3 1
Product4 3 5 3 3 1 4 1 1 3 1 1 3
31. Jan 04. Feb 08. Feb 12. Feb
Product1 54 55 62
Product2 0 0 0
Product3 46 56 46
Product4 41 54 61
Product5 0 0 0
You can use sumproduct for this. B2:E12 is the range of data for Feb 1 though Feb 4, and O2 is equal to the criteria you are searching for. So in my case O2 was equal to Product 1. When you want the range for Feb 8, just change B2:E12 to the range of data corresponding to Feb 5 to Feb 8.
=SUMPRODUCT(B2:E12*(A2:A12=O2))

Automatically fill data from another sheet

Main Question
I would like to auto-fill Sheet A with values from Sheet B in Excel 2013. My data are in two sheets in the same workbook.
Example
=========== Sheet 1 =========== =========== Sheet 2 ===========
location year val1 val2 location year val1 val2
USA.VT 1999 USA.VT 1999 6 3
USA.VT 2000 USA.VT 2000 3 2
USA.VT 2001 USA.VT 2001 4 1
USA.VT 2002 USA.VT 2002 9 5
USA.NH 1999 USA.NH 1999 3 6
USA.NH 2000 USA.NH 2002 12 56
USA.NH 2001 USA.ME 1999 3 16
USA.NH 2002 USA.ME 2002 4 5
USA.ME 1999
USA.ME 2000
USA.ME 2001
USA.ME 2002
I would like to use some function or formula to automatically populate Sheet 1 based on the values in Sheet 2 according to: location, year, and the column (val1 or val2). Non-matches would be zero-filled.
This would result in the following:
=========== Sheet 1 ===========
location year val1 val2
USA.VT 1999 6 3
USA.VT 2000 3 2
USA.VT 2001 4 1
USA.VT 2002 9 5
USA.NH 1999 3 6
USA.NH 2000 0 0
USA.NH 2001 0 0
USA.NH 2002 12 56
USA.ME 1999 3 16
USA.ME 2000 0 0
USA.ME 2001 0 0
USA.ME 2002 4 5
I have tried VLOOKUP, INDEX, and MATCH, but I'm having no luck.
Any help would be greatly appreciated!
Put this Array formula in C2:
=IFERROR(INDEX(Sheet2!C$2:C$9,MATCH($A2&$B2,Sheet2!$A$2:$A$9&Sheet2!$B$2:$B$9,0)),0)
Being an array formula you must confirm with Ctrl-Shift-Enter to exit the edit mode instead of Enter.
Then copy over one column and down.
The picture is not exact because I left it on one sheet.

Resources