Related
I have this list:
def grades = [5,4,3,2,1,1]
Where index is a grade, and value is an occurrence of the grade:
Grade
Occurrence
0
5
1
4
2
3
3
2
4
1
5
1
How can I calculate the 90th percentile for the grades?
This gets me a whole grade. I was hoping to find an exact value of 90th percentile, but this will do for me.
def grades = [5,4,3,2,1,1]
sum=grades.sum()
per_grade=0
per_value=sum*0.9
grades.eachWithIndex { grade_count, grade ->
per_value-=grade_count
if (per_value<0){per_grade=grade-1}
if (per_value==0){per_grade=grade}
}
out.write(per_grade)
A total of 16 grades have been given. 90% are 14.4 grades, so discard the lowest 14 grades and take the smallest remaining (in your example it will be 4).
How to code? There are some ways:
You may count through the array you have got. Subtract 5 from 14 (= 9), then 4, then 3, then 2. Once you reach zero, you’re at the index of the 90th percentile.
Maybe easier to understand, but will require a few more code lines: Put all 16 grades into an array (or list): [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5]. Since the array is sorted, the median is found at index 14.
I'm trying to group data into batches of 1000 with the aid of a helper column, so that I don't have to keep typing out specific ranges to select them.
I came up with a formula for a helper column, but it is an imperfect solution:
=IFS(ROW()<=1001,1,ROW()<=2001,2,ROW()<=3001,3,ROW()<=4001,4,ROW()<=5001,5)
Is there not a better way of writing something to do the same job but that is infinitely scalable?
If you have a sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ..., as with ROW(), but need 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, ..., so ten times 1, then ten times 2, then ten times 3 and so on, then divide by 10 and take integer result.
=INT((ROW()-1)/10)+1
In A1 and filled down will get ten times 1, then ten times 2, then ten times 3 and so on.
Instead of -1 you can provide the start row different
=INT((ROW()-2)/10)+1
will start the counting at row 2.
Now change the 10 to 1000 as you need thousand times 1, then thousand times 2, then thousand times 3 and so on.
=INT((ROW()-2)/1000)+1
If you have Office 365, you could enter a formula like:
=INT((SEQUENCE(COUNTA($A:$A))-1)/1000)+1
and it will spill down.
The COUNTA(... is merely to limit how far down the helper column is populated. If you want to populate the entire column, replace COUNTA with ROWS and enter the formula in Row 1.
But there may be simpler methods of solving your actual problem.
Fellow overflowers, all help is appreciated;
I have the following rows of values (always 7 values per row) of data in Excel (3 examples below), where data is coded as 1 or 2. I am interested in the 1's.
2, 2, 1, 2, 2, 1, 1.
1, 2, 2, 2, 2, 1, 2.
2, 2, 2, 1, 1, 1, 2.
I use the =MATCH(1,A1:G1,0) to tell me WHEN the first 1 appears, BUT now I want to calculate the proportion that 1's make up of the the remaining values in the row.
For example;
2, 2, 1, 2, 2, 1, 1. (1 first appears at point 3, but then 1's make up 2 out of 4 remaining points; 50%).
1, 2, 2, 2, 2, 1, 2. (1 first appears at point 1, but then 1's make up 1 out of the 6 remaining points; 16%).
2, 2, 2, 1, 1, 1, 2. (1 first appears at point 4, but then 1's make up 2 out of the 3 remaining points; 66%).
Please help me calculate this proportion!
You could use this one
=(LEN(SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",",""))
-LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",",""),1,""))
)/LEN(SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",",""))
The
SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",","")
-part gets the string after the first 1. The single 1 in the middle part is the one, you want to calculate the percentage for. So if you want to adapt the formula to other chars, you have to change the single 1 in th emiddle part and the three 1s in the three searches.
EDIT thank you for the hint #foxfire
A solution for values in columns would be
=COUNTIF(INDEX(A1:G1,1,MATCH(1,A1:G1,0)+1):G1,1)/(COUNT(A1:G1)-MATCH(1,A1:G1,0))
You can do it with SUMPRODUCT:
My formula in column H is a MATCH like yours:
=MATCH(1;A3:G3;0)
My formula for calculatin % of 1's over reamining numbers after first 1 found, is:
=SUMPRODUCT((A3:G3=1)*(COLUMN(A3:G3)>H3))/(7-H3)
This is how it works:
(A3:G3=1) will return an array of 1 and 0 if cell value is 1 or not. So for row 3 it would be {0;0;1;0;0;1;1}.
COLUMN(A3:G3)>H3 will return an array of 1 and 0 if column number of cell is higher than column number of first 1 found, (that matchs with its position inside array). So for row 3 it would be {0;0;0;1;1;1;1}
We multiply both arrays. So for row 3 it would be {0;0;1;0;0;1;1} * {0;0;0;1;1;1;1} = {0;0;0;0;0;1;1}
With SUMPRODUCT we sum up the array of 1 and 0 from previous step. So for row 3 we would obtain 2. That means there are 2 cells with value 1 after first 1 found.
(7-H3) will just return how many cells are after first 1 found, so fo row 3, it means there are 4 cells after first 1 found.
We divide value from step 4 by value from previous step, and that's the % you want. So for row 3, it would be 2/4=0,50. That means 50%
update: I used 2 columns just in case you need to show where is the first 1. But in case you want a single column with the %, formula would be:
=SUMPRODUCT((A3:G3=1)*(COLUMN(A3:G3)>MATCH(1;A3:G3;0)))/(7-MATCH(1;A3:G3;0))
I have a RDD[Double], I want to divide the RDD into k equal intervals, then count the number of each equal distance interval in RDD.
For example, the RDD is like [0,1,2,3,4,5,6,6,7,7,10]. I want to divided it into 10 equal intervals, so the intervals are [0,1), [1,2), [2,3), [3,4), [4,5), [5,6), [6,7), [7,8), [8,9), [9,10].
As you can see, each element of RDD will be in one of the intervals. Then I want to calculate the number of each interval. Here, there are one element in [0,1),[1,2),[2,3),[3,4),[4,5),[5,6), and both [6,7) and [7,8) have two element. [9,10] has one element.
Finally I expected an array like array([1,1,1,1,1,1,2,2,0,1].
Try this. I have assumed that first element of the range is inclusive and last exclusive. Please confirm on this. For example when considering the range [0,1] and element is 0 the condition is element >= 0 and element < 1.
for index_upper, element_upper in enumerate(array_range):
counter = 0
for index, element in enumerate(rdd.collect()):
if element >= element_upper[0] and element < element_upper[1] :
counter +=1
countElementsWithinRange.append(counter)
print(rdd.collect())
# [0, 1, 2, 3, 4, 5, 6, 6, 7, 7, 10]
print(countElementsWithinRange)
# [1, 1, 1, 1, 1, 1, 2, 2, 0, 0]
I've the following data set and want to add the values that reflect "ABC" in any cell.
Column1 Column 2 Column 3 Column 4 Column 5
ABC is good CNN $150 ABC NBA
Better life N-H $40 LIT MNM
Nice Job ABC is good $35 MN ABC
Poor H-I $200 ITL ABC
Best TI $120 SQL ABC
Poor life N-T $40 LT NM
Great BE $800 ABC BEF
The sum it should return is $150+$35+200+120+$400 = $905 because somewhere in the cells it has the text "ABC". I tried using sumif(find) formula but gives me value error.
Any thoughts?
Short Answer
Use this array formula:
=SUMPRODUCT(IF(IF(LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A),1,0)+IF(LEN(SUBSTITUTE(B:B,"ABC",""))<LEN(B:B),1,0)+IF(LEN(SUBSTITUTE(D:D,"ABC",""))<LEN(D:D),1,0)+IF(LEN(SUBSTITUTE(E:E,"ABC",""))<LEN(E:E),1,0)>0,1,0),C:C)
Note: array formulas are entered with ctrl + shift + enter
Explaination
To test whether or not a cell contains ABC we can use the SUBSTITUTE forumla combined with a LEN to test the difference between the string lengths:
LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A)
We can then wrap that in an IF statement to get a nice array of 1's and 0's
IF(IF(LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A),1,0)
If we mapped this out for your data it would look like this:
IF(IF(LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A),1,0) = {0, 1, 0, 0, 0, 0, 0, 0}
IF(IF(LEN(SUBSTITUTE(B:B,"ABC",""))<LEN(B:B),1,0) = {0, 0, 0, 1, 0, 0, 0, 0}
IF(IF(LEN(SUBSTITUTE(D:D,"ABC",""))<LEN(D:D),1,0) = {0, 1, 0, 0, 0, 0, 0, 1}
IF(IF(LEN(SUBSTITUTE(E:E,"ABC",""))<LEN(E:E),1,0) = {0, 0, 0, 1, 1, 1, 0, 0}
+= {0, 2, 0, 2, 1, 1, 0, 1}
All we have to do then is check if the number in the array is >0 and multiply it by column C using SUMPRODUCT:
{0, 2, 0, 2, 1, 1, 0, 1 }
>0 {0, 1, 0, 1, 1, 1, 0, 1 }
*C:C {0, 150, 40, 35, 200, 120, 40, 800}
= {0, 150, 0, 35, 200, 120, 0, 800}
-----------------------------------------
SUM = 1305
Since we are looking for ABC in any of the cells, we can use CONCATENATE-FIND to join all the cells together and then find ABC in the new string. This saves a ton of code and simplifies the logic. It always makes it easier to expand to more cells.
Ranges for reference
Formula in G1. This is an array formula (enter with CTRL+SHIFT+ENTER).
=SUM(IF(ISERR(FIND("ABC",CONCATENATE(A1:A7,B1:B7,D1:D7,E1:E7))), 0, C1:C7))
How it works
CONCATENATE forms a single large string with all the columns combined
FIND looks for ABC in that single string. It will return a number if found and an error (#VALUE) otherwise.
ISERR checks if the error was returned
IF decides if the value in column C should be returned or a 0 based on that error
SUM takes all of those numbers and adds them