Intermediate steps in evaluation of Frequency formula - excel

This has reference to [SO question]Counting unique list of items from range based on criteria from other ranges
Formula Suggested by Scot Craner is :
=SUM(--(FREQUENCY(IF(B2:B7<=25,IF(C2:C7<=35,COUNTIF(A2:A7,"<"&A2:A7),""),""),COUNTIF(A2:A7,"<"&A2:A7))>0))
I have been able to understand clearly the logic and evaluation of the formula except for this step shown in the attached snapshots.
As per MS Office document:
FREQUENCY(data_array, bins_array) The FREQUENCY function syntax has
the following arguments: Data_array Required. An array of or
reference to a set of values for which you want to count frequencies.
If data_array contains no values, FREQUENCY returns an array of zeros.
Bins_array Required. An array of or reference to intervals into
which you want to group the values in data_array. If bins_array
contains no values, FREQUENCY returns the number of elements in
data_array.
It is clear to me as to How {1;1;4;0;"";"") comes in data_array and also how {1;1;4;0;5;3} comes in bins_array.But how it evaluates to {2;0;1;1;0;0;0} is not clear to me.
Would appreciate if someone can lucidly explain it.

So you wants to know how
FREQUENCY({1;1;4;0;"";""},{1;1;4;0;5;3}) evaluates to {2;0;1;1;0;0;0}?
Problem is that the bins_array not needs to be sorted to make FREQUENCY working. But of course it internally must sort the bins_array to get the intervals into which to group the values in data_array. Then it groups and counts and then it returns the counted numbers in the same order the bins was given in bins_array.
Scores Bins
1 1
1 1
4 4
0 0
"" 5
"" 3
Bins sorted
0 (<=0)
1 (>0, <=1)
1 (>1, <=1) == not possible
3 (>1, <=3)
4 (>3, <=4)
5 (>4, <=5)
(>5)
Bin Description Result
1 Number of scores (>0, <=1) 2
1 Number of scores (>1, <=1) == not possible 0
4 Number of scores (>3, <=4) 1
0 Number of scores (<=0) 1
5 Number of scores (>4, <=5) 0
3 Number of scores (>1, <=3) 0
Number of scores (>5) 0

Related

Excel - assign values based on the first unique item

I have got an excel question that I can not answer. Here is my table:
ID Key Count Unique Available Text Results
1 0 Text-1 Dupe-Y
2 1 Y Text-1 Y
3 0 Text-1 Dupe-Y
4 0 Text-1 Dupe-Y
5 1 N Text-2 N
6 1 Y Text-3 Y
7 0 Text-2 Dupe-N
8 0 Duplicate Text-2 Dupe-N
9 0 Duplicate Text-2 Dupe-N
10 0 Y Text-2 Dupe-N
Id Key is just unique key.
Count unique picks up the first time each value in column Text appears. Available can have Y, N, Duplicate and Text is the main column I need to analyze my table. The Results are for the first time each value in Text appears (Count unique = 1), if there is a value in Available then that is the value I need, if Count Unique is 0 then is either Dupe-Y or Dupe-N depending on the value in Available.
I tried with a formula like this one but got stuck after initial progress. =IF(B2=0,"",IFERROR(IF(COUNTIF(D:D,D2)>1,IF(COUNTIF($D:$D,D2)=1,"",C2),1),1))
Note that the column Results is the one I need to populate with a formula that is not affected by sorting or lack of it.
I guess you got all those values and you just need a formula for column Results.
My formul will work only if the data is sorted like in your example. If sorting changes, formula will fail:
My formula is:
=IF(B2=1;D2;"Dupe-"&RIGHT(G1;1))

I want to improve speed of my algorithm with multiple rows input. Python. Find average of consequitive elements in list

I need to find average of consecutive elements from list.
At first I am given lenght of list,
then list with numbers,
then am given how many test i need to perform(several rows with inputs),
then I am given several inputs to perform tests(and need to print as many rows with results)
every row for test consist of start and end element in list.
My algorithm:
nu = int(input()) # At first I am given lenght of list
numbers = input().split() # then list with numbers
num = input() # number of rows with inputs
k =[float(i) for i in numbers] # given that numbers in list are of float type
i= 0
while i < int(num):
a,b = input().split() # start and end element in list
i += 1
print(round(sum(k[int(a):(int(b)+1)])/(-int(a)+int(b)+1),6)) # round up to 6 decimals
But it's not fast enough.I was told it;s better to get rid of "while" but I don't know how. Appreciate any help.
Example:
Input:
8 - len(list)
79.02 36.68 79.83 76.00 95.48 48.84 49.95 91.91 - list
10 - number of test
0 0 - a1,b1
0 1
0 2
0 3
0 4
0 5
0 6
0 7
1 7
2 7
Output:
79.020000
57.850000
65.176667
67.882500
73.402000
69.308333
66.542857
69.713750
68.384286
73.668333
i= 0
while i < int(num):
a,b = input().split() # start and end element in list
i += 1
Replace your while-loop with a for loop. Also you could get rid of multiple int calls in the print statement:
for _ in range(int(num)):
a, b = [int(j) for j in input().split()]
You didn't spell out the constraints, but I am guessing that the ranges to be averaged could be quite large. Computing sum(k[int(a):(int(b)+1)]) may take a while.
However, if you precompute partial sums of the input list, each query can be answered in a constant time (sum of numbers in the range is a difference of corresponding partial sums).

I want to sum the total amount of consecutive numbers (similar to a count of consecutive numbers attached)

In this article there is directions on determining the number of consecutive cells greater than 0. I want to do something similar, but instead of simply counting the number of consecutive columns there are, I want to sum them up.
For example, 0 0 0 1 2 0 0 5 would be 3 (1+2=3)
Edit: If for example, there is multiple max consecutive numbers (e.g., 0 0 1 3 0 0 4 5) I would want the highest sum (9) to be chosen.
Edit2: The data is set-up in rows (X axis is months I need to sum for consecutives and Y axis is "user") so the formula would match the examples I've given above.

Excel: Count until, then repeat?

I have a list of numbers which are either 1's or 2's. What I'd like to do is count how many 1's there are before a 2 appears, and then keep repeating this down the list (i'm trying to find the average number of 1's between each 2).
What would be the best way of doing this considering I've got over 10,000 rows? (i.e. too many to do manually)
The average number of 1's between each number 2, is the same as the ratio between the number 1 and the number 2.
Example:
1
1
2
1
1
1
1
2
1
1
2
1
1
2
Contains 10 ones and 4 twos.
Or there are five groups of ones, with the following counts: 2, 4, 2, 2
Either way, it will give you and average of 2.5 (10/4 = 2.5)
Note: You have to make a design choice, regarding how to handle beginnings and ends. If you had another one, after the last two, how should it be handled?
You can use the formula as shown in the screenshot below:
Note that the formula in the first row is different.
B C
=IF(A2=1,B1,B1+1) =COUNTIF(B:B,B2)
=IF(A3=1,B2,B2+1) =IFERROR(IF(A4=2,COUNTIF(B:B,B4),"")-1,"")
Then to get the average use:
=AVERAGEIF(C:C,"<>"&0)
Noceo's solution as a formula:
=COUNTIF(A:A,1)/COUNTIF(A:A,2)
The output of all the above:

How do you group data in columns?

I have numeric data under fifty samples that are mostly similar. I want to count identical columns and give statistics on the same. There are too many rows to select them (37,888). Data looks like:
Sample 1 Sample 2 Sample 3 ........ Sample 50
4 4 0
4 4 0
4 4 ...
0 0
0 0
0 0
0 0
... ...
upto thousands of rows for each sample.
There is a column for date/time as well, would be nice if I could include that in the grouping.
In this snippet, there are many rows. Sample 1 and 2 are identical hence should be grouped together. Sample three would form another group and so on.
While I'm not sure what "There are too many rows to select them" means in this context (there is no limit on the number of rows or items that can be selected and included in a formula), this looks like a job for array formulas.
If you want to determine (for instance) whether columns C and D are equal, from rows 1 through 37888, you can use this formula:
=AND(C1:C37888=D1:D37888)
To make Excel treat this as an array formula, you need to press CTRL-SHIFT-ENTER (Windows) or CMD-ENTER (Mac) after typing the formula. The "AND" function will return TRUE if and only if all corresponding entries are equal: C1=D1, C2=D2, C3=D3, ..., C37888=D37888. It returns FALSE if any corresponding entries disagree.
Exactly what you do next will depend on the nature of the statistics that you want to compute for each group, but this formula will at least help you figure out which columns belong in the same group together.

Resources