Retrieve top 10 results from a data set by category - excel

The data set I have is for example and the actual data will have upwards of 100 people. I need to retrieve the top 10 scores from each category in the picture below:

I think the answer is really quite simple.
In each of the corresponding cells, use the "large" forumla.
In cell for vertical 1 it would read large(column vertical, 1) - Returning the largest value in the set.
In cell for vertical 2 it would reage large(colum vertical, 2) - Returning the 2nd largest value in the set.

Related

Counting if part of string is within interval

I am currently trying to check if a number in a comma-separated string is within a number interval. What I am trying to do is to check if an area code (from the comma-separated string) is within the interval of an area.
The data:
AREAS
Area interval
Name
Number of locations
1000-1499
Area 1
?
1500-1799
Area 2
?
1800-1999
Area 3
?
GEOLOCATIONS
Name
Areas List
Location A
1200, 1400
Location B
1020, 1720
Location C
1700, 1920
Location D
1940, 1950, 1730
The result I want here is the number of unique locations in the "Areas list" within the area interval. So Location D should only count ONCE in the 1800-1999 "area", and the Location A the same in the 1000-1499 location. But location B should count as one in both 1000-1499 and one in 1500-1799 (because a number from each interval is in the comma-separated string in "Areas list"):
Area interval
Name
Number of locations
1000-1499
Area 1
2
1500-1799
Area 2
3
1800-1999
Area 3
2
How is this possible?
I have tried with a COUNTIFS, but it doesnt seem to do the job.
Here is one option using FILTERXML():
Formula in C2:
=SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]"))
Where:
"<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>" - Is the part where we construct a valid piece of XML. The theory here is that we use three axes here. Each t-node will be named a literal 1 to make sure that once we return them with xpath we can sum the result. The outer x-nodes are there to make sure Excel will handle the inner axes correctly. If you are curious to know how this xml-syntax looks at the end, it's best to step through using the 'Evaluate Formula' function on the Data-tab;
//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]")) - Basically means that we collect all t-nodes where the count of child s-nodes that are >= to the leftmost number and <= to the rightmost number is larger than zero. For A2 the xpath would look like //t[count(.//*[.>=1000][.<=1499])>0]")) after substitution. In short: //t - Select t-nodes, where count(.//* select all child-nodes where count of nodes that fullfill both requirements [.>=1000][.<=1499] is larger than zero;
Since all t-nodes equal the number 1, the SUM() of these t-nodes equals the amount of unique locations that have at least one area in its Areas List;
Important to note that FILTERXML() will result into an error if no t-nodes could be found. That would mean we need to wrap the FILTERXML() in an IFERROR(...., 0) to counter that and make the SUM() still work correctly.
Or, wrap the above in BYROW():
Formula in C2:
=BYROW(A2:A4,LAMBDA(a,SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(a,"-","][.<=")&"])>0]"))))
Using MMULT and TEXTSPLIT:
=LET(rng,TEXTSPLIT(D2,"-"),
tarr,IFERROR(--TRIM(TEXTSPLIT(TEXTJOIN(";",,$B$2:$B$5),",",";")),0),
SUM(--(MMULT((tarr>=--TAKE(rng,,1))*(tarr<=--TAKE(rng,,-1)),SEQUENCE(COLUMNS(tarr),,1,0))>0)))
I am in very distinguished company but will add my version anyway as byrow probably is a slightly different approach
=LET(range,B$2:B$5,
lowerLimit,--#TEXTSPLIT(E2,"-"),
upperLimit,--INDEX(TEXTSPLIT(E2,"-"),2),
counts,BYROW(range,LAMBDA(r,SUM((--TEXTSPLIT(r,",")>=lowerLimit)*(--TEXTSPLIT(r,",")<=upperLimit)))),
SUM(--(counts>0))
)
Here the ugly way to do it, with A LOT of helper columns. But not so complicated 🙂
F4= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(B4;",";"</r><r>")&"</r></m>";"//r"))
F11= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(A11;"-";"</r><r>")&"</r></m>";"//r"))
F16= =SUM(F18:F21)
F18= =IF(SUM(($F4:$O4>=$F$11)*($F4:$O4<=$G$11))>0;1;"")
G18= =IF(SUM(($F4:$O4>=$F$12)*($F4:$O4<=$G$12))>0;1;"")
H18= =IF(SUM(($F4:$O4>=$F$13)*($F4:$O4<=$G$13))>0;1;"")

Determine Size Of Array of Equal Items Excel

I have an array that looks like this
11100100110
essentially, an array of fixed size with each item being a 1 or 0 with the last item always equal to 0.
Consider each set of consecutive 1's to be a "bucket". I'd like a formula to determine the size of each bucket. So the output of this formula for the above sequence should be
312
as an array. Ideally this works in both excel and google sheets.
If you are interested this is the result of a list of stars and bars configurations where the 0's in my sequence represent bars and the 1's represent stars (the final value is a dummy 0 to make things easier to work with). I want the size of each non-empty bucket in a given configuration of stars and bars.
Thanks, in advance.
You could also use the standard method with Frequency which will work with Excel 365 and GS:
=FILTER(FREQUENCY(IF(A1:A11=1,ROW(A1:A11)),IF(A1:A11=0,ROW(A1:A11))),FREQUENCY(IF(A1:A11=1,ROW(A1:A11)),IF(A1:A11=0,ROW(A1:A11))))
try:
=INDEX((JOIN(, LEN(SPLIT(A1, 0)))))
update:
=INDEX(IFERROR(1/(1/SUBSTITUTE(FLATTEN(QUERY(TRANSPOSE(IFERROR(1/(1/
LEN(SPLIT(SUBSTITUTE(FLATTEN(QUERY(
TRANSPOSE(A1:K),, 9^9)), " ", ), 0))))),, 9^9)), " ", ))))
Assuming A2:A9 contains the data,
=ARRAYFORMULA(QUERY(FREQUENCY(IF(A2:A9,ROW(A2:A9)),IF(NOT(A2:A9),ROW(A2:A9))),"where Col1>0",))
FREQUENCY(data,classes) to get the frequency of data in classes
Make sequence of row numbers as data, if 1
Make sequence of row numbers as classes, if not 1
QUERY to get rid of zeros

Scatter plot for variable number of rows and specific columns

I want to create an automated scatter plot. This is the first example table based on the step size I end up measuring A, B, C, D for a specific frequency. In this scatter plot I created manually you can see I want to plot C v/s A for a particular frequency.
But I need to do this automatically as based on the step size the number of row can change. Here, since the step size decreased the number of samples increased, and now the scatter plot needs to update number of A and C values it plots.
Is there a formula I can use without using any macros?
The relation between the step size and frequency is (number of samples of a single frequency = (360/step size)) so for a step size of 60 you will have in reality six entries of frequency 100 and six of 200 .
You can use formulas to define chart ranges if you hide the formulas in named ranges. Combine that with the fact that #N/A values are not plotted and you can get this to work without VBA.
For your example graph you could define two names ranges as follows:
Name: A_100
Refers To: =IF(Sheet1!$E$3:$E$100=100,OFFSET(Sheet1!$A$3,0,0,360/Sheet1!$B$1,1),NA())
and
Name: C_100
Refers To: =IF(Sheet1!$E$3:$E$100=100,OFFSET(Sheet1!$C$3,0,0,360/Sheet1!$B$1,1),NA())
Then set the X and Y axis of the chart to SheetName!A_100 and SheetName!C_100
The if statement filters out all the points not at frequency 100, if you have a formula for selecting the frequency replace "Sheet1!$E$3:$E$100=100" with that.
The offset function takes the first cell in the column and expands the number of rows according to your 360/step size formula.

Excel dynamic data series. Unusual data look and chart

Since I solved previous problem with collecting data from database, I need to put that data on a chart now. I am working on a report generating software called ReportWorx.
Problem is, data comes in series and looks like this:
ID DATE SAMPLE
1 XX-XX-XX VALUE
1 XX-XX-XX VALUE
1 XX-XX-XX VALUE
2 XX-XX-XX VALUE
2 XX-XX-XX VALUE
3 XX-XX-XX VALUE
3 XX-XX-XX VALUE
I can not change how it looks because it is generated automatically. What I want is linear chart in which 1, 2, 3 are series name and of course next to it DATE and VALUE are put on a linear chart (or bargraph, w/e) (Date at X axis, Value at Y axis).
I can`t specify how many records will be there (how many rows) but I found few solutions about creating dynamically increasing charts, so probably it will not be a poblem. I just do not know how to separate thos ID series from each other.
EDIT:
I have found a solution in VBA according to the first answer. Here you have VBA code below:
Sub Rewrite()
Dim row, id
For row = 38 To 1000
For id = 1 To 37
If Sheet1.Cells(row, 1).Value = id Then
Sheet2.Cells(row, 1).Value = Sheet1.Cells(row, 2)
Sheet2.Cells(row, id + 1).Value = Sheet1.Cells(row, 3)
End If
Next id
Next row
End Sub
Thank You #sancho.s
I will post a solution that I use a lot for cases like yours.
With reference to the figure (where I used sample numbers), you set up 3 new columns (D:F here), the header of which contain the corresponding labels. Then you use a formula for "splitting" the list of X data (column B here) associated with each label, and assigning a "NULL" value for data not corresponding (#N/A here, but you can choose whatever you want):
=IF($A3=D$2,$B3,$B$1)
You enter this in D3. The absolute/relative indexing used allows for copy-and-paste throughout D3:F9.
Cell B1 here contains the "NULL" value.
Then you plot 3 series: column C against columns D, E, F.
PS: I guess you could split the Y data column instead, with similar results. For some reason that I do not recall, I decided a long time ago that this was the best option, at least in my case then. You may want to try out the other option.
PS2: This also works for data that is not sorted by label.
PS3: Using NA() as the "NULL" value avoids cell values being taken as zero and then showing up in the chart, as it is the case with other errors (e.g., try using =1/0 in B1). It is the best option I found so far. Alternatively (just in case you find it useful), you can use an explicit value which is outside the actual X data range, but then you would have to manually set the X axis range. All this is for a Scatter plot, just check what works for your case.

dynamic programming - topcoder

I've been trying out the dp tutorials on Topcoder. One of the problems given for practice was MiniPaint . I think I've got the solution partly- find the minimum no. of mispaints for a given no. of strokes, for each row and then compute for the entire picture (again using dp, similar to the knapsack problem). However, I'm not sure how to compute the min. no for each row.
P.S I later found the match editorial, but the code for finding the min. no. of mispaintings for each row seems wrong. Could someone explain exactly what they've done in the code?
The stripScore() function returns the minimum number of mispaintings for each row given the amount of strokes available to paint it. Although I'm not sure if the rowid argument is correct, the idea is that starting at start at a particular row with needed amount of strokes available to use and the colour of the region directly before it.
The key to this algorithm, is that the best score for the area to the right of the kth region, is uniquely determined by the number of strokes needed, and the color used to paint the (k-1)th region.
Intuition
I have been bashing my head with this problem for 3 days straight, not realising that It requires two consecutive uses of dynamic programming logic. My approaches, in contrast to the ones available from topcoder, are bottom up.
To start with, instead of calculating the minimum number of mispaints I can achieve, I will instead calculate the maximum number of cells I can paint with maxStrokes strokes. The result can easily be calculated by subtracting my findings from the total cells of my matrix. But how can I really do that? The initial observation has to be the fact that each row can yield me some painted cells in exchange for a number of strokes. This does not depend on the rest of the rows. That means that, for each row, I can calculate the maximum number of cells I can paint on that specific row, with a certain number of strokes.
Example
Input=['BBWWB','WBWWW'], maxStrokes=3
Let's now look at the first row BBWWB, and denote C to be the Max number of cells i can paint with Q strokes
Q C
0 0 (I cant paint with 0 strokes)
1 3 (BBWWB)
2 4 (BBWWB)
3 5 (BBWWB)
We could easily represent the above results with an array of length 4 that stores for each index (stroke) the maximum number of cells that can be painted, namely [0,3,4,5]
It's easy to see that the second row in the same manner would have an array [0,4,4,5].
The result can now easily be calculated just by these two arrays alone, as what we're looking for is a combination of two choices, one for each calculated array, that will yield me the highest amount of cells I can paint with 3 strokes. What are my choices though? Each item of my array represents the maximum number of cells i can paint with index strokes. So, for the first array a choice would be to paint 4 cells with 2 strokes.
I could then combine that choice with the second array's 1-st item 4, which means I can paint 4 cells with 1 stroke. My final result would be 4+4=8 cells with 2+1=3 strokes, which happens to be the best I can get. The output would then trivially be 2*5-8=2 minimum mispaints. However, we need to find an optimal way to calculate the different combinations of items from each row and what sums they can yield me.
The Process
The first part of my algorithm populates two very important tables. Let us denote with N, M the dimensions of the matrix I'm given. The first table, dp is a N*M*maxStrokes matrix. dp[i][j][k] represents the maximum number of cells I can paint from the 0-th cell up until the j-th cell of the i-th row with k strokes. As for the maxPainted table, that is a N*maxStrokes matrix. maxPainted[i][k] stores the maximum number of cells I can paint in the i-th row with k strokes and is identical to the arrays calculated in the above example. In order to calculate the latter, I need to calculate dp first. The formula is the following:
dp[i][j][k]= MAX (1,dp[i][r][k]+1 (if A[i][j]==A[i][r]) ,dp[i][r][k-1]+1 (if A[i][j]!=A[i][r])), for every 0<=r<j
Which can be translated as: The maximum number of cells I can paint up to the j-th cell of the i-th row with k strokes is the maximum of:
1, because I can just ignore all the previous cells, and paint this cell alone
dp[i][r][k]+1, because when A[i][j]==A[i][r], I can extend that color with no extra strokes
dp[i][r][k-1]+1, because when A[i][j]==A[i][r], I have to use a new stroke to paint A[i][j]
It is now evident, that the dp table needs to be calculated in order to acquire the best possible scenarios for each row, that is the maximum number of cells I can paint with every possible number of strokes available. But how can I utilize the maxPainted table once I have calculated it in order to get to my result?
The second part of my approach uses a variation of the 0-1 Knapsack problem in order to calculate the biggest number of cells I can paint with maxStrokes strokes available. What really made this challenging, is that, in contrast to the classical Knapsack, I am only allowed to pick 1 item out of every row, and then calculate all the possible combinations that do not surpass the required stroke constraint. In order to achieve that, I will firstly create a new array of length N*M +1 , called possSums. Let us denote with possSums[S] the MINIMUM number of strokes needed to reach sum S. My goal is to calculate each row's contribution to this array. Let us demonstrate with our previous example.
So I had a 2*5 input, therefore the possSums array would consist of 10+1 elements, which we set to Infinity, as we re trying to minimize the keystrokes needed to reach said sums.
So, possSums=[0,∞,∞,∞,∞,∞,∞,∞,∞,∞,∞], with the first item being 0 because I can paint 0 cells with 0 strokes. What we re now looking to do is calculate each row's contribution to possSums. That means that for every row of my maxPainted array, each element needs to make a specific sum available, which will simulate it being chosen. As we have previously demostrated, maxPainted[0]=[0,3,4,5]. This row's contribution would have to allow 0,3,4 and 5 as achievable sums in my possSums array with used strokes 0,1,2,3 respectively. possSums would then be transformed to possSums=[0,∞,∞,1,2,3,∞,∞,∞,∞,∞]. The next row was maxPainted[1]=[0,4,4,5], which now has to once again alter the possSums to allow the combinations made possible with the selection of each item. Notice that each alterations needs to be irrelevant to the others in the same row. For example, if we first allow the sum=4 which can happen by picking the 1st item of maxPainted[1], sum=9 cannot be allowed by furtherly picking the 3d item of that same array, essentially meaning that combinations of items in the same row cannot be considered. In order to ensure that no such cases are considered, for each row I create a clone of my possSums array to which I will be making the necessary modifications instead of my original array. After considering all of the items within maxPainted[1], possSums would look like this possSums=[0,∞,∞,1,1,3,∞,2,3,4,6], giving me a maximum number of cells that can be painted with up to 3 strokes on the 8th index (sum=8). Therefore my output would be 2*5-8=2
var minipaint=(A,maxStrokes)=>{
let n=A.length,m=A[0].length
, maxPainted=[...Array(n)].map(d=>[...Array(maxStrokes+1)].map(d=>0))
, dp=[...Array(n)].map(d=>[...Array(m)].map(d=>[...Array(maxStrokes+1)].map(d=>0)))
for (let k = 1; k <=maxStrokes; k++)
for (let i = 0; i <n; i++)
for (let j = 0; j <m; j++) {
dp[i][j][k]=1 //i can always just paint the damn thing alone
//for every previous cell of this row
//consider painting it and then painting my current cell j
for (let p = 0; p <j; p++)
if(A[i][p]===A[i][j]) //if the cells are the same, i dont need to use an extra stroke
dp[i][j][k]=Math.max(dp[i][p][k]+1,dp[i][j][k])
else//however if they are,im using an extra stroke( going from k-1 to k)
dp[i][j][k]=Math.max(dp[i][p][k-1]+1,dp[i][j][k])
maxPainted[i][k]=Math.max(maxPainted[i][k],dp[i][j][k])//store the maximum cells I can paint with k strokes
}
//this is where the knapsack VARIANT happens:
// Essentially I want to maximize the sum of my selection of strokes
// For each row, I can pick maximum of 1 item. Thing is,I have a constraint of my total
// strokes used, so I will create an array of possSums whose index represent the sum I wanna reach, and values represent the MINIMUM strokes needed to reach that very sum.
// so possSums[k]=min Number of strokes needed to reach sum K
let result=0,possSums=[...Array(n*m+1)].map(d=>Infinity)
//basecase, I can paint 0 cells with 0 strokes
possSums[0]=0
for (let i = 0; i < n; i++) {
let curr=maxPainted[i],
temp=[...possSums]// I create a clone of my possSums,
// where for each row, I intend to alter It instead of the original array
// in order to avoid cases where two items from the same row contribute to
// the same sum, which of course is incorrect.
for (let stroke = 0; stroke <=maxStrokes; stroke++) {
let maxCells=curr[stroke]
//so the way this happens is :
for (let sum = 0; sum <=n*m-maxCells; sum++) {
let oldWeight=possSums[sum]//consider if UP until now, the sum was possible
if(oldWeight==Infinity)// if it wasnt possible, i cant extend it with my maxCells
continue;
// <GAME CHANGER THAT ALLOWS 1 PICK PER ROW
let minWeight=temp[sum+maxCells]//now, consider extending it by sum+maxCells
// ALTERING THE TEMP ARRAY INSTEAD SO MY POTENTIAL RESULTS ARE NOT AFFECTED BY THE
// SUMS THAT WERE ALLOWED DURING THE SAME ROW
temp[sum+maxCells]=Math.min(minWeight,oldWeight+stroke)
if(temp[sum+maxCells]<=maxStrokes)
result=Math.max(result,sum+maxCells)
}
}
possSums=temp
}
return n*m-result // returning the total number of cells minus the maximum I can paint with maxStrokes
}

Resources