How do I group rows based on a fixed sum of values in Excel? - excel

I am trying to find another solution to below Excel formula that was already provided here:
How do I create groups based on the sum of values?
It is the same requirement, but the grouping criteria needs to be an exact value.
Here's the sample data:
Column A | Column B
Item A | 1
Item B | 2
Item C | 3
Item D | 4
Item E | 5
Item F | 1
Item G | 2
Item H | 3
Item I | 4
Item J | 5
I need to group the rows if their Column B sum = 5.
Expected result:
Group 1 = Item A, Item D (1 + 4) = 5
Group 2 = Item B, Item C (2 + 3) = 5
Group 3 = Item E = 5
Group 4 = Item F, Item I (1 + 4) = 5
Group 5 = Item G, Item H (2 + 3) = 5
Group 6 = Item J = 5
If a row's Column B exceeds 5 or does not have another matching row to equal 5 when added then it will have no Group value.
Groupings can be interchangeable, ie. Group 1 = Item A, Item I can be made since 1 + 4 = 5.
I assume this can be achieved using Excel formulas but I am struggling to find which formula(s) can be used. Any help is appreciated!

I believe I was able to understand your question after some comments exchanged. Anyway I would recommend to update your question, it is an interesting problem, but the question was difficult to follow.
Before looking for an Excel solution, I took the approach of understanding the problem as a state machine with the transition from one state to another. I considered the following states that represent the position the item in the group. A group is defined as consecutive items that the sum of all items is equal to 5.
EMPTY: Just the initial situation
START: Start of the group
MIDDLE: A middle element of the group
END: The end of the group
START-END: A group with a single element
NA: Not applicable group
I follow the same idea of: How do I create groups based on the sum of values?, but slightly different helper columns:
Total (Column D), but for this case it is used the following formula: IF(SUM(C3,D2)>5,C3,SUM(C3,D2))
Status or item position within Group (Column G). Here is where it is calculated the corresponding status for each element
Checks for Valid Groups (Column H): Evaluates if a group is valid. When there is no match to 5, the group is not valid. It is indicated at the row that represents the beginning of the group (START or START-END states). If TRUE it means a valid group, if FALSE it is not a valid group, and NA for an NA value from Status column. If empty represents any element of the group that is not the first one.
Group # (Column I): To identify the group the row (Item) belongs to. Notice that we start counting the group from 1 and I also consider the case a group can not be formed (NA).
Here is a screenshot with the solution and the formula on G3:
=LET(total, D3, prevS, G2, QTY, C3,
IF(C3="", "",
IF(OR(AND(total=5, QTY<5, prevS="START"), AND(total=5, prevS="MIDDLE")), "END",
IF(OR(AND(total>5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total>5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "NA",
IF(OR(AND(total<5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total<5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START",
IF(AND(total<5, OR(prevS="START", prevS="MIDDLE")), "MIDDLE",
IF(OR(AND(total=5, total= QTY, OR(prevS="START", prevS="MIDDLE")),AND(total=5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START-END", "UNDEFINED")
)
)
)
)
)
)
Notes::
LET Excel function is used to have something more readable
The IF blocks should to be ordered from the most specific case of total and QTY values to the most generic ones. For the case with same total condition, make sure the second condition for prevS are not repeated.
Added as a last resort UNDEFINED case, to check if any transition was not covered, if that is the case it has to be reviewed, so far in the sample data all cases are covered
Column K-Q is just for documenting purpose to identify all possible transitions. Column K-M provides all possible transitions organized them by previous status. The columns O-Q represent all possible transitions ordered by current status, so it is easier to formulate each portion of the IF blocks.
Maybe the formula can be simplified, compared to the solution provided by the similar question is more complex, but this question has more specific conditions. Some transitions maybe not relevant for the final result, but it is preferred to consider all positions in the group to make sure all transitions are covered.
The following state machine diagram shows all possible transitions:
Notes:
As you can see the solution also considers when a group cannot be created or non valid groups (NA values). The solution considers that Item column has only positive values, it is not stated in the question any restriction, but looking at the example they are all positives. To consider zero values, this solution needs to be adjusted.
Checks for Valid Groups column is calculated as follow:
= IF(G3="", "",
IF(G3="START-END", TRUE,
IF(G3="NA", "NA",
IF(G3="START",
LET(endRow, IFNA(MATCH("START", LEFT(G4:$G$1000,5),0), MATCH("", LEFT(G4:$G$1000,5),0))+ ROW()-1,
value, VLOOKUP("END", G4:INDIRECT( "G" & endRow),1,0),
IF(ISNA(value), FALSE, TRUE)
), ""
)
)
)
)
It identifies the start and end of the group, and then finds any NA values, if there are, then it is not a valid group. If the end of the candidate group is not found (the first MATCH returns N/A), then is searches until a blank row
Group # column is calculated has follow:
=IF(C3="","", LET(value, MAX($I$2:I2), IF(G3="NA", "NA",
IF(H3=TRUE, value + 1, IF(H3=FALSE, "NA",
IF(I2="NA", "NA", value))))))
This way only valid transaction are considered, i.e. the following status transitions starting from START but not ending in END : START->NA, START->MIDDLE[one or more]...->NA and NA are not considered valid groups (NA).
I added more examples from the original sample file provided, more can be added to further test all possible scenarios, but I guess you get the idea about this approach. As you sated "I assume this can be achieved using Excel formulas" yes it is possible, but I would say for more complex conditions I would suggest to implement a state machine algorithm in VBA. Even it is possible to do it with Excel functions, you have to deal with several nested IF blocks and helper columns, something that can be achieved with a simple for-loop in VBA.
Here is a link to online Excel file I used.

Related

How to reproduce the equivalent of a Countif excel function in Business Objects?

I'm from France (sorry for my english) and I am currently working on the latest version of Business Objects (business-intelligence suite from SAP).
I would like to transpose an Excel formula to Business Objects, but I cannot. Could someone be able to answer me how to reproduce the equivalent of a Countif function, please ?
In my example, I have a whole list of repeating social security numbers to which I have appended a variable number taken from another field. I would like to do a count for each security number and know how many of them have the "2" value attached to them in my other field.
Example :
For 1741111111100 | 17411111111001, the result in a new field will be 2.
For 1741111111100 | 17411111111001, the result in a new field will be 2.
For 1741111111100 | 17411111111002, the result in a new field will be 2.
For 1741111111100 | 17411111111002, the result in a new field will be 2.
For 1741111111100 | 17411111111003, the result in a new field will be 2.
For 1751111111100 | 17511111111001, the result in a new field will be 1.
For 1751111111100 | 17511111111002, the result in a new field will be 1.
For 1751111111100 | 17511111111003, the result in a new field will be 1.
For 1761111111100 | 17611111111001, the result in a new field will be 0.
For 1761111111100 | 17611111111001, the result in a new field will be 0.
For 1761111111100 | 17611111111003, the result in a new field will be 0.
In excel it's easy to do with a Countif function but how could I do this in Business Objects, please ?
Thank you in advance because I spent a whole afternoon in vain.
RE-EDIT
Here's the same example with excel :
1741111111100|1|17411111111001|2|
1741111111100|1|17411111111001|2|
1741111111100|2|17411111111002|2|
1741111111100|2|17411111111002|2|
1741111111100|3|17411111111003|2|
1751111111100|1|17511111111001|1|
1751111111100|2|17511111111002|1|
1751111111100|3|17511111111003|1|
1761111111100|1|17611111111001|0|
1761111111100|1|17611111111001|0|
1761111111100|3|17611111111003|0|
A column :
there are my security numbers (1741111111100 repeated 5 times, 1751111111100 repeated 3 times, 1761111111100 repeated 3 times)
B column :
It's a number between 1 and 3.
C column :
I concatenated A column + B column like =CONCATENATE(A1;B1)
D column :
Here are my countif functions done like this :
=COUNTIF(C$1:C$11;CONCATENATE(A1;"2")) that gives a quantity of "2".
=COUNTIF(C$1:C$11;CONCATENATE(A2;"2")) that gives a quantity of "2".
=COUNTIF(C$1:C$11;CONCATENATE(A3;"2")) that gives a quantity of "2".
=COUNTIF(C$1:C$11;CONCATENATE(A4;"2")) that gives a quantity of "2".
=COUNTIF(C$1:C$11;CONCATENATE(A5;"2")) that gives a quantity of "2".
=COUNTIF(C$1:C$11;CONCATENATE(A6;"2")) that gives a quantity of "1".
=COUNTIF(C$1:C$11;CONCATENATE(A7;"2")) that gives a quantity of "1".
=COUNTIF(C$1:C$11;CONCATENATE(A8;"2")) that gives a quantity of "1".
=COUNTIF(C$1:C$11;CONCATENATE(A9;"2")) that gives a quantity of "0".
=COUNTIF(C$1:C$11;CONCATENATE(A10;"2")) that gives a quantity of "0".
=COUNTIF(C$1:C$11;CONCATENATE(A11;"2")) that gives a quantity of "0".
I was interested by the "2" value attached to the security number and the number of security numbers concerned by this attachment.
So, it's easy to do with excel but so so so hard to do with B.I. !
Thanx for any help.
Background:
Context Operators:
ForEach
ForAll
In
Operate on "Sets" of data allowing you to parse out data into subsets and aggregate. in SQL this is similar to "Over Partition By" if you're familiar with it.
Answer:
If we assume you create a variable in the report [Concat] as follows:
=Concatenation([A];"2")
Then we can use formula:
=Sum(If([C]=[Concat];1;0)) ForEach([Concat]) In ([C])
Where [C] is your concatenated columns A+B.
Explanation
The above essentially says if [C] = [Concat] return a 1 otherwise return a 0 and then sum the results of those evaluations together.
This occurs ForEach unique value within [Concat]; found in [C].
Logically the system finds all the unique values in [Concat] in then iterates though [C] evaluating if [C]=[Concat] for each case. it then sums the results for each [concat] and then renders those results for each [C]
Additional Point:
In my example the report data was being "combined" due to duplicate values So row 1, row 2 in your example were combined. I had to turn off BO's combining of this row data or my results were skewed. This can be accomplished by formatting the result table and checking the top checkbox "Avoid Duplicate row Aggregation" If you have other values which make each row unique, you will not have this problem. You can turn this off at a query level as well in the query properties of the edit data provider. I believe it depends on what source you use as to which method must be used... But I'm not positive.
So below you can see results from [Cnt] match your expected results in column D using the aforementioned formula.

how to vlookup if prefix found in the list?

HI.
how can i come up with return value of "company name" (column H) at Column B IF any of the "PrefiX" (Column G) found at "con no" (Column A).
Sample of outcome needed as in column B.
Sample:
620011113 = DD
CN1234 = BB
thanks
=INDEX($H:$H,AGGREGATE(15,6,ROW($G$1:$G$7)/(--(FIND($G$1:$G$7,$A2)=1)*--(LEN($G$1:$G$7)>0)),1),1)
Breaking this down, the INDEX retrieves the Nth item from Column H (Company name). To find the value of N, we are using the AGGREGATE function
AGGREGATE is a weird function - it lets us use things like MAX or LARGE or SUM while ignoring any error values. In this case, we will be using it for SMALL (first argument, 15), while Ignoring Error Values (second argument, 6). We will want the very smallest value, so the fourth argument will be 1. (If we wanted the second smallest, it would be 2, and so on)
=INDEX($H:$H,AGGREGATE(15,6, <SOMETHING> ,1),1)
So, all we need now is a list of values to compare! To make things slightly simpler, I'll break that bit of the code out for you here:
ROW($G$1:$G$7) / (--(FIND($G$1:$G$7,$A2)=1) * --(LEN($G$1:$G$7)>0))
There are 3 parts to this. The first, ROW($G$1:$G$7)is the actual value we want to retrieve - these will be the Row Numbers for each Prefix that matches your value. On its own, however, it will be all the row numbers. Since we are skipping errors, we want any Rows that don't match the prefix to throw an error. The easiest way to do this is to Divide by Zero
At the start of --(FIND($G$1:$G$7,$A2)=1) and --(LEN($G$1:$G$7)>0) we have a double-negative. This is a quick way to convert True and False to 1 and 0. Only when both tests are True will we not divide by 0, as this table shows:
A | B | A*B
1 | 1 | 1
1 | 0 | 0
0 | 1 | 0
0 | 0 | 0
Starting with the second test first (it's easier), we have LEN($G$1:$G$7)>0 - basically "don't look at blank cells".
The other test (FIND($G$1:$G$7,$A2)=1) will search for the Prefix in the Con No, and return where it is found (or a #VALUE! error if it isn't). We then check "is this at position 1" - in other words, "Is this at the start of the Con No, rather than in the middle". We don't want to say Con No CNQ6060 is part of Company AA instead of Company BB by mistake!
So, if the Prefix is at the Start of the Con No, AND it isn't Blank (because there is an infinite amount of Nothing Before, After, and Between every number and letter), then we get it added to our list of Rows. We then take the smallest row (i.e. closest to the top - change AGGREGATE(15 to AGGREGATE(14 if you want the closest to the bottom!), and use that to get the Company Name
You could try the below formal:
=VLOOKUP(IF(LEFT(A3,1)="6",LEFT(A3,4),IF(LEFT(A3,1)="C",LEFT(A3,2),IF(LEFT(A3,1)="E",LEFT(A3,7)))),$G$3:$H$7,2,0)
Have in mind that you have to use ' before the cell value of column A & G in order to convert cell value into text get the correct out comes using VLOOKUP
Result:

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Count number of occurences of a string and relabel

I have a n x 1 cell that contains something like this:
chair
chair
chair
chair
table
table
table
table
bike
bike
bike
bike
pen
pen
pen
pen
chair
chair
chair
chair
table
table
etc.
I would like to rename these elements so they will reflect the number of occurrences up to that point. The output should look like this:
chair_1
chair_2
chair_3
chair_4
table_1
table_2
table_3
table_4
bike_1
bike_2
bike_3
bike_4
pen_1
pen_2
pen_3
pen_4
chair_5
chair_6
chair_7
chair_8
table_5
table_6
etc.
Please note that the dash (_) is necessary Could anyone help? Thank you.
Interesting problem! This is the procedure that I would try:
Use unique - the third output parameter in particular to assign each string in your cell array to a unique ID.
Initialize an empty array, then create a for loop that goes through each unique string - given by the first output of unique - and creates a numerical sequence from 1 up to as many times as we have encountered this string. Place this numerical sequence in the corresponding positions where we have found each string.
Use strcat to attach each element in the array created in Step #2 to each cell array element in your problem.
Step #1
Assuming that your cell array is defined as a bunch of strings stored in A, we would call unique this way:
[names, ~, ids] = unique(A, 'stable');
The 'stable' is important as the IDs that get assigned to each unique string are done without re-ordering the elements in alphabetical order, which is important to get the job done. names will store the unique names found in your array A while ids would contain unique IDs for each string that is encountered. For your example, this is what names and ids would be:
names =
'chair'
'table'
'bike'
'pen'
ids =
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1
1
1
1
2
2
names is actually not needed in this algorithm. However, I have shown it here so you can see how unique works. Also, ids is very useful because it assigns a unique ID for each string that is encountered. As such, chair gets assigned the ID 1, followed by table getting assigned the ID of 2, etc. These IDs will be important because we will use these IDs to find the exact locations of where each unique string is located so that we can assign those linear numerical ranges that you desire. These locations will get stored in an array computed in the next step.
Step #2
Let's pre-allocate this array for efficiency. Let's call it loc. Then, your code would look something like this:
loc = zeros(numel(A), 1);
for idx = 1 : numel(names)
id = find(ids == idx);
loc(id) = 1 : numel(id);
end
As such, for each unique name we find, we look for every location in the ids array that matches this particular name found. find will help us find those locations in ids that match a particular name. Once we find these locations, we simply assign an increasing linear sequence from 1 up to as many names as we have found to these locations in loc. The output of loc in your example would be:
loc =
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
5
6
7
8
5
6
Notice that this corresponds with the numerical sequence (the right most part of each string) of your desired output.
Step #3
Now all we have to do is piece loc together with each string in our cell array. We would thus do it like so:
out = strcat(A, '_', num2str(loc));
What this does is that it takes each element in A, concatenates a _ character and then attaches the corresponding numbers to the end of each element in A. Because we want to output strings, you need to convert the numbers stored in loc into strings. To do this, you must use num2str to convert each number in loc into their corresponding string equivalents. Once you find these, you would concatenate each number in loc with each element in A (with the _ character of course). The output is stored in out, and we thus get:
out =
'chair_1'
'chair_2'
'chair_3'
'chair_4'
'table_1'
'table_2'
'table_3'
'table_4'
'bike_1'
'bike_2'
'bike_3'
'bike_4'
'pen_1'
'pen_2'
'pen_3'
'pen_4'
'chair_5'
'chair_6'
'chair_7'
'chair_8'
'table_5'
'table_6'
For your copying and pasting pleasure, this is the full code. Be advised that I've nulled out the first output of unique as we don't need it for your desired output:
[~, ~, ids] = unique(A, 'stable');
loc = zeros(numel(A), 1);
for idx = 1 : numel(names)
id = find(ids == idx);
loc(id) = 1 : numel(id);
end
out = strcat(A, '_', num2str(loc));
If you want an alternative to unique, you can work with a hash table, which in Matlab would entail to using the containers.Map object. You can then store the occurrences of each individual label and create the new labels on the go, like in the code below.
data={'table','table','chair','bike','bike','bike'};
map=containers.Map(data,zeros(numel(data),1)); % labels=keys, counts=values (zeroed)
new_data=data; % initialize matrix that will have outputs
for ii=1:numel(data)
map(data{ii}) = map(data{ii})+1; % increment counts of current labels
new_data{ii} = sprintf('%s_%d',data{ii},map(data{ii})); % format outputs
end
This is similar to rayryeng's answer but replaces the for loop by bsxfun. After the strings have been reduced to unique labels (line 1 of code below), bsxfun is applied to create a matrix of pairwise comparisons between all (possibly repeated) labels. Keeping only the lower "half" of that matrix and summing along rows gives how many times each label has previously appeared (line 2). Finally, this is appended to each original string (line 3).
Let your cell array of strings be denoted as c.
[~, ~, labels] = unique(c); %// transform each string into a unique label
s = sum(tril(bsxfun(#eq, labels, labels.')), 2); %'// accumulated occurrence number
result = strcat(c, '_', num2str(x)); %// build result
Alternatively, the second line could be replaced by the more memory-efficient
n = numel(labels);
M = cumsum(full(sparse(1:n, labels, 1)));
s = M((1:n).' + (labels-1)*n);
I'll give you a psuedocode, try it yourself, post the code if it doesn't work
Initiate a counter to 1
Iterate over the cell
If counter > 1 check with previous value if the string is same
then increment counter
else
No- reset counter to 1
end
sprintf the string value + counter into a new array
Hope this helps!

Finding the next result from a MATCH

I am trying to produce a sorted table in excel, which depend on the selected year and category.
My methodology has been to sequentially find largest values in order, within the selected year and category parameters, doing the following:
Column E
{=LARGE(IF(('Master Data'!A$1:A$500 = $B$1) * ('Master Data'!B$1:B$500 = $B$2),'Master Data'!C$1:C$500), $B10)}
This works fine, $B$1$ is where I store the year, $B$2 is where I store the category, $B10 references a hard coded 1-25 in column B.
Column F
{=MATCH(E10,IF(('Master Data'!A$1:A$500 = $B$1) * ('Master Data'!B$1:B$500 = $B$2),'Master Data'!C$1:C$500),FALSE)}
This returns the row number of the result I need, which I then use in conjunction with INDEX to find related data.
The problem with this is that Match only returns the first row number, and if you have two results with the same value this clearly becomes an issue.
Column G
To resolve this I used an example from dailydoseofexcel which looks like this:
=IF(F10<>F11, F11, G10+MATCH(E11,INDIRECT("'Master Data'!C"&(G10+1)&":C500"),0))
This works to a limited extent, for my purposes, as it is unable to take into account the year and category filter I need to apply, so I tried:
{=IF(F10<>F11, F11, G10+MATCH(E11,IF((INDIRECT("'Master Data'!A"&(G10+1)&":A500") = $C$2) * (INDIRECT("'Master Data'!B"&(G10+1)&":B500") = $C$3), INDIRECT("'Master Data'!C"&(G10+1)&":C500")),0))}
But I am just getting #N/A as a result.
I think SUMPRODUCT may be what you are looking for:
Charley Kyd XL Legend: Use SUMPRODUCT to get the Last item in a list

Resources