I am trying to find another solution to below Excel formula that was already provided here:
How do I create groups based on the sum of values?
It is the same requirement, but the grouping criteria needs to be an exact value.
Here's the sample data:
Column A | Column B
Item A | 1
Item B | 2
Item C | 3
Item D | 4
Item E | 5
Item F | 1
Item G | 2
Item H | 3
Item I | 4
Item J | 5
I need to group the rows if their Column B sum = 5.
Expected result:
Group 1 = Item A, Item D (1 + 4) = 5
Group 2 = Item B, Item C (2 + 3) = 5
Group 3 = Item E = 5
Group 4 = Item F, Item I (1 + 4) = 5
Group 5 = Item G, Item H (2 + 3) = 5
Group 6 = Item J = 5
If a row's Column B exceeds 5 or does not have another matching row to equal 5 when added then it will have no Group value.
Groupings can be interchangeable, ie. Group 1 = Item A, Item I can be made since 1 + 4 = 5.
I assume this can be achieved using Excel formulas but I am struggling to find which formula(s) can be used. Any help is appreciated!
I believe I was able to understand your question after some comments exchanged. Anyway I would recommend to update your question, it is an interesting problem, but the question was difficult to follow.
Before looking for an Excel solution, I took the approach of understanding the problem as a state machine with the transition from one state to another. I considered the following states that represent the position the item in the group. A group is defined as consecutive items that the sum of all items is equal to 5.
EMPTY: Just the initial situation
START: Start of the group
MIDDLE: A middle element of the group
END: The end of the group
START-END: A group with a single element
NA: Not applicable group
I follow the same idea of: How do I create groups based on the sum of values?, but slightly different helper columns:
Total (Column D), but for this case it is used the following formula: IF(SUM(C3,D2)>5,C3,SUM(C3,D2))
Status or item position within Group (Column G). Here is where it is calculated the corresponding status for each element
Checks for Valid Groups (Column H): Evaluates if a group is valid. When there is no match to 5, the group is not valid. It is indicated at the row that represents the beginning of the group (START or START-END states). If TRUE it means a valid group, if FALSE it is not a valid group, and NA for an NA value from Status column. If empty represents any element of the group that is not the first one.
Group # (Column I): To identify the group the row (Item) belongs to. Notice that we start counting the group from 1 and I also consider the case a group can not be formed (NA).
Here is a screenshot with the solution and the formula on G3:
=LET(total, D3, prevS, G2, QTY, C3,
IF(C3="", "",
IF(OR(AND(total=5, QTY<5, prevS="START"), AND(total=5, prevS="MIDDLE")), "END",
IF(OR(AND(total>5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total>5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "NA",
IF(OR(AND(total<5, total=QTY, OR(prevS="START", prevS="MIDDLE")),AND(total<5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START",
IF(AND(total<5, OR(prevS="START", prevS="MIDDLE")), "MIDDLE",
IF(OR(AND(total=5, total= QTY, OR(prevS="START", prevS="MIDDLE")),AND(total=5, OR(prevS="", prevS="END", prevS="NA", prevS="START-END"))), "START-END", "UNDEFINED")
)
)
)
)
)
)
Notes::
LET Excel function is used to have something more readable
The IF blocks should to be ordered from the most specific case of total and QTY values to the most generic ones. For the case with same total condition, make sure the second condition for prevS are not repeated.
Added as a last resort UNDEFINED case, to check if any transition was not covered, if that is the case it has to be reviewed, so far in the sample data all cases are covered
Column K-Q is just for documenting purpose to identify all possible transitions. Column K-M provides all possible transitions organized them by previous status. The columns O-Q represent all possible transitions ordered by current status, so it is easier to formulate each portion of the IF blocks.
Maybe the formula can be simplified, compared to the solution provided by the similar question is more complex, but this question has more specific conditions. Some transitions maybe not relevant for the final result, but it is preferred to consider all positions in the group to make sure all transitions are covered.
The following state machine diagram shows all possible transitions:
Notes:
As you can see the solution also considers when a group cannot be created or non valid groups (NA values). The solution considers that Item column has only positive values, it is not stated in the question any restriction, but looking at the example they are all positives. To consider zero values, this solution needs to be adjusted.
Checks for Valid Groups column is calculated as follow:
= IF(G3="", "",
IF(G3="START-END", TRUE,
IF(G3="NA", "NA",
IF(G3="START",
LET(endRow, IFNA(MATCH("START", LEFT(G4:$G$1000,5),0), MATCH("", LEFT(G4:$G$1000,5),0))+ ROW()-1,
value, VLOOKUP("END", G4:INDIRECT( "G" & endRow),1,0),
IF(ISNA(value), FALSE, TRUE)
), ""
)
)
)
)
It identifies the start and end of the group, and then finds any NA values, if there are, then it is not a valid group. If the end of the candidate group is not found (the first MATCH returns N/A), then is searches until a blank row
Group # column is calculated has follow:
=IF(C3="","", LET(value, MAX($I$2:I2), IF(G3="NA", "NA",
IF(H3=TRUE, value + 1, IF(H3=FALSE, "NA",
IF(I2="NA", "NA", value))))))
This way only valid transaction are considered, i.e. the following status transitions starting from START but not ending in END : START->NA, START->MIDDLE[one or more]...->NA and NA are not considered valid groups (NA).
I added more examples from the original sample file provided, more can be added to further test all possible scenarios, but I guess you get the idea about this approach. As you sated "I assume this can be achieved using Excel formulas" yes it is possible, but I would say for more complex conditions I would suggest to implement a state machine algorithm in VBA. Even it is possible to do it with Excel functions, you have to deal with several nested IF blocks and helper columns, something that can be achieved with a simple for-loop in VBA.
Here is a link to online Excel file I used.
I have a string column in impala called text that contains descriptions. I would like to get the words before and after a specific keyword.
Example:
text= This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom....
keyword= m2
desired result: two columns, word before = 50 and word after= apartment
Any ideas?
You can use regexp_extract to match words before and after m2 and extract them seperately.
with t as ( select "This is a great property right in front of the beach. The 50 m2 apartment is divided into a bedroom" as text)
select
regexp_extract(t.text , "(\\w+)\\s+m2", 1) as word_before,
regexp_extract(t.text , "m2\\s+(\\w+)", 1) as word_after
from t ;
+--------------+-------------+--+
| word_before | word_after |
+--------------+-------------+--+
| 50 | apartment |
+--------------+-------------+--+
This seems like a very simple problem but I just cannot find documentation for it, on MSDN.
How can I create a UserForm in Excel that will let user select many items?
Example(assuming that this will be done using listbox, otherwise better solution will be used):
_______
|item1 | <- pick this
|item2 |
|item3 | <- ,and pick this
|item4 |
|______|
me.listbox.multiselect=1 lets user select multiple lines.
further you need to loop through your list for i=0 to me.listbox.listcount-1 ,
and to check if item is selected if me.listbox.selected(i) = true then.... do stuff
note : a list starts with 0 (like in my FOR loop), so your first item is called me.listbox.list (0).value.
info on the selected method : http://msdn.microsoft.com/en-us/library/office/gg251644%28v=office.15%29.aspx
The following table illustrates a brief snapshot of the data that I wish to manipulate. I am looking for an awk script that will group similar elements into one group. For eg. if you look at the table below:
Numbers (1,2,3,4,6) should all belong to one group. So row1 row2 row4 row8 will be group "1"
Number 9 is unique and does not have any common elements. So it will reside alone in a separate group say group 2
Similarly numbers 5,7 will reside in one group say group 3 and so on...
The file:
heading1 heading2 numberlist group
name1 text 1,2,3 1
name2 text 2 1
name3 text 9 2
name4 text 1,4 1
name5 text 5,7 3
name6 text 7 3
name7 text 8 4
name8 text 6,2 1
I was searching for queries similar to mine and found this link. Grouping lists by common elements. But the solution is in C++ and not awk, which is my primary requirement.
Incidentally I also found this awk solution that is somewhat related to my query but it was devoid of handling of comma separated values.
awk script grouping with array
Numberlist i.e. $3 is my only consideration for grouping.
This problem seemed almost same as one of my problems and i had used one column in your example to solve my problem :) So...
[[bash_prompt$]]$ cat log ; echo "########"; \
> cat test.sh ;echo "########"; awk -f test.sh log
heading1 heading2 numberlist group
name1 text 1,2,3
name2 text 2
name3 text 9
name4 text 1,4
name5 text 5,7
name6 text 7
name7 text 8
name8 text 6,2
########
/^name/{
i=0; j=0;
split($3,a,",");
for(var in a) {
for(var1 in q) {
split(q[var1],r,",");
for(var2 in r) {
if(r[var2] == a[var]) {
i=1;
j=((var1+1));
}
}
}
}
if(i == 0) {
q[length(q)] = $3;
j=length(q);
}
print $1 "\t\t" $2 " \t\t" $3 "\t\t" j;
}
########
name1 text 1,2,3 1
name2 text 2 1
name3 text 9 2
name4 text 1,4 1
name5 text 5,7 3
name6 text 7 3
name7 text 8 4
name8 text 6,2 1
[[bash_prompt$]]$
Update:
split splits the first argument by the delimiter passed in third argument and puts it into an array pointed by the second argument. Here main array is q, which holds the group members of a group, it's basically an array of arrays where the index of an element is the group id, and the element is collection all the members of the group. so q[0]="1,2,3" indicates 0th group is containing members 1,2 and 3. Now in awk, first one line is read which starts with name (/^name/). Then the 3rd field (1,2,3) is broken down into an array a. Now for each element in an array a, we go for each group stored into q (for(var1 in q)) , then inside each group, we split them into another temporary array r (split(q[var1],r,",")), i.e. "1,2,3" is split into an array r. Now each element in r is compared to the element in a. if a match found, the group's index is the index of that row (array index starts from 0, group's from 1, so ((var1+1)) used. Now if not found, just add this as a new group in q and the last index + 1, i.e. length of the array is the index for the row
Update:
/^name/{
j=0;
split($3,a,",");
for(var in a) {
if(q[a[var]] != 0) {
j=q[a[var]]; i=1;
break;
}
}
j = (j == 0) ? ++k : j;
for(var1 in a) {
if(q[a[var1]] == 0) {
q[a[var1]] = j;
}
}
print $1 "\t\t" $2 " \t\t" $3 "\t\t" j;
}
Update:
base is awk has associative array and each element is accessed by a string key. Earlier approach was to store each group in an array where key is the index of the group. So when we were reading a column, we will read each group, split the group in individual element, compare each of the element with each element of the column. But instead of storing a group, if we store the elements in an array where key is the element themselves and value at key is the index of the group to which the element belongs. So when we read a column, we split the column in individual element (split($3,a,",");) then check element in array if there is a group index with the element as key in if(q[a[var]] != 0)( in awk, if the element is not there, by default an element with value 0 is initialized there, so the check q[a[var]] != 0 ). If any element is found, we take the element's group index as the index of the column and break. else j will remain 0. if j remains 0, ++k gives the latest group index. Now we found the group index for the column elements. Need to carry that index to those elements which are not a part of any other group( there will be cases where multiple elements in same column belongs to different group, here we are taking the first come, first serve approach, but do not over write the group index of others already belonging to another group). So for each element in column (for(var1 in a)) , if it does not belong to a group (if(q[a[var1]] == 0)) , give it a group index q[a[var1]] = j;. So here all accesses are linear because we are accessing using elements directly a key. Thus no breaking up a group again and again for every element and hence a shorter time. My first approach was based on one of my own problem ( i mentioned in first line ) which was more complex processing but shorter data set. But this one required a simpler straight forward logic.
I have tried searching for a solution but cant find it.
I have a list of products, and each product has many parts. Is there a HasNext function in VBA to see if there are more parts for a product? For instance, for chicken burger, I want to pick out all the parts, put them in an array and display it in another sheet.
I cant hard-code the array, because the client would add in more products in the future. There might be 15, 20, 23 parts etc. Is there a HasNext function to get the value in the next column and add it into the array?
Product | Part 1 | Part 2 | Part 3
Chicken Burger | Veggie | Bun | Patty
You can use Range.End property to detect how "long" the title row is:
Dim col
col=Range("A1").End(xlToRight).Column
For i = 1 to col
If Cells(1, col).Value <> "" Then
'...
End
P.S. I wonder why MSDN refers to it as "property" instead of "method"...