Excel: Combining semi-duplicated records (with different columns)?

Excel: Combining semi-duplicated records (with different columns)? - excel

How would I combine records if specified columns are the same?
Here's what I have, and the result I'm looking for:

It can be done using array formulas if you don't mind them being big and ugly. This example should do what you're looking for. In case of duplicate entries, it simply takes the last defined value (Prog instead of Programmer for Kevin Moss):
Enter the following formula into C11 and D11, then press CTRL+SHIFT+ENTER to apply the array formula. You can then copy the formula to the rows below as needed.
=INDEX((IF((((($A11=$A$2:$A$7)+($B11=$B$2:$B$7))=2)+(C$2:C$7<>""))=2,C$2:C$7,"")),MAX(IF((IF((((($A11=$A$2:$A$7)+($B11=$B$2:$B$7))=2)+(C$2:C$7<>""))=2,C$2:C$7,""))<>"",ROW($A$1:$A$6),0)))
This breaks down what's happening a little bit, but admittedly it's still pretty opaque, sorry:
=INDEX(
(IF( # This IF statement collects all entries in a data field for a given Fname/Lname combination
(((($A11=$A$2:$A$7) + ($B11=$B$2:$B$7))=2) + (C$2:C$7<>""))=2, # Checks that First and Last Name Match, and Data field isn't empty
C$2:C$7, # Return data field if TRUE
"" # Return empty if FALSE
)),
MAX( # Take the highest index number, use it to select a row from the result of the IF statement above
IF(( # This IF statement returns an index number if the data field isn't empty
IF( # This IF statement collects all entries in a data field for a given Fname/Lname combination (copied from above)
(((($A11=$A$2:$A$7)+($B11=$B$2:$B$7))=2)+(C$2:C$7<>""))=2,
C$2:C$7,
"")
)<>"", # End of conditional statement
ROW($A$1:$A$6), # Value if TRUE (ROW used as an incrementing counter)
0 # Value if FALSE (0 will be ignored in the MAX function that uses this result)
)
)
)

Related

Comparing two columns and their values and outputting the greater value

I'm trying to compare two columns ("Shows") from different tables and showing which one has the greater number ("Rating") associated with it in another table.
Ignore the operation column above as part of the solution that I'm trying to get, it's just to illustrate for you what I'm trying to compare.
Important note: If the names are duplicated. Compare the matching pair in their corresponding order. (1st with 1st, 2nd with 2nd, 3rd with 3rd etc..) illustrated in the table below:
Thanks

You can try the following in cell F3 for an array solution that spills the entire result at once:
=LET(sA, A3:A6, rA, B3:B6, sB, C3:C6, rB, D3:D6, CNTS, LAMBDA(x,
LET(seq, SEQUENCE(ROWS(x)), MAP(seq, LAMBDA(s,ROWS(FILTER(x,(x=INDEX(x,s))
*(seq<=s))))))), cntsA, CNTS(sA), cntsB, CNTS(sB), eval, MAP(sA, rA, cntsA,
LAMBDA(s,r,c,IF(r > FILTER(rB, (sB=s) * (cntsB=c)), "Table 1", "Table 2"))),
HSTACK(sA, eval))
Here is the output:
Explanation
The main idea is to count repeated show values. We use a user LAMBDA function CNTS, to avoid repetition of the same formula twice. Once we have the counts (cntsA, contsB), we use MAP to iterate over Table 1 elements with the counts and look for specific show and counts to compare with Table 2 columns. The FILTER function will return always a single value (based on sample data). Finally, we prepare the output as expected using HSTACK.

Try-
=IF(INDEX(FILTER($B$3:$B$6,$A$3:$A$6=G3),COUNTIFS($G$3:$G3,G3))>INDEX(FILTER($E$3:$E$6,$D$3:$D$6=G3),COUNTIFS($G$3:$G3,G3)),"Table-1","Table-2")

Textjoin values of column B if duplicates are present in column A

I want to consolidate the data of column B into a single cell ONLY IF the index (ie., Column A) is duplicated.
For example:
Currently, I'm doing manually for each duplicated index by using the following formula:
=TEXTJOIN(", ",TRUE,B4:B6)
Is there a better way to do this all at once?
Any help is appreciated.

There may easier way but you can try this formula-
=BYROW(A2:A17,LAMBDA(p,IF(INDEX(MAP(A2:A17,LAMBDA(x,SUM(--(A2:INDEX(A2:A17,ROW(x)-1)=x)))),ROW(p)-1,1)=1,TEXTJOIN(", ",1,FILTER(B2:B17,A2:A17=p)),"")))

Using REDUCE might be possible for a more succinct solution, though try this for now:
=BYROW(A2:A17,LAMBDA(ζ,LET(α,A2:A17,IF((COUNTIF(α,ζ)>1)*(COUNTIF(INDEX(α,1):ζ,ζ)=1),TEXTJOIN(", ",,FILTER(B2:B17,α=ζ)),""))))

For the sake of alternatives about how to solve it:
Using XMATCH/UNIQUE
=LET(A, A2:A17, ux, UNIQUE(A),idx, FILTER(XMATCH(ux, A), COUNTIF(A, ux)>1),
MAP(SEQUENCE(ROWS(A)), LAMBDA(s, IF(ISNA(XMATCH(s, idx)), "", TEXTJOIN(",",,
FILTER(B2:B17, A=INDEX(A,s)))))))
or using SMALL/INDEX to identify the first element of the repetition:
=LET(A, A2:A17, n, ROWS(A), s, SEQUENCE(n),
MAP(A, s, LAMBDA(aa,ss, LET(f, FILTER(B2:B17, A=aa), IF((ROWS(f)>1)
* (INDEX(s, SMALL(IF(A=aa, s, n+1),1))=ss), TEXTJOIN(",",, f), "")))))
Here is the output:
Explanation
XMATCH and UNIQUE
The main idea here is to identify the first unique elements of column A via ux, and find their corresponding index position in A via XMATCH(ux, A). It is an array of the same size as ux. Then COUNTIF(A, ux)>1) returns an array of the same size as XMATCH output indicating where we have a repetition.
Here is the intermediate result:
XMATCH(ux, A) COUNTIF(A, ux)>1)
1 FALSE
2 FALSE
3 TRUE
6 FALSE
7 TRUE
9 TRUE
11 FALSE
12 TRUE
15 FALSE
16 FALSE
so FILTER takes only the rows form the first column where the second column is TRUE, i.e the index position (idx) where the repetition starts. For our sample it will be: {3;7;9;12}.
Now we iterate over the sequence of index positions (s) via MAP . If s is found in idx via XMATCH (also XLOOKUP(s, idx, TRUE, FALSE) can be used for the same purpose) then we join the values of column B filtered by column A equal to INDEX(A,s).
SMALL and INDEX
This is a more flexible approach because in the case we want to do the concatenation in another position of the repetition you just need to specify the order and the formula doesn't change.
We iterate via MAP through elements of column A and index position (s). The name f has the filtered values from column B where column A is equal to a given value of the iteration aa. We need to identify only filtered rows with repetition, so the first condition ROWS(f) > 1 ensures it.
The second condition identifies only the first element of the repetition:
INDEX(s, SMALL(IF(A=aa, s, n+1),1))=ss
The second argument of SMALL indicates we want the first smallest value, but it could be the second, third, etc.
Where A is equal to aa, IF assigns the corresponding value of the sequence (remember IF works as an array formula), if not then it assigns a value that will never be the smallest one, for example, n+1, where n represents the number of rows of column B. SMALL returns the smallest index position. If the current index position ss is not the smallest one, the conditions FALSE.
Finally, we do a TEXTJOIN only when both conditions are met (we multiply them to ensure an AND condition).

How/which formula to use, to show combine text results for false condition (for pending task reporting usage)?

Wanted to check if CONCATENATE is the one to use (not sure if my excel has TEXTJOIN), and how to show just the text that has empty value in the cells.
For example in my attachment below, I want the intended result shown like in B2 and B3, where the texts shown with delimiter, when the values are false (empty).
If I were to use CONCATENATE like in Row 10 and Row 11, it's rather manual and it only capture "positive values" as in non-blank cells.
Purpose: To show pending tasks (empty/blank status cells)

Use MID with CONCATENATED IFS:
=MID(IF(C2="","/"&$C$1,"")&IF(D2="","/"&$D$1,"")&IF(E2="","/"&$E$1,"")&IF(F2="","/"&$F$1,"")&IF(GC2="","/"&$G$1,"")&IF(H2="","/"&$H$1,""),2,999)

I would use TEXJOIN and FILTER if you have the newest version of Excel.
For example: =TEXTJOIN("/",1,FILTER($E$2:$I$2, ISBLANK(E3:I3)))
EDIT: For older versions, a temporary workaround is as follows:
make a temporary array the same size as your original dataframe where each value is determined by a formula such as =IF(ISBLANK(E3), E$2&"/","")
Use something like =LEFT(CONCAT(E15:J15), LEN(CONCAT(E15:J15))-1) to get the desired result (where E15:J15 is where I elected to store the first row of the temporary array created in step 1).

I am not sure of your Excel version, but I think this would work in older versions (formatted for readability - will work if you paste it directly into cell B2 and copy down):
=LEFT(CONCAT( INDEX( CHOOSE({1;2;3},$C$1:$H$1,{"/","/","/","/","/","/"},{"","","","","",""}),
INDEX( IF(ISBLANK(C2:H2),{1;2},{3;3}),
MOD(COLUMN(A1:INDEX(1:1,,12))-1,2)+1,
(COLUMN(A1:INDEX(1:1,,12))-1)/2+1 ),
(COLUMN(A1:INDEX(1:1,,12))-1)/2+1 ) ),
SUM(7*ISBLANK(C2:H2))-1 )
Notes
As this is an array formula, you may have to enter it with CTRL + SHIFT + ENTER with an older version of Excel.
The stat labels must all have a length of 6 characters as shown in your post. If not, then they must at least have the same length and the last line SUM(7*ISBLANK(C2:H2))-1 must be changed to replace the 7 with the string length + 1, e.g. a length of 9 would be SUM(10*ISBLANK(C2:H2))-1.
If they don't have the same length, the LEFT( can be removed along with the SUM(10*ISBLANK(C2:H2))-1) at the end. You will end up having a trailing / delimiter at the end. You could fix that for the case of stat F being the last part by changing {"/","/","/","/","/","/"} to {"/","/","/","/","/",""}, but the other cases would still have a trailing /. Another approach is much more complex, but the component SUM(10*ISBLANK(C2:H2))-1) could be shaped to identify what to cut off or maybe a helper column could be built - in any case, let's hope your situation is that the stat labels all have the same length.
The delimiter "/" can be changed, but must always be a single character. If not, then then last line must be changed to SUM( [label length + delimiter length] *ISBLANK(C2:H2))-1.
This formula is fixed to 6 stat columns. If you need for it to accommodate more, it is possible by extending the {"/","/","/","/","/","/"} and {"","","","","",""} (one element for each new column) and replacing every 12 with 2 times the number of columns. Also, obviously, the references $C$1:$H$1 and C1:H2 must be changed to read in your new columns.

Excel array formula anomaly

I have an array formula in Excel that works fine in all cells of the array except when there is a change in the conditional tests, and I'm not sure why.
The array formula is:
{=TEXT(VALUE(Header!$A$2)+VALUE(ReadingID)
*(IF(EventID="2", 1,IF(EventID="4", 1,0))*(VALUE(Header!$N$2)/86400)
+IF(EventID="2", 0, IF(EventID="4", 0, 1))*(VALUE(Header!$M$2)/86400))
, "#.000000")}
Typical data for the formula cells value:
Header!$A$2 = '43432.40434' # An excel serial date/time number as text.
ReadingID = #incremental numbers as text e.g. '1000', '1001' etc.
EventID = # Values 1 or 2 or 3 or 4 as text.
Header!$M$2 = 60 # as text.
Header!$N$2 = 10 # as text.
The ReadingID and EventID columns are the same size as the array formula column.
Typical results when EventID changes from, say, "2" to "3", are as follows:
ReadingID EventID Result Diff
'1540 '2 43432.582581 0.000116
'1541 '2 43432.582696 0.000115
'1542 '3 43433.475173 0.892477
'1543 '3 43433.475868 0.000695
'1544 '3 43433.476562 0.000694
The Diff column is simply to show the increment from row to row and is consistent either side of the transition in EventID value (e.g. from "2" to "3"). The same anomaly occurs at all points where the EventID value changes (i.e. "1" to "2"; "3" to "4").
The array formula spans several thousand cells and returns the expected result in all other rows, except when EventID changes.
I originally tried an OR function to perform the incremental sum, but that didn't work, hence the nested IF statements.
Can anyone suggest if there is something wrong with the array formula, or how to avoid this rogue result?
NOTE: The data is in text format as it is being imported from elsewhere in CSV format and I would like to preserve the raw import.

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800

tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Excel: Combining semi-duplicated records (with different columns)? - excel

How would I combine records if specified columns are the same? Here's what I have, and the result I'm looking for:

Related

Comparing two columns and their values and outputting the greater value

Textjoin values of column B if duplicates are present in column A

How/which formula to use, to show combine text results for false condition (for pending task reporting usage)?

Excel array formula anomaly

Using tbl.Lookup to match just part of a column value

Categories

Resources