Formula To Remove Duplicates based On Two Columns - excel-formula

I have two columns and i would like to remove duplicates. If I use the function in excel , too much information will be removed
For example I have the following
ColumnA ColumnB
E1 ABC
E1 ABC
E1 ABC
E1 CBA
E2 TTT
E2 TTT
E2 JJJ
What i would like to do is remove duplicates based on two column criteria so i would have the following
ColumnA ColumnB
E1 ABC
E1 CBA
E2 TTT
E2 JJJ
I have tried the following formula
=IF(A2=A3,1)*AND(B2=B3,1)
This will put a 1 in some cells if there are duplicates (which could then be deleted) but does not really test for all duplicates hence my formula is not correct.

Related

return a list of all elements present in one column that is not present in the other

I'm trying to create column where there are hundreds of items in a and b column, and I want to remove common items in b column and list them in different column in excel or google sheet.
a
b
items present in b column only
a1
a1
a5
a2
a2
a6
a3
a5
a4
a6
Excel:
Formula in C2:
=FILTER(B2:B5,COUNTIF(B2:B5,A2:A5)=0)
Google-Sheets:
Almost the same, but less explicit: =FILTER(B2:B,COUNTIF(B2:B,A2:A)=0)

Spark Join Returning Null Values in Columns

I'm pulling my hair out trying to solve what I feel is an extremely simple problem, but I'm not sure if there's some spark voodoo occurring as well.
I have two tables, which are both very small. Table A has about 90K rows and Table B has about 2K rows.
Table A
A B C D
===========
a1 b1 c1 d1
a1 b1 c2 d2
a1 b1 c3 d3
a2 b2 c1 d1
a2 b2 c2 d2
.
.
.
Table B
A B E F
===========
a1 b1 e1 f1
a2 b2 e2 f2
I want a table that looks like
Result Table
A B C D E F
=================
a1 b1 c1 d1 e1 f1
a1 b1 c2 d2 e1 f1
a2 b2 c1 d1 e2 f2
.
.
.
I was a little loose, but the idea is I want to join the table with fewer rows on the table with more rows and it's okay to have multiple associated values in the final table.
This should be really simple:
table_a.join(table_b, table_a.a == table_b.a, table_a.b == table_b.b).select(..stuff..)
HOWEVER, for almost all of the resulting values in the Result Table (which should have about 90K rows since Table A has about 90K rows), I get null values in columns E and F.
When I save the result of just Table B, I see all the columns and values.
When I save the result of just Table A, I see all the columns and values.
(i.e I could do a paper and pencil join)
The weird thing is that even though ~89K rows have null values in columns E and F in the Result Table, there are a few values that do randomly join.
Does anyone know what's going on or how I can diagnose this?
Have you tried <=> instead of == in your join?

Conditional transpose of rows to columns

I have the following table with around 500 rows that I need to transpose into columns:
A B
A1 B1
A2 B2
A3 B3
The result I'm trying to get is
A B C D E F
A1 B1 A2 B2 A3 B3
Because the results are to be the interleaving of two columns I think not a duplicate of the OP indicated (at one time). Assuming A1 is in cell A2 (i.e. A and B are column labels) I suggest in C2 and copied across to suit:
=IF(ISODD(COLUMN()),OFFSET($A1,COLUMN()/2,0),OFFSET($A1,(COLUMN()/2)-1,1))

Multiple Column Duplicates : Sum

I have data like:
A1 B1 C1 v1
A1 B1 C1 v2
A1 B1 C2 v3
A1 B2 C1 v4
A2 B3 C2 v5 ....
I would like to sum all duplicate tuple (A, B, C) but only if all three values are same, that is Ai = Aj, Bi = Bj and Ci = Cj
I would like the result to be in format:
A1 B1 C1 [sum of relevant vs]
...
I know about SUMIF and Pivot function, but so far couldn't get them to work as required.
Any help will be appreciated.
PS: Previous search on stackoverflow reveals solutions for duplication across single column only. If I miss anything in my search, I am sorry and would appreciate the link to relevant thread.
A pivot table is the most appropriate solution to me. Put all three columns A, B and C under row labels and put the 4th column under Values. It should automatically sum the values in the 4th column:
After that, pick Tabular and Repeat items and then Do Not Show Subtotals under PivotTable Design:
And you will get this:

How do I group the records which have the same value for the "NAME" column?

My data
NAME QTY LOCATION
abc 3 a1
abc 3 a3
abc 3 a4
cdf 4 c5
cdf 4 c7
cdf 4 c1
cdf 4 c9
ghi 6 g12
ghi 6 g5
ghi 6 g17
ghi 6 g6
ghi 6 g89
ghi 6 g1
My desired result
NAME QTY LOCATION
abc 3 a1, a3, a4
cdf 4 c5, c7, c1, c9
ghi 6 g12, g5, g17, g6, g89, g1
How can I do this automatically using function(s) in Excel?
I have created column C as "helper" to concatenate all relevant locations. Then I use column D to only show the last entry and "filter" the intermediate results.
As to the formulas:
C1: =B1
C2: =IF(A2=A1,C1&", "&B2,B2)
C3: =IF(A3=A2,C2&", "&B3,B3)
C4: etc...
D1: =IF(A1=A2,"",C1)
D2: etc...
Makes a bit more sense with the formatting. Still a little extra info on where you want to copy to and from along with your attempts would be helpful.
I would suggest using VBA for this rather than formulas. If you don't know VBA, I would suggest recording a macro whilst going through the steps to copy the data manually. You can then view the code created by Excel and you will just need to replace absolute cell address references with a logical progression through relative addresses. Have a go at this, then fell free to ask if there are specifics you are stuck on.
This is somewhat possible without using VBA and using direct excel formulas.
I used your test data in columns A, B, and C and then I put the solution in Columns E-L here is a picture:
(This should work regardless of what order the data is in column A that's why I have a row misplaced)
The only non automated portion of this solution is that you must manually enter in the name's in column E. Whatever name you enter there the data to the right will automatically populate with the QTY and Locations.
This is slightly complicated but anyways here are the 3 formulas I used:
Column F: =OFFSET($B$1,MATCH(E1,A:A,0),0)
This will grab the QTY value
Column G: =OFFSET($C$1,MATCH($E1,$A$1:$A14,0)-1,0)
This will get the first location corresponding to the name
Column H-L: This formula you can put in column H and then fill across to the right. It will grab the next location relative to the one to the left. =IFERROR(OFFSET($C$1,MATCH($E1,INDIRECT("$A"&MATCH(G1,$C$1:$C14,0)+1&":$A$14"),0)+MATCH(G1,$C$1:$C14,0)-1,0),"")
Here pictures to help:
Column F:
Column G:
Column H-L:
Assuming a1 is in C2, into D2 put:
=IF(COLUMN()<COUNTIF($A:$A,$A2)+4,IF($A2=$A3,INDIRECT("$c"&ROW()+COLUMN()-4),""),"")
Copy across and down to suit (say to ColumnZ).
Select entire sheet and copy Paste Special Values over the top.
In C2 put =A1=A2 and copy down.
Filter of ColumnC to select TRUE and delete all visible rows.
In C1 put =D1&","&E1&","&F1&","&G1&","&H1&","&I1 and so on to suit and copy down to suit.
Copy ColumnC and Paste Special Values over the top.
Replace ,, with nothing in ColumnC.
Find somewhere in Row1 to put =IF(RIGHT(C1,1)=",",LEFT(C1,LEN(C1)-1),C1) and copy down.
Copy that column and Paste Special Values over the top of ColumnC.
Delete "that" column.
Add Column labels to suit.

Resources