Remove swapped duplicated row using Excel - excel

I have an id pairs in 2 columns. There are some pairs which are redundant, but in a swapped form. How can i removed redundant id pairs using Excel?
Here is the explanation,
Initial,
col1 col2
id1 id2
id2 id1
id3 id8
id1 id5
id1 id6
id2 id9
Need to be like,
col1 col2
id1 id2
id3 id8
id1 id5
id1 id6
id2 id9
(Note that the 2nd row id2 id1 is deleted, because it is a swapped duplicate).
Thanks..

1.Write two max min functions in the as a excel function adjacent to these 2 cells(maxa did not work for some reason)
2.copy values only to some other place and remove duplicates.
ps: I tried to transpose then sort transpose back, but it tries to sort as one big record rather than tuples. Hence the function.

I would use the following steps:-
Join them together in the right order into column C
=IF(A2<B2,A2&"|"&B2,B2&"|"&A2)
Find the unique values in column D (array formula must be entered with Ctrl-Shift-Enter))
=IFERROR(INDEX($C$2:$C$7, MATCH(0, COUNTIF($D$1:D1, $C$2:$C$7), 0)),"")
Separate the first one in Column E
=IFERROR(LEFT(D2,FIND("|",D2)-1),"")
Separate the second one in column F
=IFERROR(RIGHT(D2,LEN(D2)-FIND("|",D2)),"")

Assuming you're dealing with Column A and B with a header in each column, I'd do this put cell C2 and drag down. Delete anything greater than 0:
=COUNTIFS($B$2:B2,A2,$A$2:A2,B2)
To clarify what's happening here, the $ in the ranges act as anchors when you drag down. So making the range from $B$2:B2 means the next cell down will be $B$2:B3 followed by $B$2:B4 and so on.
The COUNTIFS() formula returns a count where ALL the criteria are met, so only those rows with both values switched are counted.
Using the COUNTIFS() with the anchored range like I've done here only counts whatever comes above that row, which is why you can delete any non-zeros without losing the unique values.
I'd recommend verifying this in case there's something I didn't think of, of course.

Related

How to restore formulas of table rows to normal state after sorting in excel?

I have a table in excel whose data are resulted by formulas. For example, A2 cells equals E7*2 of another sheet as follow.
When I sort table from largest to smallest, the order of formulas changes. For example, formulas of A2 is E18*2 of another sheet as follow.
Except Ctrl+z or its shortkey, is there any solution to restore table formulas to initial state? specially if I saved the file and want to open it after a while. I mean formulas of A2 becomes E7 not E18.
Thanks
I don't think that you can have both: sorting and referencing other cells.
What you can do is to build one formula that does the multipling and the sorting.
If you have Excel 365 you can use this formula:
=SORT(CHOOSE({1,2},rowdata[Num1]*2,rowdata[Num2]),1,-1)
CHOOSE re-builds the rowdata table but multiplies the values of the first column by two.
Caveat of this solution: you can't use a table for the result.
I feel like a real geek for writing this, but I have done something similar and I have found a really funny solution:
Sheet1 :
Col_A Col_B Col_C
a b c
=5-Sheet2!A2
=5-Sheet2!A3
=5-Sheet2!A4
Sheet 2:
Col_A
a
1
2
3
So my "Sheet1" looks like this:
Col_A Col_B Col_C
a b c
4
3
2
When I sort Col_B in "Sheet1", I get the order 2, 3, 4 but how to get it back?
Well, in Col_C I have put =FORMULATEXT(B2) (and the same for C3 and C4, obviously), this gave something like:
Col_A Col_B Col_C
a b c
4 =5-Sheet2!A2
3 =5-Sheet2!A3
2 =5-Sheet2!A4
Now, when I order according to Col_B, I get 2, 3, 4, and if I want to get it back, I just order on Col_C, and everything gets ordered back as it was.

Dynamic multiple criteria on sumifs

I was thinking of this for 3 days now. There is a formula that will do this, but I can't figure it out. Hope someone can help me on this.
I'm trying to sum a column in the spreadsheet based on several conditions:
ID1 ID2 ID3 ID4 Value
SW A 1 X 4
SW B 2 Y 5
SE C 2 Y 6
SE A 2 X 3
NE A 0 X 2
SE A 1 X 3
I would like to sum the value column based on the following conditions:
ID1 = SW, SE
ID2 = A, C
ID3 = 1, 0
ID4 = X, Y
Based on the conditions above, sum should be 7
Seems Sumif(s) can only handle 1 dynamic multiple criteria and once I loaded the criteria of ID2, only SW of ID were totaled, formula excluded the SE. (In below formula, ID4 criteria was not yet included).
=SUMIFS(RawData!G:G,RawData!$D:$D,IF($B$7="","<>",{"SW","SE"}),RawData!$C:$C,$C21,RawData!$E:$E,IF($B$9="","<>",{"A","C"}),RawData!$F:$F,IF($B$10="","<>",{"1","0"}))
Is there any way it can handle multiple criteria which in each criteria is an array? Thanks!
SUMIFS can handle up to two criteria lists, (assuming you want to count all possible combinations) as long as one is a "row" and one is a "column" (or transposed to be so), e.g.
=SUMPRODUCT(SUMIFS(Sumrange,Critrange1,{"x","y"},Critrange2,{"a";"b"}))
Note that {"x","y"} has a comma separator (a row) while {"a";"b"} has semi-colon (a column), it has to be like that
SUMIFS produces an array of 4 values (all possible combinations) so SUMPRODUCT is used to sum those 4 values. For 3 or more criteria lists use SUMPRODUCT with MATCH, e.g.
=SUMPRODUCT(Sumrange,ISNUMBER(MATCH(Critrange1,{"SW","SE"},0)*MATCH(Critrange2,{"A","C"},0)*MATCH(Critrange3,{1,0},0)*MATCH(Critrange4,{"X","Y"},0))+0)
Where all ranges are the same dimensions
Your IF functions on the criteria complicate this, and are more difficult to accommodate in this type of formula, but you can cater for those like this, assuming that when the relevant cell is blank all non-blanks are counted:
=SUMPRODUCT(((B7="")*(Critrange1<>"")+ISNUMBER(MATCH(Critrange1,{"SW","SE"},0))>0)*((B8="")*(Critrange2<>"")+ISNUMBER(MATCH(Critrange2,{"A","C"}0))>0)*((B9="")*(Critrange3<>"")+ISNUMBER(MATCH(Critrange3,{1,0}0))>0)*((B10="")*(Critrange4<>"")+ISNUMBER(MATCH(Critrange4,{"x","y"}0))>0))
Note for {1,0} match in this formula only numeric values will be matched. If the data is text formatted quotes need to be added, e.g. {"1","0"} - this works differently in SUMIFS where you can use quotes or not and it will count both text and numbers
In this version you can use commas or semi-colons in the MATCH functions, as long as you are consistent within each MATCH

Finding and obtaining the difference between two columns

I have two columns. Each column has tens of thousands of values. I need to find the difference between them and print the difference in some cells. I read similar questions, but they are not enough for my question, highlighting the different cells is not enogh for me because it would be very tiring to look at tens of thousands of cell by searching highlighted cells. Thus, i need to obtain the values.
Example:
Column1 Column2
John Jennifer
Mary Washington
Joe John
Michael Texas
Houston Newyork
Texas Mary
Values existing in col1 but not col2 : Joe, Michael, Houston
Values existing in col1 but not col2 : Jennifer, Washington, New York
Algorithmically, i need to check each row of column1 whether it exists in Column2, if the row does not exist in Column2, the value is taken.
Similarly, i need to check each row of column2 whether it exists in Column1, if the row does not exist in Column1, the value is taken.
Thanks
Old: If this is a once-off i'd make a backup then mark the area of the two columns and "remove duplicates" which is a button on the DATA command bar.
You would be left with unique records only, which is what I understand you want.
Edit: If you want to completely remove duplicate values then: Assuming that col1 is "A" and col2 is "B" then this formula only shows the record if it is in A but not in B. You can make a similar one for B not in A. In my example I made two columns C, D for the unique values. Then filter the list on these being "not nothing", and you have a set of unique records.
Current formula is for A2 cell - in my example placed in C2.
=IF(ISERROR(MATCH(A2;B:B;0));A2;"")

VLOOKUP for multiple entries

I have a few columns in a sheet. First column being first names and the fifth being their respective age. If I want to search the age column for a particular age say '12' and return their corresponding first names in a separate sheet, what should i do? I tried VLOOKUP but I could not figure out the logic. Can someone help me out?Thank you.
Unfortunately VLOOKUP will not work in this situation,as the Vlookup function cannot reference details on left side, however you can use a combination of INDEX and MATCH functions thou. Lets say you have following table
A B
mark 11
john 23
Selly 30
Youbaraj 45
and you want to get the value of A based on the value of B, you can use something like
=INDEX(A1:A20,MATCH(1,B1:B20,0))
You can use index and match to do HLOOKUP and VLOOKUP looking into any column and getting values of any side.
You can use an Index and Double match .. to get answers from a column by entering its name.
Example:
A B C D
1 col1 col2 col3 col4
2 val1 val2 val3 val4
3 val5 val6 val7 val8
Consider in cell C10 ColumnName:
And in cell C11, you enter the name of a column
Now see what would happen with this function
=index(A1:D3,Match(C11,A1:D1,0),Match(val-to-look-for, Column (a,b,c,d),0))
You can dynamically type a name of a column to get that column name's look up value
Vlookup is very easy to use, however the first column must be arranged alphabeticaly before in order for it to work properly. (and usualy i use only the exact match argument).

Selecting a row in excel based on specific values in 2 columns,

Data roughly in the format
A B C
ID1 ID2 0.5
ID1 ID3 0.7
ID2 ID3 0.9
I want to create a correlation matrix (column C being the correlation between the IDs in A and B). It can definitely be done with a pivot table, though I have to use sum which could be risky if a duplicate existed since an error might not be apparent. Output format would be:
ID1 ID2 ID3
ID1 1 .5 .7
ID2 .5 1 .9
ID3 .7 .9 1
(the '1' is easily done with an =IF(B$2=$A3,1,0) and replacing 0 with the formula to find the correlation)
I basically want a match (col a= ID1 && col b = ID2). I suspect it could be done by concatenation, but I am not sure that is a great solution? Match/Vlookup etc only return the first match [in that column], which is no good to me. Ode to a 'where' clause I guess?
My searches did not reveal any usuable help, I have already calculated the correlation and am putting it into excel from SQL. So yeah, any ideas would be super, a pivot table being a last resort.
Thanks.
Assuming your source data range is on Sheet1, from A1 to C3 and your results range is on Sheet2, from A1 to D4.
You can put this formula on B2:
=SUMPRODUCT((Sheet1!$A$1:$A$3=Sheet2!B$1)*(Sheet1!$B$1:$B$3=Sheet2!$A2)*Sheet1!$C$1:$C$3)
and then, drag and drop this formula on the whole range.
Why don't you create a third column that combines the values from columns A and B using =A1&B1 and then do vlookup on that value:
A B C D
ID1 ID2 ID1ID2 0.5
ID1 ID3 ID1ID3 0.7
ID2 ID3 ID2ID3 0.9

Resources