EXCEL - Comparing Two Columns - Removing Repeats

EXCEL - Comparing Two Columns - Removing Repeats - excel

If two households share, they create a tie and this tie has a kinship rank that does not change, no matter how often two households share with each other.
KINSHIP RANK EXAMPLE
As you can see, it doesn't matter in which "direction" the tie happened whether it was household 5 who shared to household 3 or vice versa, the kinship rank is still 1
HH1 HH2 RANK
5 3 1
3 5 1
Therefore, I do not need every tie that occurs between two households, but only the first instance that a tie occurred between the two households.
So here is a sample list of many households who shared with each other, sometimes sharing resources with themselves, sharing only once, or sharing multiple times with the same household.
TWO HOUSEHOLD WITH REPEATED TIES
COL.A COL.B
ROW HH1 HH2
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 1
7 3 2
8 3 4
9 4 2
This is what I need it to look like:
TWO HOUSEHOLDS WITHOUT REPEATED TIES
COL.A COL.B
ROW HH1 HH2
1 1 1
2 1 2
3 1 3
4 2 4
5 3 2
6 3 4
What I have done
I wrote a simple command for placing the HH1 and HH2 information into the same cell:
=A1&"|"&B1
In the case of the second row, this looks like 1|2 inside cell C2
HH1 and HH2 are combined in column C so how will I be able to compare all of the households in column C to each other? Perhaps a highlighting rule if a repeat happens? Or in another column list if it is a delete or a keep?
Thank you for your assistance everyone.

I suggest a simple COUNTIFS to do the job like this:
=(COUNTIFS(A$1:A1,B2,B$1:B1,A2)+COUNTIFS(A$1:A1,A2,B$1:B1,B2))>0
starting in C2 and then copy down. It will show TRUE for each row which is within the range above it and false if not. Ich checks for both x/y and y/x (the order doesn't matter)
Now simply filter col C to only show rows with TRUE in it. Then simply select and delete it.
This also works with non numerical values like names.
If you still have any questions, just ask ;)
You also can wrap it up to get more informations like this:
=IF((COUNTIFS(A$1:A1,B2,B$1:B1,A2)+COUNTIFS(A$1:A1,A2,B$1:B1,B2)),"",COUNTIFS(A:A,B2,B:B,A2)+COUNTIFS(A:A,A2,B:B,B2))
For C2 and copy down. C1 gets:
=COUNTIFS(A:A,B2,B:B,A2)+COUNTIFS(A:A,A2,B:B,B2)
This will show you only at the first occurrence how many times it is within the whole range.
All done by phone, may contain errors

Use =((A1*B1)/(A1+B1))*((A1*B1)+(A1+B1)) to create unique identifiers. Then use Remove Duplicates in the Data Tools Pane of the Data Tab to remove all rows containing duplicates. Or, alternatively, use something like =IF(IFNA(MATCH(A2,A$1:A1,0),TRUE())=TRUE,"First Share","") dragged and dropped from row 2 to identify First Shares.

Related

Find all values in Column A which are present in Column B

Consider the sheet below:
A
B
1
4
3
5
2
2
5
0
4
1
I want to find if there is a match for each row of column 1 with any row of column 2. So ideally this would give me:
A
B
C
1
4
Yes
3
5
No
2
2
Yes
5
0
Yes
4
1
Yes
As a first and simple step, I am using =MATCH(A2,B2:B6) to get the index of the match and then manually calling this across the rows to get something like this:
A
B
C
1
4
6
3
5
-
2
2
3
5
0
2
4
1
1
I am now having a problem:
I want to apply this for a row of 500 in A and 2000 in B. I was thinking of manually filling in the first few rows and then select and drag over the first 500 rows. This however does not work as for each subsequent cell, it just changes the formula to =MATCH(A(N +1),B2 + N:B6 + N) which gives me wrong values and at worst, just repeats the older pattern ahead.
Can anyone help me with how I can just use the MATCH function to find all the values in A that are present in B?

Let me continue where you arrived:
=MATCH(A2,B2:B6,0)
(You forgot the last zero)
This formula is correct, but it is also wrong.
???
Well, when you drag it down, you get:
=MATCH(A3,B3:B7,0)
This is not what you want: you want the search term (A2) to change into A3 but you want the search array (B2:B6) not to change. In order to get this done, you need to work with absolute references. This looks like this:
=MATCH(A2,B$2:B$6,0)
When you drag this down, this is what you get:
=MATCH(A3,B$2:B$6,0)
=> ok so far.
Problem now: you need to translate your current results (a number or #N/A) into "yes" or "no". This can be done in numerous ways, let me give you an example:
=IF(ISERROR(MATCH(A2,B$2:B$6,0)),"No","Yes")
One remark: there exists an IFERROR() function in Excel, but this does not have an "else"-clause, hence the choice for the IF(ISERROR( combination.

Within Sheets you may try this out:
=index(if(len(A2:A),if(ifna(xmatch(A2:A,B2:B)),"Yes","No"),))

If you want to separate those matching values then could use FILTER() function.
=FILTER(A1:A5,COUNTIFS(B1:B9,A1:A5))
And for YES, NO dynamically, try MAP() function.
=MAP(A1:A5,LAMBDA(x,ISNUMBER(XMATCH(x,B:B))))

Excel formula to count how many times a part number is used in a top level assembly (no UNIQUE or FILTER)

I have a list of part numbers that are used in 4 different top level assemblies. The parts can be used in 1 to 4 of the top level assemblies. I'm trying to write a formula that will count how many unique top level assemblies a part number occurs in. I had previously written a formula that worked, but it uses UNIQUE and FILTER, and my coworkers don't have Excel 365, so those formulas aren't supported for them. I've been trying to come up with a workaround and would really appreciate any help :)
I have an example (I can't provide any real data) section of our spreadsheet and an image of the formula I had that was working
Top Level Assy
Part Number
Qty
Number of times used
02554
01622
4
3
89975
01622
4
3
95665
01622
4
3
98886
01723
4
1
98886
01723
10
1
98886
01723
4
1
02554
01734
4
3
89975
01734
4
3
95665
01734
4
3
02554
01740
6
3
89975
01740
6
3
95665
01740
6
3
02554
01746
5
3
89975
01746
5
3
95665
01746
5
3
02554
01835
2
3
89975
01835
2
3
95665
01835
2
3
02554
51205
4
3
=SUM(--(LEN(UNIQUE(FILTER(A:A, C:C=C2, "")))>0))
Picture of the excel sheet
Picture of working formula

Use the following formula in row 2: =SUMPRODUCT(--(FREQUENCY(IF($B$2:$B$20=$B2,$A$2:$A$20),$A$2:$A$20)>0))
*I think it doesn't require ctrl+shift-enter in older Excel versions, since SUMPRODUCT is an array formula by default.
The formula checks the frequency of values in column A where column B matches the value in the current row. It returns the count per unique value meeting the condition. Wrapping it in -- & >0 returns 1 for each unique value. SUMPRODUCT sums them.
Edit:
I realized that the top level assembly values are actual text, not numeric values. In that case (since it's all numeric values stored as text) you can use this workaround:
=SUMPRODUCT(--(FREQUENCY(IF($B$2:$B$20=$B2,--($A$2:$A$20)),--($A$2:$A$20))>0))
It converts the text to numbers.
Sidenotes to this workaround:
If any value would contain a character other than numeric it will not get counted.
If you have both values like 02554 and 2554 they'll both get converted to 2554 and counted likewise.
Edit 2:
For text use the following:
=SUMPRODUCT(IF($B$2:$B$20=$B2, 1/(COUNTIFS($B$2:$B$20, $B2, $A$2:$A$20, $A$2:$A$20)), 0))

Count occurrences of strings just once per row in Google Sheets

I have strings of spreadsheet data that need counting by 'type' but not instance.
A B C D
1 Lin 1 2 1
2 Tom 1 4 2
3 Sue 3 1 4
The correct sum of students assigned to teacher 1 is 3, not 4. That teacher 1 meets Lin in lessons B and D is irrelevant to the count.
I borrowed a formula which works in Excel but not in Google Sheets where I and others need to keep and manipulate the data.
F5=SUMPRODUCT(SIGN(COUNTIF(OFFSET(B$2:D$2, ROW($2:$4)-1, 0), E5)))
A B C D E
2 Lin 1 2 1
3 Tom 1 4 2
4 Sue 3 1 4
5 1 [exact string being searched for, ie a teacher name]
I don't know what is not being understood by Google Sheets in that formula. Does anyone know the correct expression to use, or a more efficient way to get the accurate count I need, without duplicates within rows inflating the count?

So this is the mmult way, which works by finding the row totals of students assigned to teacher 1 etc., then seeing how many of the totals are greater than 0.
=ArrayFormula(sum(--(mmult(n(B2:D4=E5),transpose(column(B2:D4)))>0)))
or
=ArrayFormula(sum(sign(mmult(n(B2:D4=E5),transpose(column(B2:D4))))))
Also works in Excel if entered as an array formula without the ArrayFormula wrapper.
A specific Google Sheets one can be quite short
=ArrayFormula(COUNTUNIQUE((B2:D4=E5)*row(B2:D4)))-1
counting the unique rows containing a match.
Note - I am subtracting 1 in the last formula above because I am assuming there is at least one zero (non-match) which should be ignored. This would fail in the extreme case where all students in all classes are assigned to the same teacher so you have a matrix (e.g.) of all 1's. This would be more theoretically correct:
=ArrayFormula(COUNTUNIQUE(if(B2:D4=E5,row(B2:D4),"")))

Compare multiple data from rows

I'm looking for a way to compare multiple rows with data to each other, trying to find the best possible match. Each number in every column must be an approximately match the other numbers in the same column.
Example:
Customer #1: 1 5 10 9 7 7 8 2 3
Customer #2: 10 5 9 3 5 7 4 3 2
Customer #3: 1 4 10 9 8 7 6 2 2
Customer #4: 9 5 6 7 2 1 10 5 6
In this example customer #1 and #3 is quite similar, and I need to find a way to highlight or sort the rows so I can easily find the best match.
I've tried using conditional formatting to highlight the numbers that are the similar, but that is quite confusing, because the amount of data is quite big.
Any ideas of how I could solve this?
Thanks!

The following formula entered in (say) L1 and pulled down gives the best match with the current row based on the sum of the absolute differences between corresponding cells:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(ABS($C1:$K1-$C$1:$K$4),TRANSPOSE(COLUMN($C$1:$K$4))^0))))
It is an array formula and must be entered with CtrlShiftEnter.
You can then sort on column L to bring the customers with lowest similarity scores to the top or use conditional formatting to highlight rows with a certain similarity value.
EDIT
If you wanted to penalise large differences in individual columns more heavily than small differences to try and avoid pairs of customers which are fairly similar except for having some columns very different, you could try something like the square of the differences:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(($C1:$K1-$C$1:$K$4)^2,TRANSPOSE(COLUMN($C$1:$K$4))^0))))
then the scores for your test data would come out as 7,127,7,127.

I'm assuming you want to compare customers 2-4 with customer 1 and that you are comparing only within each column. In this case, you could implement a 'scoring system' using multiple IFs. For example,:
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2
3 Customer 3 0 1 0
you could use in E2
=if(B2=$B$1,1,0)+if(C2=$C$1,1,0)+if(D2=$D$1,1,0)
This will return a 'score' of 1 when you have a match and a 'score' of 0 when you don't. It then adds up the scores and your highest value will be your best match. Copying down would then give
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2 2
3 Customer 3 0 1 0 1
so customer 2 is the best match.

rearranging data in excel

I'm not sure how to ask this question without illustrating it, so here goes:
I have results from a test which has tested peoples attitudes (scores 1-5) to different voices on a 16 different scales. The data set looks like this (where P1,P2,P3 are participants, A, B, C are voices)
Aformal Apleasant Acool Bformal etc
P1 2 3 1 4
P2 5 4 2 4
P3 1 2 4 3
However, I want to rearrange my data to look like this:
formal pleasant cool
P1A 3 3 5
P1B 2 1 6
P1C etc
P1D
This would mean a lot more rows (multiple rows per participant), and a lot fewer columns. Is it doable without having to manually reenter all the scores in a new excel file?

Sure, no problem. I just hacked this solution:
L M N O P Q
person# voice# formal pleasant cool
1 1 P1A 2 3 1
1 2 P1B 4 5 2
1 3 P1C 9 9 9
2 1 P2A 5 4 2
2 2 P2B 4 4 1
2 3 P2C 9 9 9
3 1 P3A 1 2 4
3 2 P3B 3 3 2
3 3 P3C 9 9 9
Basically, in columns L and M, I made two columns with index numbers. Voice numbers go from 1 to 3 and repeat every 3 rows because there are nv=3 voices (increase this if you have voices F, G, H...). Person numbers are also repeated for 3 rows each, because there are nv=3 voices.
To make the row headers (P1A, etc.), I used this formula: ="P" & L2 & CHAR(64+M2) at P1A and copied down.
To make the new table of values, I used this formula: =OFFSET(B$2,$L2-1,($M2-1)*3) at P1A-formal, and copied down and across. Note that B$2 corresponds to the cell address for P1-Aformal in the original table of values (your example).
I've used this indexing trick I don't know how many times to quickly rearrange tables of data inherited from other people.
EDIT: Note that the index columns are also made (semi)automatically using a formula. The first nv=3 rows are made manually, and then subsequent rows refer to them. For example, the formula in L5 is =L2+1 and the formula in M5 is =M2. Both are copied down to the end of the table.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

EXCEL - Comparing Two Columns - Removing Repeats - excel

Related

Find all values in Column A which are present in Column B

Excel formula to count how many times a part number is used in a top level assembly (no UNIQUE or FILTER)

Count occurrences of strings just once per row in Google Sheets

Compare multiple data from rows

rearranging data in excel

Categories

Resources