Match Two Inconsistent Lists in Excel - excel

I have a sort of complicated listing issue in excel and hopefully someone is up for the challenge. I appreciate any and all responses.
I have two lists of about 50,000 names. My actual workbook has longer strings of data but to keep it simple, I'll use this:
LIST A LIST B
Joe Michael
John
Kim Matt
Carl
Mike Joey
Matthew Kimberly
The goal is to rearrange column B to match the appropriate nickname with column A, ie:
LIST A LIST B
Joe Joey
John
Kim Kimberly
Carl
Mike Michael
Matthew Matt
The relevance of the name is less important than it matching similar characters. I can manually correct any extraneous or odd nicknames.
The other caveat is that names without akas/nicknames are left blank in the other column on both sides.
I have seen other sorting operations that could work, but don't due to the fact that the values in the two columns are technically different.
Overall - a simpler way to say is that the aim is to make them more or less stack alphabetically and then have the similar names line up and ignore things that don't match.
Let me know if any further clarification is necessary.
Thank you!

Related

Excel; How to find character and give matching description

I have a table with clients and specific markers for each shop:
Client
Markers
Location
Jane
B,D,K,M,f,n,,+
Max
B,D,J,K,M,f,i,n,+
Ted
D,i,a,1,J,Y,K,M
Maria
C,D,J,K,M,n
Alex
A,D,K,M,f,i,n
Tom
A,D,K,M,f,m,o,y,+
Richard
R,D,J,K,M,f,i,n
X
A,D,K,M,f,n
Red
A,D,K,M,f,i,n,+
John
C,D,F,K,M,f,i,n,4
Lex
T,D,a,1,4,T,K,M
Ted
D,a,1,T,K,M
Jane
D,a,1,T,K,M
Another table contains the Locations:
marker
desc
A
New York
B
Amsterdam
C
London
H
Tokyo
Q
Paris
R
Vancouver
T
Sydney
Y
Auckland
Now I want to fill first table with locations but going wrong when first marker isn't the location marker. I used: =VLOOKUP([#Markers],TableLocations[marker],1,TRUE), I've tried the MATCH function but this gives the wrong number again.
So only works fine when first character in the marker column matches the marker in the location table.
To find only for first marker location from comma separated values from each cell in first column, you can use-
=XLOOKUP(TEXTBEFORE(B2,","),$G$2:$G$9,$H$2:$H$9)
For multiple location try-
=TEXTJOIN(", ",TRUE,FILTER($H$2:$H$9,ISNUMBER(XMATCH($G$2:$G$9,TOCOL(TEXTSPLIT(B2,",")),0))))
For dynamic spill array at one go, try-
=BYROW(B2:B14,LAMBDA(x,TEXTJOIN(", ",TRUE,FILTER($H$2:$H$9,ISNUMBER(XMATCH($G$2:$G$9,TOCOL(TEXTSPLIT(x,",")),0))))))
Try this, using tables and structured references (which you can change to normal addressing if you preferred, but the former are more dynamic).
In your comments you indicated there would be only one location per client; if you need more than one, please clarify
Edit: corrected missing structured reference
=INDEX(Location[desc],AGGREGATE(14,6,BYCOL(EXACT(TEXTSPLIT([#Markers],","),Location[marker]),LAMBDA(arr,XMATCH(TRUE,arr))),1))
Note: One can get case-sensitive matches using either EXACT or FIND functions. But, because of the null string in the list in your first row (note the doubled-comma in the Markers), FIND will always return a match for that, potentially causing an incorrect result

Excel Small Function Returning only first Instances of Duplicates

When I use Small/Large functions to sort data, I get duplicates returned, how do I no longer get duplicates? I have a large amount of data I am working with in an excel file - names of people with a plethora of variables to rank them. I am looking to find the top and bottom performers based on certain variables that I want to look at - some of these variables will have the same numbers - hence where the duplicates show up.
I have coded out a large function that sorts and seperates the data very well... until duplicate variable numbers are involved. Then the first row that has that respective duplicated variable is copied each time that duplicate varaible shows up. I want to pass over the duplicate and get the names for each person, regardless if the variable number is the same.
INDEX($BQ$35:$BQ$500,MATCH(1,INDEX(($CD$35:$CD$400=SMALL($CD$35:$CD$500,ROWS(W$552:W552)))*(COUNTIF(W$552:W552,$BQ$35:$BQ$500)=0),),0))
This code returns me list/table, but shows duplicates as the first instance of that duplicate. I want every instance to be shown.
Xing Dong Chen -0.890
Weishen Deng -0.140
Michael Wan 0.200
Jojo Gonzales 0.740
Neelkanth Mishra 0.910
Ridham Desai 1.310
Aarti Shah 1.860
Daniel Fineman 2.340
George Tharenou 2.520
Ritesh Samadhiya 3.000
Hak Bin Chua 3.410
Manjiang Cheng 3.500
Manjiang Cheng 3.500
Manjiang Cheng 3.500
Manjiang Cheng 3.500
Juliana Lee 3.800
Jody Santiago 4.040
Andrew Boak 4.250
Ray Farris Jr. 4.340
Pankaj Mataney 4.420
Sakthi Siva 4.580
There are four duplicates in this because the variable numbers are the same, however the people's names are different. I want each different name to be shown, not just the first duplicate.

Find sum in one column for rows containing the same value in another column

I have a table like this:
Column I | Column J
====================
Peter 2
Martin 3
Peter 1
John 5
Peter 2
What I need is to sum the numbers for Peter, for example. So for Peter, I'd expect a result of 5.
I tried to achieve this by using a VLOOKUP function, but it seems to be working only for one row, so I have:
=VLOOKUP("Peter";I4:J4;2)+VLOOKUP("Peter";I5:J5;2)+VLOOKUP("Peter";I6:J6;2)+VLOOKUP("Peter";I7:J7;2)
However I have a lot of data like this, so it would be very long and would take me ages to write this down for all of them.
Any better solution, please?
EDIT: I'm working in Google Online Spreadsheets, so I don't know if a macro solution would be the best (or if it would even work).
Try this:
=SUMIF(I:I,"Peter",J:J)

Excel - Match in two columns with variable names

I've got two lists of people and I have to check if they're in both lists. The thing is that characters are not accepted in one of the lists ("-" for instance), and the person might have omitted a last name in case they have two.
For example:
A1 B2
John Paul John Paul Jones
Mary Williams Ryan Roberts
Ryan Roberts-Johnson Mary Williams
My formula is: =IFERROR(MATCH($A1,$B$1:$B$1215,0),IFERROR(MATCH(LEFT($A1,FIND(" ",$A1,1)),$B$1:$B$1215,0),"No Match"))
The idea is: if the name is the same, bring me the line where the person is. If not, look for the first name and see if you find someone with this first and bring it to me. If neither works, reply with "No Match".
But apparently the Match function only retrieves exact matches, so the First Name one doesn't work.
Is there any other way to solve this?
EDIT1: First finding: I can use the SUBSTITUTE formula to replace - with space and do the search once again.
Some of the things I did to save up some time (I probably spent more time figuring out/researching than I would have if I had done ~3500 entries manually, but the learning opportunity was great.)
What I wanted:
To search for for a cell with name + surname when I only had the surname.
What I did: I remembered about the wildcards and used them with VLOOKUP:
First I got the last name and added the star: ="*"&RIGHT($A1,LEN($A1)-FIND(" ",$A1,1))
=VLOOKUP(A1,$B$1:$B$1000,1,FALSE)
And it would try and find the first one. Before that I added a check to make sure that there weren't two people with the surname (so it wouldn't throw another person there) with a simple IF(COUNTIF($C$2:$C$2000,$D219)<=2, and then the rest of the formula.
Something else that I noticed and serves as a reminder: TRIM is very important, as some of the cells had two spaces in between first name/last and some one space after the last letter of the last name. Using TRIM to create a new column made me avoid a lot of mistakes I only took a while to notice.
I believe the case is solved.
You could create a temporary column extracting the first name from B column.
See this example.

Most Common Value within Sublists

I apologize in advance if this is unclear, I will try to explain everything as best I can! I am working with a data set in Google Sheets such that Column A is a list of student IDs and Column B is a list of student behaviors. It looks something like this:
A(ID) B(Behaviors)
12345 Talking
54321 Out of Seat
98765 Lying
12345 Talking
12345 Lying
98765 Lying
The list is data set is quite large because it contains recorded data from the entire school population over the course of the year, and as you can see the entire student population is pooled in one list. I am looking for a way to find each students (identified by their IDs) most commonly assigned behavior. For example, for the above data, student 12345 would have 'Talking' listed as their most common behavior and student 98765 would have 'Lying' listed as their most common behavior.
Ideally, I want to create a separate spreadsheet that looks something like this:
A(ID) B(Most Common Behavior)
12345 Talking
98765 Lying
54321 Out of Seat
Such that column A is a list of all the student's IDs and column B lists their most common behavior.
I found that I could use this formula:
=INDEX(Behaviors,MODE(MATCH(Behaviors,Behaviors,0)))
To pull out the most common value from the column containing scholar behaviors, but this formula gives me the most common behavior among the entire student population, so I am interested in modifying it so that the formula first looks at the student ID and then looks at the most common behavior within that sublist.
Please let me know if you require any further information. Thanks in advance for your help!
Are you familiar using PivotTables? You could just create a PivotTable with ID as a Row Label and Behavior as a column label and Value. Then it would just be a matter of copying/pasting those values and using a MAX formula to get the greatest behavior count.

Resources