separate Last Name, First Name and Middle Initial in three different columns - excel

I have a file which contains Last Name, First Name MI for about 5000 people.
I need to split them in 3 different columns.
The issue I am facing is , that sometimes there are more than 1 first names, for example I have a person as Davis, Mary Ann L.
I want Davis in one column.
Mary Ann in another column and L in the 3rd column. Basically check if after the comma the number of characters is greater than 1. If it is greater than 1 then consider it as first name. If number of characters is equal to 1, then consider it Middle Initial.
How can I achieve this?

In your case, I would do a first approach by using the "Text to Column" command. Just mark the whole column, then choose Data -> Text to Column. Choose "delimited", then next, then select "Space".
After this, I would look through the processed data and get a picture. I assume that most records will be ok already now. And those records which are exceptions to the standard should be easily identifyable. You could even filter for them.
Only then, in a third step, I'd write a formula which processes the columns you have created in the first step.
Or, possibly a formula is not necessary at all. Possibly you can just easily filter and process some of the exceptions manually.

Related

Making a Top 3 column based on a column category with answers in the row

I'm managing the server awards for a gaming community and using Google Forms for the first time. The voting phase ended, I moved the form responses into an excel sheet in Google Docs.
It goes like this (Answers from 89 forum accounts [ROWS] for 31 questions [COLUMNS])
(https://i.imgur.com/w9ICMjv.png)
The nominations were put as multiple-choice votes in the Forms as can be seen here, if this helps at all.
(https://i.imgur.com/4OQxiKH.png)
Most of the attempts I've read on the internet read back to integer values, whereas I'm using strings. Plus, I really have no idea how to work formulas on Excel.
I need the results to be like this, if possible.
Name One —> Most repeated name in Column C, from Row 2 to Row 30.
(47) —> Amount of times Name One is repeated in Column C.
Name Two and Three show us the second and third most repeated names.
(https://i.imgur.com/EkocuoG.png)
If you are using Google Sheets
To get your top three in column C, add this formula into another sheet.
=QUERY('Your sheet name'!A1:Z100,
// Change the sheet name and the last cell reference to suit.
// Keep the single quotes around the sheet name if it has a
// space or a non-alphanumeric character in it.
"Select C, count(C) group by C order by count(C) desc limit 3")
Thanks to #pnuts's answer on another question for allowing me to double check without turning on my laptop. :)
Have tested it and it's working for me.

Extracting text from complex string in excel

The attached image (link: https://i.stack.imgur.com/w0pEw.png) shows a range of cells (B1:B7) from a table I imported from the web. I need a formula that allows me to extract the names from each cell. In this case, my objective is to generate the following list of names, where each name is in its own cell: Erik Karlsson, P.K. Subban, John Tavares, Matthew Tkachuk, Steven Stamkos, Dustin Brown, Shea Weber.
I have been reading about left, right, and mid functions, but I'm confused by the irregular spacing and special characters (i.e. the box with question mark beside some names).
Can anyone help me extract the names? Thanks
Assuming that your cells follow the same format, you can use a variety of text functions to get the name.
This function requires the following format:
Some initial text, followed by
2 new lines in Excel (represented by CHAR(10)
The name, which consists of a first name, a space, then a last name
A second space on the same line as the name, followed by some additional text.
With this format, you can use the following formula (assuming your data is in an Excel table, with the column of initial data named Text):
=MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])),SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])))+1)-1)
To come up with this formula, we take the following steps:
First, we figure out where the name starts. We know this occurs after the 2 new lines, so we use:
=SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1
The inner (occurring second) SEARCH finds the first new line, and the outer (occurring first) finds the 2nd new line.
Now that we have that value, we can use it to determine the rest of the string (after the 2 new lines). Let's say that the previous formula was stored in a table column called Start of Name. The 2nd formula will then be:
=MID([#Text],[#[Start of Name]],LEN([#Text]))
Note that we're using the length of the entire text, which by definition is more than we need. However, that's not an issue, since Excel returns the smaller amount between the last argument to MID and the actual length of the text.
Once we have the text from the start of the name on, we need to calculate the position of the 2nd space (where the name ends). To do that, we need to calculate the position of the first space. This is similar to how we calculated the start of the name earlier (which starts after 2 new lines). The function we need is:
=SEARCH(" ",[#[Rest of String]],SEARCH(" ",[#[Rest of String]])+1)-1
So now, we know where the name starts (after 2 new lines), and where it ends (after the 2nd space). Assuming we have these numbers stored in columns named Start of Name and To Second Space respectively, we can use the following formula to get the name:
=MID([#Text],[#[Start of Name]],[#[To Second Space]])
This is equivalent to the first formula: The difference is that the first formula doesn't use any "helper columns".
Of course, if any cell doesn't match this format, then you'll be out of luck. Using Excel formulas to parse text can be finicky and inflexible. For example, if someone has a middle name, or someone has a initials with spaces (e.g. P.K. Subban was P. K. Subban), or there was a Jr. or something, your job would be a lot harder.
Another alternative is to use regular expressions to get the data you want. I would recommend this thorough answer as a primer. Although you still have the same issues with name formats.
Finally, there's the obligatory Falsehoods Programmers Believe About Names as a warning against assuming any kind of standardized name format.

Cross Reference Column A Sheet 1 w/ Column A Sheet 2 to give column c,d,e

Part 1:
Cross Reference Column A Sheet 1 to find a matching value on Column A Sheet 2, and then fill in corresponding Column B, C, D Values from Sheet 2 for Sheet 1.
I have 2 sheets:
Sheet 1: Company & Representative
Sheet 2: Company & Client first name, Client Last Name, Client Email
I want to match to put in new columns on Sheet 1 that have the client first name, last name, and email based on the company matching ( they do not match by cell #).
Does anyone have any advice on how to do this? I've got about 2000 and know there must be a better way than manual.
Part 2:
Is it possible to use a similar formula to populate paragraph text in another column if the company name contains certain text or letters? Say the company titles are various and long but each contains adjectives that can help distinguish their industry or years of experience, then is it possible to make another column including 10+ possible conditions to fill out different paragraphs depending on the conditions met?
So for example have company names in column A drive company industry supply list (that will be in paragraph form) in column J. Here is an example:
Column:
ABC level 1
ABC level 2
ABC Levels Elementary
ABC Levels Advanced
BCD Level 4
BCD Level All
BCD Level Intermediate
(continued until infinity..)
XYZ Company Level 12
If Level 1-6 or Elementary: Input >
Eucalyptus is one of three similar genera that are commonly referred to as "eucalypts",
If level Intermediate: Input>
Tree sizes follow the convention of:
If Level Advanced: Input >
A mature eucalyptus may take the form of a low shrub or a very large tree. The species can be divided into three main habits and four size categories.
If level all: Input >
Eucalyptus is one of three similar genera that are commonly referred to as "eucalypts"
+
Tree sizes follow the convention of:
+
A mature eucalyptus may take the form of a low shrub or a very large tree. The species can be divided into three main habits and four size categories.
eucalyptus copy used for example only and to educate us all on the eucalyptus plant of course. ** changed the copy to shorter so we can more easily read the example**
enter image description here
Thanks so much!
Kalina
Say if I have a Sheet2 with data like the picture showing below:
My Sheet1 should look like this:
There are at least two ways to accomplish your goal:
VLOOKUP (show in column C, Matching 1):
=VLOOKUP(A2,Sheet2!$A$2:$D$8,2,0)&", "&VLOOKUP(A2,Sheet2!$A$2:$D$8,3,0)&", "&VLOOKUP(A2,Sheet2!$A$2:$D$8,4,0)
The VLOOKUP just repeated three times to concatenate the first name, last name and the email.
INDEX/MATCH (show in column D, Matching 2):
=INDEX(Sheet2!$A$1:$D$8,MATCH(A2,Sheet2!$A$1:$A$8,0),2)&", "&INDEX(Sheet2!$A$1:$D$8,MATCH(A2,Sheet2!$A$1:$A$8,0),3)&", "&INDEX(Sheet2!$A$1:$D$8,MATCH(A2,Sheet2!$A$1:$A$8,0),4)
Similar to VLOOKUP to repeat three times.
Hope this helps and let me know if you have any question.
Here is how you can do for your part 2:
For example you have setup a table to show different levels and descriptions (Column D and E). And you want to find the description under column B from the given company info on column A. Here is the formula you want to enter in cell B2 and copy/drag down.
=IFERROR(VLOOKUP(RIGHT(A2,LEN(A2)-FIND("Level",A2)+1),$D$2:$E$11,2,0),"Please verify company name")
What this does is first, use RIGHT(A2,LEN(A2)-FIND("Level",A2)+1) to find which level keyword inside the company name. Then use VLOOKUP to look the matching level and grab the description from column E. I also added an IFERROR just in case someone entered an incorrect name. You can change that message output to anything you like. Hope this will solve your problem and let me know if you have any question.
I would suggest putting data info on each tab into a tables, and then using an Index-Match or a vlookup to pull the data from the other table that matches. It's hard to give an exact answer without an image/example.

List of items find almost duplicates

Within excel I have a list of artists, songs, edition.
This list contains over 15000 records.
The problem is the list does contain some "duplicate" records. I say "duplicate" as they aren't a complete match. Some might have a few typo's and I'd like to fix this up and remove those records.
So for example some records:
ABBA - Mamma Mia - Party
ABBA - Mama Mia! - Official
Each dash indicates a separate column (so 3 columns A, B, C are filled in)
How would I mark them as duplicates within Excel?
I've found out about the tool Fuzzy Lookup. Yet I'm working on a mac and since it's not available on mac I'm stuck.
Any regex magic or vba script what can help me out?
It'd also be alright to see how much similar the row is (say 80% similar).
One of the common methods for fuzzy text matching is the Levenshtein (distance) algorithm. Several nice implementations of this exist here:
https://stackoverflow.com/a/4243652/1278553
From there, you can use the function directly in your spreadsheet to find similarities between instances:
You didn't ask, but a database would be really nice here. The reason is you can do a cartesian join (one of the very few valid uses for this) and compare every single record against every other record. For example:
select
s1.group, s2.group, s1.song, s2.song,
levenshtein (s1.group, s2.group) as group_match,
levenshtein (s1.song, s2.song) as song_match
from
songs s1
cross join songs s2
order by
group_match, song_match
Yes, this would be a very costly query, depending on the number of records (in your example 225,000,000 rows), but it would bubble to the top the most likely duplicates / matches. Not only that, but you can incorporate "reasonable" joins to eliminate obvious mismatches, for example limit it to cases where the group matches, nearly matches, begins with the same letter, etc, or pre-filtering out groups where the Levenschtein is greater than x.
You could use an array formula, to indicate the duplicates, and you could modify the below to show the row numbers, this checks the rows beneath the entry for any possible 80% dupes, where 80% is taken as left to right, not total comparison. My data is a1:a15000
=IF(NOT(ISERROR(FIND(MID($A1,1,INT(LEN($A1)*0.8)),$A2:$A$15000))),1,0)
This way will also look back up the list, to indicate the ones found
=SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A1)*0.8)),$A3:$A$15000,1)),0,1))+SUM(IF(ISERROR(FIND(MID($A2,1,INT(LEN($A2)*0.8)),$A$1:$A1,1)),0,1))
The first entry i.e. row 1 is the first part of the formula, and the last row will need the last part after the +
try this worksheet fucntions in your loop:
=COUNTIF(Range,"*yourtexttofind*")

Excel formula to search partial match and align row

I have 2 column data in Excel like this:
Can somebody help me write a formula in column C that will take the first name or the last name from column A and match it with column B and paste the values in column C. The following picture will give you the exact idea what I am trying to do. Thanks
Since your data is not "regular", you can try this formula which uses wild card to look for just the last name.
=INDEX($B$1:$B$4,MATCH("*" &MID(A1,FIND(" ",A1)+1,99)&"*",$B$1:$B$4,0))
It would be simpler if the first part followed some rule, but some have the first initial of the first name at the beginning; and others at the end.
Edit: (explanation added)
The FIND returns the character number of the first space
Add 1 to that to get the character number of the next word
Use MID to extract that word
Use that word in MATCH (with wild-cards before and after), to find it in the array of email addresses. This will return it's position in the array (row number)
Use that row number as an argument to the INDEX function to return the actual email address.
If you want to first examine the email address, you will need to determine which of the letters comprise the last name. Since this is not regular according to your example, you will need to check both options.
You will not be able to look for the first name from the email, as it is not present.
If you can guarantee that the first part will be follow the rule of one or the other, eg: either
FirstInitialLastName or
LastNameFirstInitial
Then you can try this:
=IFERROR(INDEX($B$1:$B$4,MATCH(MID(A1,FIND(" ",A1)+1,99)& LEFT(A1,1) &"*",$B$1:$B$4,0)),
INDEX($B$1:$B$4,MATCH( LEFT(A1,1)&MID(A1,FIND(" ",A1)+1,99) &"*",$B$1:$B$4,0)))
This seems to do what you want.
=IFERROR(VLOOKUP(LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))&"*",$B$1:$B$4,1,FALSE),VLOOKUP(LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&"*",$B$1:$B$4,1,FALSE))
Its pretty crazy long and would likely be easier to digest and debug broken up into columns instead of one huge formula.
It basically makes FLast and FirstL out of the name field by splitting on the space.
LastF:
=LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))
And FirstL:
=LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))
We then make 2 vlookups for these by using wildcards:
LastF:
=VLOOKUP([lastfirst equation from above]&"*",$B$1:$B$4,1,FALSE)
And FirstL:
=VLOOKUP([firstlast equation from above]&"*",$B$1:$B$4,1,FALSE)
And then wrap those in IfError so they both get tried:
=IfError([firstLast vlookup],[lastfirst vlookup])
The rub is that's going to be hell to edit if you ever need to, which is why I suggest doing each piece in another column then referencing the previous one.
You also need to be aware that this answer will get tripped up by essentially the same name - e.g. Sam Smith and Sasha Smith would both match whatever the first entry for ssmith was. Any solution here will likely have the same pitfall.

Resources