Cleaning full names into first name, last name, etc columns - string

I have a CSV file that has a single column of full names that are in different formats. Some include suffixes and initials. There are thousands of records.
I want to break each record apart into separate columns for each part of the full name that exists. The final columns would be:
Title
First Name
Middle Name
Last Name
Suffix
Here is an example of what some of the different names look like:
John Smith
Doe, Jane, MBA
Mrs. Sarah Johnson
Steven P Little
Fredericks, J S, D.D.S.
S Morrison, Dr Oscar
Fred Jones, M.B.A.
T. H. Gallatin
Morris Jr, Gary B.
What is a good way to break those out into separate columns given there is no standard format to the full names?

Related

Combining IF Statement along with CONCATENATE if it exists

Trying to combine and IF + Concatenate together. I'm running a report right now for my company where we grab samples from different water locations, but due to COVID-19 we aren't allowed in some specific locations and therefore have to get a water sample from a nearby hydrant.
I have all the locations and hydrants in one spreadsheet as data, and in my main tab I have an empty cell where someone may put (YES/NO) and if they put YES then another cell will fill with the hydrant name along with the location.
My issue is I have to have both this data combined in one static cell if "YES" is put, for example...
Location: LOC-3 John Street
Hydrant used?: YES
Hydrant (auto filled): LOC-3 HYDRANT 3333
Full location name (if YES): LOC-3 John Street LOC-3 Hydrant 3333
Full location name (if NO): LOC-3 John Street
This is the code below that I'm using in order to return the location name, can't figure out where or how to throw concatenate in there without getting an error back. Thank you in advance for your help.
=IF(OR((AND((A6<>""),(D6<>""))),(AND((B6<>""),(D6<>"")))),IF(A6="",B6,A6),"")
(Not a complete answer, but too large for a comment)
Your first part of your logical expression is quite large, let's have a look:
[(a6<>"") AND (d6<>"")] OR [(b6<>"") AND (d6<>"")]
=[(a6<>"") OR (b6<>"")] AND (d6<>"")
=[(a6&b6) <> ""] AND (d6<>"")
Where a6&b6 has the Excel meaning (concatenation of a6 and b6).
This is already a significant simplification of your formula. You might try to simplify even further and go on from there.

Transposing Rows in Openrefine

I am using Openrefine (openrefine-2.6-rc.2) running on Windows and opening with Chrome browser (65.033225.181
I have data in text format (.txt) that I have imported into Openrefine for cleaning and processing. The data entries reside in rows under one column. I would like to "transpose" (pivot) the items in the rows so they appear in columns
Following is an example of the current state:
Column 1
Mary Smith
Company Name IBM
Location New York
John Davis
Company Name Lockheed-Martin
Location Los Angeles
Jane Segal
Company Name Microsoft
Location Boston
Ideally, by transposing the entries the result would look like this:
Last Name First Name Company Name Location
Smith Mary IBM New York
Davis John Lockheed Los Angeles
Segal Jane Microsoft Boston
Just not sure how to do this in Openrefine
When creating your Open Refine project, make sure that empty rows are not imported.
You can delete them later, but it's a little more complicated (see screencast).
Then, just :
1° Apply the function Transpose -> Transpose cells in rows into columns, with a value of 3.
2° Delete the words "Company Name" and "Location" using a Transform with formulas like value.replace('Company Name', '').trim() and value.replace('Location', '').trim()
3° Rename the columns.
Here is a visual tutorial.

Pulling name and title from strings in Python 3.5

I'm working through a list of legal notices and need to pull the name and title from the strings.
"NO. 17-1354 (1) THOMAS A. GOLDBERG ADMINISTRATOR"
"NO. 17-1355 (1) ASHLEY MARIE BAKER EXECUTOR"
"CAUSE NO. _________ TIMOTHY WIMBERLY PETITIONER"
"ERIC SMITH, PETITIONER NO. 17-1048 MLF"
I've been trying various combinations using .split(), but can't seem to find one combo to fit them all.
For each one I want to identify the name and title, so it would look like:
['THOMAS A. GOLDBERG', 'ADMINISTRATOR']
['ASHLEY MARIE BAKER', 'EXECUTOR']
etc.

Match Two Inconsistent Lists in Excel

I have a sort of complicated listing issue in excel and hopefully someone is up for the challenge. I appreciate any and all responses.
I have two lists of about 50,000 names. My actual workbook has longer strings of data but to keep it simple, I'll use this:
LIST A LIST B
Joe Michael
John
Kim Matt
Carl
Mike Joey
Matthew Kimberly
The goal is to rearrange column B to match the appropriate nickname with column A, ie:
LIST A LIST B
Joe Joey
John
Kim Kimberly
Carl
Mike Michael
Matthew Matt
The relevance of the name is less important than it matching similar characters. I can manually correct any extraneous or odd nicknames.
The other caveat is that names without akas/nicknames are left blank in the other column on both sides.
I have seen other sorting operations that could work, but don't due to the fact that the values in the two columns are technically different.
Overall - a simpler way to say is that the aim is to make them more or less stack alphabetically and then have the similar names line up and ignore things that don't match.
Let me know if any further clarification is necessary.
Thank you!

How do I remove duplicates AND the originals from a list in Sublime Text 3?

Edit > Permute Lines > Unique is great for removing duplicate from a list in Sublime Text. But what if I wanted to remove all matching results instead? For example:
james
james
bobby
mary
ann
ann
The above list of names would become:
bobby
mary
Because bobby and mary are the only names that only appear once.
If you don't mind your lines being sorted, you could do it like this:
Edit > Sort Lines
Find > Replace...
Ensure RegEx mode is on
Find What: (^.*$\n)\1+
Replace With: (blank)
Although, sorting wouldn't be necessary if all the duplicates are next to each other, as per your example. e.g. it would even work with the following:
james
james
bobby
mary
ann
ann
james
james
james
Note that this regex requires the last line to have a trailing newline character, if it is a duplicate, otherwise it won't find it.

Resources