I am trying to split some data up but stuck! I have some data which comes out like the below:
USERNAME Full Name Department
USERNAME First Initial Surname Department
USERNAME Full Name Department
I have tried numerous items such as trim then can pull out words however some peoples full names are 3 words and most of them are 2 words so this kinda breaks it all.
I have also tried substituting the double spaces so it breaks it up like so
##USERNAME#######Full Name######Department###########
##USERNAME###First Initial Surname Department#
##USERNAME###########Full Name#####Department#####
But still unsure how I can pick up the words between the hashes.
Help really appreciated :)
If you have a text file with the raw data, separate the raw data using either of a TAB, a semi-colon, or a comma. Pick something you do not already have in your file. Semi-colon usually works for me.
Then, open it as a CSV (comma-separated values) file in Excel.
It will try to parse the file automatically. If it doesn't succeed, it will ask you what character you want to use as a separator.
You mentioned double spaces seperating your data, that's your ticket in.
Let's say you've got "USERNAME David Brossard DEPT" in Cell A2.
In B2, let's FIND the first double space:
=FIND(" ",A2)
In C2, let's FIND the second double space:
=FIND(" ",A2,B2+1)
In D2, we'll grab everything in between:
=MID(A2,B2+2,C2-(B2+2))
There you go!
Alternatively, you can write it all in one formula, in B2:
=MID(A2,FIND(" ",A2)+2,FIND(" ",A2,FIND(" ",A2)+1)-(FIND(" ",A2)+2))
Related
I am in charge of adding new employees to our speech recognition and gamification systems.
When I get a batch of tickets, I compile a bunch of data into a spreadsheet that I then reference when adding those users to the systems (Which unfortunately do not have a JSON/CSV upload option or anything similar)
To save some time with compiling, I've started exporting a bunch of data from our database and our HR management system into that sheet, and then using the new employee's email to XLOOKUP all the other data fields.
For one of our systems, it has a strict character limit, and the format for the username is "cde\firstname.lastname". This is no problem to CONCATENATE normally, but it has a strict character limit, so if the user has a hyphenated last name, I will basically dump everything after the hyphen.
At first I tried a simple formula using a combination of LEFT and FIND -1 to find the hyphen, and then take everything to the left of it. This obviously doesn't end up working because I get a #VALUE! for anyone without a hyphen in their last name.
I tried using IFERROR to say "OK try to return the last name without a hyphen, otherwise just return the last name", but for some reason when I put the reference in the Return_If_Error portion, it doesn't recognize it as a reference.
So I am looking for a formula that will work with a LOOKUP'd value and only give me what's before a hyphen, but otherwise will still just give me the last name.
The baseline formula I have, that just looks up and concatenates the first and last into the "cde\firstname.lastname" is:
=CONCATENATE("cde\",LOWER(XLOOKUP(G578,Sheet4!M:M,Sheet4!B:B)),".",LOWER(XLOOKUP(G578,Sheet4!M:M,Sheet4!C:C)))
To expand on the comments, you've got the right idea, just use an IF statement for testing if the string contains "-", then use the normal string functions like FIND, LEFT, etc. to pick out the things you want.
For example:
="cde/"&
LEFT(H1,FIND(".",H1)-1)&
IF(ISNUMBER(FIND("-",H1)),MID(H1,FIND(".",H1),FIND("-",H1)-FIND(".",H1)),
MID(H1,FIND(".",H1),FIND("#",H1)-FIND(".",H1)))
I am trying to embed a SUBSTITUTE in my function, but I am not sure where to incorporate it. I am trying to extract just the text "Scrumactiviteiten" but in the source data sometimes a space will be in there. A sample:
Column A
1 Team xxxx 2018-17 Scrumactiviteiten 123 and then something
2 Team xxxx 2018-17 Scrum activiteiten 123 and then something
Column B (My formula)
1 Scrumactiviteiten
2 Scrum activiteiten
The function I used to extract it (ignore the "Balans" search please):
=IFERROR(IFERROR(IFERROR(MID(A1;SEARCH("Scrum activiteiten";A1;1);18);
MID(A1;SEARCH("Scrumactiviteiten";A1;1);17));MID(A1;SEARCH("Balans";A1;1);10));" ")
This works fine, but to remove the space I tried to embed a SUBSTITUTE where I use the mid search result as the old text and provide "Scrumactiviteiten" as the new text:
=IFERROR(IFERROR(IFERROR(SUBSTITUTE(A24;((MID(A24;SEARCH("Scrum activiteiten";A24;1);18)));"Scrumactiviteiten");MID(A24;SEARCH("Scrumactiviteiten";A24;1);17));MID(A24;SEARCH("Balans";A24;1);10));" ")
The result however is a copy of the full string. I also tried putting the substitute before the search but that would not work either. I am pretty new to Excel formula's and I think I messed up the order or just plain don't understand how I embed a SUBSTITUTE in the formula I created. Some explanation would be much appreciated on what I'm doing wrong! Thank you in advance,
Mark
The problem is you are not providing the correct arguments to the function, try this formula:
=IFERROR(IFERROR(IFERROR(SUBSTITUTE(((MID(A24;SEARCH("Scrum activiteiten";A24;1);18)));" ";"");MID(A24;SEARCH("Scrumactiviteiten";A24;1);17));MID(A24;SEARCH("Balans";A24;1);10));" ")
To use SUBSTITUTE you first provide the string in which you want to replace something, the next two arguments are the string you want replaced and the string you want to replace it with. So for example =SUBSTITUTE("Scrum activiteiten";" ";"") returns Scrumactiviteiten as the space " " is replaced with an empty string "".
The attached image (link: https://i.stack.imgur.com/w0pEw.png) shows a range of cells (B1:B7) from a table I imported from the web. I need a formula that allows me to extract the names from each cell. In this case, my objective is to generate the following list of names, where each name is in its own cell: Erik Karlsson, P.K. Subban, John Tavares, Matthew Tkachuk, Steven Stamkos, Dustin Brown, Shea Weber.
I have been reading about left, right, and mid functions, but I'm confused by the irregular spacing and special characters (i.e. the box with question mark beside some names).
Can anyone help me extract the names? Thanks
Assuming that your cells follow the same format, you can use a variety of text functions to get the name.
This function requires the following format:
Some initial text, followed by
2 new lines in Excel (represented by CHAR(10)
The name, which consists of a first name, a space, then a last name
A second space on the same line as the name, followed by some additional text.
With this format, you can use the following formula (assuming your data is in an Excel table, with the column of initial data named Text):
=MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])),SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])))+1)-1)
To come up with this formula, we take the following steps:
First, we figure out where the name starts. We know this occurs after the 2 new lines, so we use:
=SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1
The inner (occurring second) SEARCH finds the first new line, and the outer (occurring first) finds the 2nd new line.
Now that we have that value, we can use it to determine the rest of the string (after the 2 new lines). Let's say that the previous formula was stored in a table column called Start of Name. The 2nd formula will then be:
=MID([#Text],[#[Start of Name]],LEN([#Text]))
Note that we're using the length of the entire text, which by definition is more than we need. However, that's not an issue, since Excel returns the smaller amount between the last argument to MID and the actual length of the text.
Once we have the text from the start of the name on, we need to calculate the position of the 2nd space (where the name ends). To do that, we need to calculate the position of the first space. This is similar to how we calculated the start of the name earlier (which starts after 2 new lines). The function we need is:
=SEARCH(" ",[#[Rest of String]],SEARCH(" ",[#[Rest of String]])+1)-1
So now, we know where the name starts (after 2 new lines), and where it ends (after the 2nd space). Assuming we have these numbers stored in columns named Start of Name and To Second Space respectively, we can use the following formula to get the name:
=MID([#Text],[#[Start of Name]],[#[To Second Space]])
This is equivalent to the first formula: The difference is that the first formula doesn't use any "helper columns".
Of course, if any cell doesn't match this format, then you'll be out of luck. Using Excel formulas to parse text can be finicky and inflexible. For example, if someone has a middle name, or someone has a initials with spaces (e.g. P.K. Subban was P. K. Subban), or there was a Jr. or something, your job would be a lot harder.
Another alternative is to use regular expressions to get the data you want. I would recommend this thorough answer as a primer. Although you still have the same issues with name formats.
Finally, there's the obligatory Falsehoods Programmers Believe About Names as a warning against assuming any kind of standardized name format.
I'm attempting to create usernames based off of a given persons first and last name. Generally, we use the first initial and last name for a username. However, now many of our users have 2 last names and sometimes include a hyphen. I am trying to create a code that gives me the first initial, the first letter of the FIRST last name and then the last name.
For example --
Amy Smith-Jones ==
asjones
This is what I am currently using, but, of course, it would yield "asmithjones".
=LOWER(LEFT(A1,1)&SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""))
I've tried some variations of this, but with no luck.
=LOWER(LEFT(A1,1)&LEFT(A2,1)&SUBSTITUTE(SUBSTITUTE(A2,"-","")," ",""))
Is there a way to generate both the first letter of the first string and the full text of the 2nd string?
EDIT
I came up with something, but now I face another challenge
=IFERROR(LOWER(LEFT(D2,1)&SUBSTITUTE(SUBSTITUTE(RIGHT(F2,LEN(F2)-FIND(" ",F2&" ")),"-","")," ","")),LOWER(LEFT(D2,1)&SUBSTITUTE(SUBSTITUTE(F2,"-","")," ","")))
Some users have 1 last name so this applies if the formula comes across those. But I have some who have a hyphen instead of a space. The SUSTITUTE function accounts for both, but how can I make the FIND function do the same?
Try:
=LEFT(A1,1)&MID(A1,(SEARCH(" ",A1)+1),1)&RIGHT(A1,(LEN(A1)-(SEARCH(" ",(SUBSTITUTE(A1,"-"," ")),(SEARCH(" ",A1)+1)))))
Based on your edit, I'll assume first names are in column D and last names are in column F:
=LOWER(LEFT(D2) & IFERROR(LEFT(F2)&MID(F2,FIND("-",SUBSTITUTE(F2," ","-"))+1,99), F2))
SUBSTITUTE changes spaces to hyphens in the last name, so FIND can look for hyphens only.
IFERROR fails if a hyphen is not found (after substitution), in which case the entire last name is returned.
Example:
EDIT: Thanks for all the responses everyone. I'm going to go ahead and try and write rules to cover as many of the cases as I can, and either manually extract or try to right more rules to cover everything else.
I am trying to sort the same "types" of data into the same columns. Essentially, I get a data dump where a bunch of data (year, company name, person name, IO number, PO number, project description, and a bunch of comments) dumps into one single column, like this:
The ideal end result would be sorting so that same type of data in the same columns, i.e. all years in column A, all IOs in column B, all POs in column C, all person names in column D, all company names in column E, and whatever is left is dumped into a "comments" section in column F.
I've written a macro that employs the SUBSTITUTE function so that it goes through this string and substitutes all dashes and backslashes with commas, then separates based on the comma delimitor, then re-pastes the text as plain-text. This works fairly well, except for in the occasional case where there are dashes in the name of a company or a backslash to indicate two people who own that IO/PO or when all of the data is entered in without any delimitor such as: 2012 Company project title IO ##### PO #### Person Name.
So here is what I am asking:
1. Is there a better way to parse the data than I am doing now? How can I accommodate for the exceptions such as a dash in the company name or a string where there are no dashes or backslashes, only spaces?
2. Once I have parsed all of this data and separated it into separate columns like so:
how do I sort it so that the same type of information is in the same column?
Any help would be greatly appreciated. Please let me know if anything was unclear.
Welcome to StackOverflow!
If the text follows clear rules, like a separator as "-" or "," you can use the Split() function to get an array of tokens. If the text doesn't follow any rule it's impossible. Very likely you are in the middle, where most of the texts follow the rules. For the other texts, you need to massage your code and try to find new rules and check them with... see below.
Create a few functions IsYear(), IsPO(), IsCompany() that return True if the content is recognized. The functions could be as simple as IsYear = Text Like "20##" or could contain many tests. Then you make a function that checks each cell of each row, and sorts if required.
I'm sorry I can't give you anything more than some generic advice, but this is a very open question for a very challenging problem.
I hope this gets you started.
Along the lines of at #Werner “You can’t make a silk purse …” Obviously the solution is to lean on whoever is responsible for the garbage in to ensure that your source data is in better shape. However I guess you are looking for a workaround. From your example, some ‘tiding’ is possible. Eg sort on ColumnB and where 2012 is in ColumnC exchange the contents of B and C for that row. Then sort on ColumnD and do much that same for D and E. If ColumnF contains Quote insert a blank cell and shift to the right. If ColumnF is blank exchange contents of that row with ColumnD. Move ColumnD to the end. Select anything before Quote in ColumnF and remove it to ColumnE if that is empty, otherwise to ColumnH. The result should look something like:
-rather better than I was expecting and I’d guess about the limit of what could reasonably be programmed.