EDIT: Thanks for all the responses everyone. I'm going to go ahead and try and write rules to cover as many of the cases as I can, and either manually extract or try to right more rules to cover everything else.
I am trying to sort the same "types" of data into the same columns. Essentially, I get a data dump where a bunch of data (year, company name, person name, IO number, PO number, project description, and a bunch of comments) dumps into one single column, like this:
The ideal end result would be sorting so that same type of data in the same columns, i.e. all years in column A, all IOs in column B, all POs in column C, all person names in column D, all company names in column E, and whatever is left is dumped into a "comments" section in column F.
I've written a macro that employs the SUBSTITUTE function so that it goes through this string and substitutes all dashes and backslashes with commas, then separates based on the comma delimitor, then re-pastes the text as plain-text. This works fairly well, except for in the occasional case where there are dashes in the name of a company or a backslash to indicate two people who own that IO/PO or when all of the data is entered in without any delimitor such as: 2012 Company project title IO ##### PO #### Person Name.
So here is what I am asking:
1. Is there a better way to parse the data than I am doing now? How can I accommodate for the exceptions such as a dash in the company name or a string where there are no dashes or backslashes, only spaces?
2. Once I have parsed all of this data and separated it into separate columns like so:
how do I sort it so that the same type of information is in the same column?
Any help would be greatly appreciated. Please let me know if anything was unclear.
Welcome to StackOverflow!
If the text follows clear rules, like a separator as "-" or "," you can use the Split() function to get an array of tokens. If the text doesn't follow any rule it's impossible. Very likely you are in the middle, where most of the texts follow the rules. For the other texts, you need to massage your code and try to find new rules and check them with... see below.
Create a few functions IsYear(), IsPO(), IsCompany() that return True if the content is recognized. The functions could be as simple as IsYear = Text Like "20##" or could contain many tests. Then you make a function that checks each cell of each row, and sorts if required.
I'm sorry I can't give you anything more than some generic advice, but this is a very open question for a very challenging problem.
I hope this gets you started.
Along the lines of at #Werner “You can’t make a silk purse …” Obviously the solution is to lean on whoever is responsible for the garbage in to ensure that your source data is in better shape. However I guess you are looking for a workaround. From your example, some ‘tiding’ is possible. Eg sort on ColumnB and where 2012 is in ColumnC exchange the contents of B and C for that row. Then sort on ColumnD and do much that same for D and E. If ColumnF contains Quote insert a blank cell and shift to the right. If ColumnF is blank exchange contents of that row with ColumnD. Move ColumnD to the end. Select anything before Quote in ColumnF and remove it to ColumnE if that is empty, otherwise to ColumnH. The result should look something like:
-rather better than I was expecting and I’d guess about the limit of what could reasonably be programmed.
Related
I'm all new to VBA and have mostly been trying to modify code after recording macros, so it's all pretty basic and the approach might not be as elegant as some of the stuff I've seen on here. So here we go.
I have coded (by brute force) my data to be arranged like a CAD design tree view with parent products/assemblies and constituent sub-assemblies/parts.
Column E contains Level 0 top assembly Part Number
Column F contains Level 1 items Part Number
... etc all the way to ...
Column M containing Level 8 items Part Number
As an example, cell G112 contains ASSY1; cells H113 to H134 contain its constituent items.
I would like to display in a new column (i.e. Column O) the value of cell G112 (ASSY1) for each of its constituents. So O113 to O134 would show the value of G112. That would need to be applied to every single level of the assembly.
I'm not sure I'm making much sense do please have a look at the picture linked below, it speaks a thousand words. I've highlighted and colour-coded the result I would like in column O.
ADDENDUM - To clarify things:
I don't know how else to explain my request but to post a simplified version of my original picture.
SIMPLIFIED EXCEL TABLE
.CSV available here WeTransfer
A very useful tool to retrieve VBA code for determined action is the macro recorder, in the ribbon, Developer -> RecordMacro, perform you action and stop recording and then you can check the code generated for the actions you recorded. Its not the cleanest code but you can find there the lines of code for the specific actions you want. Once you step into a one concrete problem with the code you tried, you can then ask for help regarding something more concrete, more than expecting that someone will code that for you.
Anyhow if you want someone to try to solve your problem, you need to post the table with the accessible data instead of the image, for the person whoever tries to approach your problem to have the data available.
Hope that helps
Here's the answer I got from somewhere else if anyone is interested:
Formula in Cell O3:
=IF(C3=0,"N/A , ALREADY TOP LEVEL",INDEX(D$2:D2,AGGREGATE(14,6,(ROW(D$2:D2)-ROW(D$2)+1)/(C$2:C2=C3-1),1)))
Copy/Paste down in every cell in column O
The attached image (link: https://i.stack.imgur.com/w0pEw.png) shows a range of cells (B1:B7) from a table I imported from the web. I need a formula that allows me to extract the names from each cell. In this case, my objective is to generate the following list of names, where each name is in its own cell: Erik Karlsson, P.K. Subban, John Tavares, Matthew Tkachuk, Steven Stamkos, Dustin Brown, Shea Weber.
I have been reading about left, right, and mid functions, but I'm confused by the irregular spacing and special characters (i.e. the box with question mark beside some names).
Can anyone help me extract the names? Thanks
Assuming that your cells follow the same format, you can use a variety of text functions to get the name.
This function requires the following format:
Some initial text, followed by
2 new lines in Excel (represented by CHAR(10)
The name, which consists of a first name, a space, then a last name
A second space on the same line as the name, followed by some additional text.
With this format, you can use the following formula (assuming your data is in an Excel table, with the column of initial data named Text):
=MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])),SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])))+1)-1)
To come up with this formula, we take the following steps:
First, we figure out where the name starts. We know this occurs after the 2 new lines, so we use:
=SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1
The inner (occurring second) SEARCH finds the first new line, and the outer (occurring first) finds the 2nd new line.
Now that we have that value, we can use it to determine the rest of the string (after the 2 new lines). Let's say that the previous formula was stored in a table column called Start of Name. The 2nd formula will then be:
=MID([#Text],[#[Start of Name]],LEN([#Text]))
Note that we're using the length of the entire text, which by definition is more than we need. However, that's not an issue, since Excel returns the smaller amount between the last argument to MID and the actual length of the text.
Once we have the text from the start of the name on, we need to calculate the position of the 2nd space (where the name ends). To do that, we need to calculate the position of the first space. This is similar to how we calculated the start of the name earlier (which starts after 2 new lines). The function we need is:
=SEARCH(" ",[#[Rest of String]],SEARCH(" ",[#[Rest of String]])+1)-1
So now, we know where the name starts (after 2 new lines), and where it ends (after the 2nd space). Assuming we have these numbers stored in columns named Start of Name and To Second Space respectively, we can use the following formula to get the name:
=MID([#Text],[#[Start of Name]],[#[To Second Space]])
This is equivalent to the first formula: The difference is that the first formula doesn't use any "helper columns".
Of course, if any cell doesn't match this format, then you'll be out of luck. Using Excel formulas to parse text can be finicky and inflexible. For example, if someone has a middle name, or someone has a initials with spaces (e.g. P.K. Subban was P. K. Subban), or there was a Jr. or something, your job would be a lot harder.
Another alternative is to use regular expressions to get the data you want. I would recommend this thorough answer as a primer. Although you still have the same issues with name formats.
Finally, there's the obligatory Falsehoods Programmers Believe About Names as a warning against assuming any kind of standardized name format.
I have 2 column data in Excel like this:
Can somebody help me write a formula in column C that will take the first name or the last name from column A and match it with column B and paste the values in column C. The following picture will give you the exact idea what I am trying to do. Thanks
Since your data is not "regular", you can try this formula which uses wild card to look for just the last name.
=INDEX($B$1:$B$4,MATCH("*" &MID(A1,FIND(" ",A1)+1,99)&"*",$B$1:$B$4,0))
It would be simpler if the first part followed some rule, but some have the first initial of the first name at the beginning; and others at the end.
Edit: (explanation added)
The FIND returns the character number of the first space
Add 1 to that to get the character number of the next word
Use MID to extract that word
Use that word in MATCH (with wild-cards before and after), to find it in the array of email addresses. This will return it's position in the array (row number)
Use that row number as an argument to the INDEX function to return the actual email address.
If you want to first examine the email address, you will need to determine which of the letters comprise the last name. Since this is not regular according to your example, you will need to check both options.
You will not be able to look for the first name from the email, as it is not present.
If you can guarantee that the first part will be follow the rule of one or the other, eg: either
FirstInitialLastName or
LastNameFirstInitial
Then you can try this:
=IFERROR(INDEX($B$1:$B$4,MATCH(MID(A1,FIND(" ",A1)+1,99)& LEFT(A1,1) &"*",$B$1:$B$4,0)),
INDEX($B$1:$B$4,MATCH( LEFT(A1,1)&MID(A1,FIND(" ",A1)+1,99) &"*",$B$1:$B$4,0)))
This seems to do what you want.
=IFERROR(VLOOKUP(LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))&"*",$B$1:$B$4,1,FALSE),VLOOKUP(LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&"*",$B$1:$B$4,1,FALSE))
Its pretty crazy long and would likely be easier to digest and debug broken up into columns instead of one huge formula.
It basically makes FLast and FirstL out of the name field by splitting on the space.
LastF:
=LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))&LOWER(MID(A1,1,1))
And FirstL:
=LOWER(MID(A1,1,1))&LOWER(MID(A1,(SEARCH(" ",A1)+1),LEN(A1)))
We then make 2 vlookups for these by using wildcards:
LastF:
=VLOOKUP([lastfirst equation from above]&"*",$B$1:$B$4,1,FALSE)
And FirstL:
=VLOOKUP([firstlast equation from above]&"*",$B$1:$B$4,1,FALSE)
And then wrap those in IfError so they both get tried:
=IfError([firstLast vlookup],[lastfirst vlookup])
The rub is that's going to be hell to edit if you ever need to, which is why I suggest doing each piece in another column then referencing the previous one.
You also need to be aware that this answer will get tripped up by essentially the same name - e.g. Sam Smith and Sasha Smith would both match whatever the first entry for ssmith was. Any solution here will likely have the same pitfall.
Captain Morgan ------ Insane Journeys -------- A-
I have easily gotten the left and right side parts using Left() and Right() functions.
I want to use a function in excel (not vba) that will allow me to get the middle phrase in this sentence (The dashes are really excessive spaces). can I accomplish this with a Mid() function?
This is just 1 item on a list of 80 different things in 1 column that needs to be turned into 3 columns. Every item has different character lengths. So the length counts cannot be manually entered.
I agree with Text to Columns but the image in the other answer only has one space per row while OP has some spaces that are redundant and some that are not. For this I’d suggest a modified approach:
Replace all pairs of spaces with a character unlikely to be encountered – I’d suggest a pipe.
Apply Text to Columns with pipe as delimiter.
Apply TRIM to the middle column to remove any remaining redundant spaces (eg =TRIM(B1) copied down and then that column pasted as values over the source).
But to answer can I accomplish this with a Mid() function? I think yes though not cost effective for a mere 80 entries when there is a viable alternative.
Try to use "Text to columns" from Data Tab. It has option to split data to different columns using various criteria.
All you need to do is select data you want to split to columns and select criteria you need.
In your case it can be either Space or Other:. When you select Other: you can add your own criteria like "space dot space" or anything you need.
For more detailed information you can enter this link.
Seems like it would be a simple thing really (and it may be), but I'm trying to take the string data of a column and then through a calculated column, replace all the spaces with %20's so that the HTML link in the workflow produced email will actually not break off at the first space.
For example, we have this in our source column:
file:///Z:/data/This is our report.rpt
And would like to end up with this in the calculated column:
file:///Z:/data/This%20is%20our%20report.rpt
Already used the REPLACE, and made up a ghastly super nested REPLACE/SEARCH version, but the problem there is that you have to nest for EACH potential space, and if you don't know how many up front, it doesn't work, or will miss some.
Have any of you come across this scenario and how did you handle it?
Thanks in advance!
As far as I know there is no generic solution using the calculated-column syntax. The standard solution for this situation is using an ItemAdded (/ItemUpdated) event and initializing the field value from code.
I was able to solve this issue for my circumstances by using a series of calculated columns.
In the first calculated column (C1) I entered a formula to remove the first space, something like this:
=IF(ISNUMBER(FIND(" ",[Title])),REPLACE([Title],FIND(" ",[Title]),1,"%20"),[Title])
In the second Calculated column (C2) I used:
=IF(ISNUMBER(FIND(" ",[C1])),REPLACE([C1],FIND(" ",[C1]),1,"%20"),[C1]).
In my case, I wanted to encode upto four spaces, so I used 3 calculated columns (C1, C2, C3) in the same fashion and got the desired result.
This is not as efficient as using a single calculated column, but if SUBSTITUTE will not work in your SharePoint environment, and you cannot use an event handler or workflow, it may offer a workable alternative.
I actually used a slightly different formula, but it was on a work machine to which I don't have access at the moment, so I just grabbed this formula from a similar S.O. question. Any formula that will replace the first occurrence of a space with "%20" will work, the trick is to a) make sure the formula returns the original string unchanged if it does not have more spaces in it, and b) test, test, test. Create a view of your list that has the field you are trying to encode, plus the calculated fields, and see if you are getting the results you want.
so that the HTML link in the workflow produced email will actually not break off at the first space.
The browser only does this if you have not enclosed your link in quotes
If you wrap the link in quotes, it does not cut off at the first space
In a SharePoint Formula it would be:
="""file:///Z:/data/This is our report.rpt"""
becuase two quotes are the SP escape notation to output a quote
You can use this formula (Start trim for 1, in my case was 4):
=IF(ISBLANK([EUR Amount]),"",(TRIM(MID([EUR Amount],4,2))&TRIM(MID([EUR Amount],6,2))&TRIM(MID([EUR Amount],8,2))&TRIM(MID([EUR Amount],10,2))&TRIM(MID([EUR Amount],12,2))&TRIM(MID([EUR Amount],14,2)))*1)