I have 2002 addresses which have all been compiled into a single cell during the download process from my server; in most cases, the hash (#) symbol is used to separate fields (such as Line 1, Line 2, City, Postcode).
I have spent a lot of time trying combinations of LEFT, MID and other functions, but to no avail; the problem is that as there are so many addresses, and not all of them have the same number of characters for each field (such as Postcode - some will have 6 characters (including blank space), where some others will have five or more/fewer), there doesn't appear to be a one-size-fits-all solution that I can enter once and then use Excel's auto-fill handle/feature to complete the process for all records.
Here is a sample of my data (which has been anonymised):
44A THE ADDRESS#EALING#LONDON#W1 1WW#
541 PARSON PLACE#HENDON#LONDON#NW4 4WN#
SOMEBODY PRACTICE CHALKHILL PCC THE WELFORD CTR#11B CHALKHILL AVENUE#WIMBLEDONE MIDDX#HH9 9HH#
THE SEBELMONT MEDICAL CLINIC 18 EASTERN ROAD#SOUTHALL#MIDDLESEX#UN1 1NU#
130 FINGOVER COURT#REDBUS STREET#CAMBERWELL#SE5 5ES#
KING'S ELBOW MEDICAL CENTRE 17F STAGLAND LANE#KINGSBURY#MIDDX#NW9 9WN#
10 LADYFOOT ROAD RUISLIP#MIDDLESEX#HA4 4AH#
I want to be able to extract everything between the hash symbols (excluding/omitting the hash symbols themselves) and I am dedicating four columns to store this data: Address Line 1, AL2, AL3, Postcode.
Going by the first example (44A THE ADDRESS#EALING#LONDON#W1 1WW#) which resides in a single cell, I hope to achieve something like the following outcome:
AL1 AL2 AL3 POSTCODE
44A THE ADDRESS EALING LONDON W1 1WW
It doesn't matter if some of the address sections appear under the wrong column - I can very easily rectify this and can even add another column; I simply want to be able to extract the data from the single cell.
If you import the data as a text file, you can normally select the delimiter.
File->open
<select the file from the dialogue box>
This dialogue box should appear, after clicking next, it will appear as above, at which point, you can select a hash as a delimiter- instant self data sorting!
Related
I have some text which I receive daily that I need to seperate. I have hundreds of lines similar to the extract below:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
I need to extract individual snippets from this text, so for each in a seperate cell, I the result needs to be the date, month, company, size, and price. In the case, the result would be:
FEB50-40
APR
COMPANY A
100
0.40
The issue I'm struggling with is uniformity. For example one line might have FEB50-FEB40, another FEB5-FEB40, or FEB50-FEB4. Another example giving me difficult is that some rows might have 'COMPANY A' and the other 'COMPANYA' (one word instead of two).
Any ideas? I've been trying combinations of the below but I'm not able to have uniform results.
=TRIM(MID(SUBSTITUTE($D7," ",REPT(" ",LEN($D7))), (5)*LEN($D7)+1,LEN($D7)))
=MID($D7,20,21-10)
=TRIM(RIGHT(SUBSTITUTE($D6,"$",REPT("$",2)),4))
Sometimes I get
FEB40-50(' OR 'FEB40-FEB5'
when it should be
'FEB40-FEB50'`
Thank you to who is able to help.
You might get to the limits of formulas with this scenario, but with Power Query you can still work.
As I see it, you want to apply the following logic to extract text from this string:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
text after the first : and before the first (
text between the brackets
text after the word OFFERS and before AT
text after 'AT`
These can be easily translated into several "Split" scenarios inside Power Query.
split by custom delimiter : - that's colon and space - for each ocurrence
remove first column
Split new first column by ( - that's space and bracket - for leftmost
Replace ) with nothing in second column
Split third column by delimiter OFFERS
split new fourth column by delimiter AT
The screenshot shows the input data and the result in the Power Query editor after renaming the columns and before loading the query into the worksheet.
Once you have loaded the query, you can add / remove data in the input table and simply refresh the query to get your results. No formulas, just clicking ribbon commands.
You can take this further by removing the "KB" from the column, convert it to a number, divide it by 100. Your business processing logic will drive what you want to do. Just take it one step at a time.
The attached image (link: https://i.stack.imgur.com/w0pEw.png) shows a range of cells (B1:B7) from a table I imported from the web. I need a formula that allows me to extract the names from each cell. In this case, my objective is to generate the following list of names, where each name is in its own cell: Erik Karlsson, P.K. Subban, John Tavares, Matthew Tkachuk, Steven Stamkos, Dustin Brown, Shea Weber.
I have been reading about left, right, and mid functions, but I'm confused by the irregular spacing and special characters (i.e. the box with question mark beside some names).
Can anyone help me extract the names? Thanks
Assuming that your cells follow the same format, you can use a variety of text functions to get the name.
This function requires the following format:
Some initial text, followed by
2 new lines in Excel (represented by CHAR(10)
The name, which consists of a first name, a space, then a last name
A second space on the same line as the name, followed by some additional text.
With this format, you can use the following formula (assuming your data is in an Excel table, with the column of initial data named Text):
=MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])),SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])))+1)-1)
To come up with this formula, we take the following steps:
First, we figure out where the name starts. We know this occurs after the 2 new lines, so we use:
=SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1
The inner (occurring second) SEARCH finds the first new line, and the outer (occurring first) finds the 2nd new line.
Now that we have that value, we can use it to determine the rest of the string (after the 2 new lines). Let's say that the previous formula was stored in a table column called Start of Name. The 2nd formula will then be:
=MID([#Text],[#[Start of Name]],LEN([#Text]))
Note that we're using the length of the entire text, which by definition is more than we need. However, that's not an issue, since Excel returns the smaller amount between the last argument to MID and the actual length of the text.
Once we have the text from the start of the name on, we need to calculate the position of the 2nd space (where the name ends). To do that, we need to calculate the position of the first space. This is similar to how we calculated the start of the name earlier (which starts after 2 new lines). The function we need is:
=SEARCH(" ",[#[Rest of String]],SEARCH(" ",[#[Rest of String]])+1)-1
So now, we know where the name starts (after 2 new lines), and where it ends (after the 2nd space). Assuming we have these numbers stored in columns named Start of Name and To Second Space respectively, we can use the following formula to get the name:
=MID([#Text],[#[Start of Name]],[#[To Second Space]])
This is equivalent to the first formula: The difference is that the first formula doesn't use any "helper columns".
Of course, if any cell doesn't match this format, then you'll be out of luck. Using Excel formulas to parse text can be finicky and inflexible. For example, if someone has a middle name, or someone has a initials with spaces (e.g. P.K. Subban was P. K. Subban), or there was a Jr. or something, your job would be a lot harder.
Another alternative is to use regular expressions to get the data you want. I would recommend this thorough answer as a primer. Although you still have the same issues with name formats.
Finally, there's the obligatory Falsehoods Programmers Believe About Names as a warning against assuming any kind of standardized name format.
I have a list of addresses from which I need to extract the last sequence of numbers (zip code). I'm looking for a general expression from which I can extract the zip codes from addresses from all over the world. I would have to tweak the expression in order for it to work for each country, or for a group of countries, I assume.
I'm trying to write a formula in excel that can recognise the last digit in a string, and from that, extract the numbers immediately before that last digit and stoping whenever it reaches a non-integer. Below I have an example of an address and the formula I've come up with (in E26), but I'm looking for something more compact:
National Institute of Pharmaceutical Education and Research (NIPER), Phase X, Sector 67, SAS Nagar, Punjab, 160062, India.
=MID(E26, MAX(IF(ISNUMBER(VALUE(MID(E26,ROW(INDIRECT("1:" & LEN(E26))),1))),ROW(INDIRECT("1:" & LEN(E26))))+1)-6, 6)
The first part of recognizing the last digit is working fine, the problem is to recognize the beggining of the sequence, at least in cases where there's also street numbers within the string (such as in this case). This is why I'm subtacting -6 to the position where the last digit was found, since I know the lenght of the zip code in this particular country. However, it may not be the case for all countries.
Plus there are cases, where there's a space between the sequence such as: 160 062. Also, they won't always have delimeters that I could use to extract the zip codes, hence, the reason why a need an algorithm for this.
I was wondering if there's a nitter way to do this? I would be open for VBA. Thanks for your help!
Best regards,
Antonio
Currently I am working with a spreadsheet where I have to collect addresses of UK business Directors. Some of the directors have multiple addresses. UK zip code consist of two segment and I have to ignore the address where the first segment of zip code starts with W1 , SW1, EC1, EC2, EC3 & EC4 and ends with character not number because those are generally industrial address. For example Please check the following addresses where the addresses starts with W1 , SW1, EC1, EC2, EC3 & EC4 and ends with letter and I have to ignore collecting those.
3RD FLOOR 207 REGENT STREET, LONDON, W1B 3HH
42, CHARTERHOUSE SQUARE, LONDON EC1, EC1M 6EU
5, WESTMINSTER GARDENS, LONDON, MARSHAM STREET, SW1P 4JA
160, QUEEN VICTORIA STREET, LONDON, EC4V 4QQ
THE BROADGATE TOWER 20, PRIMROSE STREET, LONDON, EC2A 2RS
BAKERS' HALL 9, HARP LANE, LONDON, EC3R 6DP
Also please take note the zip codes of W11,W12 W13 and so on are not prohibited and same thing applies for the SW1, EC1, EC2, EC3 & EC4.
Now as we are working on the bulk data, its impossible to notice the zip codes when collecting the address. So I have tried with combination of various formula to highlight the first segment of those mentioned zip codes in a separate column using just right after pasting the address. In this way I can see the prohibited zip codes right after pasting the address. Although I am not successful cracking on it but I was close of it. I am just sharing the thing I have tried and looking for a compact solution from you guys.
First of I have created a separate column E where it will show the first segment of the zip code when I will paste the addresses in the column D. Here is the code which I have used:
=TRIM(LEFT(RIGHT(" "&SUBSTITUTE(TRIM(D2)," ",REPT(" ",60)),120),60))
Next I have tried to show only those values which starts with W1 , SW1, EC1, EC2, EC3 & EC4 and ends with a letter. So I have created another column F and create the formula for "SW1"
=IF(NOT(ISNUMBER(VALUE(RIGHT(E2,1)))), IF(ISNUMBER(SEARCH("SW1",E2)), LEFT(E2,3), ""),"")
So if I have to check for W1 , EC1, EC2, EC3 & EC4, I have to create 5 more columns with the same formula where just have to change the value of search function. This lead me to 6 extra columns and I want a compact formula for savings the space because I generally split the browser and execl in a way that's why I can copy and paste data on the spreadsheet without minimizing the spreadsheet. This saves me a lots of time. But creating six more column will make my work more time consuming as I have to check all six columns for those zip codes.
Question - 1:
I want to ask, is there any way to make a compact formula for showing my desired result in a single column only?
Question - 2:
We also have to ignore the addresses which consist the words "floor","house" & "airport". I have tried the below formula for single query:
=IF(ISNUMBER(SEARCH("Floor",D2)), "Floor", "")
Is there any possibilities combing all required formula and show the result in one column?
Update regarding Question - 1:
I have tried to combine using some other formula to show the required result. But comes up with showing only those zip code which starts with W1 , SW1, EC1, EC2, EC3 & EC4 but can't modify it to restrict those results also where the last character is a number. Here is the code:
=IF(ISNUMBER(FIND("W1",(LEFT(E2,2)),1))=TRUE,LEFT(E2,2)&IF(NOT(ISNUMBER(VALUE(RIGHT(E2,1)))), RIGHT(E2,1),""),IF(ISNUMBER(FIND("SW1",E2,1))=TRUE,LEFT(E2,3)&IF(NOT(ISNUMBER(VALUE(RIGHT(E2,1)))), RIGHT(E2,1),""),IF(ISNUMBER(FIND("EC1",E2,1))=TRUE,LEFT(E2,3)&IF(NOT(ISNUMBER(VALUE(RIGHT(E2,1)))), RIGHT(E2,1),""),IF(ISNUMBER(FIND("EC2",E2,1))=TRUE,LEFT(E2,3)&IF(NOT(ISNUMBER(VALUE(RIGHT(E2,1)))), RIGHT(E2,1),""),IF(ISNUMBER(FIND("EC3",E2,1))=TRUE,LEFT(E2,3)&IF(NOT(ISNUMBER(VALUE(RIGHT(E2,1)))), RIGHT(E2,1),""),IF(ISNUMBER(FIND("EC4",E2,1))=TRUE,LEFT(E2,3)&IF(NOT(ISNUMBER(VALUE(RIGHT(E2,1)))), RIGHT(E2,1),""),""))))))
Your formula in column E could combine the last character is not a number with the parse of the first section of the postal code with something like the following.
=IF(ISERROR(--RIGHT(A2)), TRIM(LEFT(TRIM(RIGHT(A2, 8)), 4)), "")
That makes the parsing dependent on a British postal code being either 7 or 8 characters wide with the first section being either 3 or 4 characters. The 'wandering' space is either picked up and trimmed off or not picked up at all depending on the length.
With column Y listing the prefixes of the postal codes to be ignored and column Z (in any worksheet) devoted to a cross-reference list of exceptions like the following:
Note that each of the entries in the Ignore list and the Exceptions list all carry the wildcard asterisk as a suffix. This is necessary to deal with staggered lengths of the comparisons to be made.
The formula used for a Conditional Formatting Rule for A2:E999 would be,
=AND(SUMPRODUCT(COUNTIF($E2, $Y$2:$Y$7)), NOT(SUMPRODUCT(COUNTIF($C2, $Z$2:$Z$4))))
This resolves TRUE for any postal that should be ignored and is not in the exception list.
The cross-reference tables of ignores and exceptions may benefit from becoming a named range for easy reference. You could use a dynamic range definition for the Applies to: of something like:
=Sheet1!$Y$2:INDEX(Sheet1!$Y:$Y, MATCH("zzz", Sheet1!$Y:$Y))
=Sheet1!$Z$2:INDEX(Sheet1!$Z:$Z, MATCH("zzz", Sheet1!$Z:$Z))
Here's a formula to tell you whether the first word after the last comma ends with a number.
=ISNUMBER(NUMBERVALUE(RIGHT(LEFT(TRIM(RIGHT(SUBSTITUTE(B1,",",REPT(" ",LEN(B1))),LEN(B1))),SEARCH(" ",TRIM(RIGHT(SUBSTITUTE(B1,",",REPT(" ",LEN(B1))),LEN(B1))))-1))))
Is it possible to parse/cast text (like "=A1+A2") as a formula in MS Excel? I want to build a formula from pieces of text - some of which will only be typed in later by a user.
If the INDIRECT() function did not only work for referencing cells, then I could have typed this =INDIRECT("=A1+A2").
I know you can a work around this problem by simply adding a lot more hidden columns to do sub calculations. But for the sake scalability and efficiency, I would rather do something like the above.
I found a similar questions here and here, yet they don't solve my problem.
The Real-world problem:
Read on for a better understanding as to why you would want to do the above
Scenario
Each item in the list consists of a string, which contains anywhere from 1 to 5 account names each. Each account name is followed by an account number in brackets. The length of the number determines the type of account. Part of the account number is a date, of which the date format depends on the type of account. Further more, each account type may have more that 1 account-number length associated with it, although each number-length[*] is only associated with 1 account type.
Objectives
Extract account-names and their respective account-numbers and account-types from a list.
Make an assumption as to the account-type from the account-number
Validate this assumption by inspecting the build of the number and elements in the name
Check the validity of the account-numbers depending on their type.
The tricky part (this is where my problem lies)
The account-types and their respective account-number-lengths are not known before hand, and are typed into a table by the user of the sheet, specifying a type of account and the number-lengths associated with this account-type. The user should type this into a list - not go and tinker around with delicate formulas
Done so far
Column A: Contains the raw data (each cell has up to 5 names and numbers)
Columns B..F: Each column extracts 1 name, remains empty if all are already extracted
Columns G..K: Each column extracts 1 number corresponding to its name in columns B..F, remains empty if all are already extracted
Columns L..P: Each column calculates the length of the corresponding number in columns
G..K
Now the user would type the following details into a table which assigns certain number-lengths an account type:
TYPE2, BUSINESS, (OR(length=13,length=6))
where length will later be replaced with the cell address which contains the calculated account number-length.
What I want to do now
Columns Q..U:
Should all indicate the account-type of the corresponding account-number in columns G..K. The idea is to build a nested if-elseIf-elseIf formula using the criteria typed in by the user as specified above. Example of one of the elseIF statements:
SUBSTITUTE(CONCATENATE("=IF(",criteria,",",type,",",errCode)),"length","O10"))
All of these elseIf statements will then be concatenated together to form a master formula which will then need to be parsed/cast as a formula to calculate the account-type
This proposal uses only 5 columns (1 for each account-number, containing the master formula) and a table specifying account-types and criteria, also keeping the user away from formulas. Editing 1 line of code (the criteria) will update all formulas. Efficient & Scalable.
Since the user should never tinker around with the formulas under the hood, a simple 1 column if-elseIf-elseIf is out of the question. The alternative to the above would be to make a separate column to test for each account-type for each account-number. Separating/Abstracting out each test to its own column results in much better readability, easier editing & much less debugging - Unless you like multi-screen-wide-formulas. Example: 5 account-numbers * 10 possible account types = 50 extra columns.
Each edit to any criteria needs to copied to 4 other non-adjacent columns and drag-filled down 10,000 rows (columns can not be adjacent since it is effectively a 5x5 array of columns). Not Efficient nor scalable. Unless I'm missing some elegant way of updating non-adjacent formulas in a single click
The rest of the validations error indications are trivial.
Sample data
Tshepo Trust (6901/2005) Marlene Mead (8602250646085)
Great Force Inv 67 Pty Ltd (200602258007)
Jane (870811) Livingstone (6901/2005) Janette Appel (8503250647056) James (900111)
I know all this would probably be much easier to achieve with clever usage of VBA, eliminating all the need to simulate abstraction, encapsulation, multi-dimensional arrays and functional programming on a spreadsheet. But until I can program in VBA, worksheet formulas will be my refuge.
[*]: account number-length could also be described as the amount of digits in the number or as indicated by this formula: LEN(accNumber)
In VBA you have access to Cell.Formula.
I usually used Range to peek a cell by address.
I'm not sure if this would answer your question(it's a very detailed question!), but if your user was entering the account numbers in a table (I'm calling it 'RefTable') , that was:
Length of account number | business type
----------------------------------------
6 | Accountant
8 | Advisor
Then you could just use a vlookup on the length of the account number, given you've already separated them out.
=vlookup(len(accNumber), Reftable, 2, false)
Make sure that you either use a dynamic range name, or specify plenty of space below in RefTable, so that when your users add types, they don't get lost.
Also, if you have two different accounts with the same length, this could get you into trouble.