Pulling out specific characters that are combined in a single column

Pulling out specific characters that are combined in a single column - excel

I have a data extract that resulted in fields being combined into column A like this:
Sales Figures Report pg 121
Walmart Inc. 001230134 99 Associates Parkway 56.12 20.00 10.00 86.12 00 1
1400.25 262.40 14.50 1677.15 02 9
50.00 100.25 10.00 160.25 00 1
1400.25 262.40 14.50 1677.15 02 9
There are over 50,000 rows in this sort of format, some are a little different as they'll start with vendor information and then have those values after (still all in col. A). In the above example 1677.15 is the combined value of the three numbers before that, i.e. the total amount due.
Originally I wanted to basically separate out each value using things like left(), mid(), right() etc. however at this point all I want is the total figure, i.e. the $1677.15 in the above example. What is the best non-vba method of doing this?
Two issues:
The problem is that the amounts are not always the same number of digits (can range from $xx.xx to $xxx,xxx.xx)
Since There are multiple "." you can't use search() to find the correct character location.

In your example, the Total figure is always the third group from the end. That being the case, you can use the following formula and defined names:
=IFERROR(--INDEX(TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),seq_99,99)),LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-1),"")
Defined Names
seq refers to: =ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,255,1))
seq_99 refers to: =IF(seq=1,1,(seq-1)*99)
In the formula,
SUBSTITUTE: Replace all spaces with a large number (99) of spaces
MID Return an array consisting of each word (space separated)
TRIM the words to remove the extra spaces
INDEX Return the item that is third from the end.
IFERROR handle lines such as your first, which does not contain the relevant pattern.
seq and seq_99 return arrays of {1,2,3,...}and{1,99,198,...}` to be used in the formulas

Related

Excel Formula Extract any number greater than x charters from a string

I have a file which contains a list of data. In each cell is a name and number and a date the date is either mm/yy or mm-yy or mm-yyyy etc. (never the day just month and year)
The number I need is always going to be greater than 5 characters. Is there a way that I can get just the number from the string
xx company holding - 96923432 -02-22. (number required 96923432)
yy Company (HOLDINGS) LTD - 131002204 - 02/2023 (number required 131002204)
ab HOLDINGS LIMITED / 115472907 / Feb-23 (number required 115472907)

... prior removed
=========UPDATE=========
This formula will work for you, which splits your data by space, then converts to a number and then extracts the max. Adjust as needed if you have occasions where you may not have a number greater than 5 by wrapping with an IF().
=MAX(IFERROR(NUMBERVALUE(TEXTSPLIT(A2," ")),0))

This is interesting since you use 2 different delimiters. However, no worries you can simply use the following to capture both instances. If you have more possible delimiters simply just add them between the {} in both textbefore and textafter functions. Here is an example of the equation:=TEXTBEFORE(TEXTAFTER(A2, {"-","/"}), {"-","/"})
This should work for you then if you want to return nothing if output is less than 5. =IF(LEN(TEXTBEFORE(TEXTAFTER(A1,{"-","/"}),{"-","/"}))>5,TEXTBEFORE(TEXTAFTER(A1,{"-","/"}),{"-","/"}),"")

Is there an excel formula to extract numbers from the end of a string in a cell, where the length is not always constant

I am trying to separate information copied from a PDF table - id usually use text to columns but the only delamination is spaces and this then splits the data into multiple unusable columns
The data comes like this:
Raw Data
A1 Company 0
Company2 40000
name a 1
name b 15
name c 184
Big 17 Company 1887
I need the output to be:
Company
Units
A1 Company
0
Company2
40000
name a
1
name b
15
name c
184
Big 17 Company
1887
So the company name (that might contain numbers) is separated for the unit number (that could be 1-5 digits long).
I haven't been able to figure out a way that uses =len() as the string length isn't a constant mixed with the last numbers not being a consistent number of digits.
I'm currently using:
=SUMPRODUCT(MID(0&A2, LARGE(INDEX(ISNUMBER(--MID(A2, ROW(INDIRECT("1:"&LEN(A2))), 1)) * ROW(INDIRECT("1:"&LEN(A2))), 0), ROW(INDIRECT("1:"&LEN(A2))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(A2)))/10)
This gives me all the numbers in the cell - which works for 90% of the data as most of the company's don't have numbers in their name. But for something like 'A1 Company 0' it gives 10 as the output not just the 0. I then go and manually edit the small number of companies that this happens too.
I then use a mixture of =LEN() =LEFT and =RIGHT to split the information up as required for the further automated analysis.
I'd prefer a formula over VBA/macro
I cant provide the actual data but I hope I've given enough examples in the table above to show the main problems (different company name lengths, companies with numbers in their name, different amount of digits representing the units)

Using Libre Office, but this formula checks for the last space in the cell
=RIGHT(A1,LEN(A1)-FIND("#",SUBSTITUTE(A1," ","#",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1))
Taken from: https://trumpexcel.com/find-characters-last-position/

FILTERXML() would best choice for this case. Try-
=FILTERXML("<t><s>"&SUBSTITUTE(A1:A6," ","</s><s>")&"</s></t>","//s[last()]")
Details about FILTERXML() from JvdV here.

See if the following works for you:
Formula in B2:
=LEFT(A2,LEN(A2)-1-LEN(C2))
In C2:
=-LOOKUP(1,-RIGHT(A2,ROW($1:$5)))
For those users using ms365's newest functions:
=HSTACK(TEXTBEFORE(A2," ",-1),TEXTAFTER(A2," ",-1))

Is there a way to find any one of a set of characters using an excel formula

I have data that uses a range, or a less than symbol to denote 'between 0 and number'. But multiple characters are used for the same purpose.
It looks like below (first two columns), plus a column showing the results I want:
Country
Average hotdog consumption
Desired output
Madeupaland
10-200
105
Exampledesh
50—1000
525
Republic of Notreal
<1000
500
Inventia
≤5000
2500
Plus many rows where the data in the second column is purely numerical and doesn't need finessing into a number
I can use this formula to calculate the midpoint where there is a range:
=IFERROR(AVERAGE(LEFT(C2,FIND("–",C2)-1),RIGHT(C2, LEN(C2)-FIND("–",C2))), A2)
But they only covers one kind of dash(- and not —). Similarly, if I want to halve the numbers in rows with < and ≤ I'd need to replicate a formula there.
Is there a way of finding multiple different characters from a set? My understanding is that find looks for the whole string of characters. substitute is a work around, but I'd have to substitute every different value in the 'character set'.
In regex this would just be [-—].
I'm using Excel 2013 if that matters

It's not a perfect solution but you can try the following. This replaces those patterns of text with replacements representing which formula to use:
Create a Reference Table (I have made this in I1:K5)
|Pattern |Pattern Name |Substitution Rule |
|------- |------------ |----------------- |
|— |double dash |/2+0.5* |
|- |dash |/2+0.5* |
|< |lt |0.5* |
|≤ |lte |0.5* |
In your third column enter the following array formula (Using Ctrl + Shift + Enter to confirm)
=IF(ISNUMBER(B2),B2,"'="&SUBSTITUTE(B2,INDEX($I$2:$I$5,MIN(IF(ISNUMBER(FIND($I$2:$I$5,B2)),ROW($I$2:$I$5)-1,99))),INDEX($K$2:$K$5,MIN(IF(ISNUMBER(FIND($I$2:$I$5,B2)),ROW($I$2:$I$5),99)-1))))
Copy your third column and past values into a fourth column
Replace all the ''s with nothing to evaluate the expressions using Ctrl + H
My Result:
Country
Average hotdog consumption
Desired output
Formula Paste
Output after replacing 's
Madeupaland
10-200
105
'=10/2+0.5*200
105
Exampledesh
50—1000
525
'=50/2+0.5*1000
525
Republic of Notreal
<1000
500
'=0.5*1000
500
Inventia
≤5000
2500
'=0.5*5000
2500

Split text and number, then convert and add numbers

I have a series of values such as:
10RP
2.5R
5R
7.5R
10R
2.5YR
5YR
I want to convert the string portion to a number based on this table:
0 R
10 YR
20 Y
30 GY
40 G
50 BG
60 B
70 PB
80 P
90 RP
I then want to create two columns so that:
2.5YR
becomes:
2.5 10
In a third column I will add the two numbers together.
Can this be done just using formulas? I want to avoid using VBA if I can.
Thanks.

Here's another approach.
seq is a defined name referring to an array constant ={1,2,3,4,5}
If you might have numbers that encompass more than five characters, just extend the constant appropriately.
Number part: =LOOKUP(9E+307,--MID(A1,1,seq))
Letter portion converted to number:
=VLOOKUP(MID(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq)+1,9),$F$1:$G$10,2,FALSE)
Where your table is in F1:G10 and reversed so that the letters are in the first column

This might not be the most efficient but should work
=IFERROR((LEFT(F3,LEN(F3)-2)+VLOOKUP(RIGHT(F3,2),$A$3:$B$12,2,0)),
(LEFT(F3,LEN(F3)-1)+VLOOKUP(RIGHT(F3,1),$A$3:$B$12,2,0)))
Where your lookup table is A3:B12 with the letters in the left-most column
Check for the two-letter combinations before the single-letter ones.

Destination, prefix lookup via Phone number - Excel

I have two tables.
Table one contains: phone number list
Table Two contains: prefix and destination list
I want look up prefix and destination for phone number.
Given below Row data table and result table
Table 01 ( Phone Number List)
Phone Number
------------
12426454407
12865456546
12846546564
14415332165
14426546545
16496564654
16896546564
16413216564
Table 02 (Prefix and Destination List)
PREFIX |COUNTRY
-------+---------------------
1 |Canada_USA_Fixed
1242 |Bahamas
1246 |Barbados
1268 |Antigua
1284 |Tortola
1340 |Virgin Islands - US
1345 |Cayman Island
144153 |Bermuda-Mobile
1473 |Grenada
1649 |Turks and Caicos
1664 |Montserrat
Table 03 (Result)
Phone Number | PREFIX | COUNTRY
--------------+--------+-------------------
12426454407 | 1242 | Bahamas
12865456546 | 1 | Canada_USA_Fixed
12846546564 | 1284 | Tortola
14415332165 | 144153 | Bermuda-Mobile
14426546545 | 1 | Canada_USA_Fixed
16496564654 | 1649 | Turks and Caicos
16896546564 | 1 | Canada_USA_Fixed
16643216564 | 1664 | Montserrat

Lets assume phone numbers are in column A, now in column B you need to extract the prefix. Something like this:
=LEFT(A1, 4)
However your Canada_USA_Fixed creates problems as does the Antigua mobile. I'll let you solve this issue yourself. Start with IF statements.
Now that you have extracted the prefix you can easily use VLOOKUP() to get the country.

Assuming that the longest prefix is 6 digits long you, can add 6 columns (B:G) next to the column with the phone numbers in table 1 (I assume this is column A). In column B you'd show the first 6 characters using =LEFT(A2,6), in the next column you show 5 chars, etc.
Then you add another 6 columns (H:M) , each doing a =MATCH(B2,Table2!A:A,0) to see if this prefix is in the list of prefixes.
Now if any of the 6 potential prefixes match, you'll get the row number of the prefix - else you'll get an #N/A error. Put the following formula in column N: {=INDEX(H2:M2,MATCH(FALSE,ISERROR(H2:M2),0))} - enter the formula as an array formula, i.e. instead of pressing Enter after entering it, press Ctrl-Shift-Enter - you'll see these {} around the formula then, so don't enter those manually!.
Column N now contains the row of the matching prefix or #N/A if no prefix matches. Therefore, put =IF(ISNA(N2,'No matching prefix',INDEX(Table2!B:B,N2)) in the next column and you'll be done.
You could also the above approach with less columns but more complex formulas but I wouldn't recommend it.

I'm also doing longest prefix matches and, like everyone else that Google has turned up, it's also for international phone number prefixes!
My solution is working for my table of 200 prefixes (including world zone 1, ie. having 1 for US/Canada and 1242 for Bahamas, etc).
Firstly you need this array formula (which I'm going to call "X" in the following but you'll want to type out in full)
(LEFT(ValueToFind,LEN(PrefixArray))=PrefixArray)*LEN(PrefixArray)
This uses the trick of multiplying a logical value with an integer so the result is zero if there's no match. You use this find the maximum value in one cell (which I'm calling "MaxValue").
{=MAX(X)}
If MaxValue is more than zero (and therefore some sort of match was found), you can then find the position of the maximum value in your prefix array.
{=MATCH(MaxValue,X,0)}
I've not worried about duplicates here - you can check for them in your PrefixArray separately.
Notes for neophytes:
PrefixArray should be an absolute reference, either stated with lots of $ or as a "named range".
I'm assuming you'll make ValueToFind, MaxValue and the resultant index into PrefixArray as cells on the same row, and therefore have a $ against their column letter but not their row number. This allows easy pasting for lots of rows of ValueToFind.
Array formula are indicated by curly braces, but are entered by typing the text without the curly braces and then hitting Ctrl-Shift-Enter.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pulling out specific characters that are combined in a single column - excel

Related

Excel Formula Extract any number greater than x charters from a string

Is there an excel formula to extract numbers from the end of a string in a cell, where the length is not always constant

Is there a way to find any one of a set of characters using an excel formula

Split text and number, then convert and add numbers

Destination, prefix lookup via Phone number - Excel

Categories

Resources