Is there a way to find any one of a set of characters using an excel formula - excel

I have data that uses a range, or a less than symbol to denote 'between 0 and number'. But multiple characters are used for the same purpose.
It looks like below (first two columns), plus a column showing the results I want:
Country
Average hotdog consumption
Desired output
Madeupaland
10-200
105
Exampledesh
50—1000
525
Republic of Notreal
<1000
500
Inventia
≤5000
2500
Plus many rows where the data in the second column is purely numerical and doesn't need finessing into a number
I can use this formula to calculate the midpoint where there is a range:
=IFERROR(AVERAGE(LEFT(C2,FIND("–",C2)-1),RIGHT(C2, LEN(C2)-FIND("–",C2))), A2)
But they only covers one kind of dash(- and not —). Similarly, if I want to halve the numbers in rows with < and ≤ I'd need to replicate a formula there.
Is there a way of finding multiple different characters from a set? My understanding is that find looks for the whole string of characters. substitute is a work around, but I'd have to substitute every different value in the 'character set'.
In regex this would just be [-—].
I'm using Excel 2013 if that matters

It's not a perfect solution but you can try the following. This replaces those patterns of text with replacements representing which formula to use:
Create a Reference Table (I have made this in I1:K5)
|Pattern |Pattern Name |Substitution Rule |
|------- |------------ |----------------- |
|— |double dash |/2+0.5* |
|- |dash |/2+0.5* |
|< |lt |0.5* |
|≤ |lte |0.5* |
In your third column enter the following array formula (Using Ctrl + Shift + Enter to confirm)
=IF(ISNUMBER(B2),B2,"'="&SUBSTITUTE(B2,INDEX($I$2:$I$5,MIN(IF(ISNUMBER(FIND($I$2:$I$5,B2)),ROW($I$2:$I$5)-1,99))),INDEX($K$2:$K$5,MIN(IF(ISNUMBER(FIND($I$2:$I$5,B2)),ROW($I$2:$I$5),99)-1))))
Copy your third column and past values into a fourth column
Replace all the ''s with nothing to evaluate the expressions using Ctrl + H
My Result:
Country
Average hotdog consumption
Desired output
Formula Paste
Output after replacing 's
Madeupaland
10-200
105
'=10/2+0.5*200
105
Exampledesh
50—1000
525
'=50/2+0.5*1000
525
Republic of Notreal
<1000
500
'=0.5*1000
500
Inventia
≤5000
2500
'=0.5*5000
2500

Related

Is there an excel formula to extract numbers from the end of a string in a cell, where the length is not always constant

I am trying to separate information copied from a PDF table - id usually use text to columns but the only delamination is spaces and this then splits the data into multiple unusable columns
The data comes like this:
Raw Data
A1 Company 0
Company2 40000
name a 1
name b 15
name c 184
Big 17 Company 1887
I need the output to be:
Company
Units
A1 Company
0
Company2
40000
name a
1
name b
15
name c
184
Big 17 Company
1887
So the company name (that might contain numbers) is separated for the unit number (that could be 1-5 digits long).
I haven't been able to figure out a way that uses =len() as the string length isn't a constant mixed with the last numbers not being a consistent number of digits.
I'm currently using:
=SUMPRODUCT(MID(0&A2, LARGE(INDEX(ISNUMBER(--MID(A2, ROW(INDIRECT("1:"&LEN(A2))), 1)) * ROW(INDIRECT("1:"&LEN(A2))), 0), ROW(INDIRECT("1:"&LEN(A2))))+1, 1) * 10^ROW(INDIRECT("1:"&LEN(A2)))/10)
This gives me all the numbers in the cell - which works for 90% of the data as most of the company's don't have numbers in their name. But for something like 'A1 Company 0' it gives 10 as the output not just the 0. I then go and manually edit the small number of companies that this happens too.
I then use a mixture of =LEN() =LEFT and =RIGHT to split the information up as required for the further automated analysis.
I'd prefer a formula over VBA/macro
I cant provide the actual data but I hope I've given enough examples in the table above to show the main problems (different company name lengths, companies with numbers in their name, different amount of digits representing the units)
Using Libre Office, but this formula checks for the last space in the cell
=RIGHT(A1,LEN(A1)-FIND("#",SUBSTITUTE(A1," ","#",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))),1))
Taken from: https://trumpexcel.com/find-characters-last-position/
FILTERXML() would best choice for this case. Try-
=FILTERXML("<t><s>"&SUBSTITUTE(A1:A6," ","</s><s>")&"</s></t>","//s[last()]")
Details about FILTERXML() from JvdV here.
See if the following works for you:
Formula in B2:
=LEFT(A2,LEN(A2)-1-LEN(C2))
In C2:
=-LOOKUP(1,-RIGHT(A2,ROW($1:$5)))
For those users using ms365's newest functions:
=HSTACK(TEXTBEFORE(A2," ",-1),TEXTAFTER(A2," ",-1))

Split text and number, then convert and add numbers

I have a series of values such as:
10RP
2.5R
5R
7.5R
10R
2.5YR
5YR
I want to convert the string portion to a number based on this table:
0 R
10 YR
20 Y
30 GY
40 G
50 BG
60 B
70 PB
80 P
90 RP
I then want to create two columns so that:
2.5YR
becomes:
2.5 10
In a third column I will add the two numbers together.
Can this be done just using formulas? I want to avoid using VBA if I can.
Thanks.
Here's another approach.
seq is a defined name referring to an array constant ={1,2,3,4,5}
If you might have numbers that encompass more than five characters, just extend the constant appropriately.
Number part: =LOOKUP(9E+307,--MID(A1,1,seq))
Letter portion converted to number:
=VLOOKUP(MID(A1,LOOKUP(2,1/ISNUMBER(-MID(A1,1,seq)),seq)+1,9),$F$1:$G$10,2,FALSE)
Where your table is in F1:G10 and reversed so that the letters are in the first column
This might not be the most efficient but should work
=IFERROR((LEFT(F3,LEN(F3)-2)+VLOOKUP(RIGHT(F3,2),$A$3:$B$12,2,0)),
(LEFT(F3,LEN(F3)-1)+VLOOKUP(RIGHT(F3,1),$A$3:$B$12,2,0)))
Where your lookup table is A3:B12 with the letters in the left-most column
Check for the two-letter combinations before the single-letter ones.

Pulling out specific characters that are combined in a single column

I have a data extract that resulted in fields being combined into column A like this:
Sales Figures Report pg 121
Walmart Inc. 001230134 99 Associates Parkway 56.12 20.00 10.00 86.12 00 1
1400.25 262.40 14.50 1677.15 02 9
50.00 100.25 10.00 160.25 00 1
1400.25 262.40 14.50 1677.15 02 9
There are over 50,000 rows in this sort of format, some are a little different as they'll start with vendor information and then have those values after (still all in col. A). In the above example 1677.15 is the combined value of the three numbers before that, i.e. the total amount due.
Originally I wanted to basically separate out each value using things like left(), mid(), right() etc. however at this point all I want is the total figure, i.e. the $1677.15 in the above example. What is the best non-vba method of doing this?
Two issues:
The problem is that the amounts are not always the same number of digits (can range from $xx.xx to $xxx,xxx.xx)
Since There are multiple "." you can't use search() to find the correct character location.
In your example, the Total figure is always the third group from the end. That being the case, you can use the following formula and defined names:
=IFERROR(--INDEX(TRIM(MID(SUBSTITUTE(A1," ",REPT(" ",99)),seq_99,99)),LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-1),"")
Defined Names
seq refers to: =ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,255,1))
seq_99 refers to: =IF(seq=1,1,(seq-1)*99)
In the formula,
SUBSTITUTE: Replace all spaces with a large number (99) of spaces
MID Return an array consisting of each word (space separated)
TRIM the words to remove the extra spaces
INDEX Return the item that is third from the end.
IFERROR handle lines such as your first, which does not contain the relevant pattern.
seq and seq_99 return arrays of {1,2,3,...}and{1,99,198,...}` to be used in the formulas

How do I sum data based on a PART of the headers name?

Say I have columns
/670 - White | /650 - black | /680 - Red | /800 - Whitest
These have data in their rows. Basically, I want to SUM their values together if their headers contain my desired string.
For modularity's sake, I wanted to merely specify to sum /670, /650, and /680 without having to mention the rest of the header text.
So, something like =SUMIF(a1:c1; "/NUM & /NUM & /NUM"; a2:c2)
That doesn't work, and honestly I don't know what i should be looking for.
Additional stuff:
I'm trying to think of the answer myself, is it possible to mention the header text as condition for ifs? Like: if A2="/650 - Black" then proceed to sum the next header. Is this possible?
Possibility it would not involve VBA, a draggable formula would be preferable!
At this point, I may as well request a version which handles the complete header name rather than just a part of it as I believe it to be difficult for formula code alone.
Thanks for having a look!
Let me know if I need to elaborate.
EDIT: In regards to data samples, any positive number will do actually, damn shame stack overflow doesn't support table markdown. Anyway, for example then..:
+-------------+-------------+-------------+-------------+-------------+
| A | B | C | D | E |
+---+-------------+-------------+-------------+-------------+-------------+
| 1 |/650 - Black |/670 - White |/800 - White |/680 - Red |/650 - Black |
+---+-------------+-------------+-------------+-------------+-------------+
| 2 | 250 | 400 | 100 | 300 | 125 |
+---+-------------+-------------+-------------+-------------+-------------+
I should have clarified:
The number range for these headers would go from /100 - /9999 and no more than that.
EDIT:
Progress so far:
https://docs.google.com/spreadsheets/d/1GiJKFcPWzG5bDsNt93eG7WS_M5uuVk9cvkt2VGSbpxY/edit?usp=sharing
Formula:
=SUMPRODUCT((A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($H$1)=4,$H$1&"",$H$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($I$1)=4,$I$1&"",$I$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))))
Apparently, each MID function is returning false with each F9 calculation.
EDIT EDIT:
Okay! I found my issue, it's the /being read when you ALSO mentioned that it wasn't required. Man, I should stop skimming!
Final Edit:
=SUMPRODUCT((RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match5)=4,Match5&"",Match5&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match6)=4,Match6&"",Match6&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match7)=4,Match7&"",Match7&" ")))
The idea is that Header and RETURNSUM will become match criteria like the matches written above, that way it would be easier to punch new criterion into the search table. As of the moment, it doesn't support multiple rows/dragging.
I have knocked up a couple of formulas that will achieve what you are looking for. For ease I have made the search input require the number only as pressing / does not automatically type into the formula bar. I apologise for the length of the answer, I got a little carried away with the explanation.
I have set this up for 3 criteria located in J1, K1 and L1.
Here is the output I achieved:
Formula 1 - SUMPRODUCT():
=SUMPRODUCT((A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($K$1)=4,$K$1&"",$K$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($L$1)=4,$L$1&"",$L$1&" "))))
Sumproduct(array1,[array2]) behaves as an array formula without needed to be entered as one. Array formulas break down ranges and calculate them cell by cell (in this example we are using single rows so the formula will assess columns seperately).
(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))
Essentially I have broken the Sumproduct() formula into 3 identical parts - 1 for each search condition. (A4:G4*: Now, as the formula behaves like an array, we will multiply each individual cell by either 1 or 0 and add the results together.
1 is produced when the next part of the formula is true and 0 for when it is false (default numeric values for TRUE/FALSE).
(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))
MID(text,start_num,num_chars) is being used here to assess the 4 digits after the "/" and see whether they match with the number in the 3 cells that we are searching from (in this case the first one: J1). Again, as SUMPRODUCT() works very much like an array formula, each cell in the range will be assessed individually.
I have then used the IF(logical_test,[value_if_true],[value_if_false]) to check the length of the number that I am searching. As we are searching for a 4 digit text string, if the number is 4 digits then add nothing ("") to force it to a text string and if it is not (as it will have to be 3 digits) add 1 space to the end (" ") again forcing it to become a text string.
The formula will then perform the calculation like so:
The MID() formula produces the array: {"650 ","670 ","800 ","680 ","977 ","9999","143 "}. This combined with the first search produces {TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} which when multiplied by A4:G4
(remember 0 for false and 1 for true) produces this array: {250,0,0,0,0,0,0} essentially pulling the desired result ready to be summed together.
Formula 2: =SUM(IF(Array)): [This formula does not work for 3 digit numbers as they will exist within the 4 digit numbers! I have included it for educational purposes only]
=SUM(IF(ISNUMBER(SEARCH($J$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($K$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($L$1,$A$1:$G$1)),A8:G8))
The formula will need to be entered as an array (once copy and pasted while still in the formula bar hit CTRL+SHIFT+ENTER)
This formula works in a similar way, SUM() will add together the array values produced where IF(ISNUMBER(SEARCH() columns match the result column.
SEARCH() will return a number when it finds the exact characters in a cell which represents it's position in number of characters. By using ISNUMBER() I am avoiding having to do the whole MID() and IF(LEN()=4,""," ") I used in the previous formula as TRUE/FALSE will be produced when a match is found regardless of it's position or cell formatting.
As previously mentioned, this poses a problem as 999 can be found within 9999 etc.
The resulting array for the first part is: {250,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} (if you would like to see the array you can highlight that part of the formula and calculate with F9 but be sure to highlight the exact brackets for that part of the formula).
I hope I have explained this well, feel free to ask any questions about stuff that you don't understand. It is good to see people keen to learn and not just fishing for a fast answer. I would be more than happy to help and explain in more depth.
I start this solution with the names in an array, you can read the header names into an array with not too much difficulty.
Sub test()
Dim myArray(1 To 4) As String
myArray(1) = "/670 - White"
myArray(2) = "/650 - black"
myArray(3) = "/680 - Red"
myArray(4) = "/800 - Whitest"
For Each ArrayValue In myArray
'Find position of last character
endposition = InStr(1, ArrayValue, " - ", vbTextCompare)
'Grab the number section from the string, based on starting and ending positions
stringvalue = Mid(ArrayValue, 2, endposition - 2)
'Convert to number
NumberValue = CLng(stringvalue)
'Add to total
Total = Total + NumberValue
Next ArrayValue
'Print total
Debug.Print Total
End Sub
This will print the answer to the debug window.

Destination, prefix lookup via Phone number - Excel

I have two tables.
Table one contains: phone number list
Table Two contains: prefix and destination list
I want look up prefix and destination for phone number.
Given below Row data table and result table
Table 01 ( Phone Number List)
Phone Number
------------
12426454407
12865456546
12846546564
14415332165
14426546545
16496564654
16896546564
16413216564
Table 02 (Prefix and Destination List)
PREFIX |COUNTRY
-------+---------------------
1 |Canada_USA_Fixed
1242 |Bahamas
1246 |Barbados
1268 |Antigua
1284 |Tortola
1340 |Virgin Islands - US
1345 |Cayman Island
144153 |Bermuda-Mobile
1473 |Grenada
1649 |Turks and Caicos
1664 |Montserrat
Table 03 (Result)
Phone Number | PREFIX | COUNTRY
--------------+--------+-------------------
12426454407 | 1242 | Bahamas
12865456546 | 1 | Canada_USA_Fixed
12846546564 | 1284 | Tortola
14415332165 | 144153 | Bermuda-Mobile
14426546545 | 1 | Canada_USA_Fixed
16496564654 | 1649 | Turks and Caicos
16896546564 | 1 | Canada_USA_Fixed
16643216564 | 1664 | Montserrat
Lets assume phone numbers are in column A, now in column B you need to extract the prefix. Something like this:
=LEFT(A1, 4)
However your Canada_USA_Fixed creates problems as does the Antigua mobile. I'll let you solve this issue yourself. Start with IF statements.
Now that you have extracted the prefix you can easily use VLOOKUP() to get the country.
Assuming that the longest prefix is 6 digits long you, can add 6 columns (B:G) next to the column with the phone numbers in table 1 (I assume this is column A). In column B you'd show the first 6 characters using =LEFT(A2,6), in the next column you show 5 chars, etc.
Then you add another 6 columns (H:M) , each doing a =MATCH(B2,Table2!A:A,0) to see if this prefix is in the list of prefixes.
Now if any of the 6 potential prefixes match, you'll get the row number of the prefix - else you'll get an #N/A error. Put the following formula in column N: {=INDEX(H2:M2,MATCH(FALSE,ISERROR(H2:M2),0))} - enter the formula as an array formula, i.e. instead of pressing Enter after entering it, press Ctrl-Shift-Enter - you'll see these {} around the formula then, so don't enter those manually!.
Column N now contains the row of the matching prefix or #N/A if no prefix matches. Therefore, put =IF(ISNA(N2,'No matching prefix',INDEX(Table2!B:B,N2)) in the next column and you'll be done.
You could also the above approach with less columns but more complex formulas but I wouldn't recommend it.
I'm also doing longest prefix matches and, like everyone else that Google has turned up, it's also for international phone number prefixes!
My solution is working for my table of 200 prefixes (including world zone 1, ie. having 1 for US/Canada and 1242 for Bahamas, etc).
Firstly you need this array formula (which I'm going to call "X" in the following but you'll want to type out in full)
(LEFT(ValueToFind,LEN(PrefixArray))=PrefixArray)*LEN(PrefixArray)
This uses the trick of multiplying a logical value with an integer so the result is zero if there's no match. You use this find the maximum value in one cell (which I'm calling "MaxValue").
{=MAX(X)}
If MaxValue is more than zero (and therefore some sort of match was found), you can then find the position of the maximum value in your prefix array.
{=MATCH(MaxValue,X,0)}
I've not worried about duplicates here - you can check for them in your PrefixArray separately.
Notes for neophytes:
PrefixArray should be an absolute reference, either stated with lots of $ or as a "named range".
I'm assuming you'll make ValueToFind, MaxValue and the resultant index into PrefixArray as cells on the same row, and therefore have a $ against their column letter but not their row number. This allows easy pasting for lots of rows of ValueToFind.
Array formula are indicated by curly braces, but are entered by typing the text without the curly braces and then hitting Ctrl-Shift-Enter.

Resources