Extract number from column - excel

i have a lot of address data in (mostly) this format:
Karl-Reimann-Ring 13, 99087 Erfurt
Markttwiete 2, 23611 Bad Schwartau
Hüxstraße 55, 23552 Lübeck
Bunsenstraße 1c, 24145 Kiel
and my goal is to extract the zip code.
I copied a formula from a website, which i dont really understand:
=VERWEIS(9^9;1*TEIL(E2861&"#";SPALTE(2860:2860);6))
VERWEIS = LOOKUP,
TEIL = MID,
SPALTE = COLUMN
This formula seems to work 99% of the time, also for the ones above, but for some i get weird results.
Kurt-Schumacher-Straße 56, 55124 Mainz --> 44340
Kleine Früchtstraße 6, 55130 Mainz --> 44346
Bahnstraße 1, 55128 Mainz --> 44344
All with 'Mainz' are wrong and start with 44xxx
But when i increase the last argument from 6 to 7 it seems to work.
Do somebody know, how i can impove this formula to get always the correct zip code?

The problem is that the formula will return the last "number" which is constructed of 6 character strings starting at every character in the string.
The last substring that can be interpreted numerically (in the 55424 Mainz address) is actually 24 Mai. German Excel will parse that into 24 Mai 2021 which, as a number, will be 44340.
One modification you can make to your formula, to prevent that from happening, would be to add a comma after the zipcode. eg:
=LOOKUP(9^9;1*MID(SUBSTITUTE(A1;" ";",") & "#";COLUMN(2860:2860);6))
Another option would be to use FILTERXML where you can separate by spaces, and then return the last numeric node:
=FILTERXML("<t><s>" &SUBSTITUTE(A1;" ";"</s><s>") & "</s></t>";"//s[number(.) = number(.)][last()]")

Related

How to fix the #SPILL! Error by displaying only the second value?

I have a column with some info displayed like that:
Product Info
I am the 3rd product from 2020
I was created in 1995 and I went public in 2021
I am a not sure if I'm from 2019 2020 2021
I have a formula to extract the year in the above column that is:
=IFERROR(FILTERXML("<k><m>"&SUBSTITUTE([#[Product Name]]," ","</m><m>")&"</m></k>","//m[.=number() and string-length()=4]"),"")
The problem with this formula is that it works fine with the first case, but it gives me a #SPILL! Error on the other two cases. My ideal output would be:
Product Info
Year
I am the 3rd product from 2020
2020
I was created in 1995 and I went public in 2021
2021
I am a not sure if I'm from 2019 2020 2021
Basically, for the first case, just return the 4 digits. EVERY time that I only have one sequence of 4 digits, I want to return that sequence.
For the second case, I want to return ONLY the second year. EVERY time I have 2 sequences of 4 digits, I want to return ONLY the second year.
For the third case, I want to return nothing. EVERY time I have more than 2 sequences of 4 digits, I want to return blank.
The last thing I tried to add was position()>5 and that would cut off the 1995 in the second example, but I would continue having the Error on the third example. Also, my list is quite huge, and I am not sure if the position()>5 thing would work for ALL products that fall in the same second example.
I am not very good with XPATH, so any help would be greatly appreciated.
Thank you!
Disclaimer: Below solution is written on the assumption that when 'count of years < 3', return the last given year. If 'count >= 3' then only return the last year if years come in pairs of two. Hence the use of 'modulus 2 == 0'.‡
You can expand the xpath for sure if you so desire. However, I'd rewrite it a little bit. Each predicate, the structure between the opening and closing square brackets, is a filter of a given nodelist. To write multiple of these structures is in fact anding such predicates. To get a better understanding of what most common xpath 1.0 functions can do within FILTERXML(), I'd like to redirect you to this post.
So to write a consecutive pattern of predicates I'd opt for:
[.*0=0] - First return a filtered nodelist of all numbers where a node multiplied by zero equals zero;
[string-length()=4] - Then return only those that are 4 characters long‡‡;
[position() = last() and (position() = 1 or position() mod 2 = 0)] - The 3rd and last predicate is the trickiest for your query. This is done with a first check that position() = last() meaning the node needs to be the last node in the filtered nodelist of step 2 and (position() = 1 or position() mod 2 = 0) means we want to check that this node is also at the 1st index or the modulus 2 of the indexed position equals 0‡‡‡.
Formula in B2:
=IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=4][position() = last() and (position() = 1 or position() mod 2 = 0)]"),"")
Whilst the above would work for Excel 2013 and higher‡‡‡‡, you do talk about spilled behaviour. If you happen to work with the current channel in ms365 you could also try:
=LET(x,TEXTSPLIT(A2," "),y,--FILTER(x,ISNUMBER(-(x&"**0"))*(LEN(x)=4),{1,2,3}),z,COUNT(y),IF(OR(z=1,MOD(z,2)=0),TAKE(y,,-1),""))
‡ If you need to simply return the last year if 'count < 3' then you can use xpath "//s[.*0=0][string-length()=4][position()<3 and position() = last()]" or ms365 formula =LET(x,TEXTSPLIT(A2," "),y,FILTER(x,ISNUMBER(-(x&"**0"))*(LEN(x)=4),""),IF(COUNTA(y)>2,"",TAKE(y,,-1))).
‡‡ Note that you can be more strict about this if you'd wish to validate that a year is between say 1900-2050 or so. One could replace the 1st and 2nd predicate with [.*1>1899][.*1<2051].
‡‡‡ Note that the order or writing your and/or statements in xpath do matter. We need to use explicit parentheses to control the precedence. See this
‡‡‡‡ This is not true for Excel Online or Excel for Mac
Just add a simple clause to determine the number of returns, for example using ROWS (since by default FILTERXML returns a vertical array):
=LET(
ζ, FILTERXML(
"<k><m>" &
SUBSTITUTE(
[#[Product Name]],
" ",
"</m><m>"
) & "</m></k>",
"//m[.=number() and string-length()=4]"
),
ξ, ROWS(ζ),
IF(ξ > 2, "", INDEX(ζ, ξ))
)
Edit: I might prefer to avoid FILTERXML here:
=LET(
ζ, TEXTSPLIT([#[Product Name]], " "),
ξ, -(ζ & "**0"),
IF(COUNT(ξ) > 2, "", IFERROR(-LOOKUP(1, FILTER(ξ, LEN(ζ) = 4)), ""))
)
You can try the following using TEXTAFTER function. Assuming you have years at the end delimited by space. If that is not the case, the formula can be adapted to have additional checks (it is a number and four-digit, but strictly speaking a year can have less or more than 4 digits). Let me know if the previous assumption doesn't apply so I can try to adapt it. The following is an array version, so you can use the entire table column in case you are using excel tables:
=LET(in,A2:A4,last,TEXTAFTER(in," ",-1),
IF(ISNUMBER(1*TEXTAFTER(SUBSTITUTE(in," "&last,"")," ",-1)),"",last))
For the case of more than one year, it removes the last year found, and if the second search is a number, then it returns empty, otherwise returns the previous year found.

How do I find the last number in a string with excel formulas

I'm parsing strings in excel, and I need to return everything through the last number. For example:
Input: A00XX
Output: A00
In my case, I know the last number will be between index 3 and 5, so I'm brute-forcing it with:
=LEFT([#Point],
IF(SUM((MID([#Point],5,1)={"0","1","2","3","4","5","6","7","8","9"})+0),5,
IF(SUM((MID([#Point],4,1)={"0","1","2","3","4","5","6","7","8","9"})+0),4,
IF(SUM((MID([#Point],3,1)={"0","1","2","3","4","5","6","7","8","9"})+0),3,
))))
Unfortunately, I've run into some edge cases where the numbers extended beyond index 5. Is there a generic way to find the last number in a string using excel formulas?
Note:
I've tried =MAX(SEARCH(... but it returns the index of the first number, not the last.
As a starting point: if we know the position of the last number, we can use LEFT to get the string to that point. Suppose that the position is 5:
=LEFT(A1, 5)
But, we don't know the position of the last number. Now, what if the only valid number was 0, and it only appeared once: then we could use FIND to locate the position of the number:
=LEFT(A1, FIND(0, A1))
But, we have more than one valid number. Suppose that we had all the numbers from 0 through 9, but each number could only appear once — then we could use MAX on a FIND array, to tell us which of the numbers is the last one:
=LEFT(A1, MAX(FIND({0,1,2,3,4,5,6,7,8,9}, A1)))
Unfortunately, FIND will throw a #VALUE! error any number doesn't appear, which will then make MAX return the same error. So, we need to fix that with IFERROR:
=LEFT(A1, MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9}, A1), 0)))
However, numbers can appear more than once. As such, we need a method to find the last occurrence of a value in a string (since FIND and SEARCH will, by default, return the first occurrence).
The SUBSTITUTE function has 3 mandatory arguments — Initial String, Value to be Replaced, Value to Replace with — and one Optional argument — the occurrence to replace. Normally, this is omitted, so that all occurrences are replaced. But, if we know how many times a character appears in a string, then we can replace just the last instance with a special/uncommon sub-string to search for.
To count how many times a character appears in a String, just start with the length of the String, then subtract the length when you SUBSTITUTE all copies of that character for Nothing:
=LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))
This means we can now replace the last occurrence of the character with, for example, ">¦<", and then FIND that:
=FIND(">¦<", SUBSTITUTE(A1, 0, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))))
Of course, we want to do this for all the numbers from 0 to 9, and take the MAX value (remembering our IFERROR), so we need to put the Array of values back in:
=MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0))
Then, we plug that all back into our initial LEFT function:
=LEFT(A1, MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0)))
An alternative, assuming that the length of the string in question will never be more than 9 characters (which seems a safe assumption based on your description):
=LEFT(A1,MATCH(0,0+ISERR(0+MID(A1,{1;2;3;4;5;6;7;8;9},1))))
This, depending on your version of Excel, may or may not require committing with CTRL+SHIFT+ENTER.
Note also that the separator within the array constant {1;2;3;4;5;6;7;8;9} is the semicolon, which, for English-language versions of Excel, represents the row-separator. This may require amending if you are using a non-English-language version.
Of course, we can replace this static constant with a dynamic construction. However, since we are already making the assumption that 9 is an upper limit on the number of characters for the string in question, this would not seem to be necessary.
If you have the newest version of Excel, you can try something like:
=LEFT(D1,
LET(x, SEQUENCE(LEN(D1)),
MAX(IF(ISNUMBER(NUMBERVALUE(MID(D1, SEQUENCE(LEN(D1)), 1))), x))))
For example:

EXCEL: Unique alphanumeric code with certain characters excluded (without VBA / duplicates)

I am trying to create a list =5 alphanumeric characters.
They cannot contain 1, and i and there cannot be duplicates when dragging / copying the code down.
The characters that are allowed are:
023456789ABCDEFGHJKLMNOPQRSTUWVXYZ (Capital)
I have tried numerous of options but I can't seem to figure this one out.
Cheers
If your allowable character string is in cell A1 then the following formula will result in random codes that are each five characters in length:
=MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1)
But note that there is no guarantee that the codes will be unique.
As #ScottCraner pointed out... if you should happen to have Office 365, you can use this much shorter formula that takes advantage of two new functions only available in Excel 365:
=CONCAT(MID(A1,RANDARRAY(5,,1,34,TRUE),1))
But again, there is no guarantee that the resulting codes will be unique.
This formula will generate the codes in order
=SUBSTITUTE(SUBSTITUTE(BASE(K, 34,5),"1","Z"),"I","Y")
Here K can be 0, 1, 2, .... One way to generate the first ~1,048,576 K's is to use ROW()-1. You could get higher values of K by using something like K = 1048576*(COLUMN()-1) + ROW()-1.
The formula works by
(a) calling BASE(K, 34, 5) to get a 5-char long base-34 representation of K
(b) substituting Z for 1 since 1 is not a valid char
(c) substituting Y for I since I is not a valid char

Trying to increment a 4 character alphanumeric code in Excel

I'm trying to create a CSV file of one of my customer's serial numbers. We print them as barcodes for them to use, and normally I'd use our barcode software to generate the numbers. However, we're using a different method of printing, and it requires a CSV/Excel file of all the numbers to be printed. The barcode is as follows:
MC100VGVA.
The last digit is a check digit created from the rest of the string.
Now, my problem comes with the "VGVA" bit. Column A is the prefix (MC), Column B is the number (100), Column C is the incrementing 4 characters (VGVA), and Column D is the check digit.
I need for the VGVA bit to increment alphanumerically. So, when it gets to VGVZ, I need it to go to VGW0. Then when it gets to VGZZ, it needs to go to VH00 and so on until they reach ZZZZ, in which the next digit would increase Column B to 101, and Column C would become 0000.
I've attempted to use the CHAR formula, as well as CONCATENATE, and MID. But, because I'm not well versed in these formulas, my attempts at editing them to work with 4 digits have been failing me.
I'm not opposed to using VBA if needed, but it's not something I've ever worked with, so you'll have to forgive any ignorance on my part.
Please let me know if you need more information. Thanks!
It looks like you are trying to create a new base, the one based on 27 digits (0 and all letter from 'A' to 'Z'). So I'd advise you to create a conversion from and to 27-digit system.
Let me first explain you what I mean in octal numbering (8 digits, from 0 to 7): in that system we start from (just some examples):
a=0011
b=1237
c=1277
The meaning of those numbers is:
a equals 0*8^3 + 0*8^2 + 1*8^1 + 1*8^0 = 9, so:
a+1 equals 10, and converting this to octal numbering yields:
0012
b equals 1*8^3+2*8^2+3*8^1+7*8^0 = 671, so:
b+1 equals 672, and converting this to octal numbering yields:
1240
c equals 1*8^3 + 2*8^2 + 7*8^1 + 7*8^0 = 703, so:
c+1 equals 704, and converting this to octal numbering yields:
1300
I propose to do exactly the same for your 27-digit system, with following example:
VGZZ equals 22*27^3 + 7*27^2 + 26*27^1 + 26 = 438857
VGZZ+1 equals 438858, and converting this to 27-digit numbering yields:
VH00
You can do this, using a VBA function you need to develop yourself. The converting from the string to the normal number is obvious, and in the other way around, you use =MOD(...,27^3) and other similar functions.
I believe I've found a non-VBA answer to this question, thanks to someone on another forum.
Here's what they suggested and it seems to be working perfectly:
B2
=B1+(C2="0000")
C2
=RIGHT(BASE(DECIMAL(C1,36)+1,36,4),4)
and maybe try this at D1
=MID("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ-. $/+%",MOD(SUMPRODUCT(SEARCH(MID((A1&B1&C1),ROW($1:$99),1),
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ-. $/+%") )-99,43)+1,1)

PowerQuery M Conditional Column 'count' argument is out of range

I have a sheet with dates as MMDDYYY with no leading 0's if month number is single digit. For example, 1012018 or 12312018. Each record has a date, and each date is either 7 or 8 characters in length.
Here is the code I am using to convert the numbers to dates:
if Text.Length([ContractDate]) = 7
then
Text.Range([ContractDate],0,1)&"/"&Text.Range([ContractDate],1,2)&"/"&Text.Range([ContractDate],4,4)
else
Text.Range([ContractDate],0,2)&"/"&Text.Range([ContractDate],2,2)&"/"&Text.Range([ContractDate],4,4)
The code works fine for the "else" condition but I am getting error "Expression.Error: The 'count' argument is out of range. Details: 4" for all records where Text.Length() = 7. I verified this by adding a second column to get Length of ContractDate.
What am I missing?
EDIT: Problem Solved - I'm an idiot. I was getting an error because in the "then" condition, I am extracting a substring of (4,4) from a value that only has Len=7. I can't get 4 characters out of a 7 character string when starting at index of 4.
I know you found the issue with your code, but worth pointing out some things that might be good to know.
Text.Range with no character count will pull in all characters past the start point (so Text.Range([ContractDate], 4) would work for both).
Text.Middle operates like Text.Range but will not cause an error if you select a range that expands past the size of the string. This can be useful if for some reason you were dealing with variable size strings where you need a specific number of characters up to a limit past a certain position.
You could also use Text.PadStart([ContractDate], 8, "0") to pad the 7 length strings with a 0 at the start, and avoid the need for a conditional check all together.

Resources