Regex: VBA : Evaluation of Expression with Conditions - excel

I am trying to extract a cell number from the formula expression (in vba) which I need to replace by another cell number. eg: I have the following formulae in different cells "=AL82+L8+L82", "=L8+L82" and "=AL82+L8" . I have to change "L8" in each of the formulae to "L9". I am new to Regex and was trying the following expression in regex pattern:
"(?=[^A-Z])([L8])(?=[^0-9])"
However only 8 is changed to L9. Please assist me with the error.
Thanks

You can capture either plus or an equals sign in a capturing group.
Then Match L8 and assert using a negative lookahead, that the 8 is not directly followed by a digit.
In the replacement use group 1 followed by L9: $1L9
([+=])L8(?!\d)
See a regex demo

Related

I need to extract the date from the middle of file name between two of the same characters in excel using a formula

I am trying to pull the date from the middle of a text string. Date is between underscores and I am not sure how to grab it. this is what my code looks like right now
=MID(K2601,FIND("_",K2601,1)+1,FIND("_",K2601,14)-FIND("_",K2601,1)-1)
but it is only pulling the well name which is not what I want.
The formula would be:
=MID(B1,SEARCH("_????-??-??_",B1,1)+1,10)
What to search: B1
Find the start using pattern.
Included beginning and ending underlines to delimit. Returns the character position so add 1 to allow for beginning _.
Use Mid to extract the date starting at value returned by Search + 1 for 10 characters.
Sample run:

Regex: table line matcher

I want to parse a table line using regex.
Input
|---|---|---|
|---|---|---|
So far I've come up with this regex:
/^(?<indent>\s*)\|(?<cell>-+|)/g
Regex101 Link: https://regex101.com/r/wzMYxd/1
But this regex is incomplete.
This only finds the first cell --|, but I want to find all the following cells as different ----|.
Question: Can we catch the following cells with the same pattern using the regex?
ExpectedOutput: groups with array of matched cells: ["---|", "----|", "---|"]
Note: no constant number of - is required
How about first verifying, if the line matches the pattern:
^[ \t]*\|(?:-+\|)+$
See this demo at regex101 - If it matches, extract the stuff:
^(?<indent>[\t ]*)\||(?<cell>-+)\|
Another demo at regex101 (explanation on the right side)
With just one regex maybe by use of sticky flag y and a lookahead for validation:
/^(?<indent>[ \t]*)\|(?=(?:-+\|)+$)|(?!^)(?<cell>-+)\|/gy
One more demo at regex101
The lookahead checks once after the first | if the rest of the string matches the pattern. If this first match fails, due to the y flag (matches are "glued" to each other) the rest of the pattern fails too.

Replace all non-alphanumeric characters, including wildcards

I take this beautiful formula from JvdV answer:
=TRIM(CONCAT(IF(ISNUMBER(SEARCH(MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1),"-./ 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ")),MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)," ")))
This formula replace any non-alphanumeric character (&^%]#$) with simple space " ".
I put in formula some exception (-./ ), but this is not all exceptions.
How about wildcards? How to filter wildcards (~*?) with this formula?
I think: Ok, I will use FIND instead of SEARCH and all will be right, just put lowercase and uppercase alphabet in the FIND index, like this: *"-./ 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"*
Then I think: But, what if I want to keep not only numeric and regular alphabet? What if I want to keep all diacritics, like this: "ÁÀȦÄǍĀÃÅĄȺẤẦẮẰǠǺǞẪẴẢȀȂẨẲẠḀẬẶĂÂḂɃƁḄḆĆĊĈČÇȻḈƇƆḊĎḐĐƊḌḒḎÐƉÉÈĖÊËĚĔĒẼĘȨɆẾỀḖḔỄḜẺȄȆỂẸḘḚỆÉÈÊËḞƑǴĠĜǦĞḠĢǤƓḢĤḦȞḨĦḤḪⱧÍÌİÏǏĬĪĨĮƗḮỈȈȊỊḬÍÌÏÎȷĴǰḰǨĶƘᶄḲḴⱩꝀꝂꝄĹĿĽⱢⱠĻȽŁḶḼḺḸꝈḾṀṂŃǸṄŇÑŅƝṆṊṈÑŊÓÒȮÔÖǑŎŌÕǪŐỐỒƟØṒṐṌȪỖṎǾȬǬỎȌȎƠỔỌỚỜỠỘỞỢÓÒÔÖÕØṔṖⱣƤƦŔṘŘŖɌⱤȐȒṚṞṜŚṠŜŠṤṦṢṨŞṪŤƬṬƮṰṮȾŢŦÚÙÛÜǓŬŪŨŮŲŰɄǗǛṸṺỦȔȖƯỤṲỨỪṶṴỮỬỰÚÙÛÜṼṾẂẀẆŴẄẈẊẌÝỲẎŶŸȲỸɎỶƳỴÝŹŻẐŽƵẒẔ"
Then lowercase and uppercase alphabet is too much for FIND index.
Ok, for SEARCH index is also too much, because function accept max. 255 length, but lets say we have only 200 characters in index (numbers, alphabet and some diacritics)
So, the question is available:
How to filter (replace with space) wildcards (~*?) with this kind of formula?
As I read this question there are a few problems:
How to include over 255 characters in the 2nd parameter of SEARCH();
How to exclude literal wildcard characters in the 2nd parameter of SEARCH();
One way around the length limit is to feed SEARCH() an array of options, in this case an array of two elements of a lenght of <255:
Formula in C1:
=TRIM(CONCAT(IF(MMULT(IFERROR(SEARCH("~"&MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1),{"ÁÀȦÄǍĀÃÅĄȺẤẦẮẰǠǺǞẪẴẢȀȂẨẲẠḀẬẶĂÂḂɃƁḄḆĆĊĈČÇȻḈƇƆḊĎḐĐƊḌḒḎÐƉÉÈĖÊËĚĔĒẼĘȨɆẾỀḖḔỄḜẺȄȆỂẸḘḚỆÉÈÊËḞƑǴĠĜǦĞḠĢǤƓḢĤḦȞḨĦḤḪⱧÍÌİÏǏĬĪĨĮƗḮỈȈȊỊḬÍÌÏÎȷĴǰḰǨĶƘᶄḲḴⱩꝀꝂꝄĹĿĽⱢⱠĻȽŁḶḼḺḸꝈḾṀṂŃǸṄŇÑŅƝṆṊṈÑŊÓÒȮÔÖǑŎŌÕǪŐỐỒƟØṒṐṌȪỖṎǾȬǬỎȌȎƠỔỌỚỜỠỘỞỢÓÒÔÖÕØṔṖⱣƤƦŔṘŘŖɌⱤ";"ȐȒṚṞṜŚṠŜŠṤṦṢṨŞṪŤƬṬƮṰṮȾŢŦÚÙÛÜǓŬŪŨŮŲŰɄǗǛṸṺỦȔȖƯỤṲỨỪṶṴỮỬỰÚÙÛÜṼṾẂẀẆŴẄẈẊẌÝỲẎŶŸȲỸɎỶƳỴÝŹŻẐŽƵẒẔ-./*? 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"}),0),{1,1}),MID(A1,ROW(A$1:INDEX(A:A,LEN(A1))),1)," ")))
What we did here is:
Use an horizontal array {abc;xyz} to check against our characters which was an vertical array {a,b,c}. Note the difference between semi-column and comma.
The result will be a 2D-array which MMULT() can sum. Meaning if the character was found in any of the two elements of the array it will return that same character. Otherwise, a space.
The special wildcard characters are now also included with an extra tilde to escape them as with actually all characters.
If Excel doesn't recognize all lowercase diacritics as their uppercase counterparts, just add them to one of the two elements. If need be, add a 3rd. But know that you'd need to extend on the 2nd parameter in MMULT() too then.
To visualize the above:
Remember, you are using Excel 2019 which means you need to CSE-enter this formula. Needles to say that all will be much easier in ms365 using its dynamic array functionality.

How can I accomplish this regex?

I want a regex that matches "cell" values from A-J rows to 1-10 columns.
For example it should match A10, A1, E9
It should not match A100, A30, P7, A01
By the time being, I came up with this regex:
(?:[ABCDEFGHIJabcdefghij][123456789](?![123456789]))(?<=1)(0)?
The only case where it fails is when you give it a A100 cell, it matches the first two characters when in reality it should not return a match.
EDIT:
Playing around a little bit, I wrote:
(?<!\S)[ABCDEFGHIJabcdefghij]123456789((?<=1)(0))?(?!\S)
Which seems to work for even most cases. I´m still open to suggestions on how to improve it / write it more elegantly.
You can shorten the pattern using a ranges. Then you could match either 10 or 1-9 using an alternation instead of using (?<=1)(0)? to match 10.
To prevent the partial match, you can use word boundaries.
\b[A-Ja-j](?:10|[1-9])\b
\b A word boundary
[A-Ja-j] Match either chars in a range from A-J or a-j
(?:10|[1-9]) Match either 10 or a single digit 1-9
\b A word boundary
Regex demo
With whitespace boundaries on the left and right:
(?<!\S)[A-Ja-j](?:10|[1-9])(?!\S)

retrieve part of the info in a cell in EXCEL

I vaguely remember that it is possible to parse the data in a cell and keep only part of the data after setting up certain conditions. But I can't remember what exact commands to use. Any help/suggestion?
For example, A1 contains the following info
0/1:47,45:92:99:1319,0,1320
Is there a way to pick up, say, 0/1 or 1319,0,1320 and remove the rest unchosen data?
I know I can do text-to-column and set the delimiter, followed by manually removing the "un-needed" data, but my EXCEL spreadsheet contains 100 columns X 500000 rows with each cell looking similar to the data above, so I am afraid EXCEL may crash before finishing the work. (have been trying with LEFT, LEN, RIGHT, MID, but none seems to work the way I had hoped)
Any suggestion will be greatly appreciated.
I think what you are looking for is combination of find and mid, but you'll have to work out exactly how you want to split your string:
A1 = 0/1:47,45:92:99:1319,0,1320 //your number
B1 = Find(“:“,A1) //location of first ":" symbol
C1 = LEN(A1) - B1 //character count to copy ( possibly requires +1 or -1 after B1.
=Left(A1,B1) //left of your symbol
=Mid(A1,B1+1,C1) //right size from your symbol (you can also replace C1 with better defined number to extract only 1 portion
//You can also nest the statements to save space, but usually at cost of processing quantity increase
This is the concept, you will probably need to do it in multiple cells to split a string as long as yours. For multiple splits you probably want to replicate this command to target the result of previous right/mid command.
That way, you will get cell result sequence like:
0/1:47,45:92:99:1319,0,1320; 47,45:92:99:1319,0,1320; 92:99:1319,0,1320; 99:1319,0,1320......
From each of those you can retrieve left side of the string up to ":" to get each portion of a string.
If you are working with a large table you probably want to look into VB scripting. To my knowledge there is no single excel command that can take 1 cell and split it into multiple ones.
Let me try to help you about this, I am not a professional so you may face some problems. First of all my solution contains 2 columns to be added to the source column as you can see below. However you can improve formulas with this principle.
Column B Formula:
=LEFT(A2,FIND(":",A2,1)-1)
Column C Formula:
=RIGHT(A2,LEN(A2)-FIND("|",SUBSTITUTE(A2,":","|",LEN(A2)-LEN(SUBSTITUTE(A2,":","")))))
Given you statement of having 100x columns I imagine in some instances you are needing to isolate characters in the middle of your string, thus Left and Right may not always work. However, where possible use them where you can.
Assuming your string is in cell F2: 0/1:47,45:92:99:1319,0,1320
=LEFT(F2,3)
This returns 0/1 which are the first 3 characters in the string counting from the left. Likewise, Right functions similarly:
=RIGHT(F2,4)
This returns 1320, returning the 4 characters starting from the right.
You can use a combination of Mid and Find to dynamically find characters or strings based off of defined characters. Here are a few examples of ways to dynamically isloate values in your string. Keep in mind the key to these examples is the nested Find formula, where the inner most Find is the first character to start at in the string.
1) Return 2 characters after the second : character
In cell F2 I need to isolate the "92":
=MID(F2,FIND(":",F2,FIND(":",F2)+1)+1,2)
The inner most Find locates the first : in the string (4 characters in). We add the +1 to move to the 5th character (moving beyond the first : so the second Find will not see it) and move to the next Find which starts looking for : again from that character. This second Find returns 10, as the second : is the 10th character in the string. The Mid formula takes over here. The formula is saying, Starting at the 10th character return the following 2 characters. Returning two characters is dictated by the 2 at the end of the formula (the last part of the Mid formula).
2) In this case I need to find the 2 characters after the 3rd : in the string. In this case "99":
=MID(F2,FIND(":",F2,FIND(":",F2,FIND(":",F2)+1)+1)+1,2)
You can see we have simply added one more nested Find to the formula in example 1.

Resources