Say I need write code in VBA to match the string "revenues" in a different worksheet that may show a similar string in a column range though not perfectly matching (it may be "total revenues", "rev", "ABC revenues"). What the best strategy to approach the problem?
I'd probably write a VB function that matched the (lower case-cast) strings to a list of fragments, or detected fragments in cell value. While you could also try to use Regex, I doubt that you have consistent enough input to generate a rule that would pick out most examples. You'll probably need a countermeasure to check against false positives as well.
Good Luck!
Related
I have a dictionary containing lots of words - I want the user to be able to input a list of substrings, and then a filtered list will be updated, containing only words that contain those substrings and nothing else. Any words that contain extra characters the user didn't specify, should not appear. Cell F3 will use a FILTER function to create the list. As in the mock-up below:
What I need is a formula that would generate the TRUE or FALSE flags from the yellow section (B3:B9), but I'm not sure how to go about this.
I'm sure this could be solved by VBA or Regex using Google Sheets, but I want to know if there's a way to do this by formula, as I don't want this to require a button press or script execution, and my spreadsheet can't be hosted on Google sheets due to its size. Any ideas?
You can also use a combination of ISNUMBER and SUMPRODUCT:
=ISNUMBER(SUMPRODUCT(MATCH(MID(A3,ROW(INDEX(A:A,1,1):INDEX(A:A,LEN(A3),1)),1),$D$3:$D$5,0)))
Adjusted formula:
=ISNUMBER(SUMPRODUCT(MATCH(MID(A3,ROW(A$1:INDEX(A:A,LEN(A3))),1),$D$3:$D$5,0)))
The result:
The test being ran below is subtracting each instance of your dictionary from the length of original string. If the result is 0, this returns TRUE. If not, this returns FALSE. This is not case sensitive - a & A will be treated equally here.
=NOT(LEN(A1)-(LEN(A1)-LEN(SUBSTITUTE(UPPER(A1),D1,"")))-(LEN(A1)-LEN(SUBSTITUTE(UPPER(A1),D2,"")))-(LEN(A1)-LEN(SUBSTITUTE(UPPER(A1),D3,""))))
The equation works fine although I don't know if it is an optimal solution for you, but posting as answer in case it is for somebody else. The issue with this approach is the equation gets longer and longer for each character you add to your dictionary. Depending on the size of dictionary and strings to test against, this can get sloppy and calc heavy really quick.
Have you considered a UDF in VBA?
I'm trying to bring back one of two possible words (Local or National) from a text string, and if neither of these words are in the text string, then bring back the string in the whole cell
The issue I have is that I can bring back either word when they appear, but I get an error when they do not
I'm currently using
=IFERROR(IF(SEARCH("*local*",B2,1),"Local"),IF(SEARCH("*national*",B2,1),"National"))
However this obviously this doesn't bring back if the words not exist
I'm sure it's easy and I'm missing something, but I just cannot figure it out. any help would be great
Cheers all
You can use:
Formula in B1:
=IF(ISNUMBER(SEARCH("*local*",A1)),"Local",IF(ISNUMBER(SEARCH("*national*",A1)),"national",A1))
Drag down
Note:
Notice that your wildcards make it that even a string with 'international' in it will return 'national'. If this is not what you want, you should remove the wildcards.
You can also use INDEX/AGGREGATE:
=IFERROR(INDEX({"local","national"},AGGREGATE(15,7,ROW($1:$2)/(ISNUMBER(SEARCH({"local";"national"},A1))),1)),A1)
This will allow one to replace the both hard coded arrays with a range of cells that contain the outputs. If Local and National were in D1:D2 then you can use:
=IFERROR(INDEX($D:$D,AGGREGATE(15,7,ROW($D$1:$D$2)/(ISNUMBER(SEARCH($D$1:$D$2,A1))),1)),A1)
That way if the list gets bigger the formula does not.
I'd suggest regular expressions.
=IF(REGEXMATCH(A2,"(Local|National)"),REGEXEXTRACT(A2,"(Local|National)"),A2)
Currently following method described here: https://exceljet.net/formula/cell-contains-one-of-many-things with a few alterations (to compensate for variable number of substrings).
Code is:
=SUMPRODUCT(--ISNUMBER(SEARCH(OFFSET(Categories!A$1,0,0,COUNTA(Sheet2!A:A),1),[#String])))>0
What I'd like is instead of a "TRUE" or "FALSE" output, is to output the substring that matches. "first encountered" substring would be fine, or "all substrings separated by a comma" or anything like that.
Not really sure where to start, or even if it's possible with Excel formulas.
=LOOKUP(1,0/SEARCH(Substring_List,String),Substring_List)
is probably the most efficient, though you should know that, if more than one entry from Substring_List is found within String, this set-up will return that which occurs latest within that list.
Regards
I'm working in SAS 9.2, in an existing dataset. I need a simple way to match a single word within string values of a single variable, and then replace entire string value with a blank. I don't have experience with SQL, macros, etc. and I'm hoping for a way to do this (even if the code is less efficient" that will be clear to a novice.
Specifically, I need to remove the entire string containing the word "growth" in a variable "pathogen." Sample values include "No growth during two days", "no growth," "growth did not occur," etc. I cannot enter all possible strings since I don't yet know how they will vary (we have only entered a few observations so far).
TRANSWD and TRANSLATE will not work as they will not allow me to replace an entire phrase when the target word is only a part of the string.
Other methods I've looked at (for example, a SESUG paper using PRX at http://analytics.ncsu.edu/sesug/2007/CC06.pdf) appear to remove all instances of the target string in every variable in the dataset, instead of just in the variable of interest.
Obviously I could subset the dataset to a single variable before I perform one of these actions and then merge back, but I'm hoping for something less complicated. Although I will certainly give something more complicated a shot if someone can provide me with sample code to adapt (and it would be greatly appreciated).
Thanks in advance--Kim
Could you be a little more clear on who the data set is constructed? I think mjsqu's solution will work if your variable pathogen is stored sentence by sentence. If not then I would say your best bet is to parse the blocks into sentences and then apply mjsqu's solution.
DATA dataset1;
format Ref best1.
pathogen $40.;
input Ref pathogen $40. ;
datalines;
1 No growth during two days
2 no growth,
3 growth did not occur,
4 does not have the word
;
RUN;
DATA dataout;
SET dataset1;
IF index(lowcase(pathogen),"growth") THEN pathogen="";
RUN;
Ok, lets say that I have two cells in Excel. They each contain a number. I realize that to compare the values of the numbers in these two cells, I can use a simple =[cell1]=[cell2] function. And I also realize that if I want to find the negation of a certain boolean value, I can use the =not function.
My question is simple, is there a more efficient way of coding long boolean formulas? I know in Java I could do something along the lines of ((!(cell1)&&(!(cell2)))||cell3. But in Excel that simple expression turns into something along the lines of =or(and(not(cell1),(notcell2)),cell3). Personally I like the shorter, more compact style of the java code.
Is there a short way to write boolean statements like this in Excel? Or am I doomed to use Excels clunky functions for the simplest of comparisons?
Also, this is a hypothetical question. I am just trying to figure out how to reduce the size of some of my longer boolean expressions. I don't have a specific error, just a lot of frustratingly long formulas.
Well in that case
AND(Not(cell1),NOT(cell2))
Can be replaced by:
=NOT(OR(cell1;cell2))
And, as in most of the cases you can replace AND by * and OR by + all the expression can be written like this:
NOT(cell1+cell2)+cell3