Comparing values with some alphanumerics - excel

I've looked through the forums but couldn't find any questions (with answers) that helped. Any guidance would be appreciated.
I'm working on an Excel/Access project that cross references error codes. The codes are twelve digits long, with the first half and second half that need to be sortable. 99% of these codes are entirely numeric, but the 1% that includes letters is really screwing me up.
For example, a common error code might be "386748000123". This would be split into "386748" and "000123", with the first being the code for the type of system and the second being the type of error.
But then the 1% are something like this: "0957AB003A41". "0957AB", and "003A41".
If I format the columns (in Excel and Access) as numbers than the numeric comparisons are far easier, "000123" equals "123". If I format the column as strings than I can compare the alphanumeric values but then "000123" and "123" stop crossing.
The possible solution I've come across is utilizing the Val function inside an Access query to purely compare values but I've never used it and it seems like only a partial fix. Val ignores the strings, which means "0957AB" will have the same value as "0957XY" - and that doesn't work for this project.
I'm sure many of you have had similar issues, so I'm hoping to get some ideas on different ways the problem has been approached and resolved.

You have not provided a minimal sample of the data and also the output, also there is no code that I can amend it for you, but the only part that you are having problem is comparing the alphanumeric ones, you should format all of your data as strings and then compare. to make 123 be equal to "000123" you need to just format the numeric ones as string as below:
format(123,"000000")
which will give you "000123"
Edit
from you comment I learned that the problem is the key that is always or often a number, format will return the proper string for comparison, if it is already a 6-character string it will return itself so there would not be a problem:
do something like this:
if format(key,"0000000")=format(code,"000000") then
'do something
end if

Related

How do I repace any combination of "O"s with zeros in long list of part numbers so that I can create a synonym list?

We have a list of part numbers that have various alphanumeric combinations. There are no "O"s contained in these part numbers whatsoever, only zeros. However, a customer will sometimes see a part number and assume that the zeros are Os and enter them as such in our search field. THis returns no result.
To remedy this, we have a list of "synonyms" set up to attempt to catch such errors (AJO-9000 = AJ0-9000). However some combinations of zeros and Os have slipped through the cracks.
Here is an example. The part number "00h-1038-k0" can be mistakenly written as:
o0h-1038-k0
0oh-1038-k0
ooh-1038-k0
o0h-1o38-k0
o0h-1038-ko
o0h-1o38-ko
0oh-1o38-k0
0oh-1o38-ko
00h-1o38-ko
00h-1o38-ko
00h-1o38-k0
ooh-1o38-k0
ooh-1o38-ko
ooh-1038-ko
ooh-1o38-ko
We've attempted a to catch the obvious ones manually, but I'm sure there's a formula of some kind that can generate all combinations of a part number that contains zeros to multiple version with "O"s automatically into a spreadsheet that we can then upload to our search as synonyms of those numbers. Any ideas on how I can do this?
I tried a find/replace in excel (all 0s to Os), but combinations of random 0s distributed throughout the part numbers which have no pattern or limit to their length makes this very tricky.
REVISED
Screenshot(s)/here refer:
Throw this function in:
=LET(hh_,SEQUENCE(1,LEN(B2),1,1),kk_,MID(B2,hh_,1),tt_,IFERROR(IF(1*kk_=0,SEQUENCE(1,LEN(B2)),""),""),a_,SEQUENCE(LEN(B2)),b_,MID(B2,a_,1),c_,TRANSPOSE(b_),n_,SUM(IFERROR(1*(1*b_=0),)),o_,DEC2BIN(SEQUENCE(2^n_,1,0,1)),p_,REPT("0",n_-LEN(o_))&o_,aaa_,MID(p_,SEQUENCE(1,n_,1,1),1),bbb1_,MID(B2,aaa_,1),ccc1_,SUBSTITUTE(aaa_,1,"O"),aa_,SEQUENCE(1,LEN(B2),1,1),xx_,MID(B2,aa_,1),ii_,IFERROR(INDEX(ccc1_,SEQUENCE(2^5,1,1,1),MATCH(tt_,FILTER(tt_,--(tt_<>"")),0)),xx_),IF(MOD(hh_,LEN(B2))=0,ii_&",",ii_))
This returns an array which cannot be handled directly by textjoin / filterxml in the same function due to limitations with these and compounding complexity of the function.
For convenience I include a comma in the final column - you may wish to remove, unless you wish to join results together in which case this might be useful - e.g. using textjoin etc....
=TEXTJOIN("",1,C3#)
Still feel VB would be best; power queries/'unpivoting' method is the only other method I'm aware of. Albeit your Q unique in that one also has to contend with placement in relation to non-zero/'fixed' characters.
As a bonus feature (like waiting until 8th min after soundtrack finishes for a special feature song ☺) - above linked file has part of the working VB solution in Module 1 - but its incomplete so didn't publish here.. :)

Counting number of occurences of a specific search string

I'm building a monitoring system that takes a log (where people register their work in a set format) and returns a counter, which I can use for analysis. The monitor and log are two separate workbooks. The log has entries like this: INITALS;DATE;HOUR:RESULT|
Each cell can contain multiple entries.
My first attempt was to do a simple countif and look for a string (note that I use ; instead of , in formulas since I work on a Dutch excel):
=COUNTIF('LOCATION'!Table[LOG];"*NB;??/??/????;??:??:#A*|*")
This worked fine, but the formula only counted the number of cells where this string was present, not the actual number of occurences. I then tried this solution.
=SUM(LEN('LOCATION'!Tabel13[LOG])-LEN(SUBSTITUTE('LOCATION'!Tabel13[LOG];"NB";"")))
This indeed counted the number of times "NB" was present in the LOG. However, when I tried to use the original search string, this solution stopped working:
=SUM(LEN('LOCATION'!Tabel13[LOG])-LEN(SUBSTITUTE('LOCATION'!Tabel13[LOG];"*NB;??/??/????;??:??:#A*|*";"")))
It seems to me that SUM does not recognize symbols like ? or * which are necessary to define the correct search string. Where did I go wrong? Or can this be solved in another way? I can still look into VBA, but the workbooks are slow as hell already.
"?" and "*" are wildcards. Some functions support these (like COUNTIFS()) where others don't. Like you found out, SUBSTITUTE() does not.
Here is one way to count, assuming ms365:
Formula in C1:
=REDUCE(0,A1:A2,LAMBDA(a,b,a+LET(X,SEQUENCE(LEN(b)),SUM(--(IFERROR(SEARCH("NB;??/??/????;??:??:#A*|*",b,X),0)=X)))))
Note: I removed the asterisk in front of "NB" just to make searching for a position valid in comparison to what i called variable "X".

How can I check that a string only contains a defined set of substrings by Excel formula?

I have a dictionary containing lots of words - I want the user to be able to input a list of substrings, and then a filtered list will be updated, containing only words that contain those substrings and nothing else. Any words that contain extra characters the user didn't specify, should not appear. Cell F3 will use a FILTER function to create the list. As in the mock-up below:
What I need is a formula that would generate the TRUE or FALSE flags from the yellow section (B3:B9), but I'm not sure how to go about this.
I'm sure this could be solved by VBA or Regex using Google Sheets, but I want to know if there's a way to do this by formula, as I don't want this to require a button press or script execution, and my spreadsheet can't be hosted on Google sheets due to its size. Any ideas?
You can also use a combination of ISNUMBER and SUMPRODUCT:
=ISNUMBER(SUMPRODUCT(MATCH(MID(A3,ROW(INDEX(A:A,1,1):INDEX(A:A,LEN(A3),1)),1),$D$3:$D$5,0)))
Adjusted formula:
=ISNUMBER(SUMPRODUCT(MATCH(MID(A3,ROW(A$1:INDEX(A:A,LEN(A3))),1),$D$3:$D$5,0)))
The result:
The test being ran below is subtracting each instance of your dictionary from the length of original string. If the result is 0, this returns TRUE. If not, this returns FALSE. This is not case sensitive - a & A will be treated equally here.
=NOT(LEN(A1)-(LEN(A1)-LEN(SUBSTITUTE(UPPER(A1),D1,"")))-(LEN(A1)-LEN(SUBSTITUTE(UPPER(A1),D2,"")))-(LEN(A1)-LEN(SUBSTITUTE(UPPER(A1),D3,""))))
The equation works fine although I don't know if it is an optimal solution for you, but posting as answer in case it is for somebody else. The issue with this approach is the equation gets longer and longer for each character you add to your dictionary. Depending on the size of dictionary and strings to test against, this can get sloppy and calc heavy really quick.
Have you considered a UDF in VBA?

SAS: Match single word within string values of a single variable then replace entire string value with a blank

I'm working in SAS 9.2, in an existing dataset. I need a simple way to match a single word within string values of a single variable, and then replace entire string value with a blank. I don't have experience with SQL, macros, etc. and I'm hoping for a way to do this (even if the code is less efficient" that will be clear to a novice.
Specifically, I need to remove the entire string containing the word "growth" in a variable "pathogen." Sample values include "No growth during two days", "no growth," "growth did not occur," etc. I cannot enter all possible strings since I don't yet know how they will vary (we have only entered a few observations so far).
TRANSWD and TRANSLATE will not work as they will not allow me to replace an entire phrase when the target word is only a part of the string.
Other methods I've looked at (for example, a SESUG paper using PRX at http://analytics.ncsu.edu/sesug/2007/CC06.pdf) appear to remove all instances of the target string in every variable in the dataset, instead of just in the variable of interest.
Obviously I could subset the dataset to a single variable before I perform one of these actions and then merge back, but I'm hoping for something less complicated. Although I will certainly give something more complicated a shot if someone can provide me with sample code to adapt (and it would be greatly appreciated).
Thanks in advance--Kim
Could you be a little more clear on who the data set is constructed? I think mjsqu's solution will work if your variable pathogen is stored sentence by sentence. If not then I would say your best bet is to parse the blocks into sentences and then apply mjsqu's solution.
DATA dataset1;
format Ref best1.
pathogen $40.;
input Ref pathogen $40. ;
datalines;
1 No growth during two days
2 no growth,
3 growth did not occur,
4 does not have the word
;
RUN;
DATA dataout;
SET dataset1;
IF index(lowcase(pathogen),"growth") THEN pathogen="";
RUN;

Number representation by Excel

I'm building a VBA program on Excel 2007 inputing long string of numbers (UPC). Now, the program usually works fine, but sometimes the number string seems to be converted to scientific notation and I want to avoid this, since I then VLook them up.
So, I'd like to treat a textbox input as an exact string. No scientific notation, no number interpretation.
On a related side, this one really gets weird. I have two exact UPC : both yield the same value (as far as I or any text editor can tell), yet one of the value gives a successful Vlookup, the other does not.
Anybody has suggestions on this one? Thanks for your time.
Long strings that look like numbers can be a pain in Excel. If you're not doing any math on the "number", it should really be treated as text. As you've discovered, when you want to force Excel to treat something as a string, precede it with an apostrophe.
There are a couple of common problems with VLOOKUP. The one you found, extra whitespace, can be avoided by using a formula such as
=VLOOKUP(TRIM(A1),B1:C:100,2,FALSE)
The TRIM function will remove those extraneous spaces. The other common problem with VLOOKUP is that one argument is a string and the other is a number. I run into this one a lot with imported data. You can use the TEXT function to do the VLOOKUP without having to change the raw data
=VLOOKUP(TEXT(A1,"00000"),B1:C100,2,FALSE)
will convert A1 to a five digit string before it tries to look it up in column B. And, of course, if your data is a real mess, you may need
=VLOOKUP(TEXT(TRIM(A1),"00000"),B1:C100,2,FALSE)

Resources