M language - extract 5 digit numbers only from string - excel

I am trying to write a custom column in M to detect whether the field contains a 5 digit number, and then extract that 5 digit number into a new column. When employees incur an expense, they need to specify the job number if its for a job. If not, they usually type in text.
IF text.contains number like "#####" then have the new column have that 5 digit number, else null.
I am having incredible difficult on how to write in M. I tried doing this in

Assuming your data is in column [Column1] then try
Add column.. custom column ... with formula:
= try if Number.From([Column1])>0 and Text.Length(Text.From([Column1]))=5 then [Column1] else null otherwise null
that looks for Column1 being something that can evaluate to a number, and that has a length of 5 characters when viewed as a string, otherwise puts a null. This allows for leading zeroes like '01234 but does not attempt to account for mixed text/numbers such as A12345
if you need to remove alphas, try
= if Text.Length(Text.Select(Text.From([Column1]),{"0".."9"}))=5 then Text.Select(Text.From([Column1]),{"0".."9"}) else null

Given what you have written, that valid entries will be in the range of 10,000 to 99,999 and may or may not be preceded by a J, you can add a column to detect that.
//Ensure Column1 (or whatever it's real name is) is of type text and NOT any
#"Added Custom" = Table.AddColumn(#"Previous Step", "Job Number", each
let
x = Text.TrimStart(Text.Upper([Column1]),"J"),
n = try Number.From(x) otherwise 0
in
if n>=10000 and n<100000 then n else null)

Related

Power Query - Converting whole number to text in a CUSTOM COLUMN

I need to convert 3 whole number columns to text in a formula when adding a new column inside power query. I know how to do this in dax using FORMAT function but I can't make it work inside power query.
3 columns are - click to veiw
Then below is my CUSTOM COLUMN:
= Table.AddColumn(RefNo.3, "Refernce Number", each
if Text.Length([RefNo.3]) > 1 and Text.Length([RefNo.3]) < 11 then [RefNo.3]
else if Text.Length([RefNo.2]) > 1 and Text.Length([RefNo.2]) < 11 then [RefNo.2]
else if Text.Length([RefNo.1]) > 1 and Text.Length([RefNo.1]) < 11 then [RefNo.1]
else null)
However, at the moment I'm getting this error:
Expression.Error: We cannot convert a value of type Table to type Number.
Details:
Value=[Table]
Type=[Type]
So I know I need to convert the whole number columns to text first inside the formula. Also, I had to intentionally convert those 3 columns from text to whole number previously to get rid of redundant values (so that's not an option for me to revert that). thanks in advance guys.
There are any number of ways to solve this, depending on your real data.
Just set the columns to Type.Text before executing your AddColumn function.
If you do this, you would also have to check for null as they will cause the script, as you've written it, to fail
Or you could precede your testing with another line to replace the nulls with an empty string (""): Table.ReplaceValue(table_name,null,"",Replacer.ReplaceValue,{"RefNo", "RefNo2", "RefNo3"}),
If they are all positive integers, compare the values rather than the string lengths: eg >=0 and <10000000000
Construct a numeric array, and return the last value that passes the filter
= Table.AddColumn(your_table_name, "Reference Number",
each List.Accumulate(List.Reverse(List.RemoveNulls({[RefNo],[RefNo2],[RefNo3]})),
null,(state,current)=> if state = null then
let
x = Text.Length(Text.From(current))
in
if x > 1 and x < 11 then current else state
else state))

Replacing multiple string values in SAS

I have a dataset that I'm trying to clean up. One variable is gender where I have 'F','Female,'M','Male' and 'Unknown' as values. I want to change all the iterations of 'F' to show as 'Female' and all the 'M' values to show as 'Male'. I also have another variable called 'Ethnicity' which has values such as '1 - White' but I want it to show as 'White'.
I have tried to use tranwrd
gender=tranwrd(gender, "F", "Female");
But this replaces the 'Female' values as well to 'Femaleemale'
I have also attempted index:
IF index(lowcase(gender),"f") THEN gender="Female";
IF index(lowcase(gender),"m") THEN gender="male";
But the multiple If statements don't work.
As you discovered TRANWRD is the wrong function for the value transformation task at hand. Neither is INDEX because the true value in SAS is the state of non-zero and non-missing -- INDEX(source, excerpt) result will be a logical true for the case of finding the excerpt anywhere in source.
For specific value transformations use a direct literal value for comparison. For testing a specific single character you can do the lowercase as you show, or use an IN list.
if gender in ('M', 'm') then gender = 'Male'; else
if gender in ('F', 'f') then gender = 'Female';
For the case of extracting ethnicity from a value construct # - ethnicity you can , per #draycut, use the COMPRESS function with the keep alphabetic characters only option (ka).
Another way to transform patterned values is to use regular expression search and replace.
* replace leading # - before embedded ethnicity with no string (//);
ethnicity = prxchange ('/^\d+\s*-\s*//',1,ethnicity);
See if you can use this as a template
data have;
input gender $ 1-7 Ethnicity $ 9-18;
datalines;
F 1 - White
Female White
Male 2 - Black
Unknown Black
m 1 - White
f 1 - White
;
data want;
set have;
if upcase(char(gender, 1)) = "M" then gender = "Male";
else if upcase(char(gender, 1)) = "F" then gender = "Female";
else gender = "Unknown";
Ethnicity = compress(Ethnicity, , 'ka');
run;

how to vlookup if prefix found in the list?

HI.
how can i come up with return value of "company name" (column H) at Column B IF any of the "PrefiX" (Column G) found at "con no" (Column A).
Sample of outcome needed as in column B.
Sample:
620011113 = DD
CN1234 = BB
thanks
=INDEX($H:$H,AGGREGATE(15,6,ROW($G$1:$G$7)/(--(FIND($G$1:$G$7,$A2)=1)*--(LEN($G$1:$G$7)>0)),1),1)
Breaking this down, the INDEX retrieves the Nth item from Column H (Company name). To find the value of N, we are using the AGGREGATE function
AGGREGATE is a weird function - it lets us use things like MAX or LARGE or SUM while ignoring any error values. In this case, we will be using it for SMALL (first argument, 15), while Ignoring Error Values (second argument, 6). We will want the very smallest value, so the fourth argument will be 1. (If we wanted the second smallest, it would be 2, and so on)
=INDEX($H:$H,AGGREGATE(15,6, <SOMETHING> ,1),1)
So, all we need now is a list of values to compare! To make things slightly simpler, I'll break that bit of the code out for you here:
ROW($G$1:$G$7) / (--(FIND($G$1:$G$7,$A2)=1) * --(LEN($G$1:$G$7)>0))
There are 3 parts to this. The first, ROW($G$1:$G$7)is the actual value we want to retrieve - these will be the Row Numbers for each Prefix that matches your value. On its own, however, it will be all the row numbers. Since we are skipping errors, we want any Rows that don't match the prefix to throw an error. The easiest way to do this is to Divide by Zero
At the start of --(FIND($G$1:$G$7,$A2)=1) and --(LEN($G$1:$G$7)>0) we have a double-negative. This is a quick way to convert True and False to 1 and 0. Only when both tests are True will we not divide by 0, as this table shows:
A | B | A*B
1 | 1 | 1
1 | 0 | 0
0 | 1 | 0
0 | 0 | 0
Starting with the second test first (it's easier), we have LEN($G$1:$G$7)>0 - basically "don't look at blank cells".
The other test (FIND($G$1:$G$7,$A2)=1) will search for the Prefix in the Con No, and return where it is found (or a #VALUE! error if it isn't). We then check "is this at position 1" - in other words, "Is this at the start of the Con No, rather than in the middle". We don't want to say Con No CNQ6060 is part of Company AA instead of Company BB by mistake!
So, if the Prefix is at the Start of the Con No, AND it isn't Blank (because there is an infinite amount of Nothing Before, After, and Between every number and letter), then we get it added to our list of Rows. We then take the smallest row (i.e. closest to the top - change AGGREGATE(15 to AGGREGATE(14 if you want the closest to the bottom!), and use that to get the Company Name
You could try the below formal:
=VLOOKUP(IF(LEFT(A3,1)="6",LEFT(A3,4),IF(LEFT(A3,1)="C",LEFT(A3,2),IF(LEFT(A3,1)="E",LEFT(A3,7)))),$G$3:$H$7,2,0)
Have in mind that you have to use ' before the cell value of column A & G in order to convert cell value into text get the correct out comes using VLOOKUP
Result:

PowerQuery (M): How can I extract a date from a large text field?

My table has a text column called Remarks which usually contains a large amount of text.
Here's an example:
3/24/2017 11:14:41 AM - EMD FOR STATUS NFU 3/30/17
3/30/2017 10:58:03 AM - CLD PER RECEPTIONIST GM UNAVAILABLE NFU 04-13-2017
4/13/2017 11:10:15 AM - CLD PER RECEPTIONIST WILL GIVE INFO NFU4/27
4/27/2017 9:02:20 AM - MLD INV WITH 90 DAY STAMP
4/27/2017 9:15:03 AM - PER REP WILL CALL CUSTOMER FOR PAYMENT
4/27/2017 11:03:46 AM - NFU 05/5PER REP CUSTOMER CONFUSION
5/5/2017 8:55:17 AM - NFU 5/9/2017 CRP PER REP CHECK WAS MLD 5/2/17
All of that text would be crammed into a single field, and I need to extract the last NFU date from the field for use in calculations and filtering.
In the above example, I would want to extract the date 5/9/2017 from the last row.
But as you can see, the date could be in any format, anywhere in the field.
I presume Excel can parse the text into a date value in any of the above formats (if not, I'll deal with that some other way - employee training, etc.)
The main things I need to figure out how to do using PowerQuery are:
Find the last instance of "NFU" in this field
Extract all text immediately following that last instance of "NFU", including the space between "NFU" and the date, if present.
At this point, the result should be:
" 5/9/2017 CRP PER REP CHECK WAS MLD 5/2/17"
Remove any whitepsace at the beginning of the string.
At this point, the result should be:
"5/9/2017 CRP PER REP CHECK WAS MLD 5/2/17"
Find the first character that is not 0-9, /, or - (or the end of the string, whichever comes first)
Truncate the string at the first non-date character, if appropriate.
At this point, the result should be:
"5/9/2017"
Finally, attempt to format the resulting text into Date type/format, and return as the result for a PowerQuery custom column.
Looking at the PowerQuery string functions available, I'm not sure whether this is even possible.
I guess you mean the Power Query Text functions. These are somewhat limited indeed, but there are plenty other options in Power Query's function library: in this case the List functions can come to the rescue.
By the way: I checked for " NFU" in order to avoid "CONFUSION" (last but one line in your examples).
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Typed = Table.TransformColumnTypes(Source,{{"example", type text}}),
LastNFU = Table.AddColumn(Typed, "LastNFU", each Text.PositionOf([example]," NFU",Occurrence.Last), Int64.Type),
AfterNFU = Table.AddColumn(LastNFU, "AfterNFU", each if [LastNFU] = -1 then null else Text.Range([example],[LastNFU]+4)),
Trimmed = Table.TransformColumns(AfterNFU,{{"AfterNFU", Text.Trim}}),
TextToList = Table.TransformColumns(Trimmed,{{"AfterNFU", each if _ = null then {} else Text.ToList(_)}}),
ListFirstN = Table.TransformColumns(TextToList,{{"AfterNFU", each List.FirstN(_, each Text.Contains("01234567890-/",_))}}),
TextCombine = Table.TransformColumns(ListFirstN, {"AfterNFU", Text.Combine, type text}),
Date = Table.TransformColumnTypes(TextCombine,{{"AfterNFU", type date}}, "en-US"),
Renamed = Table.RenameColumns(Date,{{"AfterNFU", "Date"}}),
Removed = Table.RemoveColumns(Renamed,{"LastNFU"})
in
Removed
A simple formula like =RIGHT(A1,LEN(A1)-(FIND("NFU",A1,1)-1)) would work to extract the string next to NFU. Assuming the text is at cell A1.
But needs to further drill down to get your other requirements.

Excel 2007 - Generate unique ID based on text?

I have a sheet with a list of names in Column B and an ID column in A. I was wondering if there is some kind of formula that can take the value in column B of that row and generate some kind of ID based on the text? Each name is also unique and is never repeated in any way.
It would be best if I didn't have to use VBA really. But if I have to, so be it.
Solution Without VBA.
Logic based on First 8 characters + number of character in a cell.
= CODE(cell) which returns Code number for first letter
= CODE(MID(cell,2,1)) returns Code number for second letter
= IFERROR(CODE(MID(cell,9,1)) If 9th character does not exist then return 0
= LEN(cell) number of character in a cell
Concatenating firs 8 codes + adding length of character on the end
If 8 character is not enough, then replicate additional codes for next characters in a string.
Final function:
=CODE(B2)&IFERROR(CODE(MID(B2,2,1)),0)&IFERROR(CODE(MID(B2,3,1)),0)&IFERROR(CODE(MID(B2,4,1)),0)&IFERROR(CODE(MID(B2,5,1)),0)&IFERROR(CODE(MID(B2,6,1)),0)&IFERROR(CODE(MID(B2,7,1)),0)&IFERROR(CODE(MID(B2,8,1)),0)&LEN(B2)
Sorry, I didn't found a solution with formula only even if this thread might help (trying to calculate the points in a scrabble game) but I didn't find a way to be sure the generated hash would be unique.
Yet, here is my solution, based on a UDF (Used-Defined Function):
Put the code in a module:
Public Function genId(ByVal sName As String) As Long
'Function to create a unique hash by summing the ascii value of each character of a given string
Dim sLetter As String
Dim i As Integer
For i = 1 To Len(sName)
genId = Asc(Mid(sName, i, 1)) * i + genId
Next i
End Function
And call it in your worksheet like a formula:
=genId(A1)
[EDIT] Added the * i to take into account the order. It works on my unit tests
May be OTT for your needs, but you can use a call to CoCreateGuid to get a real GUID
Private Declare Function CoCreateGuid Lib "ole32" (ID As Any) As Long
Function GUID() As String
Dim ID(0 To 15) As Byte
Dim i As Long
If CoCreateGuid(ID(0)) = 0 Then
For i = 0 To 15
GUID = GUID & Format(Hex$(ID(i)), "00")
Next
Else
GUID = "Error while creating GUID!"
End If
End Function
Test using
Sub testGUID()
MsgBox GUID
End Sub
How to best implement depends on your needs. One way would be to write a macro to get a GUID populate a column where names exist. (note, using it as a udf as is is no good, since it will return a new GUID when recalculated)
EDIT
See this answer for creating a SHA1 hash of a string
Do you just want an incrementing numeric id column to sit next to your values? If so, and if your values will always be unique, you can very easily do this with formulae.
If your values were in column B, starting in B2 underneath your headers for example, in A2 you would type the formula "=IF(B2="","",1+MAX(A$1:A1))". You can copy and paste that down as far as your data extends, and it will increment a numeric identifier for each row in column B which isn't blank.
If you need to do anything more complicated, like identify and re-identify repeating values, or make identifiers 'freeze' once they're populated, let me know. Currently, when you clear or add values to your list the identifers will toggle themselves up and down, so you need to be careful if your data changes.
Unique identifier based on the number of specific characters in text. I used an identifier based on vowels and numbers.
=LEN($J$14)-LEN(SUBSTITUTE($J$14;"a";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"e";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"i";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"j";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"o";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"u";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"y";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"1";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"2";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"3";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"4";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"5";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"6";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"7";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"8";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"9";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"0";""))
You say you are confident that there are no duplicate values in your words. To push it further, are you confident that the first 8 characters in any word would be unique?
If so, you can use the below formula. It works by individually taking each character's ASCII code - 40 [assuming normal characters, this puts numbers at between 8 & 57, and letters at between 57 & 122], and multiplying that characters code by 10 ^ [that character's digit placement in the word]. Basically it takes that character code [-40], and concatenates each code onto the next.
EDIT Note that this code no longer requires that at least 8 characters exist in your word to prevent an error, as the actual word to be coded has 8 "0"'s appended to it.
=TEXT(SUM((CODE(MID(LOWER(RIGHT(REPT("0",8)&A3,8)),{1,2,3,4,5,6,7,8},1))-40)*10^{0,2,4,6,8,10,12,14}),"#")
Note that as this uses the ASCII values of the characters, the ID # could be used to identify the name directly - this does not really create anonymity, it just turns 8 unique characters into a unique number. It is obfuscated with the -40, but not really 'safe' in that sense. The -40 is just to get normal letters and numbers in the 2 digit range, so that multiplying by 10^0,2,4 etc. will create a 2 digit unique add-on to the created code.
EDIT FOR ALTERNATIVE METHOD
I had previously attempted to do this so that it would look at each letter of the alphabet, count the number of times it appears in the word, and then multiply that by 10*[that letter's position in the alphabet]. The problem with doing this (see comment below for formula) is that it required a number of 10^26-1, which is beyond Excel's floating point precision. However, I have a modified version of that method:
By limiting the number of allowed characters in the alphabet, we can get the max total size possible to 10^15-1, which Excel can properly calculate. The formula looks like this:
=RIGHT(REPT("0",15)&TEXT(SUM(LEN(A3)*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}-LEN(SUBSTITUTE(A3,MID(Alphabet,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},1),""))*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}),"#"),15)
[The RIGHT("00000000000000"... portion of the formula is meant to keep all codes the same number of characters]
Note that here, Alphabet is a named string which holds the characters: "abcdehilmnorstu". For example, using the above formula, the word "asdf" counts the instances of a, s, and d, but not 'f' which isn't in my contracted alphabet. The code of "asdf" would be:
001000000001001
This only works with the following assumptions:
The letters not listed (nor numbers / special characters) are not required to make each name unique. For example, asdf & asd would have the same code in the above method.
And,
The order of the letters is not required to make each name unique. For example, asd & dsa would have the same code in the above method.

Resources