Power Query - Converting whole number to text in a CUSTOM COLUMN - excel

I need to convert 3 whole number columns to text in a formula when adding a new column inside power query. I know how to do this in dax using FORMAT function but I can't make it work inside power query.
3 columns are - click to veiw
Then below is my CUSTOM COLUMN:
= Table.AddColumn(RefNo.3, "Refernce Number", each
if Text.Length([RefNo.3]) > 1 and Text.Length([RefNo.3]) < 11 then [RefNo.3]
else if Text.Length([RefNo.2]) > 1 and Text.Length([RefNo.2]) < 11 then [RefNo.2]
else if Text.Length([RefNo.1]) > 1 and Text.Length([RefNo.1]) < 11 then [RefNo.1]
else null)
However, at the moment I'm getting this error:
Expression.Error: We cannot convert a value of type Table to type Number.
Details:
Value=[Table]
Type=[Type]
So I know I need to convert the whole number columns to text first inside the formula. Also, I had to intentionally convert those 3 columns from text to whole number previously to get rid of redundant values (so that's not an option for me to revert that). thanks in advance guys.

There are any number of ways to solve this, depending on your real data.
Just set the columns to Type.Text before executing your AddColumn function.
If you do this, you would also have to check for null as they will cause the script, as you've written it, to fail
Or you could precede your testing with another line to replace the nulls with an empty string (""): Table.ReplaceValue(table_name,null,"",Replacer.ReplaceValue,{"RefNo", "RefNo2", "RefNo3"}),
If they are all positive integers, compare the values rather than the string lengths: eg >=0 and <10000000000
Construct a numeric array, and return the last value that passes the filter
= Table.AddColumn(your_table_name, "Reference Number",
each List.Accumulate(List.Reverse(List.RemoveNulls({[RefNo],[RefNo2],[RefNo3]})),
null,(state,current)=> if state = null then
let
x = Text.Length(Text.From(current))
in
if x > 1 and x < 11 then current else state
else state))

Related

M language - extract 5 digit numbers only from string

I am trying to write a custom column in M to detect whether the field contains a 5 digit number, and then extract that 5 digit number into a new column. When employees incur an expense, they need to specify the job number if its for a job. If not, they usually type in text.
IF text.contains number like "#####" then have the new column have that 5 digit number, else null.
I am having incredible difficult on how to write in M. I tried doing this in
Assuming your data is in column [Column1] then try
Add column.. custom column ... with formula:
= try if Number.From([Column1])>0 and Text.Length(Text.From([Column1]))=5 then [Column1] else null otherwise null
that looks for Column1 being something that can evaluate to a number, and that has a length of 5 characters when viewed as a string, otherwise puts a null. This allows for leading zeroes like '01234 but does not attempt to account for mixed text/numbers such as A12345
if you need to remove alphas, try
= if Text.Length(Text.Select(Text.From([Column1]),{"0".."9"}))=5 then Text.Select(Text.From([Column1]),{"0".."9"}) else null
Given what you have written, that valid entries will be in the range of 10,000 to 99,999 and may or may not be preceded by a J, you can add a column to detect that.
//Ensure Column1 (or whatever it's real name is) is of type text and NOT any
#"Added Custom" = Table.AddColumn(#"Previous Step", "Job Number", each
let
x = Text.TrimStart(Text.Upper([Column1]),"J"),
n = try Number.From(x) otherwise 0
in
if n>=10000 and n<100000 then n else null)

Iterate in column for specific value and insert 1 if found or 0 if not found in new column python

I have a DataFrame as shown in the attached image. My columns of interest are fgr and fgr1. As you can see, they both contain values corresponding to years.
I want to iterate in the the two columns and for any value present, I want 1 if the value is present or else 0.
For example, in fgr the first value is 2028. So, the first row in column 2028 will have a value 1 and all other columns have value 0.
I tried using lookup but I did not succeed. So, any pointers will be really helpful.
Example dataframe
Data:
Data file in Excel
This fill do you job. You can use for loops aswell but I think this approach will be faster.
df["Matched"] = df["fgr"].isin(df["fgr1"])*1
Basically you check if values from one are in anoter column and if they are, you get True or False. You then multiply by 1 to get 1 and 0 instead of True or False.
From this answer
Not the most efficient, but should work for your case(time consuming if large dataset)
s = df.reset_index().melt(['index','fgr','fgr1'])
s['value'] = s.variable.eq(s.fgr.str[:4]).astype(int)
s['value2'] = s.variable.eq(s.fgr1.str[:4]).astype(int)
s['final'] = np.where(s['value']+s['value2'] > 0,1,0)
yourdf = s.pivot_table(index=['index','fgr','fgr1'],columns = 'variable',values='final',aggfunc='first').reset_index(level=[1,2])
yourdf

How to populating one column in a dataframe from the truncated value of another column

I have column in a Pandas dataframe (final_combine_df) that is called GEOID. I will have a 15 character string number like this : '371899201001045'. I want to create a new column in my data frame called 'CB_GrpID' that is equal to just the first 12 characters of the GEOID values (ex: '371899201001'). I tried this, but it just returned the same GEOID value (non-truncated) in the new 'CB_GrpID':
final_combine_df['CB_GrpID'] = final_combine_df['GEOID'][:12]
What am I doing wrong here?
final_combine_df.iloc[0]['CB_GrpID']
>>371899201001045
pandas.Series.str
Working with text
The str accessor is what you're looking for. It gives access to the strings in each cell along with "vectorized" string methods.
final_combined_df['GEOID'].str[:12]
What you were doing:
final_combined_df['GEOID'][:12]
Was just getting the first 12 elements of the column.
Follow this format. Use a lambda function to return the first 12 digits of a string. Note python starts at index 0 and the upper limit is exclusive not inclusive, meaning the last element you want is at index 11, however you set the upper limit to 12 to ensure that 11 is included. Just FYI in case you were unaware.
df[‘new_var’] = df[‘old_var’].apply(lambda x: x[:12])

What is the most efficient format for storing strings from a for loop?

I have a script that runs through a series of strings and using regex pulls out certain strings (approx 4 output strings per input string).
e.g. HelloStackOverflowWorld
-> Hello; Stack; Overflow; World;
The final output would ideally be a table where I can filter based upon the strings in the columns. Using the case above, column 1 row 1 would have 'Hello', column 2 row 1 would have 'Stack' and so on.
The problem is, the size of the output will change depending on the input so I am unsure of what output format to use.
At the moment I used something similar to this:
if strfind(missing{ii},'hello')
miss.exch = [miss.exch;'hello'];
temp.exc = regexp(missing{ii},'(?<=\d[Q|T])(\w*?)(?=[q])','match');
miss.exc = [miss.exc;temp.exc];
temp.TQ= regexp(missing{ii},'(Qc|Tc)','match');
if strcmp(temp.TQ{1,1}, 'Tc')
miss.TQ = [miss.TQ;'variableA'];
elseif temp.TQ{1,1} == 'Qc'
miss.TQ = [miss.TQ;'variableB'];
end
else if .........
end
Which obviously results in a 1x1 struct consisting of a number of fields each with many cells. This makes filtering on strings an issue!
How can I define and add data into a 'table of strings' that I can then filter?
I think you are just looking for a cell array. Here is a simple example of what they can do:
C = {'Abc','Bcd';'Cde',[]}
strcmp(C,'Cde')
Results in:
ans =
0 0
1 0
Make sure to check doc cell to see how you can access them.

Case Function Equivalent in Excel

I have an interesting challenge - I need to run a check on the following data in Excel:
| A - B - C - D |
|------|------|------|------|
| 36 | 0 | 0 | x |
| 0 | 600 | 700 | x |
|___________________________|
You'll have to excuse my wonderfully bad ASCII art. So I need the D column (x) to run a check against the adjacent cells, then convert the values if necessary. Here's the criteria:
If column B is greater than 0, everything works great and I can get coffee. If it doesn't meet that requirement, then I need to convert A1 according to a table - for example, 32 = 1420 and place into D. Unfortunately, there is no relationship between A and what it needs to convert to, so creating a calculation is out of the question.
A case or switch statement would be perfect in this scenario, but I don't think it is a native function in Excel. I also think it would be kind of crazy to chain a bunch of =IF() statements together, which I did about four times before deciding it was a bad idea (story of my life).
Sounds like a job for VLOOKUP!
You can put your 32 -> 1420 type mappings in a couple of columns somewhere, then use the VLOOKUP function to perform the lookup.
Without reference to the original problem (which I suspect is long since solved), I very recently discovered a neat trick that makes the Choose function work exactly like a select case statement without any need to modify data. There's only one catch: only one of your choose conditions can be true at any one time.
The syntax is as follows:
CHOOSE(
(1 * (CONDITION_1)) + (2 * (CONDITION_2)) + ... + (N * (CONDITION_N)),
RESULT_1, RESULT_2, ... , RESULT_N
)
On the assumption that only one of the conditions 1 to N will be true, everything else is 0, meaning the numeric value will correspond to the appropriate result.
If you are not 100% certain that all conditions are mutually exclusive, you might prefer something like:
CHOOSE(
(1 * TEST1) + (2 * TEST2) + (4 * TEST3) + (8 * TEST4) ... (2^N * TESTN)
OUT1, OUT2, , OUT3, , , , OUT4 , , <LOTS OF COMMAS> , OUT5
)
That said, if Excel has an upper limit on the number of arguments a function can take, you'd hit it pretty quickly.
Honestly, can't believe it's taken me years to work it out, but I haven't seen it before, so figured I'd leave it here to help others.
EDIT: Per comment below from #aTrusty:
Silly numbers of commas can be eliminated (and as a result, the choose statement would work for up to 254 cases) by using a formula of the following form:
CHOOSE(
1 + LOG(1 + (2*TEST1) + (4*TEST2) + (8*TEST3) + (16*TEST4),2),
OTHERWISE, RESULT1, RESULT2, RESULT3, RESULT4
)
Note the second argument to the LOG clause, which puts it in base 2 and makes the whole thing work.
Edit: Per David's answer, there's now an actual switch statement if you're lucky enough to be working on office 2016. Aside from difficulty in reading, this also means you get the efficiency of switch, not just the behaviour!
The Switch function is now available, in Excel 2016 / Office 365
SWITCH(expression, value1, result1, [default or value2, result2],…[default or value3, result3])
example:
=SWITCH(A1,0,"FALSE",-1,"TRUE","Maybe")
Microsoft -Office Support
Note: MS has updated that page to only document the behavior of Excel 2019. Eventually, they will probably remove references to 2019 as well... To see what the page looked like in 2016, use the wayback machine:
https://web.archive.org/web/20161010180642/https://support.office.com/en-us/article/SWITCH-function-47ab33c0-28ce-4530-8a45-d532ec4aa25e
Try this;
=IF(B1>=0, B1, OFFSET($X$1, MATCH(B1, $X:$X, Z) - 1, Y)
WHERE
X = The columns you are indexing into
Y = The number of columns to the left (-Y) or right (Y) of the indexed column to get the value you are looking for
Z = 0 if exact-match (if you want to handle errors)
I used this solution to convert single letter color codes into their descriptions:
=CHOOSE(FIND(H5,"GYR"),"Good","OK","Bad")
You basically look up the element you're trying to decode in the array, then use CHOOSE() to pick the associated item. It's a little more compact than building a table for VLOOKUP().
I know it a little late to answer but I think this short video will help you a lot.
http://www.xlninja.com/2012/07/25/excel-choose-function-explained/
Essentially it is using the choose function. He explains it very well in the video so I'll let do it instead of typing 20 pages.
Another video of his explains how to use data validation to populate a drop down which you can select from a limited range.
http://www.xlninja.com/2012/08/13/excel-data-validation-using-dependent-lists/
You could combine the two and use the value in the drop down as your index to the choose function. While he did not show how to combine them, I'm sure you could figure it out as his videos are good. If you have trouble, let me know and I'll update my answer to show you.
I understand that this is a response to an old post-
I like the If() function combined with Index()/Match():
=IF(B2>0,"x",INDEX($H$2:$I$9,MATCH(A2,$H$2:$H$9,0),2))
The if function compare what is in column b and if it is greater than 0, it returns x, if not it uses the array (table of information) identified by the Index() function and selected by Match() to return the value that a corresponds to.
The Index array has the absolute location set $H$2:$I$9 (the dollar signs) so that the place it points to will not change as the formula is copied. The row with the value that you want returned is identified by the Match() function. Match() has the added value of not needing a sorted list to look through that Vlookup() requires. Match() can find the value with a value: 1 less than, 0 exact, -1 greater than. I put a zero in after the absolute Match() array $H$2:$H$9 to find the exact match. For the column that value of the Index() array that one would like returned is entered. I entered a 2 because in my array the return value was in the second column. Below my index array looked like this:
32 1420
36 1650
40 1790
44 1860
55 2010
The value in your 'a' column to search for in the list is in the first column in my example and the corresponding value that is to be return is to the right. The look up/reference table can be on any tab in the work book - or even in another file. -Book2 is the file name, and Sheet2 is the 'other tab' name.
=IF(B2>0,"x",INDEX([Book2]Sheet2!$A$1:$B$8,MATCH(A2,[Book2]Sheet2!$A$1:$A$8,0),2))
If you do not want x return when the value of b is greater than zero delete the x for a 'blank'/null equivalent or maybe put a 0 - not sure what you would want there.
Below is beginning of the function with the x deleted.
=IF(B2>0,"",INDEX...
If you don't have a SWITCH statement in your Excel version (pre-Excel-2016), here's a VBA implementation for it:
Public Function SWITCH(ParamArray args() As Variant) As Variant
Dim i As Integer
Dim val As Variant
Dim tmp As Variant
If ((UBound(args) - LBound(args)) = 0) Or (((UBound(args) - LBound(args)) Mod 2 = 0)) Then
Error 450 'Invalid arguments
Else
val = args(LBound(args))
i = LBound(args) + 1
tmp = args(UBound(args))
While (i < UBound(args))
If val = args(i) Then
tmp = args(i + 1)
End If
i = i + 2
Wend
End If
SWITCH = tmp
End Function
It works exactly like expected, a drop-in replacement for example for Google Spreadsheet's SWITCH function.
Syntax:
=SWITCH(selector; [keyN; valueN;] ... defaultvalue)
where
selector is any expression that is compared to keys
key1, key2, ... are expressions that are compared to the selector
value1, value2, ... are values that are selected if the selector equals to the corresponding key (only)
defaultvalue is used if no key matches the selector
Examples:
=SWITCH("a";"?") returns "?"
=SWITCH("a";"a";"1";"?") returns "1"
=SWITCH("x";"a";"1";"?") returns "?"
=SWITCH("b";"a";"1";"b";TRUE;"?") returns TRUE
=SWITCH(7;7;1;7;2;0) returns 2
=SWITCH("a";"a";"1") returns #VALUE!
To use it, open your Excel, go to Develpment tools tab, click Visual Basic, rightclick on ThisWorkbook, choose Insert, then Module, finally copy the code into the editor. You have to save as a macro-friendly Excel workbook (xlsm).
Even if old, this seems to be a popular questions, so I'll post another solution, which I think is very elegant:
http://fiveminutelessons.com/learn-microsoft-excel/using-multiple-if-statements-excel
It's elegant because it uses just the IF function. Basically, it boils down to this:
if(condition, choose/use a value from the table, if(condition, choose/use another value from the table...
And so on
Works beautifully, even better than HLOOKUP or VLOOOKUP
but... Be warned - there is a limit to the number of nested if statements excel can handle.
Microsoft replace SWITCH, IFS and IFVALUES with CHOOSE only function.
=CHOOSE($L$1,"index_1","Index_2","Index_3")
Recently I unfortunately had to work with Excel 2010 again for a while and I missed the SWITCH function a lot. I came up with the following to try to minimize my pain:
=CHOOSE(SUM((A1={"a";"b";"c"})*ROW(INDIRECT(1&":"&3))),1,2,3)
CTRL+SHIFT+ENTER
where A1 is where your condition lies (it could be a formula, whatever). The good thing is that we just have to provide the condition once (just like SWITCH) and the cases (in this example: a,b,c) and results (in this example: 1,2,3) are ordered, which makes it easy to reason about.
Here is how it works:
Cond={"c1";"c2";...;"cn"} returns a N-vector of TRUE or FALSE (with behaves like 1s and 0s)
ROW(INDIRECT(1&":"&n)) returns a N-vector of ordered numbers: 1;2;3;...;n
The multiplication of both vectors will return lots of zeros and a number (position) where the condition was matched
SUM just transforms this vector with zeros and a position into just a single number, which CHOOSE then can use
If you want to add another condition, just remember to increment the last number inside INDIRECT
If you want an ELSE case, just wrap it inside an IFERROR formula
The formula will not behave properly if you provide the same condition more than once, but I guess nobody would want to do that anyway
If your using Office 2016 or later, or Office 365, there is a new function that acts similarly to a CASE function called IFS. Here's the description of the function from Microsoft's documentation:
The IFS function checks whether one or more conditions are met, and returns a value that corresponds to the first TRUE condition. IFS can take the place of multiple nested IF statements, and is much easier to read with multiple conditions.
An example of usage follows:
=IFS(A2>89,"A",A2>79,"B",A2>69,"C",A2>59,"D",TRUE,"F")
You can even specify a default result:
To specify a default result, enter TRUE for your final logical_test argument. If none of the other conditions are met, the corresponding value will be returned.
The default result feature is included in the example shown above.
You can read more about it on Microsoft's Support Documentation

Resources