Find certain code combinations using text (sub)string in Power Query - excel

I had similar question (link below), but it just lets say "add-on" to my issue that I found on the way.
Find all code combinations using text string in Power Query
What I need is to extract exact matches (or I would say fuzzy matches in Power Query) that are in one string using substring as lookup.
(Please ignore T1 and T2 in the screenshot and data)
As you can see in Table 3 (T3) is a main string, and in T4 is substring with slightly different markings (like JH instead of JH0 or else..) Thats exactly what I need, to use substring as it is but to filter out main string and get results as they are in T5.
I tried my luck using Fuzzy matching in Power Query but the problem is afterwards when I have different substring with more instances, my query is failing due to "column doesn't exist and so on...it has to be dynamic.
I would like to have solution in Power Query!
https://docs.google.com/spreadsheets/d/1Ji1kyV7UsD2YBRJgWUY5zisyL3ySPGwW/edit?usp=sharing&ouid=101738555398870704584&rtpof=true&sd=true

let Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
FindList = Text.Split(Table.ReplaceValue(Table3,",","_",Replacer.ReplaceText,{"String"})[String]{0},"_"),
FindList2 = List.Transform(FindList, each Text.Remove(_,{"0".."9"})),
Newlist=Text.Split(Source[Substring]{0},"_"),
Newlist2=Text.Combine(List.Transform(Newlist, each try FindList{List.PositionOf(FindList2,_)} otherwise "missing"),"_")
in Newlist2
what it is doing (a) split table3 into a list at either a , or _ (b) duplicate the list from A and remove all numbers (c) split table4 into a list at each _ (d) match each value from c against b. If there is a match, use that position number to pull the value from a, otherwise put "missing" (e) put the results back together with a comma separation
Per comments, alternate version that works for multiple matches from Table3:
Newlist2=Text.Combine(List.Transform(Newlist, each try
if List.Count(List.PositionOf(FindList2,_,20))=0 then "missing" else
Text.Combine( List.Transform(List.PositionOf(FindList2,_,20), each FindList{_}),"_") otherwise "missing"),"_")

Related

Comparing two columns and their values and outputting the greater value

I'm trying to compare two columns ("Shows") from different tables and showing which one has the greater number ("Rating") associated with it in another table.
Ignore the operation column above as part of the solution that I'm trying to get, it's just to illustrate for you what I'm trying to compare.
Important note: If the names are duplicated. Compare the matching pair in their corresponding order. (1st with 1st, 2nd with 2nd, 3rd with 3rd etc..) illustrated in the table below:
Thanks
You can try the following in cell F3 for an array solution that spills the entire result at once:
=LET(sA, A3:A6, rA, B3:B6, sB, C3:C6, rB, D3:D6, CNTS, LAMBDA(x,
LET(seq, SEQUENCE(ROWS(x)), MAP(seq, LAMBDA(s,ROWS(FILTER(x,(x=INDEX(x,s))
*(seq<=s))))))), cntsA, CNTS(sA), cntsB, CNTS(sB), eval, MAP(sA, rA, cntsA,
LAMBDA(s,r,c,IF(r > FILTER(rB, (sB=s) * (cntsB=c)), "Table 1", "Table 2"))),
HSTACK(sA, eval))
Here is the output:
Explanation
The main idea is to count repeated show values. We use a user LAMBDA function CNTS, to avoid repetition of the same formula twice. Once we have the counts (cntsA, contsB), we use MAP to iterate over Table 1 elements with the counts and look for specific show and counts to compare with Table 2 columns. The FILTER function will return always a single value (based on sample data). Finally, we prepare the output as expected using HSTACK.
Try-
=IF(INDEX(FILTER($B$3:$B$6,$A$3:$A$6=G3),COUNTIFS($G$3:$G3,G3))>INDEX(FILTER($E$3:$E$6,$D$3:$D$6=G3),COUNTIFS($G$3:$G3,G3)),"Table-1","Table-2")

how to transform a table in Excel from vertical to horizontal but with different length

i would like to get table 2 from Table 1 in a quicker way. can people help? thanks
so far i have done pivot tables and manually copy and paste transpose, but this is really time consuming/
Here, a solution that uses DROP/REDUCE/VSTACK pattern to generate each row. Check for example #JvdV's answer from this question: How to split texts from dynamic range? and a similar idea DROP/REDUCE/HSTACK pattern to generate the columns for a given row. In cell E2 put the following formula:
=LET(set, A2:B13, IDs, INDEX(set,,1), dates, INDEX(set,,2),
HREDUCE, LAMBDA(id, arr, REDUCE(id, arr, LAMBDA(acc, x, HSTACK(acc, x)))),
output, DROP(REDUCE("", UNIQUE(IDs), LAMBDA(ac, id, VSTACK(ac, LET(
idDates, FILTER(dates, ISNUMBER(XMATCH(IDs, id))), HREDUCE(id, idDates)
)))),1), IFERROR(VSTACK(HSTACK("ID", "Dates"), output), "")
)
and here is the output:
Update
As #JdvD pointed out in the comments section there is a shorted way:
=LET(set, A3:B13, title, A1:B1, IDs, INDEX(set,,1), dates, INDEX(set,,2),
IFERROR(REDUCE(title, UNIQUE(IDs),LAMBDA(ac, id,
VSTACK(ac,HSTACK(id,TOROW(FILTER(dates,IDs=id)))))),"")
)
The main idea is to use the title as a way to initialize the VSTACK accumulator (no need to use DROP), and have all the dates for a given id all at once via the FILTER function. As a side note, it can be expressed in terms of the pattern we explained in the Explanation section (see below), as follow:
=LET(set, A3:B13, title, A1:B1, IDs, INDEX(set,,1), dates, INDEX(set,,2),
HREDUCE, LAMBDA(id, HSTACK(id, TOROW(FILTER(dates,IDs=id)))),
IFERROR(REDUCE(title, UNIQUE(IDs),LAMBDA(ac,id, VSTACK(ac, HREDUCE(id)))),"")
)
Note: Keeping the same name of the user LAMBDA function (HREDUCE) for sake of consistency with the Explanation section, but there is no need to use REDUCE. A more appropriate name would be PIVOT_DATES.
Explanation
HREDUCE is a user LAMBDA function that implements the DROP/REDUCE/HSTACK pattern. In order to generate all the columns for a given row, this is the pattern to follow:
DROP(REDUCE("", arr, LAMBDA(acc, x, HSTACK(acc, func))),,1)
It iterates over all elements of arr (x) and uses HSTACK to concatenate column by column on each iteration. DROP function is used to remove the first column, if we don't have a valid value to initialize the first column (the accumulator, acc). The name func is just a symbolic representation of the calculation required to obtain the value to put on a given column. Usually, some variables are required to be defined, so quite often the LET function is used for that.
In our case we have a valid value to initialize the iteration process (no need to use DROP function), so this pattern can be implemented as follow via our user LAMBDA function HREDUCE:
LAMBDA(id, arr, REDUCE(id, arr, LAMBDA(acc, x, HSTACK(acc, x))))
In our case the initialization value will be each unique id value. The func will be just each element of arr, because we don't need to do any additional calculation to obtain the column value.
The previous process can be applied for a given row, but we need to create iteratively each row. In order to do that we use a DROP/REDUCE/VSTACK pattern, which is a similar idea:
DROP(REDUCE("", arr, LAMBDA(acc, x, VSTACK(acc, func))),1)
Now we append rows via VSTACK. For this case we don't know how to initialize properly the accumulator (acc), so we need to use DROP to remove the first row. Now fun will be: HREDUCE(id, idDates), i.e. the LAMBDA function we created before to generate all the dates columns for a given id. Now we use a LET function to name the selected dates for a given id (idDates).
At the beginning of each row (first column), we are going to have the unique IDs (UNIQUE(IDs)). To find the corresponding dates for each unique ID (id) we use the following:
FILTER(dates, ISNUMBER(XMATCH(IDs, id)))
and name the result idDates.
Finally, we build the output including the header. We pad non existing values with the empty string to avoid having #NA values. This is the default behavior of V/HSTACK functions. We use IFERROR function for that.
IFERROR(VSTACK(HSTACK("ID", "Dates"), output), "")
Note: Both patterns are very useful to avoid Nested Array Error (#CALC!) usually produced by some of the new Excel array functions, such as BYROW, BYCOL, MAP when using TEXTSPLIT for example. This is one of the effective ways to overcome it.

Containing a word in string and recording the following string

I have about 1000 or so lines in orginal dataset. I need to use the isnumber(search(substring, string)) to find if my substring is in the column. IT reports TRUE or FALSE. I have up to three possible outcomes for if its TRUE, if its FALSE I will have 0 outcomes.
However, in stead of it reporting TRUE I would like to choose which of the Three outcomes I can record, and I would like to record the string.
I have tried using =isnumber(Search(substring, string))
=isnumber(search(substring, string))
My first column is
AAA-1-2
The column Im search for my strings will contain...
AAA-1-2-1
AAA-1-2-2
AAA-1-2-B
I would like to pick from these...I would only like to choose AAA-1-2-1.
Thank you.
So the result would look like...
First column
AAA-1-2
Second column
AAA-1-2-1
Use VLOOKUP with a wild card:
=VLOOKUP(C1&"*",A:A,1,FALSE)
But based on your last deleted question I believe you really want:
=INDEX($A:$A,AGGREGATE(15,7,ROW($A$1:$A$3)/(ISNUMBER(SEARCH($C1,$A$1:$A$3))),COLUMN(A:A)))
To show that order does not matter:

Copying all #mentions and #hashtags from column A to Columns B and C in Excel

I have a really large database of tweets. Most of the tweets have multiple #hashtags and #mentions. I want all the #hashtags separated with a space in one column and all the #mentions in another column. I already know how to extract the first occurrence of a #hashtag and a #mention. But I don't know to get them all? Some of the tweets have as much as 8 #hashtags. Manually going through the tweets and copy/pasting the #hashtags and #mentions seem an impossible task for over 5,000 tweets.
Here is an example of what I want. I have Column A and I want a macro that would populate columns B and C. (I'm on Windows &, Excel 2010)
Column A
-----------
Dear #DavidStern, #spurs put a quality team on the floor and should have beat the #heat. Leave #Pop alone. #Spurs a classy organization.
Live broadcast from #Nacho_xtreme: "Papelucho Radio"http://mixlr.com nachoxtreme-radio … #mixlr #pop #dance
"Since You Left" by #EmilNow now playing on KGUP 106.5FM. Listen now on http://www.kgup1065.com  #Pop #Rock
Family Night #battleofthegenerations Dad has the #Monkeys Mom has #DonnieOsman #michaelbuble for me #Dubstep for the boys#Pop for sissy
#McKinzeepowell #m0ore21 I love that the PNW and the Midwest are on the same page!! #Pop
I want Column B to look like This:
Column B
--------
#DavidStern #Pop #Spurs
#mixlr #pop #dance
#Pop #Rock
#battleofthegenerations #Monkeys #DonnieOsman #Dubstep #Pop
#pop
And Column C to look like this:
Column C:
----------
#spurs #heat
#Nacho_xtreme
#EmilNow
#michaelbuble
#McKinzeepowell #m0ore21
Consider using regular expressions.
You can use regular expressions within VBA by adding a reference to Microsoft VBScript Regular Expressions 5.5 from Tools -> References.
Here is a good starting point, with a number of useful links.
Updated
After adding a reference to the Regular Expressions library, put the following function in a VBA module:
Public Function JoinMatches(text As String, start As String)
Dim re As New RegExp, matches As MatchCollection, match As match
re.pattern = start & "\w*"
re.Global = True
Set matches = re.Execute(text)
For Each match In matches
JoinMatches = JoinMatches & " " & match.Value
Next
JoinMatches = Mid(JoinMatches, 2)
End Function
Then, in cell B1 put the following formula (for the hashtags):
=JoinMatches(A1,"#")
In column C1 put the following formula:
=JoinMatches(A1,"#")
Now you can copy just the formulas all the way down.
you could convert text to columns using the other character #, then against for #s and then concatenate the rest of the text back together for column A, if you are not familiar with regular expressions see (#Zev-Spitz)

Case Function Equivalent in Excel

I have an interesting challenge - I need to run a check on the following data in Excel:
| A - B - C - D |
|------|------|------|------|
| 36 | 0 | 0 | x |
| 0 | 600 | 700 | x |
|___________________________|
You'll have to excuse my wonderfully bad ASCII art. So I need the D column (x) to run a check against the adjacent cells, then convert the values if necessary. Here's the criteria:
If column B is greater than 0, everything works great and I can get coffee. If it doesn't meet that requirement, then I need to convert A1 according to a table - for example, 32 = 1420 and place into D. Unfortunately, there is no relationship between A and what it needs to convert to, so creating a calculation is out of the question.
A case or switch statement would be perfect in this scenario, but I don't think it is a native function in Excel. I also think it would be kind of crazy to chain a bunch of =IF() statements together, which I did about four times before deciding it was a bad idea (story of my life).
Sounds like a job for VLOOKUP!
You can put your 32 -> 1420 type mappings in a couple of columns somewhere, then use the VLOOKUP function to perform the lookup.
Without reference to the original problem (which I suspect is long since solved), I very recently discovered a neat trick that makes the Choose function work exactly like a select case statement without any need to modify data. There's only one catch: only one of your choose conditions can be true at any one time.
The syntax is as follows:
CHOOSE(
(1 * (CONDITION_1)) + (2 * (CONDITION_2)) + ... + (N * (CONDITION_N)),
RESULT_1, RESULT_2, ... , RESULT_N
)
On the assumption that only one of the conditions 1 to N will be true, everything else is 0, meaning the numeric value will correspond to the appropriate result.
If you are not 100% certain that all conditions are mutually exclusive, you might prefer something like:
CHOOSE(
(1 * TEST1) + (2 * TEST2) + (4 * TEST3) + (8 * TEST4) ... (2^N * TESTN)
OUT1, OUT2, , OUT3, , , , OUT4 , , <LOTS OF COMMAS> , OUT5
)
That said, if Excel has an upper limit on the number of arguments a function can take, you'd hit it pretty quickly.
Honestly, can't believe it's taken me years to work it out, but I haven't seen it before, so figured I'd leave it here to help others.
EDIT: Per comment below from #aTrusty:
Silly numbers of commas can be eliminated (and as a result, the choose statement would work for up to 254 cases) by using a formula of the following form:
CHOOSE(
1 + LOG(1 + (2*TEST1) + (4*TEST2) + (8*TEST3) + (16*TEST4),2),
OTHERWISE, RESULT1, RESULT2, RESULT3, RESULT4
)
Note the second argument to the LOG clause, which puts it in base 2 and makes the whole thing work.
Edit: Per David's answer, there's now an actual switch statement if you're lucky enough to be working on office 2016. Aside from difficulty in reading, this also means you get the efficiency of switch, not just the behaviour!
The Switch function is now available, in Excel 2016 / Office 365
SWITCH(expression, value1, result1, [default or value2, result2],…[default or value3, result3])
example:
=SWITCH(A1,0,"FALSE",-1,"TRUE","Maybe")
Microsoft -Office Support
Note: MS has updated that page to only document the behavior of Excel 2019. Eventually, they will probably remove references to 2019 as well... To see what the page looked like in 2016, use the wayback machine:
https://web.archive.org/web/20161010180642/https://support.office.com/en-us/article/SWITCH-function-47ab33c0-28ce-4530-8a45-d532ec4aa25e
Try this;
=IF(B1>=0, B1, OFFSET($X$1, MATCH(B1, $X:$X, Z) - 1, Y)
WHERE
X = The columns you are indexing into
Y = The number of columns to the left (-Y) or right (Y) of the indexed column to get the value you are looking for
Z = 0 if exact-match (if you want to handle errors)
I used this solution to convert single letter color codes into their descriptions:
=CHOOSE(FIND(H5,"GYR"),"Good","OK","Bad")
You basically look up the element you're trying to decode in the array, then use CHOOSE() to pick the associated item. It's a little more compact than building a table for VLOOKUP().
I know it a little late to answer but I think this short video will help you a lot.
http://www.xlninja.com/2012/07/25/excel-choose-function-explained/
Essentially it is using the choose function. He explains it very well in the video so I'll let do it instead of typing 20 pages.
Another video of his explains how to use data validation to populate a drop down which you can select from a limited range.
http://www.xlninja.com/2012/08/13/excel-data-validation-using-dependent-lists/
You could combine the two and use the value in the drop down as your index to the choose function. While he did not show how to combine them, I'm sure you could figure it out as his videos are good. If you have trouble, let me know and I'll update my answer to show you.
I understand that this is a response to an old post-
I like the If() function combined with Index()/Match():
=IF(B2>0,"x",INDEX($H$2:$I$9,MATCH(A2,$H$2:$H$9,0),2))
The if function compare what is in column b and if it is greater than 0, it returns x, if not it uses the array (table of information) identified by the Index() function and selected by Match() to return the value that a corresponds to.
The Index array has the absolute location set $H$2:$I$9 (the dollar signs) so that the place it points to will not change as the formula is copied. The row with the value that you want returned is identified by the Match() function. Match() has the added value of not needing a sorted list to look through that Vlookup() requires. Match() can find the value with a value: 1 less than, 0 exact, -1 greater than. I put a zero in after the absolute Match() array $H$2:$H$9 to find the exact match. For the column that value of the Index() array that one would like returned is entered. I entered a 2 because in my array the return value was in the second column. Below my index array looked like this:
32 1420
36 1650
40 1790
44 1860
55 2010
The value in your 'a' column to search for in the list is in the first column in my example and the corresponding value that is to be return is to the right. The look up/reference table can be on any tab in the work book - or even in another file. -Book2 is the file name, and Sheet2 is the 'other tab' name.
=IF(B2>0,"x",INDEX([Book2]Sheet2!$A$1:$B$8,MATCH(A2,[Book2]Sheet2!$A$1:$A$8,0),2))
If you do not want x return when the value of b is greater than zero delete the x for a 'blank'/null equivalent or maybe put a 0 - not sure what you would want there.
Below is beginning of the function with the x deleted.
=IF(B2>0,"",INDEX...
If you don't have a SWITCH statement in your Excel version (pre-Excel-2016), here's a VBA implementation for it:
Public Function SWITCH(ParamArray args() As Variant) As Variant
Dim i As Integer
Dim val As Variant
Dim tmp As Variant
If ((UBound(args) - LBound(args)) = 0) Or (((UBound(args) - LBound(args)) Mod 2 = 0)) Then
Error 450 'Invalid arguments
Else
val = args(LBound(args))
i = LBound(args) + 1
tmp = args(UBound(args))
While (i < UBound(args))
If val = args(i) Then
tmp = args(i + 1)
End If
i = i + 2
Wend
End If
SWITCH = tmp
End Function
It works exactly like expected, a drop-in replacement for example for Google Spreadsheet's SWITCH function.
Syntax:
=SWITCH(selector; [keyN; valueN;] ... defaultvalue)
where
selector is any expression that is compared to keys
key1, key2, ... are expressions that are compared to the selector
value1, value2, ... are values that are selected if the selector equals to the corresponding key (only)
defaultvalue is used if no key matches the selector
Examples:
=SWITCH("a";"?") returns "?"
=SWITCH("a";"a";"1";"?") returns "1"
=SWITCH("x";"a";"1";"?") returns "?"
=SWITCH("b";"a";"1";"b";TRUE;"?") returns TRUE
=SWITCH(7;7;1;7;2;0) returns 2
=SWITCH("a";"a";"1") returns #VALUE!
To use it, open your Excel, go to Develpment tools tab, click Visual Basic, rightclick on ThisWorkbook, choose Insert, then Module, finally copy the code into the editor. You have to save as a macro-friendly Excel workbook (xlsm).
Even if old, this seems to be a popular questions, so I'll post another solution, which I think is very elegant:
http://fiveminutelessons.com/learn-microsoft-excel/using-multiple-if-statements-excel
It's elegant because it uses just the IF function. Basically, it boils down to this:
if(condition, choose/use a value from the table, if(condition, choose/use another value from the table...
And so on
Works beautifully, even better than HLOOKUP or VLOOOKUP
but... Be warned - there is a limit to the number of nested if statements excel can handle.
Microsoft replace SWITCH, IFS and IFVALUES with CHOOSE only function.
=CHOOSE($L$1,"index_1","Index_2","Index_3")
Recently I unfortunately had to work with Excel 2010 again for a while and I missed the SWITCH function a lot. I came up with the following to try to minimize my pain:
=CHOOSE(SUM((A1={"a";"b";"c"})*ROW(INDIRECT(1&":"&3))),1,2,3)
CTRL+SHIFT+ENTER
where A1 is where your condition lies (it could be a formula, whatever). The good thing is that we just have to provide the condition once (just like SWITCH) and the cases (in this example: a,b,c) and results (in this example: 1,2,3) are ordered, which makes it easy to reason about.
Here is how it works:
Cond={"c1";"c2";...;"cn"} returns a N-vector of TRUE or FALSE (with behaves like 1s and 0s)
ROW(INDIRECT(1&":"&n)) returns a N-vector of ordered numbers: 1;2;3;...;n
The multiplication of both vectors will return lots of zeros and a number (position) where the condition was matched
SUM just transforms this vector with zeros and a position into just a single number, which CHOOSE then can use
If you want to add another condition, just remember to increment the last number inside INDIRECT
If you want an ELSE case, just wrap it inside an IFERROR formula
The formula will not behave properly if you provide the same condition more than once, but I guess nobody would want to do that anyway
If your using Office 2016 or later, or Office 365, there is a new function that acts similarly to a CASE function called IFS. Here's the description of the function from Microsoft's documentation:
The IFS function checks whether one or more conditions are met, and returns a value that corresponds to the first TRUE condition. IFS can take the place of multiple nested IF statements, and is much easier to read with multiple conditions.
An example of usage follows:
=IFS(A2>89,"A",A2>79,"B",A2>69,"C",A2>59,"D",TRUE,"F")
You can even specify a default result:
To specify a default result, enter TRUE for your final logical_test argument. If none of the other conditions are met, the corresponding value will be returned.
The default result feature is included in the example shown above.
You can read more about it on Microsoft's Support Documentation

Resources