Retrieving several matches from a string excel - excel

Sorry if this is a stupid question but i've been racking my brain for a couple of days now and i can't seem to come up with a solution to this.
I have a list of phrases and a list of keywords that need to be searched, extracted and replaced.
For example i have the following list of keywords in sheet 1 column A that need to be extracted and replaced with the keywords in column B.
red - orange
blue - violet
green - pink
yellow - brown
And in sheet 2 I have a list of phrases in column A.
The girl with blue eyes had a red scarf.
I saw a yellow flower.
My cousin has a red car with blue rims and green mirrors.
And I want to extract in column B the keywords that are matched for every phrase in the exact order that they appear like so:
COLUMN A COLUMN B
The girl with blue eyes had a red scarf. violet, orange
I saw a yellow flower. brown
My cousin has a red car with blue rims and green mirrors. orange, violet, pink
Is there any way this can be achieved either by formula or VBA? Also this needs to be usable with Excel 2016 so i can't use fancy functions like "TEXTJOIN".
Thank you everyone in advance!
Cheers!
L.E.
I was able to find some code that almost does what I need it to do but it does not keep the correct order.
Is there anyway it could be modified to generate the desired results? Unfortunately I'm not that good with VBA. :(
Sub test()
Dim datacount As Long
Dim termcount As Long
datacount = Sheets("Sheet1").Cells(Rows.Count, "A").End(xlUp).Row
termcount = Sheets("Sheet2").Cells(Rows.Count, "A").End(xlUp).Row
For i = 1 To datacount
dataa = Sheets("Sheet1").Cells(i, "A").Text
result = ""
For j = 1 To termcount
terma = Sheets("Sheet2").Cells(j, "A").Text
termb = Sheets("Sheet2").Cells(j, "B").Text
If InStr(dataa, terma) > 0 Then
If result = "" Then
result = result & termb
Else
result = result & ", " & termb
End If
End If
Next j
Sheets("Sheet1").Cells(i, "B").Value = result
Next i
End Sub

You can do this with a User Defined Function making use of Regular Expressions.
The worksheet formula:
=matchWords(A2,$K$2:$L$5)
where A2 contains the sentence, and the second argument points to the translation table (which could be on another worksheet).
The code
Option Explicit
Function matchWords(ByVal s As String, translTbl As Range) As String
Dim RE As Object, MC As Object, M As Object
Dim AL As Object 'collect the replaced words
Dim TT As Variant
Dim I As Long
Dim vS As Variant
'create array
TT = translTbl
'initiate array for output
Set AL = CreateObject("system.collections.arraylist")
'initiate regular expression engine
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = True 'could change this if you want
.Pattern = "\w+" 'can change this if need to include some non letter/digit items
'split the sentence, excluding punctuation
If .test(s) Then
Set MC = .Execute(s)
For Each M In MC
For I = 1 To UBound(TT, 1)
If M = TT(I, 1) Then AL.Add TT(I, 2)
Next I
Next M
End If
End With
matchWords = Join(AL.toarray, ", ")
End Function

I would suggest you use Power Query which is a built-in function since Excel 2013.
Suppose the text strings of colours on your Sheet1 is in a Table named Tbl_LookUp
Suppose the phrases on your Sheet2 is in another Table named Tbl_Phrases
Go to the Data tab of your Excel and load both tables to the Power Query Editor (you can google how to load data from a table to the PQ Editor in Excel 2016). Please note the screenshot is from Excel 365.
Once loaded, go to the Tbl_Phrases query, and action the following steps:
Add an indexed column starting from 1
Split the Phrases column by delimiter, use space as the delimiter and choose to put the outcome into rows
Merge the current query with the Tbl_LookUp query, use the Phrase column to match the Old Text column
Expand the new column to show contents from New Text column
Group the New Text column by the Index column, you can choose to sum the values in the New Text column, and it will come up as an error after the grouping. Go to the formula field and replace this part of the formula List.Sum([New Text]) with Text.Combine([New Text],", "). Hit enter and the error will be corrected to the desired text string.
The following is the full M Code for the above query. You can copy and paste it in the Advanced Editor without manually going through each step:
let
Source = Excel.CurrentWorkbook(){[Name="Tbl_Phrases"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrases", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Phrases", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Phrases"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Phrases", type text}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type1", {"Phrases"}, Tbl_LookUp, {"Old Text"}, "Tbl_Replace", JoinKind.LeftOuter),
#"Expanded Tbl_Replace" = Table.ExpandTableColumn(#"Merged Queries", "Tbl_Replace", {"New Text"}, {"New Text"}),
#"Grouped Rows" = Table.Group(#"Expanded Tbl_Replace", {"Index"}, {{"Look up color", each Text.Combine([New Text],", "), type nullable text}})
in
#"Grouped Rows"
When you finish adding an index column in the Tbl_Phrases query, which is Step 1 from the above, you can make a copy of the query (simply right click the original query and select "duplicate"), then you will have a second query called Tbl_Phrases (2). No need to work on this query until you finish editing the original query ended up with desired text strings.
Then you can merge the Tbl_Phrases (2) query with the Tbl_Phrases query using the index column. Expand the new column to show the content from the look up colour column. Lastly, merge the Phrases column with the look up color column with delimiter (space)-(space), and remove the index column, then you should have the desired text string.
Here is the M Code for the Tbl_Phrases (2) query. Just a reminder, you must finish with the Tbl_Phrases query first otherwise the merging query step will lead to an error:
let
Source = Excel.CurrentWorkbook(){[Name="Tbl_Phrases"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrases", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Merged Queries" = Table.NestedJoin(#"Added Index", {"Index"}, Tbl_Phrases, {"Index"}, "Tbl_Phrases", JoinKind.LeftOuter),
#"Expanded Tbl_Phrases" = Table.ExpandTableColumn(#"Merged Queries", "Tbl_Phrases", {"Look up color"}, {"Look up color"}),
#"Merged Columns" = Table.CombineColumns(#"Expanded Tbl_Phrases",{"Phrases", "Look up color"},Combiner.CombineTextByDelimiter(" - ", QuoteStyle.None),"Merged"),
#"Removed Columns" = Table.RemoveColumns(#"Merged Columns",{"Index"})
in
#"Removed Columns"
You can then load the Tbl_Phrase (2) query to the desired worksheet within the same workbook (or to somewhere on Sheet2).
Let me know if you have any questions.

Related

word patterns within an excel column

I have 2 Excel data sets each comprising a column of word patterns and have been searching for a way to copy and group all instances of repetition within these columns into a new column.
This is the closest result I could find so far:
Sub Common5bis()
Dim Joined
Set d = CreateObject("Scripting.Dictionary") 'make dictionary
d.CompareMode = 1 'not case sensitive
a = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value 'data to array
For i = 1 To UBound(a) 'loop trough alle records
If Len(a(i, 1)) >= 5 Then 'length at least 5
For l = 1 To Len(a(i, 1)) - 4 'all strings withing record
s = Mid(a(i, 1), l, 5) 'that string
d(s) = d(s) + 1 'increment
Next
End If
Next
Joined = Application.Index(Array(d.Keys, d.items), 0, 0) 'join the keys and the items
With Range("D1").Resize(UBound(Joined, 2), 2) 'export range
.EntireColumn.ClearContents 'clear previous
.Value = Application.Transpose(Joined) 'write to sheet
.Sort .Range("B1"), xlDescending, Header:=xlNo 'sort descending
End With
End Sub
Which yielded this result for the particular question:
This example achieves 4 of the things I'm trying to achieve:
Identify repeating strings within a single column
Copies these strings into a separate column
Displays results in order of occurrence (in this case from least to most)
Displays the quantity of repetitions (including the first instance) in an adjacent column
However, although from reading the code there are basic things I've figured out that I can adapt to my purposes, it still fails to achieve these essential tasks which I'm still trying to figure out:
Identify individual words rather than single characters
I could possibly reduce the size from 5 to 3, but for the word stings I have (lists of pronouns from larger texts) that would include "I I" repetitions but won't be so great for "Your You" etc, whilst at least 4 or 5 would miss anything starting with "I I"
Include an indefinite amount of values - looking at the code and the replies to the forum it comes from it looks like it's capped at 5, but I'm trying to find a way to identify all repetitions for all multiple word strings which could be something like "I I my you You Me I You my"
Is case sensitive - this is quite important as some words in the column have been capitalised to differentiate different uses
I'm still learning the basics of VBA but have manually typed out this example of what I'm trying to do with the code I've found above:
Intended outcome:
And so on
I'm a bit screwed at this point which is why I'm reaching out here (sorry if this is a stupid question, I'm brand new to VBA as my work almost never needs Excel, let alone macros) so will massively appreciate any constructive advice towards a solution!
Because I've been working with it recently, I note that you can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range or From within sheet
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
First add a custom function:
New blank query
Rename per the code comment
Edits to make case-insensitive
Custom Function
//rename fnPatterns
//generate all possible patterns of two words or more
(String as text)=>
let
//split text string into individual words & get the count of words
#"Split Words" = List.Buffer(Text.Split(String," ")),
wordCount = List.Count(#"Split Words"),
//start position for each number of words
starts = List.Numbers(0, wordCount-1),
//number of words for each pattern (minimum of two (2) words in a pattern
words = List.Reverse(List.Numbers(2, wordCount-1)),
//generate patterns as index into the List and number of words
// will be used in the List.Range function
patterns = List.Combine(List.Generate(
()=>[r={{0,wordCount}}, idx=0],
each [idx] < wordCount-1,
each [r=List.Transform({0..starts{[idx]+1}}, (li)=> {li, wordCount-[idx]-1}),
idx=[idx]+1],
each [r]
)),
//Generate a list of all the patterns by using the List.Range function
wordPatterns = List.Distinct(List.Accumulate(patterns, {}, (state, current)=>
state & {List.Range(#"Split Words", current{0}, current{1})}), Comparer.OrdinalIgnoreCase)
in
wordPatterns
Main Function
let
//change next line to reflect data source
//if data has a column name other than "Column1", that will need to be changed also wherever referenced
Source = Excel.CurrentWorkbook(){[Name="Table17"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Create a list of all the possible patterns for each string, added as a custom column
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "Patterns", each fnPatterns([Column1]), type list),
//removed unneeded original column of strings
#"Removed Columns" = Table.RemoveColumns(#"Invoked Custom Function",{"Column1"}),
//Expand the column of lists of lists into a column of lists
#"Expanded Patterns" = Table.ExpandListColumn(#"Removed Columns", "Patterns"),
//convert all lists to lower case for text-insensitive comparison
#"Added Custom" = Table.AddColumn(#"Expanded Patterns", "lower case patterns",
each List.Transform([Patterns], each Text.Lower(_))),
//Count number of matches for each pattern
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Count", each List.Count(List.Select(#"Added Custom"[lower case patterns], (li)=> li = [lower case patterns])), Int64.Type),
//Filter for matches of more than one (1)
// then remove duplicate patterns based on the "lower case pattern" column
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Count] > 1)),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows", {"lower case patterns"}),
//Remove lower case pattern column and sort by count descending
#"Removed Columns1" = Table.RemoveColumns(#"Removed Duplicates",{"lower case patterns"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Count", Order.Descending}}),
//Re-construct original patterns as text
#"Extracted Values" = Table.TransformColumns(#"Sorted Rows",
{"Patterns", each Text.Combine(List.Transform(_, Text.From), " "), type text})
in
#"Extracted Values"
Note that you could readily implement a similar algorithm using VBA, the VBA.Split function and a Dictionary

Increment difference between cells

I'm trying to duplicate data in a sheet with increments of 12 between each cell from a sheet with 1 cell per row. Between the 12-incremented rows there's other data. This means I can't drag to extend the formula. Like this for customer numbers:
'SheetA'E3 = 'SheetB'Y2
'SheetA'E15 = 'SheetB'Y3
'SheetA'E27 = 'SheetB'Y4
..and so on. I've tried extending 12/24 cells at a time and copying but I can't make it work. Extending doesn't add +1 to one sheet, just +12/+24 to both. Doing this manually will take months. Can this be done without a VBA solution?
Any suggestions? I'm sorry if my terminology isn't on point here.
SheetA:
Try this (run as VBA code):
Sub test1()
For i01 = 0 To 100
Worksheets("SheetA").Cells(3 + 12 * i01, 5) = Worksheets("SheetB").Cells(2 + i01, 25)
Next i01
End Sub
Power Query, available in Windows Excel 2010+ and Office 365, can produce your SheetA given SheetB. Not sure about the effect of the variability you mention.
The query assumes that the correct parameters are listed as column headers in Sheet B. The column headers will get copied over as parameters to sheet A.
To use Power Query:
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//Read in the data
//Change table name in next line to be the "real" table name
Source = Excel.CurrentWorkbook(){[Name="Table12"]}[Content],
//set data types based on first entry in the column
//will be independent of the column names
typeIt = Table.TransformColumnTypes(Source,
List.Transform(
Table.ColumnNames(Source), each
{_,Value.Type(Table.Column(Source,_){0})})
),
//UNpivot except for the c.number and c.name columns to create the Parameter and Level columns
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(typeIt, {"C. number", "C. name"}, "Parameter", "Level"),
//Group By C.Number
//Add the appropriate rows for each customer
//And a blank row to separate the customers
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"C. number"}, {
{"All", each _, type table [C. number=nullable number, C. name=nullable text, Parameter=text, Level=any]},
{"custLabel", (t)=> Table.InsertRows(t,0,{
[C. number = null, C. name=null,Parameter = null, Level = null],
[C. number = null, C. name=null, Parameter = "Customer Number", Level="Customer Name"],
[C. number = null, C. name=null,Parameter = t[C. number]{0}, Level = t[C. name]{0}],
[C. number = null, C. name=null,Parameter = "Parameter", Level = "Level"]
})}
}),
//Remove the unneeded columns and expand the remaining table
#"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"C. number", "All"}),
#"Expanded custLabel" = Table.ExpandTableColumn(#"Removed Columns", "custLabel", {"Parameter", "Level"}, {"Parameter", "Level"}),
//Remove the top blank row
//promote the new blank row to the Header location
#"Removed Top Rows" = Table.Skip(#"Expanded custLabel",1),
#"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
//data type set to text since it will look better on the report
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer Number", type text}, {"Customer Name", type text}})
in
#"Changed Type"```
Data
Results
[ Indirect with row() ]
Assuming 'SheetA'E3 column is the target and 'SheetB'Y2 is the source data.
In SheetA!E3 cell put:
=INDIRECT("SheetB!Y"&( ( (row()-3) / 12) + 2)
Press Enter
Then select SheetA!E3 cell, copy. Then paste in SheetA!E24. The formula will update itself.
Idea :
Find the relation between the target cell row number and the source cell row number. [ b > a : 3 > 2 , 15 > 3, 27 > 4 ] leads to a = (b-3)/12 + 2 . (The math is sort of like figuring out a straight line equation from 3 coordinate.) Then use INDIRECT() to combine the calculated row number with the column address.

Searching and returning for "*1*" in a string returns instances containing "*11*" as well in excel

I am attempting to extract cells through a combination of index(match) and right(len)-find() functions from an array of data with text. In my formula, I am searching for instances of "* DS#1 ", excel returns those but also returns instances with " DS#11 *". How do I get excel to return only DS#1?
I have attempted to use an if statement with no success, if(formula="* 11 *","",formula).
Below is a link to an example of the data. The first cell highlighted in yellow should not be returning that text, it should be "". The second cell highlighted in yellow is appropriate to return that data.
example data
=RIGHT(INDEX($V:$AC,MATCH("DS#1",$AC:$AC,0),1),LEN(INDEX($V:$AC,MATCH(FW$1,$AC:$AC,0),1))-FIND($AG2,INDEX($V:$AC,MATCH(FW$1,$AC:$AC,0),1))+1)
Here some example on how to find a value and check for the following char.
Formula in D2:
=INDEX(A2:A6,MATCH(1,INDEX((ISNUMBER(SEARCH("DS#1",B2:B6)))*(NOT(ISNUMBER(MID(B2:B6,SEARCH(C2,B2:B6)+LEN(C2),1)*1))),0),0))
Here is a formula you can adapt to your ranges which will return a list from the range rngDS that contain findDS. I used named ranges, but you can adapt them to your own ranges.
Not sure if this is what you want since you chose to not post examples of your data or desired results.
The routine finds the findDS string and then checks to be sure that the following character is non-numeric.
C1: =IFERROR(INDEX(rngDS,AGGREGATE(15,6,1/(NOT(ISNUMBER(-MID(rngDS,SEARCH(findDS,rngDS)+LEN(findDS),1))+ISERROR(MID(rngDS,SEARCH(findDS,rngDS)+LEN(findDS),1))))*ROW(rngDS),ROWS($1:1))),"")
and fill down
It would be very difficult to come up a formula based solution especially when you need to first differentiate DS1, DS#1. DS#11, DS#11X etc. then look for the text string after each DS code, not to mention these confusing codes may (or may not) be positioned in random orders in the text string.
A better approach would be using Power Query which is available in Excel 2010 and later versions. My solution is using Excel 2016.
Presume you have the following two tables:
You can use From Table function in the Data tab to add both tables to the Power Query Editor.
Once added, make a duplicate copy of Table 1. I have renamed the duplicate as Table1 (2) - Number Ref. Then you should have three un-edited queries:
If your source data is a larger table containing some other information, you can google how to add a worksheet to the editor and how to remove unnecessary columns and remove duplicated values.
Firstly, let's start working with Table1.
Here are the steps:
Use Replace Values function to remove all # from the text string, and then replace all DS with DS# in the text string, so all DS codes are in the format of DS#XXX. Eg. DS8 will be changed to DS#8. This step may not be necessary if DS8 is a valid code as well as DS#8;
Use Split Column function to split the text strings by the word DS, and put each sub text string into a new row, then you should have the following:
Use Split Column function again to split the text strings by 1 Character from the left and you should have the following:
Filter the first column to show hash tag # only and then remove the first column, then you should have the following:
use Replace Values function repeatedly to remove the following characters/symbols from the text strings: (, ), HT, JH, SK, //, and replace dash - with space . I presume these are irrelevant in the comment but you can leave them if needed. Then you should have:
use Split Column function again to split the text string by the first space on the left, then you should have:
Then you can Trim and Clean the second column to further tidy up the comments, rename the columns as DS#, Comments, and Number Ref consecutively, and change the format of the third column to Text. Then you should have:
The last step is to add a custom column called Match ID to combine the value from first and third column into one text string as shown below:
Secondly, let's work on Table1 (2) - Number Ref
Here are the steps:
Remove the first column so leave the Number Ref column as the single column;
Transpose the column, and Promote the first row as header. Then you should have:
The purpose of this query is to transform all Number Reference into column headers, then append this query with the next query (Table2) to achieve the desired result which I will explain in the next section.
Lastly, let's work on the third query Table2.
Here are the steps:
Append this table with the Number Ref table from previous step;
highlight the whole table and use Replace Values function to replace all null with number 1. Then highlight the first column and use Unpivot Other Columns function to transform the table as below:
then Remove the last column (Value), and add a new custom column called Match ID to combine the DS code with the Number Reference. Then you should have:
Merge the table with Table1 using Match ID as shown below:
Expand the newly merged column Table1 to show Comments;
Use Split Column function to split the first column by Non-digit to digit, change the format of the digit column to whole number, and then sort Attribute column and the digit column ascending consecutively, then you should have:
Use Split Column function again to split the Match ID column by dash sign -, and remove the first three columns, rename the remaining three columns as DS#, Number Ref and Comments consecutively, then you should have:
Close & Load this table to a new worksheet as desired, which may look like this:
In conclusion, It is entirely up to you how you would like to structure the table in Power Query. You can pre-filter the Number Reference in the editor and load only relevant results to a worksheet, you can load the full table to a worksheet and use VLOOKUP or INDEX to retrieve the data as desired, or you can load the third query to data model from where you can create pivot tables to play around.
Here are the codes behind the scene for reference only. All steps are using built-in functions of the editor without any advanced manual coding.
Table1
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Comments", type text}}),
#"Replaced Value8" = Table.ReplaceValue(#"Changed Type","#","",Replacer.ReplaceText,{"Comments"}),
#"Replaced Value9" = Table.ReplaceValue(#"Replaced Value8","DS","DS#",Replacer.ReplaceText,{"Comments"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value9", {{"Comments", Splitter.SplitTextByDelimiter("DS", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Comments"),
#"Split Column by Position" = Table.SplitColumn(#"Split Column by Delimiter", "Comments", Splitter.SplitTextByPositions({0, 1}, false), {"Comments.1", "Comments.2"}),
#"Filtered Rows" = Table.SelectRows(#"Split Column by Position", each ([Comments.1] = "#")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Comments.1"}),
#"Replaced Value1" = Table.ReplaceValue(#"Removed Columns",")","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1","-"," ",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","(","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value3","HT","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value5" = Table.ReplaceValue(#"Replaced Value4","JH","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value6" = Table.ReplaceValue(#"Replaced Value5","SK","",Replacer.ReplaceText,{"Comments.2"}),
#"Replaced Value7" = Table.ReplaceValue(#"Replaced Value6","//","",Replacer.ReplaceText,{"Comments.2"}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value7", "Comments.2", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Comments.2.1", "Comments.2.2"}),
#"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter1",{{"Comments.2.2", Text.Trim, type text}}),
#"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"Comments.2.2", Text.Clean, type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Cleaned Text",{{"Comments.2.1", "DS#"}, {"Comments.2.2", "Comments"}}),
#"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"Number Ref", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Match ID", each "DS#"&[#"DS#"]&"-"&[Number Ref])
in
#"Added Custom"
Table1 (2) - Number Ref
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Comments", type text}, {"Number Ref", Int64.Type}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Number Ref"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Number Ref", type text}}),
#"Transposed Table" = Table.Transpose(#"Changed Type1"),
#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true]),
#"Changed Type2" = Table.TransformColumnTypes(#"Promoted Headers",{{"388", type any}, {"1", type any}})
in
#"Changed Type2"
Table2
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"List", type text}}),
#"Appended Query" = Table.Combine({#"Changed Type", #"Table1 (2) - Number Ref"}),
#"Replaced Value" = Table.ReplaceValue(#"Appended Query",null,"1",Replacer.ReplaceValue,{"List", "388", "1"}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Replaced Value", {"List"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Value"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Match ID", each [List]&"-"&[Attribute]),
#"Merged Queries" = Table.NestedJoin(#"Added Custom", {"Match ID"}, Table1, {"Match ID"}, "Table1", JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "Table1", {"Comments"}, {"Comments"}),
#"Split Column by Character Transition" = Table.SplitColumn(#"Expanded Table1", "List", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"List.1", "List.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Character Transition",{{"List.2", Int64.Type}}),
#"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Attribute", Order.Ascending}, {"List.2", Order.Ascending}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Sorted Rows", "Match ID", Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv), {"Match ID.1", "Match ID.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"List.1", type text}, {"Match ID.1", type text}, {"Match ID.2", type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Changed Type2",{"Match ID.1", "Match ID.2", "Comments"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Match ID.1", "DS#"}, {"Match ID.2", "Number Ref"}})
in
#"Renamed Columns"
Cheers :)
Based on the creative answers from this post, I was able to take some of these ideas and form a solution. There are two parts to the solution I found. The first is to replace the strings that I am trying to filter out with a random text that doesn't appear in any instance in my data. Since I had a range of data I needed to replace (DS11 through DS19), I used a VBA function to avoid a large nested function.
Once I had the strings I am trying to filter out replaced, I added an If(Isnumber(search()) function to display "" when the replaced text is returned.
Function REPLACETEXTS(strInput As String, rngFind As Range, rngReplace As Range) As String
Dim strTemp As String
Dim strFind As String
Dim strReplace As String
Dim cellFind As Range
Dim lngColFind As Long
Dim lngRowFind As Long
Dim lngRowReplace As Long
Dim lngColReplace As Long
lngColFind = rngFind.Columns.Count
lngRowFind = rngFind.Rows.Count
lngColReplace = rngFind.Columns.Count
lngRowReplace = rngFind.Rows.Count
strTemp = strInput
If Not ((lngColFind = lngColReplace) And (lngRowFind = lngRowReplace)) Then
REPLACETEXTS = CVErr(xlErrNA)
Exit Function
End If
For Each cellFind In rngFind
strFind = cellFind.Value
strReplace = rngReplace(cellFind.Row - rngFind.Row + 1, cellFind.Column - rngFind.Column + 1).Value
strTemp = Replace(strTemp, strFind, strReplace)
Next cellFind
REPLACETEXTS = strTemp
End Function

Excel - How to count pairs in two columns containing lists

I have a number of farmers registered on my database. Each farmer grows a few fruits and sells to a few counties.
For every fruit / county pair (e.g. apple, Warwickshire), how do I count the number of farmers that can supply that combo?
I have over 100 farmers registered on my database.
So my database has a row for each farmer, a column for fruits and a column for the counties they cover. The fruits and counties that each farmer covers are recorded as comma separated lists in the two cells on that farmer's row.
I want to create a matrix with fruits on the horizontal and counties on the vertical to count how many farmers cover that particular combo.
For the example in the screenshot, I've tried:
=COUNTIF(A2:B4,AND(ISNUMBER(SEARCH(G11,A2,1)),ISNUMBER(SEARCH(A13,B2,1)))="TRUE")
but with no luck.
IF you have Excel 2010+, you can do this with Power Query (aka Get & Transform in Excel 2016+).
Using Power Query allows you to update the table easily whenever any new products (or counties) are added. You just re-run the query after you add rows to the data table (or add a product or county to a given row).
Except for removing the extra spaces (Trim after splitting the columns), all can be done via the GUI. But you can just paste the M-Code into the Advanced Editor and then explore the GUI to study the individual steps.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Products", type text}, {"Counties", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Counties", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Counties"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Counties", type text}}),
#"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type1", {{"Products", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Products"),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Products", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type2", "Prod", each Text.Trim([Products])),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "County", each Text.Trim([Counties])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Products", "Counties"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"County", "Prod"}, {{"grouped", each _, type table [Prod=text, County=text]}, {"counts", each Table.RowCount(_), type number}}),
#"Removed Columns1" = Table.RemoveColumns(#"Grouped Rows",{"grouped"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Prod]), "Prod", "counts", List.Sum)
in
#"Pivoted Column"
Original Data
Results
Just for "fun" I built a solution with AND() and FIND() :
IFERROR(AND(FIND(B$7,$A$2,1),FIND($A8,$B$2,1)),0)+IFERROR(AND(FIND(B$7,$A$3,1),FIND($A8,$B$3,1)),0)+IFERROR(AND(FIND(B$7,$A$4,1),FIND($A8,$B$4,1)),0)
You could wrap this in an IF() so you only show results greater than 1 which may make it easier to spot the ones wanted.
IF(IFERROR(AND(FIND(B$7,$A$2,1),FIND($A8,$B$2,1)),0)+IFERROR(AND(FIND(B$7,$A$3,1),FIND($A8,$B$3,1)),0)+IFERROR(AND(FIND(B$7,$A$4,1),FIND($A8,$B$4,1)),0)<2,"")
Oh I get it . Its the way it has been tabulated by the form-filler(s) or has been given the data (I'm explaining it to others here so they get where I'm coming from). He probably wants to change the way these forms are filled to make it easier to read, follow and more efficient logical process going forward.
he has received confusing/badly compiled information/tables and wants to make them more straightforward/logical.
I think I got it working according to the way I understood the way you worded the question, what i know how to do in excel and your information given. Way I did it works like a "count the number of occurrences any specific word appears in a string." .
version 1:
=(LEN(lookupall("*"&B$8&"*",$A$2:$B$4,2))-LEN(SUBSTITUTE(lookupall("*"&B$8&"*",$A$2:$B$4,2),$A9,"")))/LEN(B$8)
and drag across and down.
or better:
version 2:
=(LEN(lookupall("*"&B$7&"*",$A$2:$B$4,2))-LEN(SUBSTITUTE(lookupall("*"&B$7&"*",$A$2:$B$4,2),$A8,"")))/LEN($A8)
[now 14:00 edited above - 5am written and it was off by some cells]
my results are :
version 1 results table:
&
version 2 results table: I think this is exactly what you wanted.
Notes: yes, in both source tables, A2:B4 I've called them those names (but the data is the same. war = Warwickshire. app=apples etc. )
Which one does what your seeking most?
lookupall is a UDF you can find on the net if you search around. It gives all vlookup results including duplicate lookups, concatenated together. It occurred to me that you can then look at the number of times your values in A (counties) appear in each of the results (fruit look ups), and then divide by the number of letters in that word (the counties im version 1, the fruit in version 2) to get precise number.
in version 1, I think you have to round down/up the results (because when I get rid of the glos (Gloucestershire) in b2 for instance, the result in b12 becomes 0 which would be precise given those numbers). But version 2 is better - more accurate.
Is this kinda going in the right direction for you? Might be worth more tweaking, but I think given the approximate nature of question (the way I read it), its the best I can do. It would have been more accurate to tie ...
Though I am sure there are better, more versatile, generic-scientifically accurate, other-similar-table-applicable and precise formulas out there which would do it better in just 1 formula or 1 single UDF.
The lookupall UDF I use:
Function LookupAll(vVal, rTable As Range, ColumnI As Long) As Variant
Dim rFound As Range, lLoop As Long
Dim strResults As String
With rTable.Columns(1)
Set rFound = .Cells(1, 1)
For lLoop = 1 To WorksheetFunction.CountIf(.Cells, vVal)
Set rFound = .Find(what:=vVal, After:=rFound, LookIn:=xlFormulas, lookAt _
:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _
False, SearchFormat:=False)
strResults = strResults & ";" & rFound(1, ColumnI)
Next lLoop
End With
LookupAll = Trim(Right(strResults, Len(strResults) - 1))
End Function
This actually does this and (many other) jobs actually and has been a life-saver for much of my own work. (p.s. nobody asks me questions in the office & nobody gave me anything ! its all found, researched and discovered or made by me to survive!).
my Correct results table I am pleased to say is exactly the same as Solar Mikes! So Version 2 is correct
with
=(LEN(lookupall("*"&E$7&"*",$A$2:$B$4,2))-LEN(SUBSTITUTE(lookupall("*"&E$7&"*",$A$2:$B$4,2),$A11,"")))/LEN($A11)
in cell B8 and dragged down&across

Concatenate power query columns that are offset from each other

The problem
I have a data set with two header rows. I've transposed the rows into columns to work with the headers before combining, but I need help with concatenation of column1 into column2, since past row 7 the columns are offset from one another by one row (see example image).
The goal
I've tried to use replace and concatenate myself with an index, but have been unable to achieve the desired end result where column2 row 8 is concatenated with column1 row 7, so that when I combine these columns and transpose again the headers will be correctly labeled (see example image).
Thank you for any suggestions and your time.
Example image:
Here's one way.
I start with your Problem table as a table named Table1:
Then I add an index. (Add Column > Index Column):
Then I add a custom column. (Add Column > Custom Column) With this setup:
(#"Added Index"{[Index]-1}[Column1] references the entry in Column1 at the position record row that is equal to the value in the Index column, minus 1.)
...to get this:
Then I replaced Errors in the new Custom column. (Right-click Custom column title > click Replace Errors > type null > click OK)
Then I select Column1 and Custom column and remove other columns. (Select Column 1 column title > hold Ctrl and click Custom column title > keep holding Ctrl and right click Custom column title > click Remove Other Columns)
Here's my M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each #"Added Index"{[Index]-1}[Column1]&"-"&[Column2]),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Added Custom", {{"Custom", null}}),
#"Removed Other Columns" = Table.SelectColumns(#"Replaced Errors",{"Column1", "Custom"})
in
#"Removed Other Columns"
Another way.
Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
IndexedTable = Table.AddIndexColumn(Source, "Index", 0, 1),
Transform = Table.TransformRows(IndexedTable, (row)=>[Column1= row[Column1], Column2 = if row[Column1]=null then Text.Combine({IndexedTable{row[Index]-1}[Column1], "-",row[Column2]}) else row[Column2]]),
ToTable = Table.FromRecords(Transform)
in
ToTable
Brief explanation:
Source
Add index to address previous record
Use Table.TransformRows to analyze and transform each row to a record in this manner: Column1 taken from each row's column1 (row[Column1]), Column2 is generated from previous row using Text.Concatenate, IndexedTable{row[Index]-1}[Column1]. This yields value from previous row's Column1. Table.TransformRows returns list of records.
Transform list of records into the table.
This code will fail if 1st row contains null in [Column1]. If this is unacceptable, add another if-then-else.
Another way:
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
fillDown = Table.FillDown(Table.DuplicateColumn(Source, "Column1", "copy"),{"copy"}),
replace = Table.ReplaceValue(fillDown, each [Column2], each if [Column2] = null then null
else [copy]&"-"&[Column2], Replacer.ReplaceValue, {"Column2"})[[Column1],[Column2]]
in
replace

Resources