Excel: divide text field in to separate columns - excel

I have the following input table:
Sales Order
Asset Serial Number
Asset Model
Licence Class
License Type
License Name
Account Name
10000
1234, 5643, 3463
test-pro
A123
software
LIC-0002, LIC-0188, LIC-0188, LIC-0013
ABC
2000
5678, 9846, 5639
test-pro
A123
software
LIC-00107, LIC-08608, LIC-009, LIC-0610
ABC
Here the screenshot
I need it transformed into form:
.
I tried it first with the Replace function & transponate it but I didn't find a way to add the other empty columns other than do it manually.
My second thought was the text-to-column function, didn't work either.

Here two solutions one using Excel formulas and the other one using Power Query. See Explanation section for more information about each approach:
Excel
It is possible with excel without using Power Query, but several manipulations are required. On cell I2 put the following formula:
=LET(counts, BYROW(F2:F3, LAMBDA(a, LEN(a) - LEN(SUBSTITUTE(a, ",", "")))), del, "|",
emptyRowsSet, MAP(A2:A3, B2:B3, C2:C3, D2:D3, E2:E3, F2:F3, G2:G3, counts,
LAMBDA(a,b,c,d,e,f,g,cnts, LET(rep, REPT(";",cnts),a&rep &del& b&rep &del& c&rep &del&
d&rep &del& e&rep &del& SUBSTITUTE(f,", ",";") &del& g&rep ))),
emptyRowsSetByCol, TEXTSPLIT(TEXTJOIN("&",,emptyRowsSet), del, "&"),
byColResult, BYCOL(emptyRowsSetByCol, LAMBDA(a, TEXTJOIN(";",,a))),
singleLine, TEXTJOIN(del,,byColResult),
TRANSPOSE(TEXTSPLIT(singleLine,";",del))
)
Here is the output:
Update
A simplified version of previous formula is the following one:
=LET(counts, BYROW(F2:F3, LAMBDA(a, LEN(a) - LEN(SUBSTITUTE(a, ",", "")))), del, "|",
reps, MAKEARRAY(ROWS(A2:G3),COLUMNS(A2:G3), LAMBDA(a,b, INDEX(counts, a,1))),
emptyRowsSetByCol, MAP(A2:G3, reps, LAMBDA(a,b, IF(COLUMN(a)=6,
SUBSTITUTE(a,", ",";"), a&REPT(";",b)))),
byColResult, BYCOL(emptyRowsSetByCol, LAMBDA(a, TEXTJOIN(";",,a))),
singleLine, TEXTJOIN(del,,byColResult),
TRANSPOSE(TEXTSPLIT(singleLine,";",del))
)
Power Query
The following M Code provides the expected result:
let
Source = Excel.CurrentWorkbook(){[Name="TB_Sales"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Sales Order", type text}}),
#"Split License Name" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"License Name",
Splitter.SplitTextByDelimiter(", ", QuoteStyle.Csv),
let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "License Name"),
ListOfColumns = List.Difference(Table.ColumnNames(#"Split License Name"), {"License Name"}),
RemainingColumns = List.Difference(Table.ColumnNames(#"Changed Type"), ListOfColumns),
RemoveDups = (lst as list) =>
let
concatList = (left as list, right as list) => List.Transform(List.Positions(left), each left{_}&"_"& right{_}),
prefixList = Table.Column(#"Split License Name", "Sales Order"),
tmp = concatList(prefixList, lst),
output = List.Accumulate(tmp, {}, (x, y) => x & {if List.Contains(x, y) then null else y})
in
output,
replaceValues = List.Transform(ListOfColumns, each RemoveDups(Table.Column(#"Split License Name", _))),
#"Added Empty Rows" = Table.FromColumns(
replaceValues & Table.ToColumns(Table.SelectColumns(#"Split License Name", RemainingColumns)),
ListOfColumns & RemainingColumns),
#"Extracted Text After Delimiter" = Table.TransformColumns(#"Added Empty Rows", {{"Sales Order",
each Text.AfterDelimiter(_, "_"), type text}, {"Asset Serial Number", each Text.AfterDelimiter(_, "_"), type text},
{"Asset Model", each Text.AfterDelimiter(_, "_"), type text}, {"Licence Class",
each Text.AfterDelimiter(_, "_"), type text}, {"License Type", each Text.AfterDelimiter(_, "_"), type text},
{"Account Name", each Text.AfterDelimiter(_, "_"), type text}}),
#"Reordered Columns" = Table.ReorderColumns(#"Extracted Text After Delimiter",{"Sales Order", "Asset Serial Number", "Asset Model",
"Licence Class", "License Type", "License Name", "Account Name"})
in
#"Reordered Columns"
And here is the output:
And the corresponding Excel Output:
Explanation
Here we provide the explanation for each approach: Excel formula and Power Query.
Excel Formula
We need to calculate how many empty rows we need to add based on License Name column values. We achieve that via counts name from LET:
BYROW(F2:F3, LAMBDA(a, LEN(a) - LEN(SUBSTITUTE(a, ",", ""))))
The output for this case is: {3;3}, i.e 2x1 array, which represents how many empty rows we need to add for each input row.
Next we need to build the set that includes empty rows. We name it emptyRowsSet and the calculation is as follow:
MAP(A2:A3, B2:B3, C2:C3, D2:D3, E2:E3, F2:F3, G2:G3, counts,
LAMBDA(a,b,c,d,e,f,g,cnts,
LET(rep, REPT(";",cnts),a&rep &del& b&rep &del& c&rep &del&
d&rep &del& e&rep &del& SUBSTITUTE(f,", ",";") &del& g&rep)))
We use inside MAP an additional LET function to avoid repetition of rep value. Because we want to consider the content of License Name as additional rows we replace the , by ; (we are going to consider this token as a row delimiter). We use del (|) as a delimiter that will serve as a column delimiter.
Here would be the intermediate result of emptyRowsSet:
10000;;;|1234, 5643, 3463;;;|test-pro;;;|A123;;;|software;;;|LIC-0002;LIC-0188;LIC-0188;LIC-0013|ABC;;;
2000;;;|5678, 9846, 5639;;;|test-pro;;;|A123;;;|software;;;|LIC-00107;LIC-08608;LIC-009;LIC-0610|ABC;;;
As you can see additional ; where added per number of items we have in License Name column per row. In the sample data the number of empty rows to add is the same per row, but it could be different.
The rest is how to accommodate the content of emptyRowsSet in the way we want. Because we cannot invoke TEXTSPLIT and BYROW together because we get #CALC! (Nested Array error). We need to try to circumvent this.
For example the following produces an error (#CALC!):
=BYROW(A1:A2,LAMBDA(a, TEXTSPLIT(a,"|")))
where the range A1:A2 has the following: ={"a|b";"c|d"}. We don't get the desired output: ={"a","b";"c","d"}. In short the output of BYROW should be a single column so any LAMBDA function that expands the columns will not work.
In order to do circumvent that we can do the following:
Convert the input into a single string joining each row by ; for example. Now we have column delimiter (|) and row delimiter (;)
Use TEXTSPLIT to generate the array (2x2 in this case), identifying the columns and the row via both delimiters.
We can do it as follow (showing the output of each step on the right)
=TEXTSPLIT(TEXTJOIN(";",,A1:A2),"|",";") -> 1) "a|b;c|d" -> 2) ={"a","b";"c","d"}
We are using the same idea here (but using & for joining each row). The name emptyRowsSetByCol:
TEXTSPLIT(TEXTJOIN("&",,emptyRowsSet), del, "&")
Would produce the following intermediate result, now organized by columns (Table 1):
Sales Order
Asset Serial Number
Asset Model
License Class
License Type
License Name
Account Name
10000;;;
1234, 5643, 3463;;;
test-pro;;;
A123;;;
software;;;
LIC-0002;LIC-0188;LIC-0188;LIC-0013
ABC;;;
2000;;;
5678, 9846, 5639;;;
test-pro;;;
A123;;;
software;;;
LIC-00107;LIC-08608;LIC-009;LIC-0610
ABC;;;
Note: The header are just for illustrative purpose, but it is not part of the output.
Now we need to concatenate the information per column and for that we can use BYCOL function. We name the result: byColResult of the following formula:
BYCOL(emptyRowsSetByCol, LAMBDA(a, TEXTJOIN(";",,a)))
The intermediate result would be:
Sales Order
Asset Serial Number
Asset Model
License Class
License Type
License Name
Account Name
10000;;;;2000;;;
1234, 5643, 3463;;;;5678, 9846, 5639;;;
test-pro;;;;test-pro;;;
A123;;;;A123;;;
software;;;;software;;;
LIC-0002;LIC-0188;LIC-0188;LIC-0013;LIC-00107;LIC-08608;LIC-009;LIC-0610
ABC;;;;ABC;;;
1x7 array and on each column the content already delimited by ; (ready for the final split).
Now we need to apply the same idea as before i.e. convert everything to a single string and then split it again.
First we convert everything to a single string and name the result: singleLine:
TEXTJOIN(del,,byColResult)
Next we need to do the final split:
TRANSPOSE(TEXTSPLIT(singleLine,";",del))
We need to transpose the result because SPLIT processes the information row by row.
Update
I provided a simplified version of the initial approach which requires less steps, because we can obtain the result of the MAP function directly by columns.
The main idea is to treat the input range A2:G3 all at once. In order to do that we need to have all the MAP input arrays of the same shape. Because we need to take into account the number of empty rows to add (;), we need to build this second array of the same shape. The name reps, is intended to create this second array as follow:
MAKEARRAY(ROWS(A2:G3),COLUMNS(A2:G3),
LAMBDA(a,b, INDEX(counts, a,1)))
The intermediate output will be:
3|3|3|3|3|3|3
3|3|3|3|3|3|3
which represents a 2x7 array, where on each row we have the number of empty rows to add.
Now the name emptyRowsSetByCol:
MAP(A2:G3, reps,
LAMBDA(a,b, IF(COLUMN(a)=6, SUBSTITUTE(a,", ",";"),
a&REPT(";",b))))
Produces the same intermediate result as in above Table 1. We treat different the information from column 6 (License Name) replacing the , with ;. For other columns just add as many ; as empty rows we need to add for each input row. The rest of the formula is just similar to the first approach.
Power Query
#"Split License Name" is a standard Power Query (PQ) UI function: Split Column by Delimiter.
To generate empty rows we do it by removing duplicates elements on each column that requires this transformation, i.e. all columns except License Name. We do it all at once identifying the columns that require such transformation. In order to do that we define two lists:
ListOfColumns: Identifies the columns we are going to do the transformation, because we need to do it in all columns except for License Name. We do it by difference via the PQ function: List.Difference().
RemainingColumns: To build back again the table, we need to identify the columns don't require such transformation. We use same idea via List.Difference(), based on ListOfColumns list.
The user defined function RemoveDups(lst as list) does the magic of this transformation.
Because we need to remove duplicates, but having unique elements based on each initial row, we use the first column Sales Order as a prefix, so we can "clean" the column within each partition.
In order to do that we define inside of RemoveDups() function a new user defined function concatList() to add the first column as prefix.
concatList = (left as list, right as list) =>
List.Transform(List.Positions(left), each left{_}&"-"& right{_}),
we concatenate each element of the lists (row by row) using a underscore delimiter (_). Later we are going to use this delimiter to remove the first column as prefix added at this point.
To remove duplicates and replace them with null we use the following logic:
output = List.Accumulate(tmp, {}, (x, y) =>
x & {if List.Contains(x, y) then null else y})
where tmp is a modified list (lst) with the first column as prefix.
Now we invoke the List.Transform() function for all the columns that require the transformation using as transform (second input argument) the function we just defined previously:
replaceValues = List.Transform(ListOfColumns, each
RemoveDups(Table.Column(#"Split License Name", _))),
#"Added Empty Rows" represents the step of this calculation and the output will be the following table:
The step #"Extracted Text After Delimiter" is just to remove the prefix we added and for that we use standard PQ UI Transform->Extract->Text After Delimiter.
Finally we need to reorder the column to put in a way it is expected via the step: #"Reordered Columns" using PQ UI functionality.

Related

word patterns within an excel column

I have 2 Excel data sets each comprising a column of word patterns and have been searching for a way to copy and group all instances of repetition within these columns into a new column.
This is the closest result I could find so far:
Sub Common5bis()
Dim Joined
Set d = CreateObject("Scripting.Dictionary") 'make dictionary
d.CompareMode = 1 'not case sensitive
a = Range("A1", Range("A" & Rows.Count).End(xlUp)).Value 'data to array
For i = 1 To UBound(a) 'loop trough alle records
If Len(a(i, 1)) >= 5 Then 'length at least 5
For l = 1 To Len(a(i, 1)) - 4 'all strings withing record
s = Mid(a(i, 1), l, 5) 'that string
d(s) = d(s) + 1 'increment
Next
End If
Next
Joined = Application.Index(Array(d.Keys, d.items), 0, 0) 'join the keys and the items
With Range("D1").Resize(UBound(Joined, 2), 2) 'export range
.EntireColumn.ClearContents 'clear previous
.Value = Application.Transpose(Joined) 'write to sheet
.Sort .Range("B1"), xlDescending, Header:=xlNo 'sort descending
End With
End Sub
Which yielded this result for the particular question:
This example achieves 4 of the things I'm trying to achieve:
Identify repeating strings within a single column
Copies these strings into a separate column
Displays results in order of occurrence (in this case from least to most)
Displays the quantity of repetitions (including the first instance) in an adjacent column
However, although from reading the code there are basic things I've figured out that I can adapt to my purposes, it still fails to achieve these essential tasks which I'm still trying to figure out:
Identify individual words rather than single characters
I could possibly reduce the size from 5 to 3, but for the word stings I have (lists of pronouns from larger texts) that would include "I I" repetitions but won't be so great for "Your You" etc, whilst at least 4 or 5 would miss anything starting with "I I"
Include an indefinite amount of values - looking at the code and the replies to the forum it comes from it looks like it's capped at 5, but I'm trying to find a way to identify all repetitions for all multiple word strings which could be something like "I I my you You Me I You my"
Is case sensitive - this is quite important as some words in the column have been capitalised to differentiate different uses
I'm still learning the basics of VBA but have manually typed out this example of what I'm trying to do with the code I've found above:
Intended outcome:
And so on
I'm a bit screwed at this point which is why I'm reaching out here (sorry if this is a stupid question, I'm brand new to VBA as my work almost never needs Excel, let alone macros) so will massively appreciate any constructive advice towards a solution!
Because I've been working with it recently, I note that you can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range or From within sheet
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
First add a custom function:
New blank query
Rename per the code comment
Edits to make case-insensitive
Custom Function
//rename fnPatterns
//generate all possible patterns of two words or more
(String as text)=>
let
//split text string into individual words & get the count of words
#"Split Words" = List.Buffer(Text.Split(String," ")),
wordCount = List.Count(#"Split Words"),
//start position for each number of words
starts = List.Numbers(0, wordCount-1),
//number of words for each pattern (minimum of two (2) words in a pattern
words = List.Reverse(List.Numbers(2, wordCount-1)),
//generate patterns as index into the List and number of words
// will be used in the List.Range function
patterns = List.Combine(List.Generate(
()=>[r={{0,wordCount}}, idx=0],
each [idx] < wordCount-1,
each [r=List.Transform({0..starts{[idx]+1}}, (li)=> {li, wordCount-[idx]-1}),
idx=[idx]+1],
each [r]
)),
//Generate a list of all the patterns by using the List.Range function
wordPatterns = List.Distinct(List.Accumulate(patterns, {}, (state, current)=>
state & {List.Range(#"Split Words", current{0}, current{1})}), Comparer.OrdinalIgnoreCase)
in
wordPatterns
Main Function
let
//change next line to reflect data source
//if data has a column name other than "Column1", that will need to be changed also wherever referenced
Source = Excel.CurrentWorkbook(){[Name="Table17"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Create a list of all the possible patterns for each string, added as a custom column
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "Patterns", each fnPatterns([Column1]), type list),
//removed unneeded original column of strings
#"Removed Columns" = Table.RemoveColumns(#"Invoked Custom Function",{"Column1"}),
//Expand the column of lists of lists into a column of lists
#"Expanded Patterns" = Table.ExpandListColumn(#"Removed Columns", "Patterns"),
//convert all lists to lower case for text-insensitive comparison
#"Added Custom" = Table.AddColumn(#"Expanded Patterns", "lower case patterns",
each List.Transform([Patterns], each Text.Lower(_))),
//Count number of matches for each pattern
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Count", each List.Count(List.Select(#"Added Custom"[lower case patterns], (li)=> li = [lower case patterns])), Int64.Type),
//Filter for matches of more than one (1)
// then remove duplicate patterns based on the "lower case pattern" column
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Count] > 1)),
#"Removed Duplicates" = Table.Distinct(#"Filtered Rows", {"lower case patterns"}),
//Remove lower case pattern column and sort by count descending
#"Removed Columns1" = Table.RemoveColumns(#"Removed Duplicates",{"lower case patterns"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns1",{{"Count", Order.Descending}}),
//Re-construct original patterns as text
#"Extracted Values" = Table.TransformColumns(#"Sorted Rows",
{"Patterns", each Text.Combine(List.Transform(_, Text.From), " "), type text})
in
#"Extracted Values"
Note that you could readily implement a similar algorithm using VBA, the VBA.Split function and a Dictionary

Sum multiple rows based on duplicate column data without formula

Based on data available in columns A to D (can be any 100's of columns), I want to sum up all the rows for column E to K (can be any 100's of columns)
The rows should sum up based on duplicate data from rows A to D, the result required as below
This is easily possible to do, with sumif, but would like to know if possible natively in excel or power query without creating unique id for each column or using sumif function or formula of any sort
In powerquery .. unpivot, group, pivot, done.
More detail:
Click select first 4 columns, right click, unpivot other columns
Click select first 4 columns and the new Attribute column, right click, group by
Use Operation:Sum on Column:Value name:count and hit OK
Click select Attribute column and transform .. pivot column... , for value column choose count
File Close and load
Full sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Code1", "Code2", "Code3", "Code4"}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Code1", "Code2", "Code3", "Code4", "Attribute"}, {{"Count", each List.Sum([Value]), type number}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Attribute]), "Attribute", "Count", List.Sum)
in #"Pivoted Column"
To solve a problem like this, I first do a concrete example and then generalize it. I made a small table in Excel like so:
Code1
Code2
2-Jul-20
3-Jul-20
4-Jul-20
5-Jul-20
6-Jul-20
ERT
EXC
10
6
15
2
ERT
EXC
2
3
23
1
CON
HOR
3
CON
HOR
6
2
356
3
Then I clicked within the table and created a Power Query referencing it. After opening the Power Query Editor, there is a Group By function on the Home tab. It's pretty straightforward to choose the columns you want and the Sum function in a toy example like this.
Then, I opened the Advanced Editor to see what code was auto-generated. It looked something like this:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows orig" = Table.Group(Source, {"Code1", "Code2"}, {{"2-Jul-20", each List.Sum([#"2-Jul-20"]), type nullable number}, {"3-Jul-20", each List.Sum([#"3-Jul-20"]), type nullable number}, {"4-Jul-20", each List.Sum([#"4-Jul-20"]), type nullable number}, {"5-Jul-20", each List.Sum([#"5-Jul-20"]), type nullable number}, {"6-Jul-20", each List.Sum([#"6-Jul-20"]), type nullable number}})
in
#"Grouped Rows orig"
Typically, a Power Query expression is a series of transformations applied to a table, where each one operates on the table as returned from the previous. Here, we start with the original table as "Source" and then do the grouping. The parameters are a little messy, but what we have is: (1) the input table, (2) a list of the column names to group by, and (3) a list of 3-item lists, each of which describe an aggregated column. The sublists have the output column name, the function that does the aggregation, and the data type.
In Power Query, "each" is syntactic sugar for a single parameter function whose parameter is just an underscore. But also, when you have a record or row, you can just use [column] instead of _[column].
So how to generalize the operation you want to do? My first thought is that a convenient grouping function should have two parameters, based on your description. The first is the table to group, and the second is the number of columns starting from the left to group by. If you don't have them arranged contiguously, of course, you could do something else.
sumFromColumn = (t, n) => let
cList = Table.ColumnNames(t),
toGroup = List.FirstN(cList, n),
toSum = List.RemoveFirstN(cList, n),
sumFunc = (cName) => {cName, each List.Sum(Record.Field(_, cName)), type nullable number}
in Table.Group(t, toGroup, List.Transform(toSum, each sumFunc(_))),
#"Grouped Rows" = sumFromColumn(Source, 2), // Group by the first 2 columns and sum the rest
Here is the generalized function I made, which appears to match the original Table.Group operation that was generated by the interface.
The let statement arranges things for readability but does not imply a particular sequence that they happen in. Power Query figures out the dependencies and executes the statements in whatever order is needed.
The list of column names of the table is defined as cList, and split into toGroup and toSum. Then, sumFunc is defined as a function taking a column name and returning the 3-item list needed to define an aggregation operation. In Power Query, functions can return other functions any which way. So here we are defining a function that returns a list, with a function in it. Then we can use List.Transform to take the list of aggregated columns and turn it into the appropriate parameters for Table.Group.
Finally, the actual group by is done with a call like sumFromColumn(Source, 2), which is equivalent to the original statement that hard-codes the column names.
Code1
Code2
2-Jul-20
3-Jul-20
4-Jul-20
5-Jul-20
6-Jul-20
ERT
EXC
12
3
6
38
3
CON
HOR
6
5
356
3
This can easily be changed to sumFromColumn(Source, 1), in which case it will reduce to two rows, but then the second column being non-numeric, will become error values.
Or, you can use sumFromColumn(Source, 3), which will not add things up because the group by columns taken together are distinct.
This way you can easily aggregate any number of columns without caring about their names. I recommend both the Power Query M documentation on microsoft.com and reading about functional programming in general.

retrieving values by specifying index postion

I have a column which looks something like below:
1235hytfgf ui
3434jhjhjh ui
6672jhjkhj ty
I have to name 1st four characters as numbers; the next 6 characters as type and last 2 as id; which function should I use to say that from index(0-3) should be numbers and index(4-9) : type
LEFT(), RIGHT() and MID() are three functions used to rip out portions of a string.
If you data was starting in B2 you could use the following formula in an empty cell to pull the first 4 charters.
LEFT(B2,4)
Now that will pull the first 4 characters and leave them as characters. If you want the numbers as a string to be converted to a numeric value then one of the easy ways to convert it is to send it through a math operation which does not change its value. *1, +0, -0, /1 and -- all work. You formula may look like one of the following:
--LEFT(B2,4)
LEFT(B2,4)+0
LEFT(B2,4)*1
LEFT(B2,4)/1
LEFT(B2,4)-0
To grab the middle portion of the string, use the MID function. Since you already know the starting position and length of the string to pull you can hard code the information into your formula and it will look as follows:
MID(B2,5,6)
5 is the starting position for which character to start pulling from, and 6 is the length of the string to pull or number of characters to pull.
To get the last 2 characters, similar to the first function, use RIGHT(). The formula would look as follows:
RIGHT(B2,2)
If you are dealing with a large amount of data, say over thousands of records, try power query in excel.
Please refer to this article to find out how to use Power Query on your version of Excel. It is available in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.
Steps are:
Add your one-column data to Power Query Editor;
Highlight the column, use Split Columns function and select By Digit to Non-Digit, it will separate the first nth numerical characters from the string;
Highlight the second column, use Split Columns function again and select By Delimiter and use space as the delimiter, it will separate the type and id as desired;
Renamed the columns, and you should have something like the following:
You can then Close & Load the output to a new worksheet (by default).
Here is the power query M Code behind the scene for reference only. All functions used are available in GUI so should be easy to execute.
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Split Column by Character Transition" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Column1.1", "Column1.2"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Split Column by Character Transition", "Column1.2", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.1", "numbers"}, {"Column1.2.1", "type"}, {"Column1.2.2", "id"}})
in
#"Renamed Columns"
Let me know if you have any questions. Cheers :)

DAX LOOKUPVALUE on text/string

I would like to create some kind of LOOKUPVALUE on text in DAX that will match a sentence with a particular keyword. For instance in the example below the second and third row have a hit because “Apple” and "Chicken" is in the string. The problem is that the text is in a string and not a standalone value.
Table 1
Table 2
Output
EDIT, improved answer: this new version also works when there are multiple keys in one string.
I think PowerQuery is the natural place to perform an operation like this.
The Output table would look like this:
A description of the applied steps:
Source: a reference to Table1
Added Column Key lists: adds a custom column with lists of the Table2[Key] value(s) that are in the [String] value. This is the logic for this Custom column:
For each row the function selects the values from the Table2[Key] column that it finds in the [String] value. It then returns a list that holds only the selected values.
Expanded Key list: expands the lists in the [Key] column
Join with Table2 on Key: Joins with Table2 on the Key Value
Expanded Table2: Expands the table values in the [ItemTables] column and keeps the [Item] column
Group and concate keys/items: Groups the Output table on String, concatenating the Keys and the Items. If you don't want to see the [Key] column, delete {"Key", each Text.Combine([Key], " | "), type text}, from this step
The script in the Advanced Editor looks like this:
let
Source = #"Table1",
#"Added Column Key lists" = Table.AddColumn(Source, "Key", (r) => List.Select(Table.Column(Table2,"Key"),each Text.Contains(r[String],_,Comparer.OrdinalIgnoreCase)),type text),
#"Expanded Key lists" = Table.ExpandListColumn(#"Added Column Key lists", "Key"),
#"Join with Table2 on Key" = Table.NestedJoin(#"Expanded Key lists", {"Key"}, Table2, {"Key"}, "ItemTables", JoinKind.LeftOuter),
#"Expanded ItemTables" = Table.ExpandTableColumn(#"Join with Table2 on Key", "ItemTables", {"Item"}, {"Item"}),
#"Group and concate keys / items" = Table.Group(#"Expanded ItemTables", {"String"},{{"Key", each Text.Combine([Key], " | "), type text},{"Item", each Text.Combine([Item], " | "), type text}})
in
#"Group and concate keys / items"
Here is a link to my .pbix file
I created the following dummy data sets.
My interpretation of what your after is to Identify if a sentence contains a key word.
This can be done via a calculated column with the following formula -
Lookup = LOOKUPVALUE(Table2[Result],Table2[LookUp], IF(SEARCH("Apple",Table1[Sentence],,0)> 0, "Apple",""))
You can combine the If and Search Functions with the Lookupvalue function.
The formula is searching for the word "Apple" and then returning its position within the text and if no result is found, displays 0.
The IF statement then takes any result greater then 0, as anything greater then 0 means a result has been found and that is its position within the string, and states "Apple". This then becomes your lookup value.
This then displays as bellow
You can then replace the Blank ("") that is currently the result if false, with another if statement to look for another key word such as "Orange" and then add add that to your lookup table to pull through the result your after.
Hope this makes sense and helps!
Try this formula (see the picture which cell is where in my assumptions):
=IFERROR(INDEX($B$7:$B$9,MATCH(1,--NOT(ISERROR(FIND($A$7:$A$9,$A12))),0)),"-")

Table column split by value to other columns and it's values

I have table this kind if look and it represent specifications for products
where 1st columns is SKU and serve as ID and 2nd column us specifications specifications title,Value and 0 or 1 as optional parameter(1 is default if it missed) separated by "~" and ech option is seperated by ^
I want to split it to table with SKU and each of specifications title as column header and value as it's value
I manage to write this code to split it to records with dived specifications and stack with separating title from value for each specification and record and how looking for help with this
let
Source = Excel.CurrentWorkbook(){[Name="Таблица1"]}[Content],
Type = Table.TransformColumnTypes(Source,{{"Part Number", type text}, {"Specifications", type text}}),
#"Replaced Value" = Table.ReplaceValue(Type,"Specification##","",Replacer.ReplaceText,{"Specifications"}),
SplitByDelimiter = (table, column, delimiter) =>
let
Count = List.Count(List.Select(Text.ToList(Table.Column(table, column){0}), each _ = delimiter)) + 1,
Names = List.Transform(List.Numbers(1, Count), each column & "." & Text.From(_)),
Types = List.Transform(Names, each {_, type text}),
Split = Table.SplitColumn(table, column, Splitter.SplitTextByDelimiter(delimiter), Names),
Typed = Table.TransformColumnTypes(Split, Types)
in
Typed,
Split = SplitByDelimiter(#"Replaced Value","Specifications","^"),
Record = Table.ToRecords(Split)
in
Record
Ok, I hope you still need this, as it took the whole evening. :))
Quite interesting task I must say!
I assume that "~1" is always combined with "^", so "~1^" always ending field's value. I also assume that there is no ":" in values, as all colons are removed.
IMO, you don't need to use Table.SplitColumn function at all.
let
//replace it with your Excel.CurrentWorkbook(){[Name="Таблица1"]}[Content],
Source = #table(type table [Part Number = Int64.Type, Specifications = text], {{104, "Model:~104~1^Type:~Watch~1^Metal Type~Steel~1"}, {105, "Model:~105~1^Type:~Watch~1^Metal Type~Titanium~1^Gem Type~Ruby~1"}}),
//I don't know why do you replace these values, do you really need this?
ReplacedValue = Table.ReplaceValue(Source,"Specification##","",Replacer.ReplaceText,{"Specifications"}),
TransformToLists = Table.TransformColumns(Source, {"Specifications", each List.Transform(List.Select(Text.Split(_ & "^", "~1^"), each _ <> "") , each Text.Split(Text.Replace(_, ":", ""), "~")) } ),
ConvertToTable = Table.TransformColumns(TransformToLists, {"Specifications", each Table.PromoteHeaders(Table.FromColumns(_))}),
ExpandedSpecs = Table.TransformRows(ConvertToTable, (x) => Table.ExpandTableColumn(Table.FromRecords({x}), "Specifications", Table.ColumnNames(x[Specifications])) ),
UnionTables = Table.Combine(ExpandedSpecs),
Output = UnionTables
in
Output
UPDATE:
How it works (skipping obvious steps):
TransformToLists: TransformColumns takes table, and a list of column names and functions applied to this column's value. So it applies several nested functions to the value of "Specifications" field of each row. These functions do the following: List.Select returns list of non-empty values, which in order was obtained by applying Text.Split function to the value of "Specifications" field having ":"s removed:
Text.Split(
Text.Replace(_, ":", "")
, "~")
Each keyword means that following function applied to every processed value (it can be field, column, row/record, list item, text, function, etc), which is indicated with the underscore sign. This underscore can be replaced with a function:
each _ equals (x) => some_function_that_returns_corresponding_value(x)
So,
each Text.Replace(_, ":", "")
equals
(x) => Text.Replace(x, ":", "") //where x is any variable name.
//You can go further and make it typed, although generally it is not needed:
(x as text) => Text.Replace(x, ":", "")
//You can also write a custom function and use it:
(x as text) => CustomFunctionName(x)
Having said, TransformToLists step returns a table with two columns: "Part number" and "Specifications", containing list of lists. Each of these lists has two values: column name and its value. This happens because initial value in "Specifications" field has to be split twice: first it is split to pairs by "~1^", and then each pair is split by "~". So now we have column name and its value in each nested list, and now we have to convert them all into a single table.
ConvertToTable: We apply TransformColumns again, using a function for each row's "Specifications" field (remember, a list of lists). We use Table.FromColumns, as it takes a list of lists as an argument, and it returns a table where 1st row is column headers and second is their values. Then we promote 1st row to headers. Now we have a table, and "Specifications" field containing nested table with variable number of columns. And we have to put them all together.
ExpandedSpecs: Table.TransformRows applies transformation function to every row (as a record) in a table (in the code it is signed as x). You can write your custom function, as I did:
= Table.ExpandTableColumn( //this function expands nested table. It needs a table, but "x" that we have is a record. So we do the conversion:
Table.FromRecords({x}) //function takes a list of records, so {x} means a list with single value of x
, "Specifications" //Column to expand
, Table.ColumnNames(x[Specifications]) //3rd argument is a list of resulting columns. It takes table as an argument, and table is contained within "Specifications" field.
)
It returns a list of tables (having single row each), and we combine them using Table.Combine at UnionTables step. This results in a table having all the columns from combined tables, with nulls when there is no such a column in some of them.
Hope it helps. :)
A TextToColumns VBA solution is much simpler if I understand what you are asking MSDN for Range.TextToColumns

Resources