How to get mutiple substrings in Microsoft Excel Cell - excel

I'm trying get from a cell just the value of the 'id' tag separated by ';'.
The data is as follows:
Cell:
A1: {"id":1145585,"label":"1145585: Project Z"}
A2: {"id":1150322,"label":"1150322: Project Waka 1"}|{"id":1150365,"label":"1150365: Project Waka 2"}
A3: {"id":1149240,"label":"1149240: Analysis of Technical Options"}|{"id":1149258,"label":"1149258: Check and Report"}
A4: {"id":1148925,"label":"1148925: Change Management Review"}|{"id":1148920,"label":"1148920: Follow-Up Meetings"}|{"id":1148923,"label":"1148923: Launch Date Definition"}
I have tried to use left, mid and find functions, however the number of 'IDs' can vary from 1 to 1000. I'm also trying to avoid using vba, but it seems to be the only option. So any solution is great!
The result should be:
Cell:
A1: 1145585
A2: 1150322;1150365
A3: 1149240;1149258
A4: 1148925;1148920;1148923
Any ideas?
Thanks!

Sounds like a task for #powerquery. Please refer to this article to find out how to use Power Query on your version of Excel. It is availeble in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.
The steps are:
Load the source data to power query editor which should look like the following:
Use Index Column function under the Add Column tab to add an Index column;
Use Split Column function under the Transform tab to split the column by custom delimiter "id": and put the results into Rows as shown below:
Use Extract function under the Transform tab to extract the first 7 characters of the column;
Change the Data Type to Whole Number, remove Errors, and then change the Data Type back to Text;
Use Group By function under the Transform tab to group Column1 by Index as set out below. Don't panic if the result is in error as it is expected.
Go back to last step and replace the original formula in the formula bar with the following one as Text.Combine is not a built-in function:
= Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})
Close & Load the output to a new worksheet (by default), and you should have the following:
Here are the Power Query M codes behind the scene. Most of the steps are performed using built-in functions except the last step of manually replacing the formula with the correct one. Let me know if you have any questions. Cheers :)
let
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Column1", Splitter.SplitTextByDelimiter("""id"":", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1", type text}}),
#"Extracted First Characters" = Table.TransformColumns(#"Changed Type1", {{"Column1", each Text.Start(_, 7), type text}}),
#"Changed Type2" = Table.TransformColumnTypes(#"Extracted First Characters",{{"Column1", Int64.Type}}),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type2", {"Column1"}),
#"Changed Type3" = Table.TransformColumnTypes(#"Removed Errors",{{"Column1", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type3", {"Index"}, {{"Sum", each Text.Combine([Column1],";"), type text}})
in
#"Grouped Rows"

Based on #TerryW comment, here is a solution using the FILTERXML function available in Excel 2013+. But it also requires TEXTJOIN which did not appear until later versions of Excel 2016 (and office 365)
It relies on the fact that the id string is always followed by a comma.
A disadvantage is that FILTERXML will return the numeric id's as numeric values. So leading zero's will be dropped. If there are always a fixed number of digits in the id and leading zero's need to be present, this can be mitigated by using the TEXT function.
We construct an xml by dividing both on id and on comma
We then use an xpath to return the node which follows the node that contains id
=TEXTJOIN(";",TRUE,FILTERXML("<t><s>" & SUBSTITUTE(SUBSTITUTE(A1,"""id"":",",id,"),",","</s><s>")&"</s></t>","//s[text()='id']/following-sibling::*[1]"))
Since this is an array formula, you need to "confirm" it by holding down ctrl + shift while hitting enter. If you do this correctly, Excel will place braces {...} around the formula as observed in the formula bar
Source
Results

Related

Replace text within a table for all cells that contain a given word for n columns

I have data within a table that occasionally has been inputted with text to say something like not available or No Data etc. I wish to replace each instance a cell contains no that this is then replaced with null across n number of columns. I don't know every type of word that has been entered but it looks as though each cell to be converted to null contains no as characters so I will go with this.
i.e.
Is there any way to combine `if text.contains([n columns],"no") then null else [n columns]
In powerquery, this removes the content of any cell containing (No,NO,no,nO) and converts to a null
Click select the first column, right click, Unpivot other columns
click select Value column and transform ... data type .. text
right click Value column and transform ... lower case
we really don't want that so change this in the formula bar
= Table.TransformColumns(#"Changed Type1",{{"Value", Text.Lower, type text}})
to resemble this instead (which also ignore the Case of the No)
= Table.TransformColumns(#"Changed Type1",{{"Value", each if Text.Contains(_,"no", Comparer.OrdinalIgnoreCase) then null else _, type text}})
click select attribute column
Transform ... pivot column
values column:Value, Advanced ... don’t aggregate
sample full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Column"}, "Attribute", "Value"),
#"Changed Type1" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Value", type text}}),
#"CheckForNo" = Table.TransformColumns(#"Changed Type1",{{"Value", each if Text.Contains(_,"no", Comparer.OrdinalIgnoreCase) then null else _, type text}}),
#"Pivoted Column" = Table.Pivot(#"CheckForNo", List.Distinct(#"Lowercased Text"[Attribute]), "Attribute", "Value")
in #"Pivoted Column"

Get values of top N based on sum and condition [duplicate]

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.
This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())
An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)
Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

retrieving values by specifying index postion

I have a column which looks something like below:
1235hytfgf ui
3434jhjhjh ui
6672jhjkhj ty
I have to name 1st four characters as numbers; the next 6 characters as type and last 2 as id; which function should I use to say that from index(0-3) should be numbers and index(4-9) : type
LEFT(), RIGHT() and MID() are three functions used to rip out portions of a string.
If you data was starting in B2 you could use the following formula in an empty cell to pull the first 4 charters.
LEFT(B2,4)
Now that will pull the first 4 characters and leave them as characters. If you want the numbers as a string to be converted to a numeric value then one of the easy ways to convert it is to send it through a math operation which does not change its value. *1, +0, -0, /1 and -- all work. You formula may look like one of the following:
--LEFT(B2,4)
LEFT(B2,4)+0
LEFT(B2,4)*1
LEFT(B2,4)/1
LEFT(B2,4)-0
To grab the middle portion of the string, use the MID function. Since you already know the starting position and length of the string to pull you can hard code the information into your formula and it will look as follows:
MID(B2,5,6)
5 is the starting position for which character to start pulling from, and 6 is the length of the string to pull or number of characters to pull.
To get the last 2 characters, similar to the first function, use RIGHT(). The formula would look as follows:
RIGHT(B2,2)
If you are dealing with a large amount of data, say over thousands of records, try power query in excel.
Please refer to this article to find out how to use Power Query on your version of Excel. It is available in Excel 2010 Professional Plus and later versions. My demonstration is using Excel 2016.
Steps are:
Add your one-column data to Power Query Editor;
Highlight the column, use Split Columns function and select By Digit to Non-Digit, it will separate the first nth numerical characters from the string;
Highlight the second column, use Split Columns function again and select By Delimiter and use space as the delimiter, it will separate the type and id as desired;
Renamed the columns, and you should have something like the following:
You can then Close & Load the output to a new worksheet (by default).
Here is the power query M Code behind the scene for reference only. All functions used are available in GUI so should be easy to execute.
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Split Column by Character Transition" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByCharacterTransition({"0".."9"}, (c) => not List.Contains({"0".."9"}, c)), {"Column1.1", "Column1.2"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Split Column by Character Transition", "Column1.2", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.1", "numbers"}, {"Column1.2.1", "type"}, {"Column1.2.2", "id"}})
in
#"Renamed Columns"
Let me know if you have any questions. Cheers :)

Excel - extracting top 5 values

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.
This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())
An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)
Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

Is there a way to join multiple strings in value fields when pivoting?

I am trying to make a table more readable by making a report.
This table has 3 columns;
Staff
Task (Each staff could have 0-many tasks)
Status (Planned, Started, Finished)
The report would have Staff as the left most column, 3 status as column headings. The values should be task values and if there are many tasks it should be concatenated, say, with a carriage return.
I tried pivoting but it didn't work since the task values are texts. I tried Power Query and it displays errors for every cell where there are more than 1 task.
Is there a way to do this? ...without VBA please.
Thanks
I presume you know how to operate Power Query Editor so I will skip the part of how to add data to the editor.
In my solution, I used the following sample data stored in Table3.
Once added data the editor will automatically recognize all data as text.
My approach is to add a custom column to combine Staff and Status as below first:
Then I grouped the data by the custom column (Staff+Status) with some Advanced Coding. You can do a Group By first and then go to Advanced Editor to change the formula as below:
= Table.Group(#"Added Custom", {"Staff+Status"}, {{"Task", each Text.Combine([Task],"#(lf)"), type text}})
Which will give you the following look:
Then you can split the custom column back to Staff and Status separately:
Then you can pivot the Status column, set Task as the Values Column, and in the Advanced Options set Don't Aggregate as the Aggregate Value Function.
Then you pretty much finished here, and you can load the query to a worksheet which may look like the following where the carriage return seems not working.
In order to 'activate' carriage return (which is actually line feed), you need to select a cell that you want to see carriage return, click somewhere in the formula bar, and you will notice the carriage return is 'activated'.
Copy the format of that cell and paste to the rest of the table using Format Painter to get the following:
If you are unclear about the above step, please read this article: How to display Power Query results with line feed or carriage return
All done. Cheers :)
By the way here are the codes behind the scene for reference only:
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Staff", type text}, {"Task", type text}, {"Status", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Staff+Status", each [Staff]&"+"&[Status]),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Staff+Status"}, {{"Task", each Text.Combine([Task],"#(lf)"), type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "Staff+Status", Splitter.SplitTextByDelimiter("+", QuoteStyle.Csv), {"Staff+Status.1", "Staff+Status.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Staff+Status.1", type text}, {"Staff+Status.2", type text}}),
#"Pivoted Column" = Table.Pivot(#"Changed Type1", List.Distinct(#"Changed Type1"[#"Staff+Status.2"]), "Staff+Status.2", "Task"),
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"Staff+Status.1", "Staff"}})
in
#"Renamed Columns"

Resources