HOw do you remove duplicate values from a single excel cell

HOw do you remove duplicate values from a single excel cell - excel

How do you remove duplicate values from a single excel cell (A1) using power query
For example:
Anish,Anish,Prakash,Prakash,Prakash,Anish~,Anish~
Result wanted as like:
Anish,Prakash,Anish~

Using Power Query, you can refer to a single cell in the current workbook if it is a named range. You could then use something like this, to list the distinct values:
let
Source = Excel.CurrentWorkbook(){[Name="MyCell"]}[Content],
#"Split List" = Text.Split(Source{0}[Column1],","),
#"Removed Duplicates" = List.Distinct(#"Split List"),
#"Combine Values" = Text.Combine(#"Removed Duplicates",",")
in
#"Combine Values"

I am new to M code. However, for others who might has similar experience as me, I studied a little bit and I think the following might be easier for others to understand:
#"Added Custom1" = Table.AddColumn(#"Extracted Values1", "Split1", each Text.Split([#"Cust"],",")),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "RemoveDuplicate1", each List.Distinct([#"Split1"])),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "CombineValue1", each Text.Combine([#"RemoveDuplicate1"],",")),
Just simply copy above code in Advanced Editor, and change the column name respectively. In my case, the column name is Cust, Split1, RemoveDuplicate1,CombineValue1. Of course, the added column name might also be different.
Basically, the 3 rows means we need to create 3 columns, if we manually create 3 columns, then we just need to copy and paste the code after "each" of each row of above.
See below:

Related

How can I get the rows with the most cells that have the highest (or the lowest) values

I have a table of data which is consisted of 18 columns and 2.017 rows. I can get the row that has the highest (MAX) value in a cell but I need the row that has the most cells with higher values and have them in DESC order. I haven't managed yet to find a relevant post to this.
Here follows an example:
Using numbers up to 10 for illustration, the following shows the logic behind. (The actual numbers are those shown in Exhibit1)
Thank you
EDIT:
I am adding the below in order to try to clarify further. I am not sure if it is the correct path to go but I hope it makes sense.
In Exhibit2 I am indexing each column Desc (Based on Exhibit1) and then =SUM in the end of the row. Following this logic, the name having the lowest total is the one with the most high values (not the highest) in its row.
The result table is the following

Although possible with formulas and helper tables/columns, this can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range or from within sheet
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
As we discussed in our Chat, I transform each column into a list of Ranked Entries; then sum the ranks for each row and sort as you have laid out.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
//type all the columns
data = Table.TransformColumnTypes(Source,{
{"Order", Int64.Type},
{"Name", type text}} &
List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),2), each {_, type number})
),
//Replace with ranks
//generate list of transforms to dynamically include all columns
cols = List.RemoveFirstN(Table.ColumnNames(data),2),
xForms = List.Transform(cols, (c)=> {c, each List.PositionOf(List.Sort(Table.Column(data,c),Order.Descending),_)}),
ranks = Table.TransformColumns(data,xForms),
//add Index column to enable row-wise sums
// then add the sumRank column and delete the Index column
#"Added Index" = Table.AddIndexColumn(ranks, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "sumRank", each
List.Sum(
Record.ToList(
Record.RemoveFields(#"Added Index"{[Index]},{"Order","Name","Index"})
)
)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
//join back with the original data table
//extract the sumRank column
join = Table.NestedJoin(data,{"Order","Name"}, #"Removed Columns",{"Order","Name"}, "joined",JoinKind.FullOuter),
#"Expanded joined" = Table.ExpandTableColumn(join, "joined", {"sumRank"}, {"sumRank"}),
//sort by the sumRank column, then remove it
#"Sorted Rows" = Table.Sort(#"Expanded joined",{{"sumRank", Order.Ascending}}),
#"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"sumRank"})
in
#"Removed Columns1"

This set-up is volatile, so I would only adopt it if non-volatile alternatives are not forthcoming.
An additional column in your table with the following formula:
=SUM(COUNTIF(OFFSET([Column1],,TRANSPOSE(ROW(INDIRECT("1:"&COLUMNS(Table1[#[Column1]:[Column4]])))-1)),">="&Table1[#[Column1]:[Column4]]))
which you can then use to sort your table.
Note that this formula will most likely require committing with CTRL+SHIFT+ENTER for your version of Excel.
Amend the table and column names as required, noting that the part
Table1[#[Column1]:[Column4]]
as well as including the table name, should comprise the leftmost and rightmost of the contiguous columns to be interrogated.

Get values of top N based on sum and condition [duplicate]

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.

This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())

An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)

Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

Replacing newline with new rows in excel sheet

I am working with an excel sheet where rows inside a particular column is written using new lines.
.
For e.g. in Fig 1. Col D and Col E have been represented using new lines. i.e. A = Very Good, Needs Improvement. What I am trying to get is this in another form as shown. Any pointers in this regard would be helpful.

Try to use "Get&Transform" aka Powerquery.
Steps:
Select your data and load it (with headers) into PQ.
Add a new custom column (named 'Custom' for example) and use the following custom column formula:
Table.FromColumns({Text.Split([Grades],"#(lf)"), Text.Split([Comment],"#(lf)")})
On the newly created column, click the expand button (top right) and expand both columns.
Delete columns 'Grades', 'Comments'.
Additionally you could rename the last two columns back to 'Grades' and 'Comment'.
To make things a litle easier you could also just apply the following M-code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Custom", each Table.FromColumns({Text.Split([Grades],"#(lf)"), Text.Split([Comment],"#(lf)")})),
#"Expanded {0}" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Column1", "Column2"}, {"Custom.Column1", "Custom.Column2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded {0}",{"Grades", "Comment"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Custom.Column1", "Grades"}, {"Custom.Column2", "Comment"}})
in
#"Renamed Columns"
Your end result should look like:

Try esProc, split and expand multiline words in an excel cell into multiple rows as following code.
A
1 =file("data.xlsx").xlsimport#t()
2 =A1.run(Grades=Grades.split("\n"),Comment=Comment.split("\n"))
3 =A2.news(Grades.len();Names,Class,Year,Grades(#):Grades,Comment(#):Comment)
4 =file("result.xlsx").xlsexport#t(A3)
For more explanation, see http://c.raqsoft.com/article/1609902051322
DISCLAIMER: This is about our tool esProc. It’s freemium.

Power Query / Power BI get look for data from another excel workbook

I am trying to combine worksheets from two different workbooks with Power Query and I have trouble doing that.
I do not want to merge the two workbooks.
I do not want to create relationships or "joints".
However, I want to get very specific information for one workbook which has only one column. The "ID" column.
The ID column has rows with letter tags : AB or BE.
Following these letters, sepcific numeric ranges are associated.
For both AB and BE, number ranges first from 0000 to 3000 and from 3000 to 6000.
I thus have the following possibilities:
From AB0000 to AB3000
From AB3001 to AB6000
From BE0000 to BE3000
From BE3001 to AB6000
Each category match to the a specific item in my column geography, from the other workbook:
From AB0000 to AB3000, it is ItalyZ
From AB3001 to AB6000, it is ItalyB
From BE0000 to BE3000, it is UKY
From BE3001 to AB6000, it is UKM
I am thus trying to find the highest number associated to the first AB category, the second AB category, the first BE category, and the second.
I then want to "bring" this number in the other query and increment it each time that matching country is found in the other workbook.
For example :
AB356 is the highest number in the first workbook.
Once the first "ItalyB" is found, the column besides writes "AB357".
Once the second is "ItalyB" is found, the column besides write "AB358".
Here is the one columned worksheet:
Here is the other worksheet with the various countries in geography:
Here is an example of results:
have one column (geography) with
I think that this is something which I should work towards:
I added the index column, with a start as one, because each row (even row zero) should increment either of the four matching code.
In order to keep moving forward I have also been trying to create some sort of mapping in third excel sheet, that I imported in Power BI, but I am not sure that this is a good way forward:
I have the following result when I create a blank query:
After a correction, I still get this result when creating the blank query:

This is not an easy answer as there are many steps to get to your result. I have choosen for m-query because of the complexity.
In PBi click on Transform data, now you are in m-query.
The table with the ID's (I called it "HighestID") needs expansion
because we need to be able to map on prefix
You need a mapping table
("GeoMapping"), else there is no relation between the Prefixes and
the geolocation.
We need the newID on the Geo-table (which I called "Geo").
Expand the HighestID table.
Click on the table and open the Advanced Editor, look at your code and compare it to the one below, the last 2 steps are essential, there I add two columns (Prefix and Number) which we need later.
let
Source = Csv.Document(File.Contents("...\HighestID.csv"),[Delimiter=";", Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Prefix", each Text.Middle([ID],0,2), type text),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Number", each Number.FromText(Text.Middle([ID],2,5)))
in
#"Added Custom1"
Result:
Create mapping table
Click right button under your last table and click Blank Query:
Paste the source below, ensure the name of the merg table equals the name of your table. As I mentioned, I called it HighestID.
let
Source = #table({"Prefix", "Seq_Start", "Seq_End","GeoLocation"},{{"AB",0,2999,"ItalyZ"},{"AB",3000,6000,"ItalyB"},{"BC",0,299,"UKY"},{"BC",3000,6000,"UKM"}}),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Seq_Start", Int64.Type}, {"Seq_End", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Prefix"}, HighestID, {"Prefix"}, "HighestID", JoinKind.LeftOuter),
#"Expanded HighestID" = Table.ExpandTableColumn(#"Merged Queries", "HighestID", {"Number"}, {"Number"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded HighestID", each [Number] >= [Seq_Start] and [Number] <= [Seq_End]),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Prefix", "Seq_Start", "Seq_End", "GeoLocation"}, {{"NextSeq", each List.Max([Number]) + 1, type number}})
in
#"Grouped Rows"
Result:
Adding the NextSeq Column
This is the hard bit because when I would only give you teh code, I am afraid it will not work so I give you the steps you need to do.
1.Select the table, right click on Geography and click Group by. select as below:
Merge with table Geomapping as below:
Expand the GeoMapping with NextSeq
Add a custom column:
Remove columns not needed so only custom is left created in step 4.
Expand the column (all select). End result all your columns you had earlier plus an Index column.

Excel - extracting top 5 values

I would like to extract the top 5 players based on the sales by each employee (without Pivot Table / Auto filter).
Refer my input and output screenshot
Snapshot
Any suggestions, how to obtain first top 5 ranks (even if repeated; as shown in the screenshots)
I have verified Extract Top 5 Values for Each Group in a List without VBA and some other links also.
Thanks in advance for your time and consideration! Please let me know if my request is unclear and/or if you have any specific questions.

This is what I use to track the top 5 absentees...
Edit to suit your needs.
Formula in cell A1:
=INDEX(A$13:A52,AGGREGATE(15,6,ROW($1:$40)/(B$13:B$52=B1),COUNTIF(B$1:B1,B1)))
Formula in cell B1:
LARGE(B$13:B$52,ROW())

An alternative approach using Power Query which is available in Excel 2010 Professional Plus and all later versions of Excel.
Steps are:
Add your input data table to the Power Query Editor;
Sort the table by Sales then by Name;
Add an Index Column starting from 1;
Filter the Index column to show values less than or equal to 5;
Remove the Index column, then you should have something like the following:
Close & Load the output table to a new worksheet (by default).
Here are the power query M Codes for your reference. All functions used are within GUI so it should be easy and straight forward.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee", type text}, {"Month", type text}, {"Sales", type number}}),
#"Sorted Rows" = Table.Sort(#"Changed Type",{{"Sales", Order.Descending}, {"Employee", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Filtered Rows" = Table.SelectRows(#"Added Index", each [Index] <= 5),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index"})
in
#"Removed Columns"
Let me know if you have any questions. Cheers :)

Try this one. As you have in your sample:
On Cell E16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$A$3:$A$12,$C$3:$C$12),2,FALSE)
On Cell F16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),CHOOSE({2/1},$B$3:$B$12,$C$3:$C$12),2,FALSE)
On Cell G16:
=VLOOKUP(LARGE($C$3:$C$12,ROW()-15),$C$3:$C$12,1,FALSE)
You can drag it down to get the list sorted.
Hope it helps!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string