Enriching Excel data/rows by referencing %shares in another table

Enriching Excel data/rows by referencing %shares in another table - excel

In Excel, what VBA code will help me explode/enrich data in table A by applying the % shares in table B to produce the desired output in table C? Not all companies need to be enriched.
screenshot of relevant tables in Excel
I envisage some loop to match on company name and then to enrich Table B by inserting the necessary rows to show the resulting shared $ by Team.

I'd use Power Query for that, not VBA. Load both table A and Table B into Power Query. Then create a query that merges the two tables on the company column, using a full outer join. Then extract the team and share columns. Create a new column for the calculation of the $ value, delete the columns not required and remove null values from the Team column. The result looks like this:
M Code generated by clicking the buttons in Power Query Editor and entering one IF statement manually looks like this:
let
Source = Table.NestedJoin(TableA,{"company"},TableB,{"company"},"NewColumn",JoinKind.FullOuter),
#"Expanded NewColumn" = Table.ExpandTableColumn(Source, "NewColumn", {"Team", "PercShare"}, {"Team", "PercShare"}),
#"Added Custom" = Table.AddColumn(#"Expanded NewColumn", "Dollars", each if [Team] = null then [dollar] else [dollar] *([PercShare]/100)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"PercShare"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Columns",null,"",Replacer.ReplaceValue,{"Team"})
in
#"Replaced Value"

Related

How to convert categorical values into columns in Excel?

I am working with a dataset that is structured like the one below. As you can see, the indicator column contains binary categorical data.
country_code indicator cumulative_count
AFG cases 52909
AFG deaths 2230
... ... ...
I would like to turn the indicator column into two separate columns (corresponding with the values of indicator: cases and deaths). I.e. I'm expecting the final result to be like this:
country_code cases deaths
AFG 52909 2230
... ... ...
Notes:
The original dataset is publically accessible from ECDC website.
I am only interested in the cumulative_count of one specific year_week (2020-53).
Here is a screenshot of the dataset:

This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Load your data table into Excel
Select some cell in your Data Table
Data => Get&Transform => from Table/Range or from within sheet
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//Read in the table
//Change table name in next line to your actual table name
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//Remove the unneeded columns
#"Removed Other Columns" = Table.SelectColumns(Source,{"country_code", "indicator", "year_week", "cumulative_count"}),
//Set the data types for those columns
#"Set Data Type" = Table.TransformColumnTypes(#"Removed Other Columns",{
{"country_code", type text}, {"indicator", type text},{"year_week", type text},{"cumulative_count", Int64.Type}
}),
//Pivot the Indicator column and aggregate by Sum
#"Pivoted Column" = Table.Pivot(#"Set Data Type",
List.Distinct(#"Removed Other Columns"[indicator]), "indicator", "cumulative_count", List.Sum),
//Filter to show only the relevant year-week for rows where thiere is a country_code
// (the others refer to continents)
#"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([country_code] <> null) and ([year_week] = "2020-53"))
in
#"Filtered Rows"
filtered to show just 2020-53

If I'm understanding your question correctly. one way:
Add new column F
Formula in $F$2: sumifs($D2:$D$9999, $B2:$B$9999, $B2, $E2:$E$9999, "deaths")
copy formula down through end record
filter column E for "cases"
if you then insert rows above the header row, you can use Subtotal(109, ...) to view cumulative counts for a specific year, or alternatively add another column with Sumif as shown above

How can I get the rows with the most cells that have the highest (or the lowest) values

I have a table of data which is consisted of 18 columns and 2.017 rows. I can get the row that has the highest (MAX) value in a cell but I need the row that has the most cells with higher values and have them in DESC order. I haven't managed yet to find a relevant post to this.
Here follows an example:
Using numbers up to 10 for illustration, the following shows the logic behind. (The actual numbers are those shown in Exhibit1)
Thank you
EDIT:
I am adding the below in order to try to clarify further. I am not sure if it is the correct path to go but I hope it makes sense.
In Exhibit2 I am indexing each column Desc (Based on Exhibit1) and then =SUM in the end of the row. Following this logic, the name having the lowest total is the one with the most high values (not the highest) in its row.
The result table is the following

Although possible with formulas and helper tables/columns, this can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Select some cell in your Data Table
Data => Get&Transform => from Table/Range or from within sheet
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
As we discussed in our Chat, I transform each column into a list of Ranked Entries; then sum the ranks for each row and sort as you have laid out.
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
//type all the columns
data = Table.TransformColumnTypes(Source,{
{"Order", Int64.Type},
{"Name", type text}} &
List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),2), each {_, type number})
),
//Replace with ranks
//generate list of transforms to dynamically include all columns
cols = List.RemoveFirstN(Table.ColumnNames(data),2),
xForms = List.Transform(cols, (c)=> {c, each List.PositionOf(List.Sort(Table.Column(data,c),Order.Descending),_)}),
ranks = Table.TransformColumns(data,xForms),
//add Index column to enable row-wise sums
// then add the sumRank column and delete the Index column
#"Added Index" = Table.AddIndexColumn(ranks, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "sumRank", each
List.Sum(
Record.ToList(
Record.RemoveFields(#"Added Index"{[Index]},{"Order","Name","Index"})
)
)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
//join back with the original data table
//extract the sumRank column
join = Table.NestedJoin(data,{"Order","Name"}, #"Removed Columns",{"Order","Name"}, "joined",JoinKind.FullOuter),
#"Expanded joined" = Table.ExpandTableColumn(join, "joined", {"sumRank"}, {"sumRank"}),
//sort by the sumRank column, then remove it
#"Sorted Rows" = Table.Sort(#"Expanded joined",{{"sumRank", Order.Ascending}}),
#"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"sumRank"})
in
#"Removed Columns1"

This set-up is volatile, so I would only adopt it if non-volatile alternatives are not forthcoming.
An additional column in your table with the following formula:
=SUM(COUNTIF(OFFSET([Column1],,TRANSPOSE(ROW(INDIRECT("1:"&COLUMNS(Table1[#[Column1]:[Column4]])))-1)),">="&Table1[#[Column1]:[Column4]]))
which you can then use to sort your table.
Note that this formula will most likely require committing with CTRL+SHIFT+ENTER for your version of Excel.
Amend the table and column names as required, noting that the part
Table1[#[Column1]:[Column4]]
as well as including the table name, should comprise the leftmost and rightmost of the contiguous columns to be interrogated.

How to combine multiple columns from a table

My issue is the following: I have a table where I have multiple columns that have date and values but represent different things. Here is an example for my headers:
I Customer name I Type of Service I Payment 1 date I Payment 1 amount I Payment 2 date I Payment 2 amount I Payment 3 date I Payment 3 amount I Payment 4 date I Payment 4 amount I
What I want to do is sumifs the table based on multiple criteria. For example:
I Type of Service I Month 1 I Month 2 I Month 3 I Month 4
Service 1
Service 2
Service 3
The thing is that I do not want to write 4 sumifs (in this case, but in fact I have more that 4 sets of date:value columns).
I was thinking of creating a new table where I could put all the columns below each other (in one table with 4 columns - Customer name, Type of Service, Date and Payment) but the table should be dynamically created, meaning that it should be expanded dynamically with the new entries in the original table (i.e. if the original table has 200 entries, this would make the new table with 4x200=800 entries, if the original table has one more record then the new table should have 4x201=804 records).
I also checked the PowerQuery option but could not get my head around it.
So any help on the matter will be highly appreciated.
Thank you.

You can certainly create your four column table using Power Query. However, I suspect you may be able to also generate your final report using PQ, so you could add that to this code, if you wish.
And it will update but would require a "Refresh" to do the updating.
The "Refresh" could be triggered by
User selecting the Data/Refresh option
A button on the worksheet which user would have to press.
A VBA event-triggered macro
In any event, in order to make the query adaptable to different numbers of columns requires more M-Code than can be generated from the UI, a well as a custom function.
The algorithm below depends on the data being in this format:
Columns 1 and 2 would be Customer | Type of Service
Remaining columns would alternate between Date | Amount and be Labelled: Payment N Date | Payment N Amount where N is some number
If the real data is not in that format, some changes to the code may be necessary.
To use Power Query:
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
To enter the Custom Function, while in the PQ Editord
Right click in the Queries Pane
Add New Query from Blank Query
Paste the custom function code into the Advanced Editor
rename the Query fnPivotAll
M Code
let
//Change Table name in next line to be the Actual table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
/*set datatypes dynamically with
first two columns as Text
and subsequent columns alternating as Date and Currency*/
textType = List.Transform(List.FirstN(Table.ColumnNames(Source),2), each {_,Text.Type}),
otherType = List.RemoveFirstN(Table.ColumnNames(Source),2),
dateType = List.Transform(
List.Alternate(otherType,1,1,1), each {_, Date.Type}),
currType = List.Transform(
List.Alternate(otherType,1,1,0), each {_, Currency.Type}),
colTypes = List.Combine({textType, dateType, currType}),
typeIt = Table.TransformColumnTypes(Source,colTypes),
//Unpivot all except first two columns
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(typeIt, List.FirstN(Table.ColumnNames(Source),2), "Attribute", "Value"),
//Remove "Payment n " from attribute column
remPmtN = Table.TransformColumns(#"Unpivoted Other Columns",{{"Attribute", each Text.Split(_," "){2}, Text.Type}}),
//Pivot on the Attribute column without aggregation using Custom Function
pivotAll = fnPivotAll(remPmtN,"Attribute","Value"),
typeIt2 = Table.TransformColumnTypes(pivotAll,{{"date", Date.Type},{"amount", Currency.Type}})
in
typeIt2
Custom Function: fnPivotAll
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
Sample Data
Output
If this does not give you what you require, or if you have issues going further with it to generate your desired reports, post back.

Power Query / Power BI get look for data from another excel workbook

I am trying to combine worksheets from two different workbooks with Power Query and I have trouble doing that.
I do not want to merge the two workbooks.
I do not want to create relationships or "joints".
However, I want to get very specific information for one workbook which has only one column. The "ID" column.
The ID column has rows with letter tags : AB or BE.
Following these letters, sepcific numeric ranges are associated.
For both AB and BE, number ranges first from 0000 to 3000 and from 3000 to 6000.
I thus have the following possibilities:
From AB0000 to AB3000
From AB3001 to AB6000
From BE0000 to BE3000
From BE3001 to AB6000
Each category match to the a specific item in my column geography, from the other workbook:
From AB0000 to AB3000, it is ItalyZ
From AB3001 to AB6000, it is ItalyB
From BE0000 to BE3000, it is UKY
From BE3001 to AB6000, it is UKM
I am thus trying to find the highest number associated to the first AB category, the second AB category, the first BE category, and the second.
I then want to "bring" this number in the other query and increment it each time that matching country is found in the other workbook.
For example :
AB356 is the highest number in the first workbook.
Once the first "ItalyB" is found, the column besides writes "AB357".
Once the second is "ItalyB" is found, the column besides write "AB358".
Here is the one columned worksheet:
Here is the other worksheet with the various countries in geography:
Here is an example of results:
have one column (geography) with
I think that this is something which I should work towards:
I added the index column, with a start as one, because each row (even row zero) should increment either of the four matching code.
In order to keep moving forward I have also been trying to create some sort of mapping in third excel sheet, that I imported in Power BI, but I am not sure that this is a good way forward:
I have the following result when I create a blank query:
After a correction, I still get this result when creating the blank query:

This is not an easy answer as there are many steps to get to your result. I have choosen for m-query because of the complexity.
In PBi click on Transform data, now you are in m-query.
The table with the ID's (I called it "HighestID") needs expansion
because we need to be able to map on prefix
You need a mapping table
("GeoMapping"), else there is no relation between the Prefixes and
the geolocation.
We need the newID on the Geo-table (which I called "Geo").
Expand the HighestID table.
Click on the table and open the Advanced Editor, look at your code and compare it to the one below, the last 2 steps are essential, there I add two columns (Prefix and Number) which we need later.
let
Source = Csv.Document(File.Contents("...\HighestID.csv"),[Delimiter=";", Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Prefix", each Text.Middle([ID],0,2), type text),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Number", each Number.FromText(Text.Middle([ID],2,5)))
in
#"Added Custom1"
Result:
Create mapping table
Click right button under your last table and click Blank Query:
Paste the source below, ensure the name of the merg table equals the name of your table. As I mentioned, I called it HighestID.
let
Source = #table({"Prefix", "Seq_Start", "Seq_End","GeoLocation"},{{"AB",0,2999,"ItalyZ"},{"AB",3000,6000,"ItalyB"},{"BC",0,299,"UKY"},{"BC",3000,6000,"UKM"}}),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Seq_Start", Int64.Type}, {"Seq_End", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Prefix"}, HighestID, {"Prefix"}, "HighestID", JoinKind.LeftOuter),
#"Expanded HighestID" = Table.ExpandTableColumn(#"Merged Queries", "HighestID", {"Number"}, {"Number"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded HighestID", each [Number] >= [Seq_Start] and [Number] <= [Seq_End]),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Prefix", "Seq_Start", "Seq_End", "GeoLocation"}, {{"NextSeq", each List.Max([Number]) + 1, type number}})
in
#"Grouped Rows"
Result:
Adding the NextSeq Column
This is the hard bit because when I would only give you teh code, I am afraid it will not work so I give you the steps you need to do.
1.Select the table, right click on Geography and click Group by. select as below:
Merge with table Geomapping as below:
Expand the GeoMapping with NextSeq
Add a custom column:
Remove columns not needed so only custom is left created in step 4.
Expand the column (all select). End result all your columns you had earlier plus an Index column.

Remove duplicates from single column power query

From this original table,
I made a second table (using power query).
This second table is to be used for data validation purposes, and I need it to depend on the first table so that any changes will follow through. The problem I'm running into is that my second table is not quite how I want it, I would like to remove any duplicates from each individual column. When I try to remove duplicates in power query, it removes whole rows (which makes sense, I agree), is there a way to remove duplicates from single columns?
Here's the M code I'm using to get from table1 to table2
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Removed Columns" = Table.RemoveColumns(Source,{"Grade", "fb", "fv", "fc", "fcp", "ft", "E", "E05"}),
#"Grouped Rows" = Table.FromColumns(Table.Group(#"Removed Columns", {"Catégorie"}, {{"Count", each List.InsertRange([Essence],0,List.Distinct([Catégorie]))}})[Count]),
#"Promoted Headers" = Table.PromoteHeaders(#"Grouped Rows", [PromoteAllScalars=true])
in
#"Promoted Headers"

If you have a Source table with columns A, B, and C and want to return a table of each column with duplicates removed, then you can write M code like this:
= Table.FromColumns({
List.Distinct(Source[A]),
List.Distinct(Source[B]),
List.Distinct(Source[C])},
{"A","B","C"})
More generically (without using explicit column names), you can do it in a few steps like this:
ToColumns = Table.FromList(Table.ToColumns(Source), Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RemoveDuplicates = Table.TransformColumns(ToColumns, {{"Column1", each List.Distinct(_)}}),
FromColumns = Table.FromColumns(RemoveDuplicates[Column1], Table.ColumnNames(Source))

if you are using a new version of excel go to data menu from the top menu then > highlight the column you want and press remove duplicates.
it remove only values from the selected column
first:
then:

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string