Excel Powerquery split table top / bottom 50 percent - excel

I have an example table in Excel to illustrate my question.
Two columns (first name, last name), 11 rows and a header row.
I would like to make get&transform (powerquery) links to another sheet in the same workbook where I would like to have two tables A & B with the same structure als the source table. I would like A to display row 1-6 and B to display 7-11.
BUT: I would like this split to be dynamic. So I would want A to display Top 50% rounded up, and B to display the rest. I've seen the top N rows and read some posts about counting in a different powerquery and using this Filedropper Excel file where image below comes from

Top Half:
let
Source = Excel.CurrentWorkbook(){[Name="SourceTable"]}[Content],
TopHalfRows = Number.RoundUp(Table.RowCount(Source) / 2),
KeepTopHalf = Table.FirstN(Source, TopHalfRows)
in
KeepTopHalf
Bottom Half:
let
Source = Excel.CurrentWorkbook(){[Name="SourceTable"]}[Content],
TopHalfRows = Number.RoundUp(Table.RowCount(Source) / 2),
DeleteTopHalf = Table.Skip(Source, TopHalfRows)
in
DeleteTopHalf
EDIT:
This shows how to amend by adding a filter step, before splitting:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([firstname], "Ab")),
TopHalfRows = Number.RoundUp(Table.RowCount(#"Filtered Rows") / 2),
KeepTopHalf = Table.FirstN(#"Filtered Rows", TopHalfRows)
in
KeepTopHalf

Related

Reference a sheet in my workbook as my power query data source

After a long search, I found the below M code to reference data in a sheet and use it as a source for my query, Data is found in sheet 1 in the same workbook that contains my queries and data is a simple XLS report exported from SAP. the reason i don't use table, that when people use the sheet they may paste the SAP exported data in the sheet without using table.
please let me know if it is reliable and won't cause errors.
also how to change the below first line to let it use my first sheet in the current workbook as a source instead of a workbook from a folder path as I don't need that.
let
Source = Excel.Workbook(File.Contents("C:\...\Downloads\Test.xlsx"), null, true),
Sheet1_Sheet = Source{[Item="SAP",Kind="Sheet"]}[Data],
fTrimTable = (tbl as table, header as text) =>
let
t = Table.Buffer(tbl),
columns = List.Buffer(Table.ColumnNames(t)),
rowsToCheck = 100,
Column = List.Select(columns, each List.PositionOf(List.FirstN(Table.Column(t, _),rowsToCheck), header)>0){0},
Row = List.PositionOf(Table.Column(t, Column), header),
ScrollRows = Table.RemoveFirstN (t, Row),
ScrollColumns = Table.SelectColumns(ScrollRows, List.RemoveFirstN(columns, List.PositionOf(columns, Column))),
#"Promoted Headers" = Table.PromoteHeaders(ScrollColumns, [PromoteAllScalars=true])
in
#"Promoted Headers",
Trimmed = fTrimTable(Sheet1_Sheet, "Header100")

How to convert categorical values into columns in Excel?

I am working with a dataset that is structured like the one below. As you can see, the indicator column contains binary categorical data.
country_code indicator cumulative_count
AFG cases 52909
AFG deaths 2230
... ... ...
I would like to turn the indicator column into two separate columns (corresponding with the values of indicator: cases and deaths). I.e. I'm expecting the final result to be like this:
country_code cases deaths
AFG 52909 2230
... ... ...
Notes:
The original dataset is publically accessible from ECDC website.
I am only interested in the cumulative_count of one specific year_week (2020-53).
Here is a screenshot of the dataset:
This can also be accomplished using Power Query, available in Windows Excel 2010+ and Excel 365 (Windows or Mac)
To use Power Query
Load your data table into Excel
Select some cell in your Data Table
Data => Get&Transform => from Table/Range or from within sheet
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
let
//Read in the table
//Change table name in next line to your actual table name
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//Remove the unneeded columns
#"Removed Other Columns" = Table.SelectColumns(Source,{"country_code", "indicator", "year_week", "cumulative_count"}),
//Set the data types for those columns
#"Set Data Type" = Table.TransformColumnTypes(#"Removed Other Columns",{
{"country_code", type text}, {"indicator", type text},{"year_week", type text},{"cumulative_count", Int64.Type}
}),
//Pivot the Indicator column and aggregate by Sum
#"Pivoted Column" = Table.Pivot(#"Set Data Type",
List.Distinct(#"Removed Other Columns"[indicator]), "indicator", "cumulative_count", List.Sum),
//Filter to show only the relevant year-week for rows where thiere is a country_code
// (the others refer to continents)
#"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([country_code] <> null) and ([year_week] = "2020-53"))
in
#"Filtered Rows"
filtered to show just 2020-53
If I'm understanding your question correctly. one way:
Add new column F
Formula in $F$2: sumifs($D2:$D$9999, $B2:$B$9999, $B2, $E2:$E$9999, "deaths")
copy formula down through end record
filter column E for "cases"
if you then insert rows above the header row, you can use Subtotal(109, ...) to view cumulative counts for a specific year, or alternatively add another column with Sumif as shown above

Transpose Excel Table Horizontally, but also multiply the number of rows for the first column?

Sorry if this has been asked before, but I was unable to find anything specific to this need, probably due to it being odd to phrase. Essentially I have a table like this that I would like to transform into the table below:
zip
blue
green
10000
1
2
zip
color
10000
blue
10000
green
Ideally, I would like to do this all purely in SQL or purely in Excel, but eventually I will want to transpose through R or Python once I get more familiar.
You can do this easily in Power Query, available in Windows Excel 2010+, and Excel 365.
Select some single cell in your source table
Data => Get Data => from within sheet
In the PQ UI, select the Zip column
Transform => Unpivot => Unpivot other columns
Delete the Values column
Rename the Attribute Column => Color
Home => Close & Load
Here is M-Code that will do the same thing, with some changes so as not to have to hard-code other colors that you might have besides the two you show.
You would paste this into the Advanced Editor of PQ; change the Table name at the top. Then read the comments and explore the Applied Steps to better understand the algorithm.
let
//Change table name in next line to the actual name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table29"]}[Content],
//set data types
// Zip is text to retain leading zero's
// Others are all integers
#"Changed Type" = Table.TransformColumnTypes(Source,
{{"zip", type text}} & List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),1), each {_, Int64.Type})),
//Unpivot all columns except for the Zip column
// And name the "color" column as "Color"
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"zip"}, "Color", "Value"),
//Remove the value column since you do not show it in your result example
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Value"})
in
#"Removed Columns"

Move all text values with the same ID field to separate columns

I would like to move all the comments (Column B3:B14) to be new columns against each unique ID (Column A3:A14).
The Desired Format shows the layout that I would like to get to.
Hopefully that makes sense.
EDIT: This will do what you want using vba:
Option Explicit
Sub TransposeComments()
Dim inSR%, inTR%, inTC%, rgSource As Range, rgTarget As Range
Set rgSource = Range("A3") 'Change this if the 1st ID in the source table is moved
Set rgTarget = Range("D3") 'Change this to start populating at another start point
inTR = -1
Do
If rgSource.Offset(inSR) <> rgSource.Offset(inSR - 1) Then
inTR = inTR + 1: inTC = 2
rgTarget.Offset(inTR) = rgSource.Offset(inSR)
rgTarget.Offset(inTR, 1) = rgSource.Offset(inSR, 1)
Else
rgTarget.Offset(inTR, inTC) = rgSource.Offset(inSR, 1)
inTC = inTC + 1
End If
inSR = inSR + 1
''' End on 1st empty ID (assumes ID's in source data are contiguous and nothing is below them)
Loop Until rgSource.Offset(inSR) = ""
End Sub
I've assumed you know how to implement and call/run the vb. If not, let me know and I try and help with that. :)
============================================================
EDIT: How to do it all with formulas?
I'm unsure of how dynamic the extraction table has to be (as you don't say). For example:
o Will you be making a new extraction each time or will build a standing extractor table
o Will the source data vary in size (so you need to grow and shrink the 'lookup' range)
o Etc.
Given this, I've aimed for a solution that works and is adaptable. I'll leave it to you to adapt as appropriate 😊
To extract the unique serial numbers:
{=IFERROR(INDEX($A$2:$A$14, MATCH(0, COUNTIF($E$2:E2, $A$2:$A$14), 0)),"")}
To extract the corresponding comments:
{=IF($E3="","",IF(SUM(IF($A$2:$A$15=$E3,1))>=COUNTA($F$2:F$2),INDEX($B$2:$B$15,MATCH($E3,$A$2:$A$15,0)+COUNTA($F$2:F$2)-1),""))}
Notice the {}. Both are array formulas (entered with Ctrl, Shift and Enter)
Pictogram:
Addition Information:
The solution proposed assumes any same-serial-numbers are contiguous (as shown in your example.
If that's not the case by default, you'll have to sort the source date so it is.
You can obtain your desired output using Power Query, available in Windows Excel 2010+ and Office 365 Excel
Select some cell in your original table
Data => Get&Transform => From Table/Range
When the PQ UI opens, navigate to Home => Advanced Editor
Make note of the Table Name in Line 2 of the code.
Replace the existing code with the M-Code below
Change the table name in line 2 of the pasted code to your "real" table name
Examine any comments, and also the Applied Steps window, to better understand the algorithm and steps
M Code
let
//read in the data
//change table name in next line to actual table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
//set data types
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Comments", type text}}),
//group by ID and concatenate the comments with a character not used in the comments
//I used a semicolon, but that could be changed
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {
{"Comment", each Text.Combine([Comments],";")},
//also generate Count of the number of comments in each ID group
//as the Maximum will be the count of the number of columns to eventually create
{"numCols", each Table.RowCount(_)}
}),
//calculate how many columns to create and delete that column
maxCols = List.Max(#"Grouped Rows"[numCols]),
remCount = Table.RemoveColumns(#"Grouped Rows","numCols"),
//Split into new columns
#"Split Column by Delimiter" = Table.SplitColumn(remCount, "Comment",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv),maxCols)
in
#"Split Column by Delimiter"
If you have Excel for Microsoft 365 on the Mac with the FILTER and UNIQUE functions, you can use:
D23: =UNIQUE(Table1[ID]) *or some other cell8
and in the adjacent column:
=TRANSPOSE(FILTER(Table1[Comments],Table1[ID]=D23))

How to combine multiple columns from a table

My issue is the following: I have a table where I have multiple columns that have date and values but represent different things. Here is an example for my headers:
I Customer name I Type of Service I Payment 1 date I Payment 1 amount I Payment 2 date I Payment 2 amount I Payment 3 date I Payment 3 amount I Payment 4 date I Payment 4 amount I
What I want to do is sumifs the table based on multiple criteria. For example:
I Type of Service I Month 1 I Month 2 I Month 3 I Month 4
Service 1
Service 2
Service 3
The thing is that I do not want to write 4 sumifs (in this case, but in fact I have more that 4 sets of date:value columns).
I was thinking of creating a new table where I could put all the columns below each other (in one table with 4 columns - Customer name, Type of Service, Date and Payment) but the table should be dynamically created, meaning that it should be expanded dynamically with the new entries in the original table (i.e. if the original table has 200 entries, this would make the new table with 4x200=800 entries, if the original table has one more record then the new table should have 4x201=804 records).
I also checked the PowerQuery option but could not get my head around it.
So any help on the matter will be highly appreciated.
Thank you.
You can certainly create your four column table using Power Query. However, I suspect you may be able to also generate your final report using PQ, so you could add that to this code, if you wish.
And it will update but would require a "Refresh" to do the updating.
The "Refresh" could be triggered by
User selecting the Data/Refresh option
A button on the worksheet which user would have to press.
A VBA event-triggered macro
In any event, in order to make the query adaptable to different numbers of columns requires more M-Code than can be generated from the UI, a well as a custom function.
The algorithm below depends on the data being in this format:
Columns 1 and 2 would be Customer | Type of Service
Remaining columns would alternate between Date | Amount and be Labelled: Payment N Date | Payment N Amount where N is some number
If the real data is not in that format, some changes to the code may be necessary.
To use Power Query:
Select some cell in your Data Table
Data => Get&Transform => from Table/Range
When the PQ Editor opens: Home => Advanced Editor
Make note of the Table Name in Line 2
Paste the M Code below in place of what you see
Change the Table name in line 2 back to what was generated originally.
Read the comments and explore the Applied Steps to understand the algorithm
To enter the Custom Function, while in the PQ Editord
Right click in the Queries Pane
Add New Query from Blank Query
Paste the custom function code into the Advanced Editor
rename the Query fnPivotAll
M Code
let
//Change Table name in next line to be the Actual table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
/*set datatypes dynamically with
first two columns as Text
and subsequent columns alternating as Date and Currency*/
textType = List.Transform(List.FirstN(Table.ColumnNames(Source),2), each {_,Text.Type}),
otherType = List.RemoveFirstN(Table.ColumnNames(Source),2),
dateType = List.Transform(
List.Alternate(otherType,1,1,1), each {_, Date.Type}),
currType = List.Transform(
List.Alternate(otherType,1,1,0), each {_, Currency.Type}),
colTypes = List.Combine({textType, dateType, currType}),
typeIt = Table.TransformColumnTypes(Source,colTypes),
//Unpivot all except first two columns
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(typeIt, List.FirstN(Table.ColumnNames(Source),2), "Attribute", "Value"),
//Remove "Payment n " from attribute column
remPmtN = Table.TransformColumns(#"Unpivoted Other Columns",{{"Attribute", each Text.Split(_," "){2}, Text.Type}}),
//Pivot on the Attribute column without aggregation using Custom Function
pivotAll = fnPivotAll(remPmtN,"Attribute","Value"),
typeIt2 = Table.TransformColumnTypes(pivotAll,{{"date", Date.Type},{"amount", Currency.Type}})
in
typeIt2
Custom Function: fnPivotAll
//credit: Cam Wallace https://www.dingbatdata.com/2018/03/08/non-aggregate-pivot-with-multiple-rows-in-powerquery/
(Source as table,
ColToPivot as text,
ColForValues as text)=>
let
PivotColNames = List.Buffer(List.Distinct(Table.Column(Source,ColToPivot))),
#"Pivoted Column" = Table.Pivot(Source, PivotColNames, ColToPivot, ColForValues, each _),
TableFromRecordOfLists = (rec as record, fieldnames as list) =>
let
PartialRecord = Record.SelectFields(rec,fieldnames),
RecordToList = Record.ToList(PartialRecord),
Table = Table.FromColumns(RecordToList,fieldnames)
in
Table,
#"Added Custom" = Table.AddColumn(#"Pivoted Column", "Values", each TableFromRecordOfLists(_,PivotColNames)),
#"Removed Other Columns" = Table.RemoveColumns(#"Added Custom",PivotColNames),
#"Expanded Values" = Table.ExpandTableColumn(#"Removed Other Columns", "Values", PivotColNames)
in
#"Expanded Values"
Sample Data
Output
If this does not give you what you require, or if you have issues going further with it to generate your desired reports, post back.

Resources