Practical tips on documenting Excel Queries, data model tables, pivot tables? - excel

Building a BI system (dashboards) in Excel using imported tables (from excel files). We're using Excel 2016 query, data model, measures using DAX expressions, resulting in more pivot tables (some of which are reloaded into data model), etc.
My question: is there "best practice" on 1) naming these data elements and 2) documenting these bits to have a more complete system documentation.
Background: I'm the senior "hacker" munging these things together. But I need to move this towards being sustainable. I did some prototyping work and when I went back a week later it was challenging to reconstruct my thoughts and relationships...
I've seen folks refer to use of PowerBI flow diagrams to support documentation; but it seems to be more of the "icing on the cake" than the "cake" itself.
So what "bread and butter" documentation approaches have you, more experienced developers, taken to ensure that your systems are clearly documented so that others can pick up where you left off???

For naming, I follow the Kimball Group's advice for data warehouses/marts, e.g.
https://www.kimballgroup.com/2014/07/design-tip-168-whats-name/
I rename many/most Query steps to reference the column or table name, e.g. Added Custom => Added Customer Name, Append Queries => Append Customers. The idea is to be able to pick the right step first time when coming back for maintenance.
You can select all the Queries in the Query Editor window and copy their code, then paste it into Word etc as the starting point for your documentation. You can also screen-shot the Query Editor's Query Dependancies pop-up.
For the Power Pivot logic, try this solution:
https://powerpivotpro.com/2014/03/automatically-create-data-dictionary-for-your-power-pivot-model/

Related

Spreadsheet with relationships

I have to work with data CSV file. They look like this
sample
It represents products with options/cars etc. at the web-store.
It has a lot of columns with duplicated values and in my work in often need to copy some part of this data to another sheet, deduplicate it, edit and then paste it back by matching it for one of the columns that were untouched. More this purpose I'm using Ablebits Excel suit.
Is it possible by any excel function to automate this process or maybe there is some other software that could handle this? Something not so complicated as relational databases like Access, but something close to spreadsheet editor with relationships
I already tried Power Query in Excel and Power Bi, but they seem to be more analytics tools and not the data edit
2nd edition:
Data has a layer structure with duplicates.
Title1|Part number 1|Car1
Title1|Part number 1|Car2
Title2|Part number 2|Option1
Title2|Part number 3|Option2
I want to have opportunity to:
Edit values that duplicate without using "Replace All" or at least have more flexible "Find&Replace".
Extract columns with deduplicating them and saving a reference to the place they were taken. So if you edit some data there it was changed in the 1st place. For example, I have titles(a lot of titles) but need to edit it. Instead of copying it with some id to reference it I want to open it like they appears in filters, edit it, confirm and get it edited in all column
I would use Power Query (aka Get & Transform on the Data ribbon in Excel 2016). The only limitation I see with what you want to do is that Power Query will deliver a new Excel Table with the output of a Query - it can't update existing cells.
If you can get past that, Power Query is very flexible, easy to learn (WYSIWYG query editor), scales well and is integrated with other Microsoft products (as well as Power BI, there is integration with SQL Server Analysis Services in preview and hopefully SQL Server Integration Services one day).

Power Query M language and Excel functions

I have no issues with Microsoft inventing a new language. But why they choose to alienate those already familiar with VB or Excel scripting? Why not allow those as well? (They said it is a mashup but alas it is not too mashed.)
Is it possible to place in Excel functions into the load logic for evaluation in the destination excel spreadsheet?
This is a complicated question with many answers across the development history of our product, but I'll give my own (unofficial) opinion.
"M" isn't trying to replace VB scripting, and I don't think it will replace the Excel formula language anytime soon. Instead, it's a simple language for doing just a couple jobs: getting and transforming data.
The language I most often compare "M" to is SQL. "M" and SQL both let you select specific columns and rows from your table, add computed columns, join tables, and aggregate data. But "M" gives you one query language that lets you "mashup" data from multiple sources in the same query: Sql Server, CSV, JSON, Facebook, Google Analytics, etc.
To your second question of using "M" inside of Excel functions, that's a cool idea!
I'd suggest to the Excel product team with the Feedback button inside Excel and/or at https://excel.uservoice.com/

Excel Files and Visual Basic

I have never used Visual Basic before but could do with a pointer on where to begin.
I have 750 excel spreadsheets that contains various amounts of data of different types. The columns are always the same, but the number of data rows vary per spreadsheet. I need to extract data and put it into two new spreadsheets.
Obviously to do this 750 times manually would be a nightmare. I just want to run a script that can do it for me and thus thought of Visual Basic although i've never used it before.
My specific questions are:
What type of command should i research that would allow me to copy data where the row number to start at varies (as data above varies in no of rows). There is a title before this new data - how can i get it to search for this title and then choose the row below?
Would all my spreadsheets have to be in one folder so that the script goes through them all, or can i have some kind of folder structure in that folder too?
Anyone recommend any good resources for me to get to grips with visual basic and grasp what i need to do?
thanks
Tom
So the compilation task got easier with the introduction of MS PowerQuery. If you are using MS Excel 2013, you already have this. If no, you should download it and use the extension from MS.
The following guide outlines how to Using Power Query to Combine Data from Multiple Excel Files into One Table. This means that with Power Query (PQ), MS has taken and enabled easy aggregation using a few simple button clicks. PQ is a lightweight alternative to a lot of tasks that used to require VBA.
In this example, you will use PQ to point to an entire folder (750 should be no problem) worth of commonly formatted Excel files. The only limitation is that each data file should have a similarly named tab.
I won't repeat the details of the guide for how to do it, as it is in-depth and visual. But if you run into issues, get in touch.

How to enable user to edit Power Pivot lookup data

Coming from the corporate data warehousing solutions world I found Power Pivot to be surprisingly functional tool that could help to bring BI into small bussiness. In order to educate myself in this area I'm doing a small project for a friend who is a building contractor and asked me to help analyse his costs against different projects. The issue is that some data needs to be provided by user and here is where my narrow knowledge of Power Pivot is starting to show up.
My base data is coming from accounting system. I have access to company books via SQL Server connection, I can import Invoices, Clients, Suppliers, Accounts and all other entries. I made all
the connections and can present data in an easy way in a Pivot table which alone impressed my friend a lot. I was impressed myself how easy and straightforward it was compared to some reporting tools like Microstrategy or Business Objects which I use every day.
What is missing in the accounting system is Project information, say Client has 3 houses which my friend is working on, each of them should be treated as a separate project that we want to calculate profit on. Do do that I need some manual input such as assigning Project to Cost invoice add a category (Materials, Services) etc.
Initially I wanted to create a two lookup spreadsheets:
Projects where project name, valid from/to, etc. would be entered manually (or imported from CRM application in the future) and Client which would be selected from a dropdown list (ideally sourced directly from the power pivot Client table)
QUESTION 1: how to add a dropdown list of Clients coming from Power Pivot table?
Invoice relation table which would hold some data from power pivot Cost Invoice table (invoice number, supplier, date etc.) and then a dropdown list with projects (from the first spreadsheet) and cost type. Tricky part is that I would like invoice list to be refreshed automatically when new invoices are registered but I don't want to loose any data added manually against them!
QUESTION 2: how to design such spreadsheet to be populating new invoices automatically but maintaining data linked with it? I was thinking PivotTable with some data next to it but it seems like a very volatile solution, say invoices are sorted by invoice number (which is different system or each supplier) and a new one can show up in the middle of the table, then all projects that were manually added after would start pointing to the wrong invoices.
Last resort for me is to create an MS Access database for storing/updating lookups but then all the mess with creating ODBC connections etc. comes into play and in my opinion it's defeating the idea of neat PowerPivot Excel spreadsheet...
Any ideas are welcome!
in general you can design an excel table that holds manual input data and feed that table into PowerPivot (via linked table).
as for the drop down info based on database data i suggest using power query. this tool lets you connect to various sources, do some data wrangling and then output that data into excel. these outputs can then serve as data range for excel based drop downs.
take a look at Power Query e.g.here: https://support.office.com/en-in/article/Introduction-to-Microsoft-Power-Query-for-Excel-6e92e2f4-2079-4e1f-bad5-89f6269cd605 or here https://www.youtube.com/watch?v=LACjRvxl_2w.

Sort text-based information into different sheets

I am creating a tracking document for artists' accommodation as part of an arts festival and would like to automate part of my work flow. Whilst we use event management/scheduling software for confirmed bookings, it's nice to do all my working in Excel.
I would like to have a master sheet (sheet 1), with a full list of artists and their respective accommodation - that can then be sorted into individual sheets (sheet 2, 3 etc) based on the name of the accommodation. The automatic sorting would also capture the other pieces of information in the row.
This would allow for each different sheet to show a report on who is staying in each type of accommodation and would be rather handy!
I would recommend one or more PivotTables as a simpler solution. Here a PT and two clones are shown on your Master Sheet, but they could each be on their own sheet:
Accom is in Report Filter, Company is in Row Labels and PAX (as Sum) is in Σ Values. Once having clicked on PivotTable in Insert > Tables - PivotTable and having chosen you range ('Master Sheet'!$A$2:$C$7A2:C7) and Location just drag the fields from the big box to the little ones.
This is feasible using Excel, but I don't recommend it; it is creating a maintenance nightmare in the long run.
From the question I can't gather whether the data is available in some kind of event management software package; if so you can use that one as a data source. Or create an Access or SQL database with a few tables. After that, you can use one of the following options to make the necessary overviews and as many more as you think up during the project:
Use Excel with ODBC or web query to retrieve data aggregated and
sorted as you like. Make changes in the event management package
allowing others to see the same facts. Or do it in Access. When you
change one thing, it automatically propogates also into the Excel.
Similarly, you can use an Excel add-in such as Invantive
Control (caution I work at a supplier) to retrieve the data from
the database using SQL or a webservice, change it from within Excel and
then synchronize the changes back assuming you have write access.
A similar solution is available as SQL*XL. Probably there are others too.
If the solution must be Excel only, I would recommend using vertical/horizontal lookups with the Excel function vlookup / hlookup (Dutch: vert.zoeken, horiz.zoeken). These function perform reasonable with a small amount of data and performance can be improved by sorting. And they resemble SQL joins, so the database you get within Excel more easily conforms to the relational model.
I hope the event is successfull and the people enjoy it.

Resources