Rendering STAR schema in Excel (a 2-Dimension table) - excel

Please move to an appropriate forum if it doesn't belong here.
I've a data feed that represents some multidimensional data in star schema.
e.g. /Products /SalesYear /SalesContact /Region /Salesdata
Now I want to render this data in a simple tabular view
example
2005 2006 2007
Product1
Category1 27m$ 30m$ 35m$
Category2 9m$ 1m$ 11m$
Product2
Category1 27m$ 30m$ 35m$
Category2 9m$ 1m$ 11m$
Are there any standard algorithm or techniques that can be used to display this kind of data?
[EDIT]
What I need essentially is an efficient method to build an in-memory cube like powerpivot does but at a smaller scale.

I think you could use a modified version of this answer by #Dick Kusleika: Convert row with columns of data into column with multiple rows in Excel 2007. Note that this solution does not the nested rows under Product1/Product2 that you have above, but my guess is you could pretty easily modify the solution to handle two row headings: column A would contain product name and column B would contain the category.
EDIT: I misunderstood and thought you were trying to get the data out of that format, not in to that format.
If you have Excel 2010, the PowerPivot plugin can consume OData fields directly (found the answer on the OData.org consumers page. If you have an older Excel, you might still be able to pull the data in with Get External Data From Web. You may need to throw a proxy page (ASP.NET, PHP, whatever you're comfortable with) in-between that understands JSON and transform it into an HTML table. Get External Data From Web will definitely understand how to read data from a standard table.
Once you have the data in a normalized sheet in Excel, it should only be a matter of inserting a Pivot Table that uses that range as it's data source.

Related

Calculated Field

I am trying to create a simple pivot table which will tell me how many community residents reported a particular problem, and what percentage of them reported each problem type. I have a data set with name, and then columns for each type of problem. Here's an small sample of the data set:
I have created a pivot table which sums each of these columns and also provides me the total number of people who reported any type of problem at all. Here's what I have:
I want to add a second column to this pivot table that gives the percent of times each problem type was reported. Sounds simple, but because of the structure of the original data set, I can't figure out how to do it. I can set up formulas outside of the Pivot Table which reference the table, but in doing so I forfeit the ability to graph the percentages on a pivot chart. Any ideas how to create a calculated field for this pivot table?
Just to be clear, what I want is something like this, except all contained in the structure of the pivot table:
Edit: I've changed the example of the data set. Here's an explanation of the pivot table. The values under the "# Reporting Issue" column are counts of all the 1's under each corresponding column in the data set. This meant that I had to add each row to the pivot table independently, as you can see here:
I'm open to the idea that I need to change the formatting of the data set, but I'm not sure of the best way to do it. This was set up initially because it allowed for easy compilation into a data table, but Pivot Tables seem to be a different story.
Hopefully this edit clarifies things.
You need to unpivot your data so that you turn it into a Flat File...something that the PivotTable can consume properly.
The easiest way is to use something called PowerQuery, which is baked in to Excel 2016 but available as a free addin from Microsoft for any other versions. Google PowerQuery Unpivot and you will turn up hundreds of tutorials, such as this one from my good pal Chandoo . PowerQuery looks slightly daunting at first to a first time user, but it is freakin easy once you get your head around how to use it. PQ is by far the best addition to Excel in years. PowerPivot being a close second.
If you can't install PowerQuery, then you can use your current data structure to make a 'staging pivot', and then drag the Values label that will appear in the Columns area to the bottom of the ROWS pane, like in this excerpt from a book I'm writing:
Note that my Year categories are equivalent to your Issues categories.
That will emulate the flat file layout you’re after. All you need to do then is turn this intermediate PivotTable back into a normal range, change that Values heading to Issue, and add a Count heading and you’ve got the flat file you need to build a useable PivotTable.
You can also use VBA. Google Unpivot VBA and turn up hundreds of results, including this blazingly fast code I posted some time back. (Look for the code under the —Update 26 November 2013— heading.)
You can also use the DoubleClick extraction trick.

Limit data coming into Spotfire by a different data table

I have Table A prompted on Year/Month and Table B. Table B also has a Year/Month column. Table A is the default data table (gets pulled in first). I have set up a relationship between Table A and B on the common Year/Month column.
The goal is to get Table B to only pull through data where the Year/Month matches the Year/Month on Table A (what the user entered). The purpose is to keep the user from entering the Year/Month multiple times.
The issue is Table B contains almost 35 million records. What I do not want to do is have Spotfire pull across all 35 Million records. What is currently happening is Spotfire is pulling all those records, then by setting filtering to include Filtered Rows Only on Table B, I am limiting what is seen in the visualization to under 200,000 rows. I would much rather just pull across 200,000 rows to start with.
The question: Is there a way to force Spotfire to filter the data table (Table B) by another data table (Table A) as it pulls the data table (Table B) across, thus only pulling a small number of records into memory?
I'm writing this off the basis that most people utilize information links to get data into Spotfire, especially large data sets where the data is not embedded in the analysis. With that being said, I prefer to handle as much if not all of the joining / filtering / massaging at the data source versus the Spotfire application. Here are my views on the best practices and why.
Tables / Views vs Procedures as Information Links
Most people are familiar with the Table / View structure and get data into Spotfire in one of 2 ways
Create all joins / links in information designer based off data relations defined by the author by selecting individual tables from the data sources avaliable
Create a view (or similar object) at the data source where all joining / data relations are done, thus giving Spotfire a single flat file of data
Personally, option 2 is much easier IF you have access to the data source since the data source is designed to handle this type of work. Spotfire just makes it available but with limited functionality (i.e. complex queries, Intellisense, etc aren't available. No native IDE). What's even better is Stored Procedures IMHO and here is why.
In options 1 and 2 above, if you want to add a column you have to change the view / source code at the data source, or individually add a column in the information designer. This creates dwarfed objects and clutters up your library. For example, when you create an information link there is a folder with all the elements associated with it. If you want to add columns later, you'll have another folder for any columns added, and this gets confusing and hard to manage. If you create a procedure at the data source to return the data you need, and later want to add some columns, you only have to change this at the data source. i.e. change the procedure. Everything else will be inherited by Spotfire... all you have to do is click the "reload data" button in Spotfire. You don't have to change anything in the information designer. Additionally, you can easily add new parameters, set default parameter properties or prompt the user, making this a very efficient method of data retrieval. This is perfect when the data source is an OLTP and not a data-mart/data-warehouse (i.e. the data isn't already aggregated / cleansed) but can also be powerful in data warehouse environments as well.
Ditch the GUI, Edit the SQL
I find managing conditions, parameters, join paths, etc a bit annoying--but that's me. Instead, when possible, I prefer to click "Edit SQL" next to all the elements in my Information Link and alter the SQL there. This will allow database guys to work in an environment which is more familiar.

Spreadsheet with relationships

I have to work with data CSV file. They look like this
sample
It represents products with options/cars etc. at the web-store.
It has a lot of columns with duplicated values and in my work in often need to copy some part of this data to another sheet, deduplicate it, edit and then paste it back by matching it for one of the columns that were untouched. More this purpose I'm using Ablebits Excel suit.
Is it possible by any excel function to automate this process or maybe there is some other software that could handle this? Something not so complicated as relational databases like Access, but something close to spreadsheet editor with relationships
I already tried Power Query in Excel and Power Bi, but they seem to be more analytics tools and not the data edit
2nd edition:
Data has a layer structure with duplicates.
Title1|Part number 1|Car1
Title1|Part number 1|Car2
Title2|Part number 2|Option1
Title2|Part number 3|Option2
I want to have opportunity to:
Edit values that duplicate without using "Replace All" or at least have more flexible "Find&Replace".
Extract columns with deduplicating them and saving a reference to the place they were taken. So if you edit some data there it was changed in the 1st place. For example, I have titles(a lot of titles) but need to edit it. Instead of copying it with some id to reference it I want to open it like they appears in filters, edit it, confirm and get it edited in all column
I would use Power Query (aka Get & Transform on the Data ribbon in Excel 2016). The only limitation I see with what you want to do is that Power Query will deliver a new Excel Table with the output of a Query - it can't update existing cells.
If you can get past that, Power Query is very flexible, easy to learn (WYSIWYG query editor), scales well and is integrated with other Microsoft products (as well as Power BI, there is integration with SQL Server Analysis Services in preview and hopefully SQL Server Integration Services one day).

Querying single data points from the Excel Data Model / Power Query (Get & Transform Data)

I'm using an up-to-date version of Excel 2016 (via O365 E3 license) and using Power Query / Get & Transform Data. I can successfully create queries and load them to the page. I have also successfully created Power Pivot reports.
I would like to query single data points from the data loaded via Power Query. For instance, imagine a dataset called DivisionalRevenue with:
Date Division Revenue
2016-01-01 Alpha 1000
2016-01-02 Alpha 1500
2016-01-01 Beta 2000
2016-01-02 Beta 400
I could easily load that to an Excel workbook or include it in the data model and create a power pivot. However, Power Pivot doesn't always meet my requirements, particularly around how the data is displayed on the page. In order to achieve my goal I may want to be able to query individual data points.
I would like to have a cell on the page with a formula in it that I can use to query individual data points. If it was in a pivot table I could use something like:
=GETPIVOTDATA("Revenue",$A$3,"Date",DATE(2016,1,1),"Division","Alpha")
The lookup values (date and division) could be retrieved from a cell on the page or hard-coded into the formula. This is a requirement for several reports I'm working on.
Or, I could add a combined lookup column with Date and Division concatenated and use a vlookup to pull the values like:
=VLOOKUP("42371Alpha",I9:L13,4,FALSE)
Finally, I could use a combination of INDEX and MATCH to identify the correct row number and then pull the data.
All of these solutions require the data to be loaded onto a sheet. One requires a pivot table that has to be refreshed to work properly. The other two require creating arbitrary lookup columns so that you can match a row based on more than one field (date and division in this example), and you have to ensure that that lookup field's formula is properly extended down the length of the data table. In both cases I would have concerns when sharing this workbook with my colleagues in case someone affects the rather fragile setup of the pivot table or the lookup.
So, what I truly want to find is something equivalent to pivot table querying against a dataset.
** This doesn't exist, but I would like to know if something like it does **
=GETQUERYDATA("Revenue","DivisionalRevenue","Date",DATE(2016,1,1),"Division","Alpha")
Does such a thing exist? Can such a thing be done? Can I retrieve arbitrary data points from the dataset created through Power Query / Get & Transform Data?
I think that what you want are cubefunctions:
Some Background
How to easy create cubefunctions from a pivot table
There is a feature in Excel that allows you to query off of a PowerPivot model, but it's not highly advertised for some reason.
Once you have the data in your PowerPivot model, go to your Excel -> Data tab -> Existing Connections -> Tables tab
From there, choose the table that you want to start with. Once that table's data is on your excel sheet, you can actually right click that table -> go to "Table" -> "Edit DAX"
From there you can enter the following DAX function, as an example
EVALUATE
FILTER(SampleData,[Date]=DATE(2016,1,1) && SampleData[Division]="Alpha")
Make sure to choose Command Type=DAX in the drop-down. Here's how it looks on my screen:
To further improve your querying power, you can install the optional "DAX Studio" plugin for Excel, which allows you to write custom DAX queries and then export the results directly back to an Excel sheet.

Create a Table with data filtered from data in another worksheet

I have a table with 105 columns and around 300 rows in Sheet 1. I need in Sheet 2 a reduced version of the same table, filtered by some column values (not the first column).
I've looked at Pivot Tables but it seems that I can not get the same tabular structure. I have tried with Advanced Filter and I get an error:
"The extract range has a missing or illegal field name".
Could you help?
Microsoft's PowerQuery addin supports this. One of its many sources can be Excel Data-From Table.
I have discovered that one needs to run Advanced Filter from the destination sheet, in an unused place (best over the intended destination, not on it or below it).
Thanks
You can use the add-in for table-valued functions I developed to make any operations (including filtering, partitioning, aggregation, distribution etc.) on data tables in Excel.
Each table (ListObject in Excel) is an input or output parameter for a table-valued function. You can for example feed three tables as input parameters to a table function which generates some resultant tables.

Resources