Limit data coming into Spotfire by a different data table - spotfire

I have Table A prompted on Year/Month and Table B. Table B also has a Year/Month column. Table A is the default data table (gets pulled in first). I have set up a relationship between Table A and B on the common Year/Month column.
The goal is to get Table B to only pull through data where the Year/Month matches the Year/Month on Table A (what the user entered). The purpose is to keep the user from entering the Year/Month multiple times.
The issue is Table B contains almost 35 million records. What I do not want to do is have Spotfire pull across all 35 Million records. What is currently happening is Spotfire is pulling all those records, then by setting filtering to include Filtered Rows Only on Table B, I am limiting what is seen in the visualization to under 200,000 rows. I would much rather just pull across 200,000 rows to start with.
The question: Is there a way to force Spotfire to filter the data table (Table B) by another data table (Table A) as it pulls the data table (Table B) across, thus only pulling a small number of records into memory?

I'm writing this off the basis that most people utilize information links to get data into Spotfire, especially large data sets where the data is not embedded in the analysis. With that being said, I prefer to handle as much if not all of the joining / filtering / massaging at the data source versus the Spotfire application. Here are my views on the best practices and why.
Tables / Views vs Procedures as Information Links
Most people are familiar with the Table / View structure and get data into Spotfire in one of 2 ways
Create all joins / links in information designer based off data relations defined by the author by selecting individual tables from the data sources avaliable
Create a view (or similar object) at the data source where all joining / data relations are done, thus giving Spotfire a single flat file of data
Personally, option 2 is much easier IF you have access to the data source since the data source is designed to handle this type of work. Spotfire just makes it available but with limited functionality (i.e. complex queries, Intellisense, etc aren't available. No native IDE). What's even better is Stored Procedures IMHO and here is why.
In options 1 and 2 above, if you want to add a column you have to change the view / source code at the data source, or individually add a column in the information designer. This creates dwarfed objects and clutters up your library. For example, when you create an information link there is a folder with all the elements associated with it. If you want to add columns later, you'll have another folder for any columns added, and this gets confusing and hard to manage. If you create a procedure at the data source to return the data you need, and later want to add some columns, you only have to change this at the data source. i.e. change the procedure. Everything else will be inherited by Spotfire... all you have to do is click the "reload data" button in Spotfire. You don't have to change anything in the information designer. Additionally, you can easily add new parameters, set default parameter properties or prompt the user, making this a very efficient method of data retrieval. This is perfect when the data source is an OLTP and not a data-mart/data-warehouse (i.e. the data isn't already aggregated / cleansed) but can also be powerful in data warehouse environments as well.
Ditch the GUI, Edit the SQL
I find managing conditions, parameters, join paths, etc a bit annoying--but that's me. Instead, when possible, I prefer to click "Edit SQL" next to all the elements in my Information Link and alter the SQL there. This will allow database guys to work in an environment which is more familiar.

Related

Is there a way to create a dynamic form that generates questions based on rows in an Excel table?

I have an Excel table that has data added to it periodically. I have a second table that describes the connection between the items in the first table. For the sake of an example:
First table with data:
and
second table with connections.
I would like to create a form that allows you to pick multiple item pairs from table 1, write a "connection" for each pair, and then submit that into table 2. Since table 1 has new data added periodically, the form would have to be somehow linked to table 1 to always allow a connection to be made with all the items.
I've looked everywhere for a solution, but none of the forms I found that could be integrated with Excel had this "dynamic" functionality. Is there a workaround or a way to achieve this.
(In reality, the tables and names are more complex, and everything is hosted on Sharepoint online.)

Spotfire Load on Demand for External Table

Suppose I have two tables, TableA(embedded data), TableB(external data).
Scenario 1:
TableB is set On-Demand based on the markings from TableA. When you mark something from TableA, it take some "n" seconds to populate the data in TableB. On-Demand setting on external table is like screenshot named LOD.png
Scenario 2:
On-Demand settings have not been induced on TableB(please note TableB still is External). There has been a relationship created between TableA and TableB. TableB is now limited based on marking from TableA by the option"Limit data using Markings".screenshot named ss2
Questions:
1. Which scenario fetches data quicker.
2. From the debug log, the query passed in both the scenario is the same.Does that mean both scenarios are same or are they different?
Scenario 1 is good if Table B is really large, or records take a long time to fetch from the database. In this case, Table B is only returning rows that are based on what you marked in Table A. This means that the number of rows could be significantly less, but this also means that every time the marking changes, those rows have to be fetched at that time. If the database takes a long time, this can become frustrating. On the flip side, if the database is really fast and you are limiting rows down enough, this can be almost seamless. At the end of the day, you are pulling this data into memory after the query runs, so all Spotfire functionality is available.
Scenario 2 is good if calculations are highly complex and need to take advantage of the power of the external DB to perform. This means that any change to the report, a change of visualization etc., will require a new query to be sent to the external data source resulting in a new table of aggregated data. This means that no changes to a visualization using an in-db data table can be made when you are not connected to the external data source. Please note, there is functionality in Spotfire that is available to in memory data like data on demand that is not available to external data.
I personally like to keep data close to Spotfire to take advantage of all Spotfire functionality, but I cannot tell you exactly which is the correct method in your case. Perhaps these TIBCO links on the difference between in memory data and external data can help:
https://docs.tibco.com/pub/spotfire/6.5.1/doc/html/data/data_overview.htm
https://docs.tibco.com/pub/spotfire/6.5.1/doc/html/data/data_working_with_in-database_data.htm

Excel 2010: Automatically combine multiple tables into one dataset

I thought there would be a simple way of doing this, but unfortunately I have not come across one. My company has an Excel workbook with 12 sheets (1 for each month), into which I enter sales data as accounts are written. I reformatted each month's data into tables, thinking that this would provide an easy reference to gather the data into a pivot table that joins all the months and would be updated as I enter data; however, a pivot table based on multiple sets of data allows highly limited manipulation.
So what I want to do is create a new table that is automatically populated as I enter data in any of the 12 current tables, to combine them into a master listing. I have tried doing a query, but when I try to set up the data sources, it doesn't recognize my tables. I tried Power Query, but I couldn't get it to update the data as I updated the source. Consolidate also was not a useful feature, as it required all the data to be somehow calculated, and my columns need to simply be copied over, not summed or averaged.
As you can probably tell from my explanations and terminology, I'm no Excel expert. I don't know what VBA even is, let alone know how to use it, but I've seen it mentioned a lot, so I figure at some point in my life I should learn it.
Is there a formula or some other Excel 2010 feature that can automatically copy all of this data onto one running list, and keep it updating as I enter data in the source tables? It would have to run automatically.
I believe your end goal is to have a pivot table which consolidates data from each of the individual 12 sheets/tables and not really to have the intermediate "single running list which is an aggregation of all the 12 sheets".
If so, I suggest to create an Excel Pivot table directly based upon the 'Multiple consolidation ranges'.
To start, create a new spreadsheet and select a cell (say A3) and use the click sequence Alt+D+P, this will bring up the PivotTable and PivotChart Wizard, and proceed further using the third option - 'Mulitple consolidation ranges'.
I will have to refer you to the below site for a detailed step by step instructions on the above: http://www.contextures.com/xlPivot08.html
Please be aware that the Difficulty level for this solution is Medium, suggest you to bookmark the solution from maintainability reasons, in case you choose to implement it.

PowerPivot Relationships Many to Many

The objective I am trying to achieve is to have 2 slicers in PowerPivot, ClientID and CSQName. When a ClientID is selected only the CSQnames that are related to that ClientID show up ,and vice versa
Relationship diagram link: https://goo.gl/photos/PnCZrnsXXTx3oFGh8
I am having a problem linking a many to many relationship in PowerPivot. A brief background on the application I am trying to build...
I am trying to combine a SQL database (IDM) and Informix SQL database (Cisco Call Data). The IDM database includes the Client Data and TBAS Open Case Data. Each Client has a specific ClientID. The Cisco database includes Call Detail Info and CSQNames(queue names). A many to many relationship exists, for example, a clientid can have multiple CSQname (clientid 3 has CSQ names of "A" and "B"). Also a csqname can have multiple clientids (csqname "Z" includes clientids "99", "98" and "97". Therefore I created an innerjoin table to create the many to many relationship called "Clients_CSQ".
I am trying to use this innerjoin table for both the "TBAS Open Cases" and "Call Detail". When I use this table for my filters, PowerPivot is stating that no relationships exist. Are there any solutions? If this does not make sense please let me know and I will try to specify. I have ready many posts but am unable to grasp how to make the DAX many to many relationship work with the calculate function. If someone can shed some light on the issue I am having it would be greatly appreciated. Thank you.
This really depends upon the data you are looking to report on.
When you add two slicers to a PowerPivot table, the available selections in each slicer will be affected by the selection in the other slicer IF and ONLY IF all of the fields in the Values section of the Pivot Table are reliant on the entries in both of the slicer fields.
In your case, it is possible to make this work (as an example) by creating 3 measures:
[Call Total]=SUM('TBAS Open Cases'[Case duration])
[Number of Calls]=COUNTA('Call Detail'[appname])
[Calls by Duration]=SUMX('Clients_CSQ',DIVIDE([Call Total],[Number of Calls]))
Place the last of these 3 measures in a pivot table with the slicers set to use 'Clients_IDM'[ic_client_id] and 'CSQ Name'[csqname] and "Hey Presto!"
The first two measures are straightforward enough. The third one is cycling through each entry in the only table that these two slicer fields have in common (Clients_CSQ) and performing a calculation using the data from your FACT tables. I have no idea if the [Calls by Duration] measure that I've come up with makes any sense with your data set, but hopefully the example will help you reach the solution you want. Again depending on what data you want to show it doesn't really matter if this measure returns junk, the important thing is that it's pulling your two data sets together.
Remember that as soon as you add any raw field from either of the fact tables to this 'unifying pivot table', the inter-relationship between the slicers will break. !!!BUT!!! there is nothing to stop you from linking the csqname slicer to another pivot on the same sheet which contains fields from your Call Detail table and likewise linking the ic_client_id slicer to a pivot that contains TBAS Open Cases data. In fact, the 'unifying pivot table' could be on a different sheet from your slicers, so you only see the two sets of data that you are interested in.
And ignore that warning about no relationships existing!

Is it possible to filter data used by pivot table based on filtering the rows in a source table in Excel?

I have developed a dashboard in Excel 2007 that uses one source table in a sheet (being filled with a query on our data warehouse) and multiple pivot tables making different cross sections on this data.
I use the GETPIVOTDATA in almost a hundred formulas to give me the right value for a specific indicator in my dashboard.
This all works fine. However I now have received the question to make the dashboard for 5 different segments. As you can imagine I don't want to create 5 different workbooks for this and need to maintain the dashboard logic on all of them.
So my question is the following. Is it possible to automatically (through VBA or any other means) filter the results in my source table which is the source for my pivot tables and thus for my dashboard values.
So schematically:
DATABASE_VIEW --> SOURCE_TABLE --> 12 pivot tables --> 100 GETPIVOTDATA functions
Preferably I would like to load all the segments in the source_table (one view on my database) and then filter the data in the source table, which results in filterd source_dat for my pivots. This way I can (without requerying the db) quickly change between segments in the dashboards (refreshing pivots only).
Data in the source table has the column: CUSTOMER_SEGMENT available to filter upon.
Any help is appreciated.
Geoffrey
You can manipulate all external data connections and internal pivottables through VBA.
To make it a double learner for you I recommend using the Record Macro button and then changing a filter in your pivot table and also change your SQL query a bit.
You will now see that in the recorded macro the related properties of that Pivottable/query are stated. Filters and SQL are simply Strings in the VBA code, thus you can alter certain bits to get different filters or "WHERE Cust_ID = " comboboxCust.Value kind of things.
Doing it through VBA codes to change the filters and SQL is usually more speedy then having it all interactively related with the standard Excel tools (Functions, parameters, linked filters, ...)

Resources