Resolve "missing relationships" message - excel

I get the "relationships between tables may be needed" message in Excel 2016 PowerPivot. I suspect this is because I've got a many-to-many relationship, but lack experience to be sure. I'll be happy to read up on it if someone can help me put a name on the problem.
The data that I'm analyzing looks like this:
Units - Instances of a product. E.g. the computer was produced 100 times = 100 units. Each unit has ONE ProductID, and a UnitID (unique). Units have e.g. a production time.
Products - Each product can consist of several items. A product could be computer xyz = "laptop xyz, driver CD, manual, power brick, 2 chocolates and mains cable". Each product has a unique ProductID and an PackListID. Several products can have the same PackListID.
PackList - Unique PackListIDs
CompositeList - for each PackListID, this can contain several ItemIDs and number included in packing list (e.g. 2 pieces of chocolate)
Items - line items for a product packing list, e.g. "manual for computer xyz"
I want to answer the question: How many (compositelist.number) line items of each Items have been included in units produced on a particular date (units.date).
I read the data by SQL from a existing application/database server. I cannot influence how the data is structured :(
I've imported my data and connected as follows:
Unit.ProductID -> Product.ProductID (each product can have many units)
Product.PackListID -> PackList.PackListID (bridge table)
PackList.PackListID -> CompositeList.PackListID (each packlist can have many composite-rows)
CompositeList.ItemID -> Items.ItemID (each composite-list refers to one item)
If I filter by PackListID, the result is a correct list of line items with sums.
Now I'd like to multiply that with the number of times that packlist has been produced on a particular day...
My pivot looks like this:
Filter: Production date, filter to a particular date
Rows: Items.text
Values: Sum of compositelist.number
I would love to see the number of line items used in production on a certain day, e.g. 15 manuals and 32 mains cables.
Unfortunately I get the message "Relationships between tables may be needed".

Related

organize a set of data based on open slots/sold-out slots

I am trying to analyze data based on the following scenario:
A group of places, each with its own ID gets available for visiting from time to time for an exclusive number of people - this number varies according to how well the last visit season performed - so far visit seasons were opened 3 times.
Let's suppose ID_01 in those three seasons had the following available slots/sold-out slots ratio: 25/24, 30/30, and 30/30, ID_02 had: 25/15, 20/18, and 25/21, and ID_03 had: 25/10, 15/15 and 20/13.
What would be the best way to design the database for such analysis on a single table?
So far I have used a table for each ID with all their available slots and sold-out amounts, but as the number of IDs gets higher and the number of visit seasons too (way beyond three at this point) it has been proving to be not ideal, hard to keep track of, and terrible to work with.
The best solution I could come up with was putting all IDs on a column and adding two columns for each season (ID | 1_available | 1_soldout | 2_available | 2_soldout | ...).
The Wikipedia article on database normalization would be a good starting point.
Based on the information you provided in your question, you create one table.
AvailableDate
-------------
AvailableDateID
LocationID
AvailableDate
AvailableSlots
SoldOutSlots
...
You may also have other columns you haven't mentioned. One possibility is SoldOutTimestamp.
The primary key is AvailableDateID. It's an auto-incrementing integer that has no meaning, other than to sort the rows in input order.
You also create a unique index on (LocationID, AvailableDate) and another unique index on (AvailableDate, LocationID). This allows you to retrieve the row by LocationID or by AvailableDate.

Excel: Order by date within multiple IDs

I have a huge epidemiological dataset containing registry data with pathology reports and clinical information. I have merged several files into one masterfile in order to get all information from one file. Every patient is assigned an unique ID-number. Each patient can have several reports and hence the same ID number can be repeated several times in the ID column. For each ID entry = new row (= pathology or clinical report) there is a date of that sample/information reported.
My goal is to be able to read all pathology/clinical info for a particular ID within one row.
By sorting the IDs, I get a clear picture of the number of each ID that has been entered. The problem arises when there are several reports = multiple rows with identical ID because the dates within this one patients with several IDs = rows do not match. The dates come from pathology (sample date, answer date, clinical info date etc). The dates from pathology and clinical within one patient does not have to match exactly on the day but still within a reasonable timeframe e.g. within 1-2 months. This is best illustrated with an example.
I want to sort the columns so that dates from a particular row match together. I am sure there is a way to do that but I cannot figure it out.
Thanks in advance
The issue of mismatching records seems to arise once the two separate tables are merged into one. In order to fix this, there are several options you can take:
Re-do the merge but strengthen the way in which the tables are joined on.
Instead of only merging based on ID, see if there is another field that could easily connect the records, perhaps a medical record #, case #, or event #, and merge the tables based on this new field AND ID. This would be the strongest solution, however it will only work if you can find said field to strengthen the link.
A separate solution would be to first sort the original tables based on the dates so that they match up and then re-merging them together.
In theory this should solve your problem as I assume currently when matching up the two separate tables it is grabbing the first instance of patient X01 from both tables and matching them together. This can be confirmed by checking the merged query and looking to see if the mismatched records are in the same order as presented in the original tables. This is not perfect, as it relies on no clinical dates occurring between pathology dates for the record, so I would proceed with caution.
And to address your concern about losing track of ID's with multiple rows, this should not matter as in the end result after merged you can then sort by ID, however you can add multiple levels of sort by selecting the data and going to Data -> Sort -> Add Level. You can change the order in which the data is sorted (First by ID and then by Date).

Excel Matching Customer Orders by Item and Quantity

Brief:
I have a large dataset, inside of which are Individual customer orders by item and quantity. What I'm trying to do is get excel to tell me which order numbers contain exact matches (in terms of items and quantities) to each other. Ideally, I'd like to have a tolerance of say 80% accuracy which I can flex to purpose but I'll take anything to get me off the ground.
Existing Solution:
At the moment, I've used concatenation to pair item with quantity, pivoted and then put the order references as column and concat as rows with quantity as data (sorted by quantity desc) and I'm visually scrolling across/down to find matches and then manually stripping in my main data where necessary. I have about 2,500 columns to check so was hoping I could find a more suitable solution with excel doing the legwork on identification.
Index/matching works at cross referencing a match for the concatenation but of course, the order numbers (which are unique) are different so its not giving me matches ACROSS orders.
Fingers crossed!
EDIT:
Data set with outcomes
As you can see, the bottom order reference has no correlation to the orders above it so is not listed as a match to the orders above but 3 are identical and 1 has a slightly different item but MOSTLY matches.

Looking up values from different tables including newly found values

I have several documents which contain statistical data of performance of companies. There are about 60 different excel sheets representing different months and I want to collect data into one big table. Original tables looks something like this, but are bigger:
Each company takes two rows which represent their profit from the sales of the product and cost to manufacture the product.I need both of these numbers.
As I said, there are ~60 these tables and I want to extract information about Product2. I want to put everything into one table where columns would represent months and rows - profit and costs of each company. It could be easily done (I think) with INDEX function as all sheets are named similarly. The problem I faced is that at some periods of time other companies enter the market:
Some of them stay, some of them fail. I would like to collect information on all companies that exist today or ever existed, but newly found companies distort the list (in second picture we see, that company BA is in 4th row, not BB). As row of a company changes from time to time, using INDEX becomes problematic, because in some cases results of different companies get into one row. Adjusting them one by one seems very painful.
Maybe there is some quick and efficient method to solve such problem?
Any help or ideas would be appreciated.
One think you may want to try is linking the Excel spreadsheets as tables in Access. From there you can create a query that ties the tables together. As data changes in the spreadsheets, the query will reflect those changes.

SSAS Cube Calculation to summarise another measure at a coarser level

I have run into performance problems with MDX measure calculations for a summary report, using SQL Server 2008 R2.
I have a Person dimension, and a related fact table containing multiple records per person. (Qualifications)
Eg [Measures].[Other Qual Count] would give me the number of qualifications of a certain type.
Each person could have multiple, so [Measures].[Other Qual Count] > 1 for one person.
However on my summary report I would like to indicate this as 1 per person only. (To indicate the number of persons with Other qualifications.)
The summary report rolls up the values against some other dimensions including an unknown Region hierarchy (it can be one of 3 hierarchies).
I have done this as follows:
MEMBER [Measures].[Other Count2]
AS
SUM(
EXISTING [Person].[Staff Code].[Staff Code].Members,
IIF([Measures].[Other Count] > 0, 1, NULL)
)
However, I have to create several more derived measures - deriving from each other, and all at Person level to avoid unwanted multiple counts. The query slows down from <1 second to 1min+ (my goal is <3s).
The reason for all the derivations is a lot of logic to determine within which one of 6 mutually exclusive column a person will be reported in.
I have also tried to create a Cube Calculation, but this gives me the same value as [Other Count].
SCOPE (({[Person].[Staff Code].[Staff Code].MEMBERS}, [Measures].[Has Other Qual]));
THIS = ([Person].[Staff Code].[Staff Code], [Measures].[Has Other Qual]).Count;
END SCOPE;
Is there a better MDX/Cube calculation that can be used, or any suggestions on improving performance?
This is unfortunately my first time working with MDX and ran into this problem close to a deadline, so I am trying to make this work if possible without changes to the cube.
I have resolved the issue by changing the cube, which was simpler than expected.
On the Data Source View, I created a named query which summarizes the existing fact table at Person level. I also derive all the columns which I will need on my reports.
Treating this named query as a separate fact table, I added a measure group for it and that resolved all my problems.

Resources