I have used QGIS to create statistics by categories-excel-documents. Those documents contain tables (within the same sheet) with IDs, representing buffers of specific places, function-labels, representing the function of the buildings within the buffer of those specific places (ID), and a count for each function-label for each buffer of a specific place(ID).
I did this for two counties which happen to have overlapping buffers, which means I had to combine those tables to get a complete picture of how many buildings of certain functions are within each buffer.
For this, I used a Pivot-Table in Excel. So far so good.
excerpt from excel, showing the original tables used for the pivot-table, the pivot-table and a column containing the relevant IDs
Now, my original tables as well as my Pivot-Table contain IDs (buffer of specific places) that are irrelevant and make the list really long.
Additionally, I have a list / table containing the IDs of buffers that are relevant and will be used for further processing.
How can I either filter my Pivot-Table (or original sources) or rather match entries with the list of IDs that are of relevance, so that I get a list of the needed IDs AND the attached information belonging to the ID (function-label and count per function-label)?
I came across a VLOOKUP-post VLOOKUP-Post on Stack overflow but it stated that this would ignore repeated values. As my IDs are repeated for each function-label, this would be a problem.
I tried to find a filter function in the Pivot-Table but was only able to hand select the IDs that I want displayed in the Pivot-Table. As that's roughly 180 IDs, I don't want to do it manually. I've also seen videos that used the filter or match-function, but this was only used to show matching numbers between two columns of numbers, not the corresponding data that's attached. A similar question on StackOverflow was answered and suggested VLOOKUP, but it also stated that this would be problematic, if the column contained repeated values. As the ID is repeated multiple times for the different function-labels, this seems unfit for my problem.
Related
I have a huge epidemiological dataset containing registry data with pathology reports and clinical information. I have merged several files into one masterfile in order to get all information from one file. Every patient is assigned an unique ID-number. Each patient can have several reports and hence the same ID number can be repeated several times in the ID column. For each ID entry = new row (= pathology or clinical report) there is a date of that sample/information reported.
My goal is to be able to read all pathology/clinical info for a particular ID within one row.
By sorting the IDs, I get a clear picture of the number of each ID that has been entered. The problem arises when there are several reports = multiple rows with identical ID because the dates within this one patients with several IDs = rows do not match. The dates come from pathology (sample date, answer date, clinical info date etc). The dates from pathology and clinical within one patient does not have to match exactly on the day but still within a reasonable timeframe e.g. within 1-2 months. This is best illustrated with an example.
I want to sort the columns so that dates from a particular row match together. I am sure there is a way to do that but I cannot figure it out.
Thanks in advance
The issue of mismatching records seems to arise once the two separate tables are merged into one. In order to fix this, there are several options you can take:
Re-do the merge but strengthen the way in which the tables are joined on.
Instead of only merging based on ID, see if there is another field that could easily connect the records, perhaps a medical record #, case #, or event #, and merge the tables based on this new field AND ID. This would be the strongest solution, however it will only work if you can find said field to strengthen the link.
A separate solution would be to first sort the original tables based on the dates so that they match up and then re-merging them together.
In theory this should solve your problem as I assume currently when matching up the two separate tables it is grabbing the first instance of patient X01 from both tables and matching them together. This can be confirmed by checking the merged query and looking to see if the mismatched records are in the same order as presented in the original tables. This is not perfect, as it relies on no clinical dates occurring between pathology dates for the record, so I would proceed with caution.
And to address your concern about losing track of ID's with multiple rows, this should not matter as in the end result after merged you can then sort by ID, however you can add multiple levels of sort by selecting the data and going to Data -> Sort -> Add Level. You can change the order in which the data is sorted (First by ID and then by Date).
Brief:
I have a large dataset, inside of which are Individual customer orders by item and quantity. What I'm trying to do is get excel to tell me which order numbers contain exact matches (in terms of items and quantities) to each other. Ideally, I'd like to have a tolerance of say 80% accuracy which I can flex to purpose but I'll take anything to get me off the ground.
Existing Solution:
At the moment, I've used concatenation to pair item with quantity, pivoted and then put the order references as column and concat as rows with quantity as data (sorted by quantity desc) and I'm visually scrolling across/down to find matches and then manually stripping in my main data where necessary. I have about 2,500 columns to check so was hoping I could find a more suitable solution with excel doing the legwork on identification.
Index/matching works at cross referencing a match for the concatenation but of course, the order numbers (which are unique) are different so its not giving me matches ACROSS orders.
Fingers crossed!
EDIT:
Data set with outcomes
As you can see, the bottom order reference has no correlation to the orders above it so is not listed as a match to the orders above but 3 are identical and 1 has a slightly different item but MOSTLY matches.
I have a little bit of an Excel problem and would be happy about any suggestions.
Long version: I have a dataset with raw data representing journal entries. The structure of this dataset can be seen here:
Now, what I want to achieve is to assign each row/each journal entry to a cost category (marketing, personnel, IT, depreciation, …) based on the values in the account number, type, and cost center rows, and, in a second step, break down the categories once more, eg. for labour costs, distinguish between direct and indirect labour costs.
The way my company does this right now is using an Excel sheet with several macros where the criteria are hardcoded in the VBA code to loop through the whole list, check if a row matches the criteria for a certain cost category, and if it does, copy the row to a new sheet (having one new sheet for each category), then using a second macro to break down the categories, assigning values to the “description”-column which is empty initially based on another set of criteria. Then, pivot tables are used on each of the new sub-datasets to calculate sums for each sub-category. These sums are finally used as input data for a management report (as seen in the image above) which is the ultimate goal of this whole ordeal.
Now, not only does this seem overly complicated to me and running the macros and manually adjusting the input ranges for the pivot tables takes forever, but also the criteria for allocating the costs can change quite often, and opening the VBA editor and changing the code is not really user-friendly.The initial idea was to maybe include some helper columns (one for each cost category) and somehow create an indicator variable being one of the entry falls in the respective category, and zero otherwise, and then use these columns for further calculations (e.g. for Sumifs and such).
The problem is that a) combinations of account number and type are not unique, so that one account number can go along with various types, and one type can go along with various account numbers, so the criteria can be something like C6 = 544300 OR 544700 AND D6<>110246, etc. And b) criteria can change, meaning sometimes a new account number or type is added that also has to be assigned to an already existing category such as labor costs, which would make it necessary to include that criterion in all the formulas for that particular cost category. So, is it possible to somehow create a criteria table for each category that serves as input for some sort of IF/SUMIF or lookup function?
Short version: I have a data set (can range from 5000 to up to 100000 rows, 8 columns) where I want to perform a lookup, but based on various criteria. And, in addition to that, it would be nice if the criteria could somehow be drawn from a separate list so that they can be modified fairly easily without having to change the formula itself. Is there a way to do so? Or do you think using the advanced filter might be the most suitable option?
I have imported a bunch of data using PowerQuery into a single table and am building dashboard reporting. I have been using Pivot Tables to build my reports, which has worked fine so far.
However, I've come to a point though where I want to simply show the count of multiple columns (calculated fields). So I have column A,B,C,D, and want to show the count each of each. But, I don't want them to be subsets (or children) of one another, and I don't want to build a bunch of Pivot Tables (file is already getting pretty big, and I want them row by row for easy viewing). Any suggestions?
Also, I am using the "Columns" field already to show the counts by certain weeks (week one, week two, etc.).
Thanks,
-A
Thanks for the follow-up. Within PowerPivot, I have four calculated fields/columns that are True/False for each column. I want to know how many times each of those columns were marked "True" (I can rename the "True" field to distinguish between which field it's referencing). But I don't want four pivot tables. Right now I can only think of making four pivot tables, filtering out the false for each one, then hiding the rows so the "True" values stack on top of one another. If I put all the four fields together in the same Pivot, the three below the first become subsets. I don't want subsets, just occurrence counts.
Does this help provide clarification?
If I understand you correctly, here's an example that shows what you're trying to achieve:
The table on the left has the TRUE/FALSE entries and the PivotTable on the right just shows the number of true items in each of those columns.
The format of the DAX measure to produce these count totals is:
[Count of A]=CALCULATE(COUNTROWS(PetFacts),PetFacts[A]=TRUE)
(Apologies to any parrot owners who may get upset that I have inadvertently re-classified their pets as cold-blooded!)
I've been tasked with building some ad-hoc reports in Excel that are sourced from an SSAS OLAP cube. I don't have the ability to alter the design of the cube's dimensions currently. I've been receiving repeated requests to filter results based upon the combination of two different dimensions and their attributes.
For example:
One dimension lists locations with their hierarchies. Another dimension contains codes for the various insurance companies we work with. I'm given a list of combinations of these, concatenated with a hyphen separating them, and they are supposed to be the only combinations within the report. For example, I get things like "001-AB5". Unfortunately, there are duplicates of the codes, so I can't just pull the code, seeing that AB5 means different things for different locations, which I can't do anything about at this time either.
For some of the smaller data sets, I've used PowerPivot and just created a calculated column, and added a relationship to the list in another sheet. The issue is that now they want the drill-through actions that have been setup for the cube. Is it possible to create something like a calculated dimension in Excel (or some other means) that would be the concatenation of these without using PowerPivot?