Brief:
I have a large dataset, inside of which are Individual customer orders by item and quantity. What I'm trying to do is get excel to tell me which order numbers contain exact matches (in terms of items and quantities) to each other. Ideally, I'd like to have a tolerance of say 80% accuracy which I can flex to purpose but I'll take anything to get me off the ground.
Existing Solution:
At the moment, I've used concatenation to pair item with quantity, pivoted and then put the order references as column and concat as rows with quantity as data (sorted by quantity desc) and I'm visually scrolling across/down to find matches and then manually stripping in my main data where necessary. I have about 2,500 columns to check so was hoping I could find a more suitable solution with excel doing the legwork on identification.
Index/matching works at cross referencing a match for the concatenation but of course, the order numbers (which are unique) are different so its not giving me matches ACROSS orders.
Fingers crossed!
EDIT:
Data set with outcomes
As you can see, the bottom order reference has no correlation to the orders above it so is not listed as a match to the orders above but 3 are identical and 1 has a slightly different item but MOSTLY matches.
Related
I have been provided with two tables 1. sales table and 2.Items and it's characters. (The second table has two or more lines for one item if it has more than one characters)
I am trying to get the sum of sales value for the items based on its characters.
I have created a unique characters table and a unique material tables to link to other tables
In power pivot I am able to get the sum of sales by the material in U_Material, but if I include the Characters from U_Item and characters, then I am getting all the characters in the table and the sum, but actually it has only one characters (other characters are repeated with same value). what can i do to solve this.
if I don't include the materials or put them in filter then I get the overall sales value in the output but not as per the category.
Please let me know what I am doing wrong.
The problem is in your model. Your Fact table "Sales" are filtered only by U_Item any of your other tables don't have an impact on it; You should change the model to star/snowflake. Put "Sales" in the center, rest of your tables (Dimension) connect by relationship one to many (many on Sales side);
I have a huge epidemiological dataset containing registry data with pathology reports and clinical information. I have merged several files into one masterfile in order to get all information from one file. Every patient is assigned an unique ID-number. Each patient can have several reports and hence the same ID number can be repeated several times in the ID column. For each ID entry = new row (= pathology or clinical report) there is a date of that sample/information reported.
My goal is to be able to read all pathology/clinical info for a particular ID within one row.
By sorting the IDs, I get a clear picture of the number of each ID that has been entered. The problem arises when there are several reports = multiple rows with identical ID because the dates within this one patients with several IDs = rows do not match. The dates come from pathology (sample date, answer date, clinical info date etc). The dates from pathology and clinical within one patient does not have to match exactly on the day but still within a reasonable timeframe e.g. within 1-2 months. This is best illustrated with an example.
I want to sort the columns so that dates from a particular row match together. I am sure there is a way to do that but I cannot figure it out.
Thanks in advance
The issue of mismatching records seems to arise once the two separate tables are merged into one. In order to fix this, there are several options you can take:
Re-do the merge but strengthen the way in which the tables are joined on.
Instead of only merging based on ID, see if there is another field that could easily connect the records, perhaps a medical record #, case #, or event #, and merge the tables based on this new field AND ID. This would be the strongest solution, however it will only work if you can find said field to strengthen the link.
A separate solution would be to first sort the original tables based on the dates so that they match up and then re-merging them together.
In theory this should solve your problem as I assume currently when matching up the two separate tables it is grabbing the first instance of patient X01 from both tables and matching them together. This can be confirmed by checking the merged query and looking to see if the mismatched records are in the same order as presented in the original tables. This is not perfect, as it relies on no clinical dates occurring between pathology dates for the record, so I would proceed with caution.
And to address your concern about losing track of ID's with multiple rows, this should not matter as in the end result after merged you can then sort by ID, however you can add multiple levels of sort by selecting the data and going to Data -> Sort -> Add Level. You can change the order in which the data is sorted (First by ID and then by Date).
I am working on a budget and need some help with looking up a value and returning multiple text values (categories).
Basically transactions are exported via a customers bank. This has names dates etc. I am then trying to add a category at the end, for example, rent, bills, food. However some have subcategories. Food for example might be food core, or food junk.
I can populate one cell with the below formula pretty easy but not sure how to do two. The second category needs to be one box across but ideally in the same formula as I have thousands of lines
This is the code for the category 1,
=IF(ISNUMBER(SEARCH("NEWWORLD", F2,1)),"FOOD")
I have a little bit of an Excel problem and would be happy about any suggestions.
Long version: I have a dataset with raw data representing journal entries. The structure of this dataset can be seen here:
Now, what I want to achieve is to assign each row/each journal entry to a cost category (marketing, personnel, IT, depreciation, …) based on the values in the account number, type, and cost center rows, and, in a second step, break down the categories once more, eg. for labour costs, distinguish between direct and indirect labour costs.
The way my company does this right now is using an Excel sheet with several macros where the criteria are hardcoded in the VBA code to loop through the whole list, check if a row matches the criteria for a certain cost category, and if it does, copy the row to a new sheet (having one new sheet for each category), then using a second macro to break down the categories, assigning values to the “description”-column which is empty initially based on another set of criteria. Then, pivot tables are used on each of the new sub-datasets to calculate sums for each sub-category. These sums are finally used as input data for a management report (as seen in the image above) which is the ultimate goal of this whole ordeal.
Now, not only does this seem overly complicated to me and running the macros and manually adjusting the input ranges for the pivot tables takes forever, but also the criteria for allocating the costs can change quite often, and opening the VBA editor and changing the code is not really user-friendly.The initial idea was to maybe include some helper columns (one for each cost category) and somehow create an indicator variable being one of the entry falls in the respective category, and zero otherwise, and then use these columns for further calculations (e.g. for Sumifs and such).
The problem is that a) combinations of account number and type are not unique, so that one account number can go along with various types, and one type can go along with various account numbers, so the criteria can be something like C6 = 544300 OR 544700 AND D6<>110246, etc. And b) criteria can change, meaning sometimes a new account number or type is added that also has to be assigned to an already existing category such as labor costs, which would make it necessary to include that criterion in all the formulas for that particular cost category. So, is it possible to somehow create a criteria table for each category that serves as input for some sort of IF/SUMIF or lookup function?
Short version: I have a data set (can range from 5000 to up to 100000 rows, 8 columns) where I want to perform a lookup, but based on various criteria. And, in addition to that, it would be nice if the criteria could somehow be drawn from a separate list so that they can be modified fairly easily without having to change the formula itself. Is there a way to do so? Or do you think using the advanced filter might be the most suitable option?
I have imported a bunch of data using PowerQuery into a single table and am building dashboard reporting. I have been using Pivot Tables to build my reports, which has worked fine so far.
However, I've come to a point though where I want to simply show the count of multiple columns (calculated fields). So I have column A,B,C,D, and want to show the count each of each. But, I don't want them to be subsets (or children) of one another, and I don't want to build a bunch of Pivot Tables (file is already getting pretty big, and I want them row by row for easy viewing). Any suggestions?
Also, I am using the "Columns" field already to show the counts by certain weeks (week one, week two, etc.).
Thanks,
-A
Thanks for the follow-up. Within PowerPivot, I have four calculated fields/columns that are True/False for each column. I want to know how many times each of those columns were marked "True" (I can rename the "True" field to distinguish between which field it's referencing). But I don't want four pivot tables. Right now I can only think of making four pivot tables, filtering out the false for each one, then hiding the rows so the "True" values stack on top of one another. If I put all the four fields together in the same Pivot, the three below the first become subsets. I don't want subsets, just occurrence counts.
Does this help provide clarification?
If I understand you correctly, here's an example that shows what you're trying to achieve:
The table on the left has the TRUE/FALSE entries and the PivotTable on the right just shows the number of true items in each of those columns.
The format of the DAX measure to produce these count totals is:
[Count of A]=CALCULATE(COUNTROWS(PetFacts),PetFacts[A]=TRUE)
(Apologies to any parrot owners who may get upset that I have inadvertently re-classified their pets as cold-blooded!)