Apportioning data in Excel such that it matches the summary data - excel-formula

I want to apportion across rows(product groups) and columns(stores) in such a way that row total as well as column total matches
e.g.
I have store total as follows
and product group wise total as follows
and store productgroup wise data as given below
I want to apportion in such a way that my store wise numbers match (in link1)and productgroup wise numbers also match(in link2)

Related

Excel Matching Customer Orders by Item and Quantity

Brief:
I have a large dataset, inside of which are Individual customer orders by item and quantity. What I'm trying to do is get excel to tell me which order numbers contain exact matches (in terms of items and quantities) to each other. Ideally, I'd like to have a tolerance of say 80% accuracy which I can flex to purpose but I'll take anything to get me off the ground.
Existing Solution:
At the moment, I've used concatenation to pair item with quantity, pivoted and then put the order references as column and concat as rows with quantity as data (sorted by quantity desc) and I'm visually scrolling across/down to find matches and then manually stripping in my main data where necessary. I have about 2,500 columns to check so was hoping I could find a more suitable solution with excel doing the legwork on identification.
Index/matching works at cross referencing a match for the concatenation but of course, the order numbers (which are unique) are different so its not giving me matches ACROSS orders.
Fingers crossed!
EDIT:
Data set with outcomes
As you can see, the bottom order reference has no correlation to the orders above it so is not listed as a match to the orders above but 3 are identical and 1 has a slightly different item but MOSTLY matches.

Excel: Create lookup based on list of criteria

I have a little bit of an Excel problem and would be happy about any suggestions.
Long version: I have a dataset with raw data representing journal entries. The structure of this dataset can be seen here:
Now, what I want to achieve is to assign each row/each journal entry to a cost category (marketing, personnel, IT, depreciation, …) based on the values in the account number, type, and cost center rows, and, in a second step, break down the categories once more, eg. for labour costs, distinguish between direct and indirect labour costs.
The way my company does this right now is using an Excel sheet with several macros where the criteria are hardcoded in the VBA code to loop through the whole list, check if a row matches the criteria for a certain cost category, and if it does, copy the row to a new sheet (having one new sheet for each category), then using a second macro to break down the categories, assigning values to the “description”-column which is empty initially based on another set of criteria. Then, pivot tables are used on each of the new sub-datasets to calculate sums for each sub-category. These sums are finally used as input data for a management report (as seen in the image above) which is the ultimate goal of this whole ordeal.
Now, not only does this seem overly complicated to me and running the macros and manually adjusting the input ranges for the pivot tables takes forever, but also the criteria for allocating the costs can change quite often, and opening the VBA editor and changing the code is not really user-friendly.The initial idea was to maybe include some helper columns (one for each cost category) and somehow create an indicator variable being one of the entry falls in the respective category, and zero otherwise, and then use these columns for further calculations (e.g. for Sumifs and such).
The problem is that a) combinations of account number and type are not unique, so that one account number can go along with various types, and one type can go along with various account numbers, so the criteria can be something like C6 = 544300 OR 544700 AND D6<>110246, etc. And b) criteria can change, meaning sometimes a new account number or type is added that also has to be assigned to an already existing category such as labor costs, which would make it necessary to include that criterion in all the formulas for that particular cost category. So, is it possible to somehow create a criteria table for each category that serves as input for some sort of IF/SUMIF or lookup function?
Short version: I have a data set (can range from 5000 to up to 100000 rows, 8 columns) where I want to perform a lookup, but based on various criteria. And, in addition to that, it would be nice if the criteria could somehow be drawn from a separate list so that they can be modified fairly easily without having to change the formula itself. Is there a way to do so? Or do you think using the advanced filter might be the most suitable option?

compare two tables then sort them in excel

I have two tables with the same data but in different rows, I want to sort them in front of each other. each duplicate row in front of its duplicate.
attached photo
In a new worksheet, copy the code data from one table and append to that a copy of the code data from the other. Apply Remove Duplicates to that column and sort ascending.
Now use that sheet to look up (VLOOKUP Description, Uom and Unit Price from one of your tables into three separate columns (say 2,3,4) and lookup up same fields from the other of your tables into a further three columns (say 5,6,7).
Wrap both formulae in IFERROR(....,"") to reduce noise.
I take it any numbering will be applied independently in a new sheet (ie No. is not required to be copied to there).
Incidentally you have a lot of unconventional hyphens (eg L-80 is never normally written other than as L80), m for OCTG as a unit of measure leads to many problems and with competent staff a structured catalogue could be advisable for a high value of stock and long-term storage.

How to get Excel Pivot 'Summarize by' to return actual data values not sum, avg, etc?

I am using Excel Powerpivot with data in two separate tables. Table 1 has ITEM level data with a BRAND characteristic. Table 2 has BRAND level data. The two tables are linked by the BRAND key. The measure I am using is non addable. i.e. the sum of the ITEMS does not equal the BRAND. The pivot is set up with ITEMS nested under BRANDS in the rows and the Measure in the column.
Excel assumes that I want to summarize ITEM to a BRAND level by applying SUM, MAX, MIN, AVG, etc. I would like to return the actual values from the appropriate ITEM or BRAND level table and not apply any calculations to the values. Is this possible?
If what you are effectively trying to do is produce a different result for the Brand rows (e.g. blank()) then the answer is to write a further measure that does a logic check to determine whether or not the row in question is an ITEM or a BRAND.
= IF (HASONEVALUE(table1[Item]), [Measure], Blank() )
Bear in mind that this will work for your current pivot but may not be adaptable to all pivots.
This assumes that you have explicitly created a measure called [Measure] and you are not just dragging the numeric column into the values box. If not you can create the initial [Measure] something like this:
= Sum(table1[Value])
Where Value is the column you want to use in the measure. Although you have used a sum, if it relates to a single item which has a single row in the table it will give the desired result.

MutliGet or multiple Get operations when paging

I have a wide column family used as a 'timeline' index, where column names are timestamps. In order to prevent hotspots, I shard the CF by month so that each month has its own row in the CF.
I query the CF for a slice range between two dates and limit the number of columns returned based on the page's records per page, say to 10.
The problem is that if my date range spans several months, I get 10 columns returned from each row, even if there is 10 matching columns in the first row - thus satisfying my paging requirement.
I can see the logic in this, but it strikes me as a real inefficiency if I have to retrieve redundant records from potentially multiple nodes when I only need the first 10 matching columns regardless of how many rows they span.
So my question is, am I better off to do a single Get operation on the first row and then do another Get operation on the second row if my first call doesnt return 10 records and continue until I have the required no. of records (or hit the row limit), or just accept the redundancy and dump the unneeded records?
I would sample your queries and record how many rows you needed to fetch for each one in order to get your 10 results and build a histogram of those numbers. Then, based on the histogram, figure out how many rows you would need to fetch at once in order to complete, say, 90% of your lookups with only a single query to Cassandra. That's a good start, at least.
If you almost always need to fetch more than one row, consider splitting your timeline by larger chunks than a month. Or, if you want to take a more flexible approach, use different bucket sizes based on the traffic for each individual timeline: http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra (see the "Variable Time Bucket Sizes" section).

Resources