Match 2 columns based on a % difference within the value - excel

I am looking for a method to match two excel tables.
I basically have two Systems, where the values do not exactly match only some IDs. The values in system 2 are usually 10-20% different from system 1.
Here is how the sheet looks like:
I tried to use vlookup on the IDs and then going hand-by-hand through the values if they match, by using the filter with the ID. However, this takes extremely long and is very cumbersome.
Any recommendation how to match these two tables, much more easily?
I really appreciate your replies!

If you look at a formula for G3 you would be involving D3:E3 and A:B (where A10:B10 are the matching values).
When someone states that they are looking for a percentage, it is helpful to know "a percentage of what...?". You receive a different result if the calculation is ABS(12 - 15)/15 instead of ABS(12 - 15)/12. One may within tolerance and the other may not.
In any event, the formula for G3 would be something like,
=ABS(E3-VLOOKUP(D3,A:B, 2, FALSE))/E3
... or,
=ABS(E3-VLOOKUP(D3,A:B, 2, FALSE))/VLOOKUP(D3,A:B, 2, FALSE)
That produces a result of 0.25% or 0.20% depending on how you calculate the percentage. You could wrap that in an IF statement for a YES/NO text result or use a custom number format like [Color3][>0.2]\NO;;[Color10]\Y\E\S;# which will show a red NO for values greater than 20% and a green YES for values between 0 and 20%. Negative values do not have to be accounted for as the ABS removes them from consideration.
       
I've only reproduced a minimum of your sample data for demonstration purposes but perhaps you can get an idea on how to proceed from that.

Related

Trying to find count of missed targets between 2 sheets

I am trying to work through this problem where I have 2 sheets (seen here as 2 sections for simplicity) and I am trying to count how many shipments from sheet 1 were below the SLA target in sheet 2.
The formula I tried was
IF(A21=INDEX(A2:A11,MATCH(A21,A2:A11,0),COUNTIF(C2:C11, ">="&C21))
I have tried multiple iterations of these parameters and have it returning some very inconsistent and totally wrong results. The output I am expecting is
0,0,1,3,0,0
I know this is going to probably be some kind of boolean algebra but I honestly do not understand how that system works. I have tried looking it up but I dont think i am doing it right.
Sample Data
You could use a simple boolean multiplier like this:
=SUM((A21=$A$2:$A$11)*($C$2:$C$11<C21))
This checks if the ID matches A21 and then multiplies these TRUE/FALSE results times a second boolean array that checks if the shipped amounts (C2:C11) are less than the SLA standard C21. NB: If it is really less than or equal, then use =SUM((A21=$A$2:$A$11)*($C$2:$C$11<=C21)). This generates a series of 1's for each value that matches the conditions and then SUM adds those one's up.
Using your example for ID Key 4/Item Name D, you would get:
SUM({FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE} * {TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE})
This gets coerced into:
SUM({0,0,0,0,0,1,1,1,0,0})

Deal with Ties when Using Index/Match

I'm currently pulling the top (5) number of numerical values from one sheet and inputting them into a different sheet. Each number is within its own column and there is a name matching that column, EX:
And so, having a tie is common with the data that I'm working with, so it nearly deprecates my formulas.
For getting the name:
=INDEX('Total Cases by Categories'!$B$18:$B$50, MATCH(LARGE('Total Cases by Categories'!$H$18:$H$50, A39),'Total Cases by Categories'!$H$18:$H$50, 0))
For getting the numerical value associated with the name:
=LARGE('Total Cases by Categories'!$H$18:$H, A39)
And so, when there are 2 people with the same numerical value associated within a category, then that person appears twice, I assume because of their position within the sheet.
So something like this happens:
So in the event of a tie, I would want to list both names that have the same amount of points instead of the first name that shows up with the duplicated value.
Any help would be appreciated!
Actually, LARGE will give you both of tied names. It's MATCH that can't look beyond the first. To the best of my knowledge there is no way around that (the difficult one being not to use MATCH). Therefore the solution is to have no ties.
This is achieved with helper columns that contain no identical numbers. This can be achieved by adding an insignificant decimal. Since you are dealing with integers, adding 0.1 would be insignificant for your purposes but 13.1 is different from 13.2. If you need to extract the "real" number from this use INT(13.2).
Using the row number to generate an insignificant decimal is popular for this purpose. In row 1 ROW()/10 will return 0.1. But in row 10 ROW()/10 will return 1.0 which isn't an insignificant number anymore. Therefore you have to work with ROW()/100 or an even larger divisor, depending upon how many rows you have. Try ROW()/10^6 - any decimal will do the tie-breaking job.
You may not like that using ROW() will list tied participants in the order in which they appear in the worksheet. The differentiating decimals can be created by any other means that doesn't create ties in itself.
Normally, the helper columns with the decimals added will be hidden. They contain a formula like =D23 + (ROW()/10000) which manages itself. You can then use that column for the MATCH function to list all participants in the order of LARGE using the helper column or the original. Just make sure that MATCH refers to the helper column.

FIND formula, Correlation Table

I have a correlation table like so:
I want to make a static formula that can produce the following outcome for a large correlation table. There are 2 parts, the #, and the text. I can imagine the # and text needs to be parsed and then the formula to be applied.
IF # 's are equal, produce 1, if not produce .35
IF text are equal produce 1, if not product .99
if both are equal produce 1.
I've tried something like IF(A2=B1,1,.99) thus far but this misses alot of what i'm looking for.
Any help would be appreciated.
Use this formula:
=IF(LEFT(B$1,FIND("_",B$1)-1)=LEFT($A2,FIND("_",$A2)-1),1,0.35)*IF(MID(B$1,FIND("_",B$1)+1,LEN(B$1))=MID($A2,FIND("_",$A2)+1,LEN($A2)),1,0.99)
Then copy over and down.

Ranking with subsets

I'm trying to rank values and have managed to work out how to sort ties. My data looks at the total number of entries, ranks based on that and if there is a tie it looks to the next column of values to sort them out. However, I have two classes (East and West I've called them) of data within my dataset and want to rank them both separately (but stick to the rules above). So, if I had seven entries, 3 of them West and 4 of the East, I want West to have ranking 1,2,3 based on all the values that lie in that subset and East would have ranking 1,2,3,4. Can you explain what your formula is doing so I can understand how to apply your answer better in the future.
Effectively I'm asking what formula needs to go in achieve my result.
Cheers
Paul
There are a few related ways to do this, most involving SUMPRODUCT. If you don't like the solution below and would like to research other ways/explanations, try searching for "rankif".
The function looks up the Class and Value columns and, for every value in those columns, returns a TRUE or 1 if the current Class is a match AND if its Value is larger than the current Value, False or 0 if otherwise. The SUM adds up all these 1s, and the 1+ is for decoration. Remember to enter as an array formula using Ctrl+Shift+Enter before dragging down.
I used the array formula and SUM above to explain, but the following also works and might even be faster since it's not an array formula. It's the same idea, except we hijack SUMPRODUCT's ability to spit out a single value from an array.
=1+SUMPRODUCT(($A$2:$A$8=A2)*($B$2:$B$8>B2))
EDIT
To extend the rank-if, you could add more subsets to rank by multiplying more conditions:
You can also easily add tiebreakers by adding another SUMPRODUCT to treat the ties as an additional subset:
The first SUMPRODUCT is the 'base rank', while the second SUMPRODUCT is tiebreaker #1.

Using MIN function in excel when using mm:ss.00 fields and ignoring 00:00.00

Does anyone know how I can get excel to look at the following fields, all formatted in mm:ss.00 and return the lowest time. I am using this to calculate PB's - personal best times - in a sports club race sheet.
The formula I am using is
=MIN(J5,(U5),(AE5),(AO5),(AY5),(BI5),(BS5),(CC5),(CM5),(CW5),(DG5),(DQ5),(EA5),(EK5),(EU5))
The problem I have at the moment is that it is including 00:00.00 values in the cells and returning a MIN value of 00:00.00.
Any suggestions would be welcomed.
many thanks
Nigel
Use the following:
=SMALL((J5,U5,AE5,AO5,...),COUNTIF((J5,U5,AE5,AO5,...),0)+1)
COUNTIF counts the amounts of 0 (you maybe need to adjust this value based on your formatting, but it should work). SMALL returns the n-smallest number of the given matrix, with n being the counted value + 1.
Therefore if no 0 is in the matrix, you get the 1st-smallest (aka the smallest), with one 0 you get the 2nd-smallest and so on. Maybe you need to add a check if every value is 0, if that could happen, as in that case SMALL would try to retrieve the value on position list_size+1 of the list, which of course isn't present.

Resources