Precision Error With Match Function In Excel - excel

I have a data frame, say from A1:C100, where each cell is a value (not derived from any formula) that happens to be stored as a percentage. I copy and paste values of a column in the data frame (say Column A), sort it from largest to smallest, and then paste it in say Column E. I then use =MATCH(E1, $A$1:$A$100, 0), which works as expected and returns the correct row.
However, if I then add a constant to every value in E, say column F is =E+1, and use =MATCH(F1-1, $A$1:$A$100, 0) about 90% of the values will still be correct, but some return #N/A.
How can I work around this without changing the original data frame? I have already tried rounding data to various precision points (for example =MATCH(ROUND(F1-1,4), $A$1:$A$100, 0)), or using non-exact matching (for example =MATCH(F1-1, $A$1:$A$100, 1) or even something like =MATCH(F1-.999, $A$1:$A$100, -1)) but no luck.
Any other suggestions/anyone else ever encounter something like this? What is the underlying issue?

The problem with your precision correction is that it's only being applied to the 'lookup' component of the Match function. To get more accurate 'match' rate, advise you either round the values being 'searched' at source, or simply deploy the following:
= Match(round(range_1,6), round(range_2,6), 0)
where you would substitute range_1, range_2 with the relevent/respective ranges per your example/q.
Note - the advantage of doing this 'at source' (i.e. inserting an additional column, say, range_3, where range_3 = round(range_2,6), then substitute range_2 with range_3 in the eqn. above) concerns computational speed (this would be far quicker, especially if range_2 is substantially long, e.g. >30k rows). The reason for this should be obvious: calculations within the match function are repeated for each and every match / cell that is performed, whereas the 'at source' version would only apply the calculation 'once' (i.e. across all rows/cells in question).
Hope this makes sense, and best of luck with your endeavours.
PS - change round(range,6) to round(range,4) if necessary. I usually like 6 dps when using something like this in conditional formatting for validation reasons...
Ta,
J

Related

How to sort rows in Excel without having repeated data together

I have a table of data with many data repeating.
I have to sort the rows by random, however, without having identical names next to each other, like shown here:
How can I do that in Excel?
Perfect case for a recursive LAMBDA.
In Name Manager, define RandomSort as
=LAMBDA(ζ,
LET(
ξ, SORTBY(ζ, RANDARRAY(ROWS(ζ))),
λ, TAKE(ξ, , 1),
κ, SUMPRODUCT(N(DROP(λ, -1) = DROP(λ, 1))),
IF(κ = 0, ξ, RandomSort(ζ))
)
)
then enter
=RandomSort(A2:B8)
within the worksheet somewhere. Replace A2:B8 - which should be your data excluding the headers - as required.
If no solution is possible then you will receive a #NUM! error. I didn't get round to adding a clause to determine whether a certain combination of names has a solution or not.
This is just an attempt because the question might need clarification or more sample data to understand the actual scenario. The main idea is to generate a random list from the input, then distribute it evenly by names. This ensures no repetition of consecutive names, but this is not the only possible way of sorting (this problem may have multiple valid combinations), but this is a valid one. The solution is volatile (every time Excel recalculates, a new output is generated) because RANDARRAY is volatile function.
In cell D2, you can use the following formula:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m),
idx, SORTBY(seq, RANDARRAY(m,,1,m, TRUE)), rRng, INDEX(rng, idx,{1,2}),
names, INDEX(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Here is the output:
Update
Looking at #JosWoolley approach. The generation of the random sorting can be simplified so that the resulting formula could be:
=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m), rRng,SORTBY(rng, RANDARRAY(m)),
names, TAKE(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
(names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))
Explanation
LET function is used for easy reading and composition. The name idx represents a random sequence of the input index positions. The name rRng, represents the input rng, but sorted by random. This sorting doesn't ensure consecutive names are distinct.
In order to ensure consecutive names are not repeated, we enumerate (nCnts) repeated names. We use a MAP for that. This is a similar idea provided by #cybernetic.nomad in the comment section, but adapted for an array version (we cannot use COUNTIF because it requires a range). Finally, we use SORTBY with input argument by_array, the map result (nCnts), to ensure names are evenly distributed so no consecutive names will be the same. Every time Excel recalculate you will get an output with the names distributed evenly in a different way.
Not sure if it's worth posting this, but I might as well share the results of my research such as it is. The problem is similar to that of re-arranging the characters in a string so that no same characters are adjacent The method is just to insert whichever one of the remaining characters (names) has the highest frequency at this point and is not the same as the previous character, then reduce its frequency once it has been used. It's fairly easy to implement this in Excel, even in Excel 2019. So if the initial frequencies are in D2:D8 for convenience using Countif:
=COUNTIF(A$2:A$8,A2)
You can use this formula in (say) F2 and pull it down:
=INDEX(A$2:A$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
and similarly in G2 to get the ages:
=INDEX(B$2:B$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))
I'm fairly sure this will always produce a correct result if one is possible.
HOWEVER there is no randomness built in to this method. You can see if I extend it to more data that in the first several rows the most common name simply alternates with the other two names:
Having said that, this is a bit of a worst case scenario (a lot of duplication) and it may not look too bad with real data, so it may be worth considering this approach along with the other two methods.

When summing a value calculated by multiplying other values, the value is included as a string, however when directly referenced it behaves as a value

I am fairly new to Excel and only using it for a hobby, however I have noticed that when I attempt to add an entire column of values together, there is one value that is not included in the addition. It is also the only 'non-defined' value of the column (i.e. it is calculated using a formula rather than being inputted directly). When I go to edit the formula I see that the value in question (in the below screenshot 24.99), it seems to be a string given the speech marks either side of it, hence why it isn't in the addition.
I am confused as when I reference only that value in a sum, the value is included in the sum, as seen below:
Before it is suggested, I have experimented with using different data types for the value in the cell, including 'Currency', which is what the rest of the cells in the column are, as well as 'Number', 'Accounting', and 'General'. One thing that is strange is that the £ symbol never appears at the front of the number, no matter what data type it's casted as.
For those that are curious, the formula that is used to get the number in the cell B9 is below, where H13 is a number in the form 'Currency', and J13 is a number in the form 'General'.
=IMPRODUCT(H13,J13)
Per this link, a common way to convert text formatted numbers to, well, 'numbers' is to apply a mathematical operation (e.g. if you have ="100", multiplying by 1 will yield 100 - see screenshot)...
Cell constituents
Result
See here for Microsoft's own words "Numbers that are stored as text can cause unexpected results.".
In your example (which I've successfully replicated to aide this soln. - see below), taking 0-B9 (per below screenshot, courtesy your Q above) yields -24.99 in this case as Excel interprets cell B9 as an operand, upon which -1x (as an operation) is applied:
Put another way, you can yield exactly the same (or rather, 'anticipated') result via the summation formula as follows:
Included in the above figure is a depiction of how I have become familiarised with this 'conversion via mathematical operation' trick: it shows how taking 0 + False = 0 (or rather, per the depiction above, 0 + True = 1).
Simple example
Try the following to convince yourself:
Enter ="0", ="1", and ="2" in cells A1:A3 respectively. In the adjacent column/cells (B1:B3), enter =1-A1/A2/A3 as the case may be, i.e.:
Results
Advanced example
Now go ahead and enter (per cell) 3 a's followed by 'b' and 'c' (lowercase) in cells A1:A5. In cell B1 enter the formula "=A1:a5 = "a" (assuming you have Office 365, else use 'Ctrl+shift+enter'). This will return an array of corresponding 'True' and 'False' values for any character (col A) that satisfies the condition '="a"', videlicit:
Of course, summing column B would be futile (well, it returns 0):
However, applying an innocuous mathematical operation '1 *' makes all the difference in the world: you now have an elegant (but time / calculation expensive) way to calculate the count of cells that satisfy the criteria '="a"' - in this case, 3!
Of course, the same could be achieved using any 'neutral' calculation - for instance, adapting the apparent anomaly you've stumbled upon, using sum(0+(A1:A5 = "a") yields the self-same count of 3 ☺
i.e.
Trust this delineates/clarifies (with sufficient detail) the reason why a straight summation (which lack any 'special' operations in its own right) does not include text values, where as taking 0-B9 (per your Q) does (i.e. via the implicit 0-1*B9 operand with respect to B9).
using different data types for the value in the cell, including 'Currency'
What you changed is the number format, not data type.
It's an important difference in Excel, you can change format of cells containing numbers, however if a cell contains text - regardless of it actually represents a number - changing number format will not affect it.
=IMPRODUCT(H13,J13)
This function is used to multiply complex numbers. You could use =H13*J13
when I reference only that value
Functions taking multiple values (SUM, AVERAGE, MIN...) are designed everything not stored as number.
On the other side, if you explicitly include text in a calculation, Excel will try to convert it (e.g. 1 + "9" = 10).
That's what happens in your example too, you don't just include your string in SUM, but first it's added to 0.

Excel interpolation with results in situ

Extrapolation in Excel is easy: have a list of numbers (and optionally their paired "X-values"), and it can easily generate further entries in the list with the GROWTH() function.
GROWTH() works for interpolation too: you just need to tell it the intermediate X-values that you want it to calculate for. My problem with it is the appearance of the data in the spreadsheet. Here's an example:
Say I have some inputs, and through some process get some outputs. Only, there were gaps in the experiment so no outputs were generated for some values:
Out of curiosity, I copied the data to the right, and used Excel's "Extend with Growth Trend": I highlighted the first two entries (only), then right-click-dragged-down the little square over the next four cells (overriding the final value there) and chose "Growth Trend" in the context menu. To remind myself that the values were Excel-generated, I gave them a grey background:
Hmm. The generated values (unsurprisingly) aren't a good extrapolation, since they don't factor in the later value. It's out by over 40%! Also note that this Extend feature of Excel is an ease-of-input mechanism, not a calculation tool in its own right - Excel enters the data as raw numbers (to multiple decimal places).
So I formalised the Extend column by using the GROWTH() function - again only factoring in the first two values, but also using their paired X-values and the desired interpolation entry as parameters:
D4: =GROWTH(D$2:D$3,$A$2:$A$3,$A4)
D5: =GROWTH(D$2:D$3,$A$2:$A$3,$A5)
D6: =GROWTH(D$2:D$3,$A$2:$A$3,$A6)
Thankfully, the results mimic those of the previous column (Microsoft use the same mechanism for both features!) I didn't overwrite the last entry, since after all it has the value that I actually want! The fact that the calculated values are the same as before is the problem I'm trying to fix, and that this question is about.
To improve the calculated values, I need to incorporate the last value - but at the same time I want the "natural" sequence of input values to be maintained. In other words, I want the interpolated values to be placed in situ. That implies that the arguments to the GROWTH() function need to be discontiguous ranges, which Excel does by using the (Range,Range,...) syntax. I tried it, and got #REF! errors. I then tried using a named discontiguous range: same result.
After a bit of Googling (and StackOverflowing!) I found references to using INDIRECT() - a particularly problematic 'solution', since it evaluates strings that would need to be manually maintained. Nevertheless:
E4: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A4)
E5: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A5)
E6: =GROWTH(INDIRECT({"E2:E3","E7"}),INDIRECT({"A2:A3","A7"}),A6)
…and after all that it didn't work anyway! The values remained the same as the previous version, that didn't incorporate the last value. Maybe the last value doesn't make for better interpolation results? So, as an experiment, I ignored the "in situ" requirement and generated an "ex situ" version, with the known values followed by the desired values, allowing me to use simple ranges. Success! But to highlight that the data is in the wrong order, I asked Excel to create an X-Y plot of the data too:
B13: =GROWTH(B$10:B$12,$A$10:$A$12,$A13)
B14: =GROWTH(B$10:B$12,$A$10:$A$12,$A14)
B15: =GROWTH(B$10:B$12,$A$10:$A$12,$A15)
Of course, the results are exponential not linear, so setting the Y-axis to logarithmic generates a very readable result - and it effectively masks the back-and-forth of the data. But deep down, we both know that the data is wrong - just look at the table!
Maybe, just maybe, if I used Excel's "Sort Data" feature it would break up the range for me, and show me how I should have written the formulae? Sadly, although it looks like it worked, I get a "Circular reference" error for B12 - the range wasn't modified to make it discontiguous, and now B12's result is dependent on the original range which includes itself! I coloured it below to indicate that this isn't a viable solution:
So, my "final" solution is to maintain the previous "ex situ" version, and simply have an "in situ" column as well that does a VLOOKUP() on the ExSitu (named) table - and I needed to tell it to do an exact match with the FALSE parameter, since the list isn't sorted:
F4: =VLOOKUP($A4,ExSitu,2,FALSE)
F5: =VLOOKUP($A5,ExSitu,2,FALSE)
F6: =VLOOKUP($A6,ExSitu,2,FALSE)
Note that I labelled the column with an asterisk since it's a cheat: the values are only in situ by copying from another table.
Phew! After all that, my question:
Is there a way to directly interpolate the "in situ" values, without having to have an "ex situ" lookup table to generate the results? The above example was deliberately straightforward: you can easily imagine a longer list with more gaps to be filled in.
Since you had a good data sense, I'll share my discovery path on this case. I'm more like a visual person. I don't see things 'that' clear via tables. Here is what I do to you data points. :
Input Raw
360 7.16
370 28.9
380
390
400
410 5,380.00
Highlight all and press my favorite button > F11. I choose line chart type. Then with the plus button on the top left of the chart, I add trendline > more options.. From there I choose 'polynomial' and 'exponential' . Plus, a tick on 'display equation on chart' As you can see in the links, both fit seem ok. just take the equation and fit in for other values as needed.
Three things I've noticed :
The polynomial and exponential fit is close enough to what I need. But it doesn't exactly 'map' on the ( 410, 5380.00 ) point.
By having the formula I find it easier to make sense of whether or not the trendline 'proposed' by excel is a close fit to my need. As you play around you can see how far-off the linear & logarithmic trendline can be.
The trendline equation doesn't really map to 360,370,410... point as the x value, it assumes x is 0,1,2,3... (try to test it with the 'equation' of the excel proposed trendline)
IMHO, use excel trend with care. My next best fitting tool -> wolframalpha logarithmic fit.
For the original question :
Is there a way to directly interpolate the "in situ" values, without having to have an "ex situ" lookup table to generate the results?
I think my simple answer will be : Indirectly, Yes. Directly? not sure.
Hope this heals/helps in some ways.. ( :

Match 2 columns based on a % difference within the value

I am looking for a method to match two excel tables.
I basically have two Systems, where the values do not exactly match only some IDs. The values in system 2 are usually 10-20% different from system 1.
Here is how the sheet looks like:
I tried to use vlookup on the IDs and then going hand-by-hand through the values if they match, by using the filter with the ID. However, this takes extremely long and is very cumbersome.
Any recommendation how to match these two tables, much more easily?
I really appreciate your replies!
If you look at a formula for G3 you would be involving D3:E3 and A:B (where A10:B10 are the matching values).
When someone states that they are looking for a percentage, it is helpful to know "a percentage of what...?". You receive a different result if the calculation is ABS(12 - 15)/15 instead of ABS(12 - 15)/12. One may within tolerance and the other may not.
In any event, the formula for G3 would be something like,
=ABS(E3-VLOOKUP(D3,A:B, 2, FALSE))/E3
... or,
=ABS(E3-VLOOKUP(D3,A:B, 2, FALSE))/VLOOKUP(D3,A:B, 2, FALSE)
That produces a result of 0.25% or 0.20% depending on how you calculate the percentage. You could wrap that in an IF statement for a YES/NO text result or use a custom number format like [Color3][>0.2]\NO;;[Color10]\Y\E\S;# which will show a red NO for values greater than 20% and a green YES for values between 0 and 20%. Negative values do not have to be accounted for as the ABS removes them from consideration.
       
I've only reproduced a minimum of your sample data for demonstration purposes but perhaps you can get an idea on how to proceed from that.

nested excel functions with conditional logic

Just getting started in Excel and I was working with a database extract where I need to count values only if items in another column are unique.
So- below is my starting point:
=SUMPRODUCT(COUNTIF(C3:C94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"}))
what i'd like to figure out is the syntax to do something like this-
=sumproduct (Countif range1 criteria..., where range2 criteria="is unique value")
Am I getting this right? The syntax is a bit confusing, and I'm not sure I've chosen the right functions for the task.
I just had to solve this same problem a week ago.
This method works even when you can't always sort on the grouping column (J in your case). If you can keep the data sorted, #MikeD 's solution will scale better.
Firstly, do you know the FREQUENCY trick for counting unique numbers? FREQUENCY is designed to create histograms. It takes two arrays, 'data' and 'bins'. It sorts 'bins', then creates an output array that's one longer than 'bins'. Then it takes each value in 'data' and determines which bin it belongs in, incrementing the output array accordingly. It returns the array. Here's the important part: If a value appears in 'bins' more than once, any 'data' value meant for that bin goes in the first occurrence. The trick is to use the same array for both 'data' and 'bins'. Think it through, and you'll see that there's one non-zero value in the output for each unique number in the input. Note that it only counts numbers.
In short, I use this:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to count unique numeric values in <array>
From this, we just need to construct arrays containing numbers where appropriate and text elsewhere.
In the example below, I'm counting unique days when the color is red and the fruit is citrus:
This is my conditional array, returning 1 or true for the rows I'm interested in:
($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0))
Note that this requires ctrl-shift-enter to be used as an array formula.
Since the value I'm grouping by for uniqueness is text (as is yours), I need to convert it to numeric. I use:
MATCH($C$2:$C$10,$C$2:$C$10,0)
Note that this also requires ctrl-shift-enter
So, this is the array of numeric values within which I'm looking for uniqueness:
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
Now I plug that into my uniqueness counter:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to get:
=SUM(SIGN(FREQUENCY(
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),""),
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
)))
Again, this must be entered as an array formula using ctrl-shift-enter. Replacing SUM with SUMPRODUCT will not cut it.
In your example, you'd use something like:
=SUM(SIGN(FREQUENCY(
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),""),
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),"")
)))
I'll note, though, that scaling might be a problem on data sets as large as yours. I tested it on larger data sets, and it was fairly fast on the order of 10k rows, but really slow on the order of 100k rows, such as yours. The internal arrays are plenty fast, but the FREQUENCY function slows down. I'm not sure, but I'd guess it's between O(n log n) and O(n^2) depending on how the sort is implemented.
Maybe this doesn't matter - none of this is volatile, so it'll just need to calculate once upon refreshing the data. If the column data is changing, though, this could be painful.
Asuming the source data is sorted by the key value [A], start with determining the occurence of the key column
B2: =IF(A2=A1;B1+1;1)
Next determine a group sum
C2: =SUMIF($A$2:$A$9;A2;$B$2:$B$9)
A key is unique if its group sum is exactly 1
D2: =(C2=1)
To count records which match a certain criterium AND are unique, include column D in a =IF(AND(D2, [yourcondition];1;0) and sum this column
Another option is to asume a key unique within a sorted list if it is unequal to both its predecessor and successor, so you could find the unique records like
E2: =AND(A2<>A1;A2<>A3)
G2: =IF(AND(E2;F2="this");1;0)
E and G can of course be combined into one single formula (not sure though if that helps ...)
G2(2): =IF(AND(AND(A2<>A1;A2<>A3);F2="this");1;0)
resolving unnecessarily nested AND's:
G2(3): =IF(AND(A2<>A1;A2<>A3;F2="this");1;0)
all formulas in row 2 should be copied down to the end of the list

Resources