Excel formula Compare Data to previous 20 entries for change - excel

I have an excel table for raw material offloads. All get tested, but some don't get offloaded right away. I'm trying to create a formula that looks at the future 20 entries for the same railcar and see if it changed from "N" to "Y" for offload.
Here's what my data looks like:
CAR # Offloaded?
CTCX733450 N
CTCX733450 Y
GATX207935 N
CTCX733472 Y
GATX207923 N
GATX207935 Y
GATX207923 Y
I've tried COUNTIF functions and IF functions. I can detect the duplicate railcars, but can't correspond the Y and N with the railcar.
Any help is appreciated.

You can use COUNTIFS to check multiple columns at once. For example
=COUNTIFS(A3:A22, A2, B3:B22, "Y")
This will take the value in cell A2 (CTCX733450), then look at the following 20 rows (Rows 3-22) to see how many time Column A is that value and Column B is "Y". If it is greater than 0, then one of the next 20 instances of that Railcar has been Offloaded.
Notably, this "the next 20 rows", and not "the next 20 entries for the same railcar". For that, we would need to use AGGREGATE and INDEX to find the 20th time that railcar next appears, which will be the final row we check.
For the time being, we will substitute for this row value with ROW_VALUE. This then lets us rewrite our formula using INDEX, as follows:
=COUNTIFS(A3:INDEX(A:A, ROW_VALUE), A2, B3:INDEX(B:B, ROW_VALUE), "Y")
Simple enough! The tricky bit, though, is now working out what value we should have for ROW_VALUE. This is where the AGGREGATE comes in.
You see, we can use AGGREGATE to get the kth (fourth parameter) smallest (First parameter = 15) non-error value (Second parameter = 6) from a list of values (third parameter). We can also make a list of Rows where column A is the same as the value in A2, by using #DIV0! (divide by zero) errors, and the fact that TRUE/FALSE can be treated as 1/0
AGGREGATE(15, 6, Row(A:A)/(A:A=A2), k)
In your case, we want k to be 20 + how many copies of railcar we already have. We can count how many copies of the railcar have passed us by using COUNTIF, so long as we lock one end to the first row:
AGGREGATE(15, 6, Row(A:A)/(A:A=A2), 20+COUNTIF(A$1:A2, A2))
Now, in theory we could shove that in as our ROW_VALUE. In practice, I can immediately see 2 big problems with it. The first, working on Whole Columns is slow. Second, and more important: What happens if there are less than 20 copies of the railcar remaining? You get a #NUM! error, that's what.
We can fix both of these issues with COUNTA (assuming that there are no rows without railcar numbers). For the first one, we will use INDEX again:
AGGREGATE(15, 6, Row(A$1:INDEX(A:A, COUNTA(A:A)))/(A$1:INDEX(A:A, COUNTA(A:A))=A2), 20+COUNTIF(A$1:A2, A2))
Alternatively, you can rearrange this to get rid of the COUNTIF at the end, by starting your Range on the next row, and just looking for the 20th number:
AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20)
For the second issue, we'll use IFERROR. This is a simple function - it just says "Return this value, unless it is an error - then, use this other value instead". Our "other value" will be the COUNTA of Column A, which should give us the last row in your list of Railcars:
IFERROR(AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20),COUNTA(A:A))
This then gives us our ROW_VALUE, which we can plug into our other earlier COUNTIFS:
=COUNTIFS(A3:INDEX(A:A, IFERROR(AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20),COUNTA(A:A))), A2, B3:INDEX(B:B, IFERROR(AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20),COUNTA(A:A))), "Y")
Finally, and optionally: we can make a slight boost in calculation time by working out if the AGGREGATE will error before it does so, by checking if there are at least 20 more entries for the Railcar. This also replaces the IFERROR with an IF statement, but makes the whole equation longer:
=COUNTIFS(A3:INDEX(A:A, IF(COUNTIF(A3:INDEX(A:A, COUNTA(A:A)),A2)<20, COUNTA(A:A), AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20))), A2, B3:INDEX(B:B, IF(COUNTIF(A3:INDEX(A:A, COUNTA(A:A)),A2)<20, COUNTA(A:A), AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20))), "Y")
We have replaced this ROW_VALUE
IFERROR(AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20),COUNTA(A:A))
with this one instead
IF(COUNTIF(A3:INDEX(A:A, COUNTA(A:A)),A2)<20, COUNTA(A:A), AGGREGATE(15, 6, Row(A3:INDEX(A:A, COUNTA(A:A)))/(A3:INDEX(A:A, COUNTA(A:A))=A2), 20))

Related

Excel mutual pairing

How can I pair members of a list such that if the pair of member 'a' is member 'b' then the pair of 'b' is 'a', in MS Excel?
Ideally, the pairing randomizes at each calculation.
Attempt:
1) I have a list of names in a column, (column B).
2) I put the natural numbers up to the list length in column A:
=ROW(B2)-1
3) Random numbers in column C:
=RAND()
4) I ranked ordered the randoms in column D.
=RANK(C2, $C$2:$C$178)
Thus I have two orderings. The order of appearance in comumn A and a random order in D.
However, for example, row by row the "pair of 11" does not have 11 as its pair, in turn. How can I achieve a mutual pairing? If 11 is paired with 27 I need 27 to be paired with 11 also.
(I can then VLOOKUP to pull the names.)
=VLOOKUP(D2,$A$2:$B$178,2,0)
EDIT:
Bellow is my output. You can see that 'c' is the pair of 'a' but 'a' is not pair of 'c'. (The F column checks if someone is paired to themselves.)
So basicly in the output E column I am taking the name corresponding to the random rank ordering. I would like to acheve mutual pairs in some way.
Here is another approach. In your column E you have a permutation of the names. Keep this as an intermediate column. In this case it is c,b,d,a. Simply match c to b and d to a. Index and Match can be used to extract the pairings like this:
The crucial formula in F2 is
=IF(MOD(MATCH(B2,$E$2:$E$5,0),2) = 0,INDEX($E$2:$E$5,MATCH(B2,$E$2:$E$5,0)-1),INDEX($E$2:$E$5,MATCH(B2,$E$2:$E$5,0)+1))
The formulas in columns D, E are exactly as you gave them.
In order to randomly pair up the numbers like this, you need to take into account what numbers have already been picked. Then, you need to randomly select from the remaining numbers.
So, to do this, I am going to scrap your "Random Number" column, and calculate "Rank" directly. To start with, we check if this item already has a pair. All we will do is use MATCH to find if this entry already exists earlier in Column D, and (if so) use INDEX to find what number it has been paired up with:
=INDEX(A:A, MATCH(A2, D$1:D1, 0))
All code, except the last example, will be written for Cell D2
If this entry has already been paired up, it will return an error. Otherwise, it will return a match. If no match was found, we need to pick one at random. This condition is just an IFERROR:
=IFERROR(INDEX(A:A, MATCH(A2, D$1:D1, 0)), <FIND_RANDOM_RANK>)
There are several ways to get the a Rank at Random, but I am going to use AGGREGATE and RANDBETWEEN. We will create a list of the un-picked numbers in AGGREGATE (e.g. {2, 3, 7, 8, 12}), then use RANDBETWEEN to select a position at random (e.g. RANDBETWEEN(1, 5), because there are 5 items to choose from). Our list will be made of Row Numbers, so we will need to convert them to Sequence Numbers with INDEX.
=IFERROR(INDEX(A:A, MATCH(A2, D$1:D1, 0)), INDEX(A:A, <FIND_RANDOM_ROW>))
Since our Row Number are numbers, so we can put them in order and pick the kth SMALLest
AGGREGATE(15, 6, <NUMBER_LIST>, RANDBETWEEN(1, <NUMBER_OF_ITEMS>))
The Number of Items will be the total Number, minus how many items have already been paired. The number of items already paired will be the number of items above us in the list, PLUS however many of those are paired with items further down.
AGGREGATE(15, 6, <NUMBER_LIST>, RANDBETWEEN(1, COUNTA(A:A)-(Row()+COUNTIF(D$1:D1,">"&A2))))
In the Number List, we can use #DIV0! errors to mark items as Excluded. We exclude them if they have already been paired up: either above us in the list, or paired to an item above us in the list.
ROW(A:A) / ((Row(A:A)>Row()) * (COUNTIF(D$1:D1,A:A)=0))
Now, before we stick everything together, we need to limit our column sizes in this part (which will work as an Array Formula). If we try to calculate this for all 1048576 rows in the Worksheet, then not only would it take ages, but you would be matching against a lot of empty rows.
To do this, we can use INDEX and COUNTA to work out how many rows of record we have. This means, for example, changing A:A to A$1:INDEX(A:A,COUNTA(A:A)). If you have 10 rows of data, then this will go A$1:INDEX(A:A,COUNTA(A:A)) → A$1:INDEX(A:A,10) → A$1:A10, which is 100,000 times less data to process!
ROW(A$1:INDEX(A:A,COUNTA(A:A))) / ((Row(A$1:INDEX(A:A,COUNTA(A:A)))>Row()) * (COUNTIF(D$1:D1,A:A)=0))
Now, we can put that all together, to get our final equation:
=IFERROR(INDEX(A:A, MATCH(A2, D$1:D1, 0)), INDEX(A:A,AGGREGATE(15, 6, ROW(A$1:INDEX(A:A,COUNTA(A:A))) / ((ROW(A$1:INDEX(A:A,COUNTA(A:A)))>ROW()) * (COUNTIF(D$1:D1,A:A)=0)), RANDBETWEEN(1, COUNTA(A:A)-(ROW()+COUNTIF(D$1:D1,">"&A2))))))
This will give us our Matched Rank. All we need to do then is a quick VLOOKUP on Columns A and B to work out what the name is:
=VLOOKUP(D2,A:B,2,FALSE)
This code is for Cell E2
(All-in-all, this code will be a lot more efficient if your list is of a fixed size, and you can then replace the COUNTA(A:A) and A$1:INDEX(A:A,COUNTA(A:A)) bits with the correct numbers and cell references directly, such as 5 and A$1:A$5)

How can I redistribute a range of values without duplicates between 1 and 100?

I have a list of values between 1 and 100, essentially a sort of ranking that occassionally skips a few numbers (for example, the first ten values are 2, 6, 6, 10, 10, 10, 10, 11, 12, 13). They're ordered ascendingly, so every number will be either higher than or equal to the number above it. Now, I wish to remove all the duplicates from this list while remaining between 1 and 100. So, for example, for the values above, something like 2, 6, 7, 10, 11, 12, 13, 14, 15, 16 would work or 2, 6, 7, 8, 9, 10, 11, 12, 13, 14. However, the formulas I've tried so far will either go over 100, go under 1, or create circular references.
Given the nature of the list, there's very little chance of the amount of values exceeding 100, so if that possibility can be accounted for, it'd be a nice bonus, but it's not required.
Please create a named range for all your numbers and name it Source. As an alternative, replace the named range Source in my formulas below with the sheet address of the range where you have your numbers.
[C2] =INDEX(Source,MATCH(0,COUNTIF($C$1:$C1,Source),0))
This is an array formula and needs to be confirmed with Ctl + Shift + Enter. Observe that the COUNTIF range is defined one row above the row in which the formula resides. Its formulated with one absolute and one relative end. The end of the range will expand as you copy the formula down.
You can achieve a similar result using the LOOKUP() function. But since MATCH() finds the first instance, LOOKUP() returns the last. Therefore the formula below will return the same sequence in descending order. LOOKUP() is capable of looping on its own and the formula doesn't require array enabling. Just confirm it normally, with only Enter.
[E2] =LOOKUP(2,1/(COUNTIF($E$1:E1,Source)=0),Source)
Observe that the result range is defined with an absolute and relative end, starting above the row of the formula as explained above.
In B2, formula copied down :
=IF(A2="","",IF((COUNTIF(B$1:B1,A2)>0)+(A2=A1),B1+1,A2))
Edit #1
If data have repeated 100 as per OP's comment, then B2 formula become :
=IF((COUNTIF(B$1:B1,100)>0)+(A2=""),"",IF((COUNTIF(B$1:B1,A2)>0)+(A2=A1),B1+1,A2))
Thank you everyone for posting! Thanks to Bosco's original reply, I managed to fiddle around and find the answer myself. The biggest issue I was facing was getting a circular formula error, but by using Bosco's formula as a help column I was able to get it all worked out.
So, the solution holds 3 different columns: Original Data (column A), Help Column (column B) and Final Result (column C).
In the Help Column, I did as suggested, in cell B2 and copied down:
=IF(A2="","",IF((COUNTIF(B$1:B1,A2)>0)+(A2=A1),B1+1,A2))
This would occassionally create results over 100, so in the Final Column I placed the following array formula, which incidentally also accounts for lists longer than 100 entries, in C2 and copied down:
=IF(B2="","",IF(ROW(C2)>101,ROW(C2),IF(AND(B2>=100,B3=""),100,IF($B2:$B$1000>=C3,C3-1,B2))))
As an array formula, this needs to be entered using Ctrl+Shift+Enter.
Thank you very much to everyone who posted replies, I'm sorry for my lack of clarity!

Merge two columns into one column, one cell after another

I am trying to merge two columns into one column, one cell after another. For example, I have start time in one column and End time into another column, shown in pic. I am trying to merge them into one column (Time) that has start time in one cell and end time in the cell just below it, with duplicating the code (the third column) with each addition. Like shown in the pic.
Is there a way to do that?
Forwards:
Formula in D2:
=INDEX($A$2:$B$9,ROUND(ROW(A1)/2,0),MOD(ROW(),2)+1)
Formula in E2:
=INDEX($C$2:$C$9,ROUND(ROW(A1)/2,0))
Backwards:
Formula in D2:
=INDEX($A$2:$B$9,ROUND(((COUNTA(C:C)-1)*2-(ROW(A1)-1))/2,0),MOD(ROW(A1),2)+1)
Formula in E2:
=INDEX($C$2:$C$9,ROUND(((COUNTA(C:C)-1)*2-(ROW(A1)-1))/2,0))
This is the forumula I would use:
=CHOOSE(SEQUENCE(,2),INDEX(A1:B8,(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0)-MOD(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0),COLUMNS(A1:B8)))/COLUMNS(A1:B8)+1,MOD(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0),COLUMNS(A1:B8))+1),INDEX(C1:C8,(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0)-MOD(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0),COLUMNS(A1:B8)))/COLUMNS(A1:B8)+1))
But that formula is a bit messy, so, I wrote another one that breaks down the formula in a way that I can talk through how it works and what its doing:
=LET(
TimeRange,A1:B8,
CodeRange,C1:C8,
RowCount,ROWS(TimeRange),
ColumnCount,COLUMNS(TimeRange),
CellCount,RowCount*ColumnCount,
BaseSequence,SEQUENCE(CellCount,,0),
ModSequence,MOD(BaseSequence,ColumnCount),
RowIndex,(BaseSequence-ModSequence)/ColumnCount+1,
ColumnIndex,ModSequence+1,
Result1,
INDEX(TimeRange,RowIndex,ColumnIndex),
Result2,
INDEX(CodeRange,RowIndex),
CHOOSE(SEQUENCE(,2),Result1,Result2)
)
If you want to paste this formula into your spreadsheet, just make sure that you have the formula bar selected. If you don't, Excel will paste the text so that each line break moves to the next row.
What does each part do? This table steps through each parameter, and explains what each one does. And yes, when using a LET statement, you can use variable names declared in the prior step, but you can't do it the other way!
Part
Explanation
LET(
LET basically allows us to name our own variables. You'll see what I mean in the following sections
TimeRange, B11#
This is the StartTime and EndTime Ranges from your example
CodeRange, D11#
This is the Code range from your example
RowCount, ROWS(TimeRange)
This gives us the number of rows in TimeRange, and stores it as RowCount
ColumnCount, COLUMNS(TimeRange)
This gives us the number of columns in TimeRange, and stores it as RowCount
CellCount, RowCount*ColumnCount
This calculates how many cells there are. Note that our answer will have rows equal to CellCount
BaseSequence, SEQUENCE(CellCount,,0)
This creates a sequence of numbers, arranged in a single column, starting at 0, and stores the result in BaseSequence
ModSequence, MOD(BaseSequence, ColumnCount)
Here, we take the MOD of BaseSequence, by ColumnCount. Why? Because this effectively gives us a list of numbers counting up from 0 to 1. Note that this is why BaseSequence had to start at 0
RowIndex, (BaseSequence - ModSequence)/ColumnCount + 1
This basically counts through the rows, giving us 1, 1, 2, 2, 3, 3, (etc)
ColumnIndex, ModSequence + 1
Because unlike MOD, which starts at 0, column and row indices start at 1
Result1, INDEX(TimeRange, RowIndex, ColumnIndex)
This is our left hand column, and basically picks through the TimeRange, going through each row by picking up each column first
Result2, INDEX(CodeRange, RowIndex)
This is our right hand column, and it basically doubles up each element in the CodeRange (i.e. teacher, teacher, student, etc), because RowIndex runs 1, 1, 2, 2, 3…
CHOOSE(SEQUENCE(,2),Result1, Result2)
The last item in a LET statement is the one that returns a value to the spreadsheet. Here, I'm using a SEQUENCE to make a range 1 row and ColumnCount columns wide (i.e. [1, 2]). This goes into CHOOSE, which will select Result1 when it gets a 1, and Result2 when it gets a 2. Because both Result1 and Result2 are ranges, CHOOSE effectively combines them into a single Range
Also, in the event you want it, this is the formula I posted first (at the top), but with additional line breaks to improve legibility:
=CHOOSE(
SEQUENCE(,2),
INDEX(
A1:B8,
(
SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0)
-MOD(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0),COLUMNS(A1:B8))
)
/COLUMNS(A1:B8)+1,
MOD(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0),COLUMNS(A1:B8))+1
),
INDEX(
C1:C8,
(
SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0)
-MOD(SEQUENCE(ROWS(A1:B8)*COLUMNS(A1:B8),,0),COLUMNS(A1:B8))
)
/COLUMNS(A1:B8)+1
)
)
Assuming your data starts in row 2.
=A2&" "&B2

vlookup - only return cell with text

I need to return a cell that has text in it, but am running into difficulty.
Above is a sample table I'm working with. What I'd like to be able to do, is lookup id 1 and have it output Rich. When I do a vlookup, however, it gives no output. And while vlookup min/max will output integers, they don't work with text. Does anyone know how I can scan multiple ids, but only output the filled text cell?
There may be a shorter formula for this but I banged this off quickly and it does dynamically truncate the ranges in column B down to the minimum number of rows necessary.
=INDEX(B:B, AGGREGATE(15, 6, ROW(B2:INDEX(B:B, MATCH("zzz",B:B )))/(ISTEXT(B2:INDEX(B:B, MATCH("zzz",B:B )))*(A2:INDEX(A:A, MATCH("zzz",B:B )))=D3), 1))
To retrieve a second, third, etc. entry change the k parameter of AGGREGATE to a COUNTIF and fill down.
=INDEX(B:B, AGGREGATE(15, 6, ROW(B$2:INDEX(B:B, MATCH("zzz",B:B )))/(ISTEXT(B$2:INDEX(B:B, MATCH("zzz",B:B )))*(A$2:INDEX(A:A, MATCH("zzz",B:B )))=D3), COUNTIF(D$3:D3, D3)))

How do I get the max N numbers in row in Excel?

MAX will take a range and tell me the largest number. But what if I wanted to iterate over that range and find the largest two numbers in a row?
For example, if I have the range [0, 2, 5, 6, 9, 3, 8], MAX is 9, but MAX2 is 15 (6+9). MAX3 is 20 (5+6+9).
How would I write MAX2, MAX3, or MAXN in Excel?
Eg: sum 3 largest numbers in A1:A5
=SUM(LARGE(A1:A5,ROW(1:3)))
This is an array formula, so you need to use use Ctrl+Shift+Enter
=IF(AND(A4+A5>=A6+A5, A4+A5>=A3+A4),A4+A5,0)
you could use this formula to find the biggest two numbers in a row.
and this would equate to max2
place your example range in column a and place the formula in b4. drag this from b:1-B:N and the number in b will correspond to the sum of the two highest cells that are next to each other
so if you had n=n you could do
if(and(sum(a2:An)>=sum(a1:A(n-1), sum(a2:An)>=sum(An:A(n+n),sum(a2:an),0)
Thanks for this, the answer helped me solve an even more heinous problem:
=AVERAGE(LARGE(CHOOSE({1,2,3,4,5,6,7,8},(P3),(R3),(T3),(V3),(X3),(Z3),(AB3),(AD3)),ROW($1:$4)))
which let me average the largest of nonadjacent columns for a bunch of students.
you need to push ctrl shift enter to evaluate such an expression
(it's building a table on the fly) – that will wrap curly brackets
{} around it.
the ()s around a cell reference mean that if it's blank
(say a student didn't do a quiz) it gets translated as 0 not #fail
the dollar signs $ are necessary or you can't copy & paste this
into other rows and get the relative stuff to all work.
Not that you asked but I figured someone might...

Resources