For dates with missing data, take average of values from closest dates - excel

I have sporadically collected data for stream nitrogen levels. For dates where nitrogen levels are not available, I'm hoping I can use an Excel formula that will estimate the value by calculating the average of nitrogen levels from the two closest dates.Note that nitrogen is not correlated with flow (or other water quality parameters) and there is no trend so I can't use a regression equation to estimate values.
Below is an example of what the table looks like for columns A, B, and C, and rows 1 (header) to 7. Column B contains the real data (note that I inserted 'no value' for illustrative purposes). Column C is what I would like to have calculated. Because I have thousands of rows of data, manually entering each of these formulas is not an option.
Date_______Actual Value______Desired Calculation
1/1/2012_______0.15__________=B2
1/2/2012_____[no value]_______=AVERAGE(B2,B5)
1/3/2012_____[no value]_______=AVERAGE(B2,B5)
1/4/2012_______0.17__________=B5
1/5/2012_____[no value]_______=AVERAGE(B5,B7)
1/6/2012_______0.23__________=B7

=IF(B1<>"",B1,VLOOKUP(MAX($A$1:$A$4),$A$1:$B$4,2,0))
Misread the question the first time.
Test to see if B1 is equal to "" or not. If it is empty then perform the VLOOKUP searching for the max value of column A (most recent date), and then use the value for the adjacent column (2).
This will not cover the case where the most recent Date has no value in column B. I will return "" in that case.
TREND Option
Based on your comments and the use of some helper cells you could do the following. Make a small table off to the right somewhere of just your known dates and your known values from B. We can populate this table using the following formulas. Let arbitrarily put this table in columns F and G.
in F2 use:
=IFERROR(INDEX($A$2:$A$1564,AGGREGATE(15,6,(ROW($B$2:$B$1564)-ROW($A$2)+1)/($B$2:$B$1564<>""),ROWS(B$2:B2))),"")
This will pull the dates in order that they appear with values in B that are not "". Then in G2 put the following formula in which will pull the associated value from B
=IFERROR(INDEX($B$2:$B$1564,AGGREGATE(15,6,(ROW($B$2:$B$1564)-ROW($B$2)+1)/($B$2:$B$1564<>""),ROWS(B$2:B2))),"")
Both those formulas are currently set up to work with a header row and data starting in row 2.
once we have the helper table set up, you can set up a TREND function, which is a line of best fit through all your known points. You can pick the point on this line for dates with no value in B. In C2 you would add:
=IF(B2="",TREND($G$2:$G$3,$F$2:$F$3,A2),B2)
you would copy this formula down. It checks to see if B is blank. If it is then it pulls a value for B from the trend line. If it is not blank, then it will use the value in B.
AVERAGE Option
Again using the helper table in the same spot. This one gets a little uglier. In C2 place the following and copy down
=IF(B2<>"",B2,IF($A2<$F$2,$G$2,IF($A2>$F$3,$G$3,(INDEX($G$2:$G$3,IFERROR(MATCH($A2,$F$2:$F$3,1),1))+INDEX($G$2:$G$3,IFERROR(MATCH($A2,$F$2:$F$3,1)+1,2)))/2)))
The first thing it does is check for a blank value in B. If there is no blank then it uses the value in B2. If there is no value in B2 it then proceeds to check the date of the blank against the first date in your helper table. If it is less than that first value, it uses the first value of B since there is nothing to average it with. I then proceeds to do the same to see if the date exceeds the last value in the table. (this is currently hard coded). If it exceeds the last date in the table it uses the last value for B. the last option for the date is for it to be in the table. It take the average between the date prior to the date and question and the next date with a value.
Optional version where you dont have to hard code the last values of helper table, but you still need to hard code the range of the helper table to meet your needs.
=IF(B3<>"",B3,IF($A3<$F$2,$G$2,IF($A3>MAX($F$2:$F$3),VLOOKUP(MAX($F$2:$F$3),$F$2:$G$3,2,0),(INDEX($G$2:$G$3,IFERROR(MATCH($A3,$F$2:$F$3,1),1))+INDEX($G$2:$G$3,IFERROR(MATCH($A3,$F$2:$F$3,1)+1,2)))/2)))
TREND Between only two points
So I adjusted the formula a bit on this one. Its kind of a mix of the trend and previous options listed above. I also made it automatically selecting the end values of your table, so now you just have to copy your table down as far as you need. Its also works on the premise that only your data value in B are present, no numbers below your last row.
=IF($A2<$F$2,$G$2,IF($A2>=MAX(OFFSET($F$2,0,0,COUNT(B:B),1)),VLOOKUP(MAX(OFFSET($F$2,0,0,COUNT(B:B),1)),OFFSET($F$2,0,0,COUNT(B:B),2),2,0),TREND(OFFSET($F$2,IFERROR(MATCH($A2,OFFSET($F$2,0,0,COUNT(B:B),1),1),0)-1,1,2,1),OFFSET($F$2,IFERROR(MATCH($A2,OFFSET($F$2,0,0,COUNT(B:B),1),1)-1,0),0,2,1),$A2)))
So basically TREND function is a line of best fit through known points. if your know points do not all sit in a straight line it figures out what straight line passes near them minimizing the distance they are from the line. If we limit the points it using to make its line to only 2 points, then the line has to pass through those two points. Remember, the helper table is a list of all known points.
So basically what the formula doing is checking to see if the date is less than or equal to the first known point on the helper table. If it is, use the value from the first known point in the table. If its not, check to see if the date is greater than the last date in the table, if it is lookup the last known date in the table and return the corresponding table. If neither of these cases is true then the date must be in the table. That means we can get two points from the table and their corresponding values to use in the trend function. We feed the TREND function a an array of 2 known X and 2 Know Y values, and then supply it with an X value we are interested in. In this case X would be the dates, and Y would be the values in the second column.
Revised Helper table
In F2 use this formula:
=IFERROR(INDEX(OFFSET($A$2,0,0,COUNT(A:A),1),AGGREGATE(15,6,(ROW(OFFSET($A$2,0,1,COUNT(A:A),1))-ROW($A$2)+1)/(OFFSET($A$2,0,1,COUNT(A:A),1)<>""),ROWS(B$2:B2))),"")
and G2 use this formula:
=IF(F2="","",INDEX(OFFSET($A$2,0,1,COUNT(A:A),1),MATCH(F2,OFFSET($A$2,0,0,COUNT(A:A),1),0)))
and if you want to verify how far down you want to copy your helper table, you can put this in lets say I3:
=COUNT(B:B)+1
Now these update that are using the count(A:A) or B:B require that no other numbers be above or below your data in the A or B column.
UPDATED Average
I also tweaked the average formula to give this option as well.
=IF($B2<>"",$B2,IF($A2<=$F$2,$G$2,IF($A2>=MAX(OFFSET($F$2,0,0,COUNT($B:$B),1)),VLOOKUP(MAX(OFFSET($F$2,0,0,COUNT($B:$B),1)),OFFSET($F$2,0,0,COUNT($B:$B),2),2,0),AVERAGE(OFFSET($F$2,IFERROR(MATCH($A2,OFFSET($F$2,0,0,COUNT($B:$B),1),1),0)-1,1,2,1)))))
Example results (note rows 14-69 have been hidden so you can see what is happening around row 75.

Related

Excel - Find all identical cells, sum separate cell in row for matches, output in other sheet

So, I've searched for an answer to this, but I can't find anything. Hopefully some Excel guru out there has an easy answer.
CONTEXT
I have a sheet that has two columns; a list of airport codes (Col A) and a list of fuel gallons (Col B). Column A has a bunch of duplicate entries, column B is always different. It's basically a giant list of fill-up events for aircraft over time at different airports. The airports can be the same, because it's one row per fill-up event.
PROBLEM
What I want to do is have a formula that takes the enter data set, finds all identical entries in Col A, sums the Col B values for the matches, and spits out the result on a separate sheet with one entry for every set/match.
OTHER STUFF
I do not have a reference list for Column A and I would rather not create one since there are thousands of entries. I would like to just write a formula that does all this at once, using the data itself as the reference.
All the answers I find are "create a reference list on a separate sheet", and it's driving me crazy!
Thanks in advance for any help!
-rt
Sounds that you need a formula version of remove duplicated for column A, and a simple sumif for column B.
Column A
=IFERROR(INDEX(Data!A$1:A$1000,SMALL(IF(
MATCH(Data!A$1:A$1000,Data!A$1:A$1000,0)=ROW(Data!A$1:A$1000),ROW(Data!A$1:A$1000)),ROW())),"")
Array Formula so please press Ctrl + Shift + Enter to complete it. After that you might see a {} outside the formula.
Column B
=SUMIF(Data!A$1:A$1000,A2,Data!B$1:B$1000)
Just change the range for your data.
Reminders: The formula in columnA should starts from Row#1, or you have to add some offset constant for adjustments.
Since the returning value of MATCH() represents the position of the key in the given array. If we wanted it to be equal to its row number, we have to add some constant if the array is not started from ROW#1. So the adjustment of data in Range(B3:B1000) is below.
=IFERROR(INDEX('Event Data'!B$3:B$1000,SMALL(IF(
MATCH('Event Data'!B$3:B$1000,
'Event Data'!B$3:B$1000,0)+2=ROW('Event Data'!B$3:B$1000),
ROW('Event Data'!B$3:B$1000)),ROW())-2),"")
Further more, the range should exactly the same as the data range. If you need it larger than the data range for future expandability, an IFERROR() should added into the formula.
=IFERROR(INDEX('Event Data'!B$3:B$1000,SMALL(IFERROR(IF(MATCH(
'Event Data'!B$3:B$1000,'Event Data'!B$3:B$1000,0)+2
=ROW('Event Data'!B$3:B$1000),
ROW('Event Data'!B$3:B$1000)),FALSE),ROW())-2),"")
Lastly, I truly recommended that you should use the Remove Duplicated built in excel since the array formula is about O(n^2) of time complexity and memory usage also. And every time you entered any data in even other cells, it will automatically re-calculate all formulas once the calculation option in your excel is automatic. This will pull down the performance.

Subtract one cell from the cell above in Spotfire using multiple columns

How would I create a calculated column that would find the thickness of a cell that is the difference from one cell with the cell above it, but want to do this for many individual wells that have unique identifiers in a column. Additional, the last cell of that well will not have a value because there is not cell below it. On top of that I would like to play with the option of calculating this from deepest to shallowest, or shallowest to deepest so c in a column, and organized by ranking scheme (ascending or descending).
See image attached
Use a calculated column with an OVER statement (provided you're not using an in-database data connection). It will look like this:
Sum([Depth]) OVER Intersect([Well],Next([Depth])) - [Depth]
It will return the same result regardless of the row order because the Next function has built-in alphanumeric sorting which it performs on [Depth].

Excel formula for finding average value in a discountinue serie of dates

I am extracting data from an account statement which you can see simplified in the image in the columns A:B in red:
On A the date (in order ascending but not necessarily incremental by one) and often duplicated due to multiple account entries) and on B the fund value.
If there are days missing, the fund value is the one of the last displayed day
=> I need to for each row on C, the average of fund since the first entry of the series (January 1st).
I can do it easily if I bring to a sequential order the fund values as I did in columns G:H but I would like to avoid this method and find a formula that placed in columnC gives me the average value. Possibly no VBA
Thanks
----ADDED---
the result I would like to see and the file example
http://www.filedropper.com/example_29
How about the following formula:
=(IF(DAY(A3)-DAY(A2)>=1,(C2*DAY(A2)+B3+((DAY(A3)-DAY(A2)-1)*B2))/DAY(A3),C2))
Note that the dates need to be in subsequent order and field C2 needs to be hard-coded for my solution to work.

Excel Formula or function that returns the Nth value from a dynamically generated grouping of cells

I am trying to assemble a index/match combination and am having trouble figuring out how to make it work. I have experience with a lot of the formula types in excel, but unfortunately I am pretty ignorant when it comes to these functions.
I will explain what I am trying to do first, but I have attached 3 images at the end that will probably make things more clear.
In order to identify the specific values I want, I am having to use helper cells. These helper cells are denoted with the (helper) tag in the pictures. These cells go through and grab the adjusted closing price of the stock (column A) at the beginning (column C) and the end (Column D) of a dynamically calculated period.
I would like to consolidate these values into numerical order in columns F and G. The thought is that the first non zero number in C/D is belongs to the first predefined period and should go into columns F/G beside the #1 (column E). This gets carried on through all of the periods (ex: 2nd non zero goes beside the number 2, third nonzero number goes beside the number 3 etc.)
This is just an example of one stock. I need the function or formula to be dynamic enough to work on a wide variety of distributions. Sometimes there are up to 100 dynamically calculated periods within the stock analysis.
Below are the images that should provide more clarity
Image 1 is an example of what the data looks like
Image 2 is a crudely drawn example of how I would like the data to move
Image 3 is the desired result
Image 1
Image 2
Image 3
Updated image for Scott Craner showing out of order results
Please let me know if I can clarify any confusion.
If you just need to return the first value of each period (column C) and the last value of each period (column D), you could use index match and lookup to do this without even using helper columns.
Try this in cell F2
=INDEX(A2:A50,MATCH(E2,B2:B50,0))
And this in cell G2
=LOOKUP(E2,B2:B50,A2:A50)
Depending on much variance is in your overall number of rows, you could use indirect references in the formulas to dynamically update the ranges.
Example:
=INDEX(A2:INDIRECT("A"&COUNTA(A:A)),MATCH(E2,B2:INDIRECT("B"&COUNTA(A:A)),0))
You will need to open macro. Then do the following in recorded macro.
+ Filter only non-null value in C/D
+ Select whole column in C/D then copy the whole column
+ Turn off Filter
+ Paste the whole C/D in F/G
+ Stop macro
Gook Luck
Put this formula if F2:
=INDEX(INDEX(C:C,MATCH($E2,$B:$B,0)):INDEX(C:C,MATCH($E2,$B:$B,0)+COUNTIF($B:$B,$E2)-1),MATCH(1E+99,INDEX(C:C,MATCH($E2,$B:$B,0)):INDEX(C:C,MATCH($E2,$B:$B,0)+COUNTIF($B:$B,$E2)-1)))
Copy over one column and down the list.

Countif with dynamic headers

Good afternoon! I'm trying to get a Countifs or Index Match statement to count the number of times a value occurs in another table. The example:
On my report sheet, Column A contains 10 different statuses, such as Green, Yellow, Red etc.; Row 1 contains six dates, such as 1/31/2015, 2/28/2015, etc. These dates are calculations. The last date references my date worksheet and the five other use EOMONTH to get the month end for the five prior months.
On my data table, I have 7 descriptive columns (such as Type, Make, Model, etc) and then we begin date columns: 1/31/2010 all the way to 7/31/2015. I add a new column each month (I know, I don't like it either, but unfortunately we don't have a time series database).
What I need to do is have a Countifs or Index Match that pulls the date from my report tab, goes and finds it in the header row of my tblTrends, and then counts all those statuses that are Green, and if it's a SUV (for example).
Thoughts?
Thx!!
G
At it's most basic, you'd want something akin to:
=COUNTIFS(TypeRange,"SUV",FirstMonth,Status)
So let's say your data table starts in column L, and the first date column is O:
=COUNTIFS($L$1:$L$100,"SUV",O$1:O$100,A$2)
As you drag this formula across the different dates, it will move the date reference over one to the next month.
If you need it to dynamically determine the date column, I'd recommend OFFSET, which dynamically select a range. However, note that OFFSET is a "volatile" formula, which means it re-calculates anytime a change is made anywhere in the file, which can lead to pretty slow load times if not used sparingly.
=COUNTIFS($L$2:$L$100,"SUV",OFFSET($N$2:$N$100,,MATCH(B$1,$O$1:$Z$1,0)),$A2)
The OFFSET starts on column N, because that's the first column before the columns we want (the date columns). The MATCH tells it how many columns to OFFSET from here.
If you're going to use this over a large amount of data, then you could avoid using the OFFSET formula by creating a dynamic table. This table would only contain data for the six months you're interested in, by utilizing INDEX/MATCH, and you could run your COUNTIFS off this table, instead, using the original, basic method I first described. I can go into detail if you're unsure what I mean.

Resources