I am currently automating a dashboard creation and I've hit a bit of a roadblock. I need some code that will go through about 7000 rows of data and return the highest value in a certain column for each specific item. The data is copied from a pivot table and so is broken down into row sections, I have attached a mock of what it looks like.
I need the highest value in Column G for each portfolio, and will need to use the portfolio code (e.g. XY12345 - They are always 7 characters) to map that value to the dashboard.
My issue is, each portfolio has a different number of rows for the values, and some have blank cells between them, and therefore I am stumped. I was hoping to use Column J to count the number of rows for each portfolio (as there are no breaks for the portfolios in this column) and then use a loop to loop through each portfolios rows of values, based off the Column J count, and then return the highest row value for each portfolio. Problem is I'm new to VBA and have been teaching myself as I go, and I've yet to use a loop.
Many thanks,
Harry
If I understand correctly, you're looking for the largest value in Column G.
I'm not sure why you think you would need VBA for this.
Get the maximum value of a column
You mentioned that you're concerned about each column not having the same number of cells but that's irrelevant. as SUM ignores blank cells, so just "go long", or - find the maximum of the entire column.
To return the largest number in Column G you could use worksheet formula :
=MAX(G:G)
The only catch is that you can't place that formula anywhere column G or else it would create a circular cell reference (trying to infinitely add a number to itself). let's pit that formula in cell F1 for now (but anywhere besides column G would do fine).
Find the location of a value
Now that you know the largest value, you can determine where it is using a lookup function such as MATCH or VLOOKUP. Like with so many things in Excel, there are several ways to accomplish the same thing. I'll go with MATCH.
Replace the formula from above (in F1) with:
=MATCH(MAX(G:G),G:G,0)
This will return the row number of the first exact match of the maximum value of Column G.
As for the third part of question: returning the code like X12345 where the value exists, will be a little tricky since your data is not organized in a logical tabular style (tabular meaning, "like a table").
Your data is organized for humans to look at, not for machines to easily read and manipulate it. (See: Office Support: Guidelines for organizing and formatting data on a worksheet)
Basically, when organizing data in rows, all relevant information should be on the same row (not a subjective number of rows behind). Also, you have the number combined with other information.
My suggestion for a quick fix:
Right-click the heading of Column C and choose Insert to insert a blank column.
In C2 enter formula: =IF(B2="",C1,LEFT(B2,7))
Copy cell C2
Select cells in column C all the way to the "end" of your data, where ever that is (not the end of the worksheet). For example maybe you would select cells B2:B1000)
Paste the copied cell into all those cells.
Now, you can again modify the formula in F1:
=INDEX(C:C,MATCH(MAX(G:G),G:G,0))
This will return the value from Column C in the same row that the maximum value of Column G is located.
This is known as an INDEX/MATCH formula.
Hopefully this works for you in the interim until you can organize your data more logically. There's lots of related information and tutorials online.
I have sporadically collected data for stream nitrogen levels. For dates where nitrogen levels are not available, I'm hoping I can use an Excel formula that will estimate the value by calculating the average of nitrogen levels from the two closest dates.Note that nitrogen is not correlated with flow (or other water quality parameters) and there is no trend so I can't use a regression equation to estimate values.
Below is an example of what the table looks like for columns A, B, and C, and rows 1 (header) to 7. Column B contains the real data (note that I inserted 'no value' for illustrative purposes). Column C is what I would like to have calculated. Because I have thousands of rows of data, manually entering each of these formulas is not an option.
Date_______Actual Value______Desired Calculation
1/1/2012_______0.15__________=B2
1/2/2012_____[no value]_______=AVERAGE(B2,B5)
1/3/2012_____[no value]_______=AVERAGE(B2,B5)
1/4/2012_______0.17__________=B5
1/5/2012_____[no value]_______=AVERAGE(B5,B7)
1/6/2012_______0.23__________=B7
=IF(B1<>"",B1,VLOOKUP(MAX($A$1:$A$4),$A$1:$B$4,2,0))
Misread the question the first time.
Test to see if B1 is equal to "" or not. If it is empty then perform the VLOOKUP searching for the max value of column A (most recent date), and then use the value for the adjacent column (2).
This will not cover the case where the most recent Date has no value in column B. I will return "" in that case.
TREND Option
Based on your comments and the use of some helper cells you could do the following. Make a small table off to the right somewhere of just your known dates and your known values from B. We can populate this table using the following formulas. Let arbitrarily put this table in columns F and G.
in F2 use:
=IFERROR(INDEX($A$2:$A$1564,AGGREGATE(15,6,(ROW($B$2:$B$1564)-ROW($A$2)+1)/($B$2:$B$1564<>""),ROWS(B$2:B2))),"")
This will pull the dates in order that they appear with values in B that are not "". Then in G2 put the following formula in which will pull the associated value from B
=IFERROR(INDEX($B$2:$B$1564,AGGREGATE(15,6,(ROW($B$2:$B$1564)-ROW($B$2)+1)/($B$2:$B$1564<>""),ROWS(B$2:B2))),"")
Both those formulas are currently set up to work with a header row and data starting in row 2.
once we have the helper table set up, you can set up a TREND function, which is a line of best fit through all your known points. You can pick the point on this line for dates with no value in B. In C2 you would add:
=IF(B2="",TREND($G$2:$G$3,$F$2:$F$3,A2),B2)
you would copy this formula down. It checks to see if B is blank. If it is then it pulls a value for B from the trend line. If it is not blank, then it will use the value in B.
AVERAGE Option
Again using the helper table in the same spot. This one gets a little uglier. In C2 place the following and copy down
=IF(B2<>"",B2,IF($A2<$F$2,$G$2,IF($A2>$F$3,$G$3,(INDEX($G$2:$G$3,IFERROR(MATCH($A2,$F$2:$F$3,1),1))+INDEX($G$2:$G$3,IFERROR(MATCH($A2,$F$2:$F$3,1)+1,2)))/2)))
The first thing it does is check for a blank value in B. If there is no blank then it uses the value in B2. If there is no value in B2 it then proceeds to check the date of the blank against the first date in your helper table. If it is less than that first value, it uses the first value of B since there is nothing to average it with. I then proceeds to do the same to see if the date exceeds the last value in the table. (this is currently hard coded). If it exceeds the last date in the table it uses the last value for B. the last option for the date is for it to be in the table. It take the average between the date prior to the date and question and the next date with a value.
Optional version where you dont have to hard code the last values of helper table, but you still need to hard code the range of the helper table to meet your needs.
=IF(B3<>"",B3,IF($A3<$F$2,$G$2,IF($A3>MAX($F$2:$F$3),VLOOKUP(MAX($F$2:$F$3),$F$2:$G$3,2,0),(INDEX($G$2:$G$3,IFERROR(MATCH($A3,$F$2:$F$3,1),1))+INDEX($G$2:$G$3,IFERROR(MATCH($A3,$F$2:$F$3,1)+1,2)))/2)))
TREND Between only two points
So I adjusted the formula a bit on this one. Its kind of a mix of the trend and previous options listed above. I also made it automatically selecting the end values of your table, so now you just have to copy your table down as far as you need. Its also works on the premise that only your data value in B are present, no numbers below your last row.
=IF($A2<$F$2,$G$2,IF($A2>=MAX(OFFSET($F$2,0,0,COUNT(B:B),1)),VLOOKUP(MAX(OFFSET($F$2,0,0,COUNT(B:B),1)),OFFSET($F$2,0,0,COUNT(B:B),2),2,0),TREND(OFFSET($F$2,IFERROR(MATCH($A2,OFFSET($F$2,0,0,COUNT(B:B),1),1),0)-1,1,2,1),OFFSET($F$2,IFERROR(MATCH($A2,OFFSET($F$2,0,0,COUNT(B:B),1),1)-1,0),0,2,1),$A2)))
So basically TREND function is a line of best fit through known points. if your know points do not all sit in a straight line it figures out what straight line passes near them minimizing the distance they are from the line. If we limit the points it using to make its line to only 2 points, then the line has to pass through those two points. Remember, the helper table is a list of all known points.
So basically what the formula doing is checking to see if the date is less than or equal to the first known point on the helper table. If it is, use the value from the first known point in the table. If its not, check to see if the date is greater than the last date in the table, if it is lookup the last known date in the table and return the corresponding table. If neither of these cases is true then the date must be in the table. That means we can get two points from the table and their corresponding values to use in the trend function. We feed the TREND function a an array of 2 known X and 2 Know Y values, and then supply it with an X value we are interested in. In this case X would be the dates, and Y would be the values in the second column.
Revised Helper table
In F2 use this formula:
=IFERROR(INDEX(OFFSET($A$2,0,0,COUNT(A:A),1),AGGREGATE(15,6,(ROW(OFFSET($A$2,0,1,COUNT(A:A),1))-ROW($A$2)+1)/(OFFSET($A$2,0,1,COUNT(A:A),1)<>""),ROWS(B$2:B2))),"")
and G2 use this formula:
=IF(F2="","",INDEX(OFFSET($A$2,0,1,COUNT(A:A),1),MATCH(F2,OFFSET($A$2,0,0,COUNT(A:A),1),0)))
and if you want to verify how far down you want to copy your helper table, you can put this in lets say I3:
=COUNT(B:B)+1
Now these update that are using the count(A:A) or B:B require that no other numbers be above or below your data in the A or B column.
UPDATED Average
I also tweaked the average formula to give this option as well.
=IF($B2<>"",$B2,IF($A2<=$F$2,$G$2,IF($A2>=MAX(OFFSET($F$2,0,0,COUNT($B:$B),1)),VLOOKUP(MAX(OFFSET($F$2,0,0,COUNT($B:$B),1)),OFFSET($F$2,0,0,COUNT($B:$B),2),2,0),AVERAGE(OFFSET($F$2,IFERROR(MATCH($A2,OFFSET($F$2,0,0,COUNT($B:$B),1),1),0)-1,1,2,1)))))
Example results (note rows 14-69 have been hidden so you can see what is happening around row 75.
I have a list and a lookup table. The list will be in column A and the lookup table will be in columns C and D. Each cell in column C contains text and is unique. Column D has a corresponding value for each element of column C. Column A contains an unsorted list with duplicates of the text from column C but nothing outside of column C. An easy example is shown below
What I ultimately want is an array that contains the corresponding values looked up from the table for all of column A. So in this example I want
{1,2,1,2,2,2,1,1,2}
I know this can be accomplished by using
=vlookup(B9,$C$1:$D$2,2,FALSE)
in each cell of column B and grab the column as an array {$B$1:$B$9}. However, I need to skip the middle man and omit having anything in another column as this needs to be performed on various lookup tables simultaneous but not connected, i.e. not a multiple lookup, I just need to look at the same list A under a different set of corresponding values D.
I'm at a complete loss, it seems so simple but I've been trying for hours and going in circles. I recall reading that you can't output vlookup to an array and I didn't have much luck trying to use match either.
If you want to continue to use VLOOKUP:
VLOOKUP(T(IF(1,+A1:A9)),C1:D2,2,0)
where T is employed here since the values in A1:A9 are text, not numeric (otherwise we would use instead N - a range containing mixed datatypes would require a different approach).
http://excelxor.com/2014/09/05/index-returning-an-array-of-values/
Regards
You can achieve your array with the following use of the INDEX function in its array form. Since your lookup table is in the same orientation as the values in A1:A9, you must TRANSPOSE A1:A9 to achieve the cyclic array processing.
As an array formula¹,
=INDEX((TRANSPOSE(A1:A9)=C1:C2)*(D1:D2), , )
This can be proved in a single cell by wrapping the array result in a SUM function which will total the elements of the array.
=SUM(INDEX((TRANSPOSE(A1:A9)=C1:C2)*(D1:D2), , ))
... which returns 14.
INDEX function in Array Form
¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.
Hello,
Please forgive me if I didn't word my question very well. However, I have an excel spreadsheet that I must complete and submit to our headquarters on a monthly basis. I've been manually editing most of the fields but I think there's a way to automate some of the values. Unfortunately, I haven't figured it out yet so I thought I'd seek help here.
So I have two tables (Table1 and Table2) - images below. What I need is a way to automatically populate the RATE FACTOR (Column D) if the WEIGHT (Column C) falls within a certain range (defined in table2).
I was reading on here about using VLOOKUP, but couldn't get it to work. Any assistance would be appreciated, thanks.
IMAGES BELOW (had trouble attaching images to question).
Table 1: https://dl.dropbox.com/u/55292384/table1.jpg
Table 2: https://dl.dropbox.com/u/55292384/table2.jpg
//Kismet
Assuming that your example has the titles in row 1, in cell D2 enter =VLOOKUP(C2,$G$2:$H$17,2)
You can copy this formula to the other cells in column D.
VLOOKUP demands that the table of values is sorted in ascending order by the first column. The lookup value (here C2) is found in the first column and the 2 says to return the corresponding value in the second column. By default if the lookup value is not found, VLOOKUP returns the result corresponding to the nearest value less than the lookup value. This may or may not suit your rate factor calculation (if not, I can make an additional suggestion).
I'm performing array calculations that are taking a long time to complete. I'd like to optimize my formulas some more. All of the formulas are of the same nature - they perform some high-level function (Average, Slope, Min, Max) across a column of values. However, not all cells in a column are included in the array. I use multiple IF criteria to choose which cells get included. All comparisons are made to the current row. Here's an example of the data:
A B C D E
1 Company Generation Date Value ToCalculate
2 Abc 1 1/1/2010 5.6
3 ... ... ... ... ...
E would look something like this
{=Average(If(A2=A2:A1000, If(B2=B2:B1000, If(C2 > C2:C1000, D2:D1000))))}
So once E2 is calculated then I have to autofill down column E. Column F, G, H, ... Uses the same approach, either selects different values to operate on or a different function to perform. My dataset is quite large, and with only a few of these the spreadsheet is taking an hour plus to compute. Every so often I'll add a fourth criteria, all other criteria being the same.
Is there an efficiency? Some thoughts:
Can I use a single array per column instead of thousands per column?
Can I condense the first three criteria so that the output is row numbers? Perhaps then subsequent formulas won't have to search for multiple criteria but can just perform the function?
or somehow build the crtieria up? So a new column returns all rows where the company is the same. another column returns all rows from the first column where generation is the same...and so on...
For the Average you can do without arrays:
=AVERAGEIFS(D2:D$1000,A2:A$1000,A2,B2:B$1000,B2,C2:C$1000,"<="&C2)
As there is also a COUNTIFS and a SUMIFS, I think your slopes could be calculated the same way.
For the rest of the functions (max, min, etc), we should analyze case by case.
I did a slight performance test, and this is apparently better, but of course my datasets are just mocked.
HTH!
Note: Excel 2007 and up only!
Edit - Answering your comment.
Without knowing the dimensions of the problem is difficult to give advice, but I'll risk one anyway:
You could write a VBA function that:
1) Generates a new sheet for each company-generation pair
2) Sorts the data in those sheets by date
3) Adds the formulas to those sheets (no conditionals needed in this context)
4) Recalculates and Gets the results from those formulas and populates the original sheet
5) Deletes the auxiliary sheets
To capture the rows and re-use try this approach:
Sort the data by Company & Generation.
Make a unique list of Companies & generations (use Advanced Filter, Unique Only, Copy)
For each Company generation pair in the list build 2 columns of formulae. First column gives the count of rows in the data for this pair (use COUNTIFS), second column gives the first row in the data for this pair (=first row for previous pair+count of rows for previous pair). Then you can use a function like OFFSET to return only the rows of data for the Company-Generation pair and embed this inside the final function/array formula (AVERAGEIFS etc) You could extend this sort and count approach to include dates if you wanted. There is a drawback that if the list of cities and generations change you have to change the list of uniques and associated formulas. There are examples of this approach on my website athttp://www.decisionmodels.com/optspeedk.htmhttp://www.decisionmodels.com/optspeedj.htm