I am creating a spreadsheet which has a column of data, and I would like to
calculate the varaince of x number of rows based on x inputed. Any advice?
As Matthew Flaschen suggests, there is no need for VBA. This can be done with Excel formulae.
Suppose your data is in column A, starting at A1 and going down.
Suppose cell B1 has the number x, which is the sample size you wish to use.
Then the formula you need should be:
=VAR(OFFSET(A1,0,0,B1,1))
The VAR part says calculate the variance of a sample of the population, ignoring non-numbers. (Variants to the function include VARP for the entire population, VARA to include non-numbers and VARPA for both).
The Offset part says to start at cell A1, move 0 columns across and zero columns down, and then select a range which is B1 columns tall and one column wide.
Related
I need to calculate Stand BA (highlighted on the right). This average is obtained by taking the sum of each plot's BA/ac value and dividing by the number of plots (for this example, there are 11 #1024 plots). The number of plots is always different. What formula could I use to apply the StandBA column so that I can drag it all the way down by plot number?
Try using =AVERAGEIF(A:A,A2,T:T).
It will return the average value from column T, and rows having the value of A2 (then A3, A4...) in column A will be the only ones to be taken into consideration for the calculation of the average value.
EDIT:
Use =IF(A2=A3,"",AVERAGEIF(A:A,A2,T:T)) if you want only the last rows of your Plot Id to contain the average formula (as shown on the screenshot in column U).
Example with columns A:C, formula pasted into C2 and dragged down to the last row.
Is there a way that you can use sum() in excel and that sums up x amount of columns depending on number given in another cell.
See below for example
Example of how it's today
In this picture I have columns B:M were I put in Actual figures.
And in columns T:AE I have budget figures.
In this example I have actual figures JAN-MAY and would like to compare them vs budget JAN-MAY.
How I do today is that I go in to column AF and drag so it's =SUM(T6:X6) so it only takes T:X in this case.
But I would like that cell AF has a formula in it were it looks at cell AH:3 and sees that it's number 5.
Therefore AF should sum(T;U;V;W;X(5columns))
Hope you understand my problem and have a solution!! :)
You can use the OFFSET function to determine the number of columns to SUM. In your example, AH3 is your offset amount.
In AF6 you can place this formula:
=SUM(T6:OFFSET(T6,0,AH3-1))
Which breaks down to
=SUM(T6:OFFSET(Start at T6, Offset 0 rows, Offset columns the value of AH3 and subtract one for a total of five columns))
Screenshot of the Excel worksheet
I'm working with historic stock prices, and using eight columns I have:
Column A: High
Column B: Low
Column C: Close
Column D: Cx-Cx-4
Column E: Counts the number of consecutive positive numbers in column D
Column F: Counts the number of consecutive negative numbers in column D
Column G: Calculate the difference between the maximum of column A and minimum of column B within a given sequence.
As an example G1 should equal:
=max(A1:A5)-min(B1:B5)
G6 should equal:
=max(A6:A8)-min(B6:B8)
G9 should equal:
=max(A9:A11)-min(B9:B11)
And so on.
I'd like to know if it is possible to automate this calculation, possibly with the use of one or more additional columns.
Welcome to SO!
This may not be the most efficient solution as you need to add two helper columns, but if I understand your requirements correctly, then this idea should work well enough.
First, let's assume that there are 100 rows in your data set. Given that, enter the formula "=A100" in cell G100 and the formula "=B100" in cell H100. This sets up the boundary condition for the formulas in columns G and H. Now, in cell G99, enter this formula:
"=IF(E99="",G100,IF(E100="",A99,MAX(A99,G100)))"
What this formula does is set up a "running maximum" with the following logic:
If the cell in E99 is blank, copy the running maximum from G100, else:
If the cell in E99 is not blank but the cell in E100 is, set up a new running maximum from the cell in A99, else:
Take the maximum of A99 and G100 as the new running maximum.
Similarly, copy the following formula into cell H100:
"=IF(F99="",H100,IF(F100="",B99,MIN(B99,H100)))"
This follows the same logic as the previous formula, but takes the minimum of column B.
Copy or autofill these formulas to the top of the data set. This should now give you running maximum for column A and a running minimum for column B.
The next step is to calculate the difference. I notice from your question, that you only seem to be interested in calculating this difference at the top of each range (G1, G6, G9, etc.), rather than doing it in every row. Given that, we need a slightly more complicated formula.
The boundary condition for this formula is simply "=G1-H1" entered in cell I1. In cell I2, enter this:
"=IF(OR(AND(E2<>"",E1=""),AND(F2<>"",F1="")),G2-H2,"")"
How this works is that it check two conditions that indicate a range boundary:
E1 is blank and E2 is not
or
F1 is blank and F2 is not
If either of these conditions hold, the IF statement is true and "G2-H2" is diplayed, otherwise a blank cell is displayed. Now copy or autofill this formula to the bottom of the data set.
As a final step, you can now hide columns G and H if you don't need them displayed. This should now give you the results I think you're looking for. Please let me know if this doesn't work out for you.
I'm having an issue getting accurate data from the SUMIF function. This appears to be caused by the SKU and Product name being identical however I don't understand why the selected range would be ignored.
SUMIF(G:K,A2,K:K) - Cell D2 is calling for the sum of K yet returning the sum result of K2:M2. All other results in D are correct.
SUMIF(G:K,A2,I:I) - If I change the formula in D to SUM I:I (text not a numeric field) the function returns the sum of K:K
Example file http://tempsend.com/013C2B6378
According to the documentation here the range to be summed starts at the top left of the sum range (K:K in your first example) but its size is given by the size of the criteria range (G:K in your example). So I think that's why you're getting extra columns summed in your result.
If you have multiple criteria involving different columns, you should be able to use SUMIFS.
So let's say your data sit in 8 rows (including the headings).
then you simply need to change your formula to say, look for B2 in column G OR in I, if true, then sum the values in K. Right?
put this formula in B2 and press ctrl+shift+enter to calculate the formula.
=SUM(IF(($G$2:$G$8=B2)+($I$2:$I$8=B2),1,0)*$K$2:$K$8)
then drag and fill down until the last cell.
obviously you need to adjust the ranges in the formula to adapt to your own data.
tell me if you get to the answer via this.
I'm creating a grid of correlation values, like a distance grid. I have a series of cells that each contain a formula whose ranges are easy to describe if you know the offset from the first cell, and I'm having trouble figuring out how to specify it.
In the upper left hand cell (R10), the formula is CORREL(C2:C21,C2:C21) -- it's 1, of course.
In the next column over (S10), the formula is CORREL(D2:D21,C2:C21).
In the next row down (R11), the formula is CORREL(C2:C21,D2:D21).
Of course, S11 would contain CORREL(D2:D21,D2:D21), which is also 1. And so on, for a roughly 15x15 grid.
Here's a graphical representation of the ranges involved:
C2:C21,C2:C21 C2:C21,D2:D21 C2:C21,E2:E21
D2:D21,C2:C21 D2:D21,D2:D21 D2:D21,E2:E21
E2:E21,C2:C21 E2:E21,D2:D21 E2:E21,E2:E21
Whenever I add a new data row, I have to manually update several formulas. So, I'd like the last non-blank column number (21, in this case), to be dynamically determined, such as with COUNTA(C:C). Ideally, I'd like the formula to calculate the row offsets, too, so that I can drag one formula across my entire range.
What's the best way to accomplish this? I think OFFSET might be a component in the solution, but I haven't had success getting it all to work together.
Using this simple setup per element of the corr matrix also helps:
=CORREL(INDIRECT("'Risk factors'!"&"T"&G6&":T"&H6);INDIRECT("'Risk factors'!"&"U"&G6&":U"&H6))
With this function I refer to data in another sheet, Risk factors, to correlate rows T and U with each other. I want the ranges of the data to be dynamic so I refer with G6 and H6 in my current sheet to the lenght of the columns (number of rows) which I of course specify in these G6 and H6 cells.
Hope this helps!
I found this formula, while wordy, achieved the desired results. In this example, the data lives in C2:O19. The table I wanted to construct computed the correlation values of all permutations of pairs of columns. Since there are 11 columns, the correlation pairs table is 11x11 and starts at R10. Each cell has the following formula:
=CORREL(INDIRECT(ADDRESS(2,2+(ROWS($R$10:R10)),4)&":"&ADDRESS(COUNTA($C:$C),
2+(ROWS($R$10:R10)),4)),INDIRECT(ADDRESS(2,2+(COLUMNS($R$10:R10)),4)&":"&
ADDRESS(COUNTA($C:$C),2+(COLUMNS($R$10:R10)),4)))
As I found out, INDIRECT() resolves a cell reference and obtains its value.
Let's take a cell, say U12, and look at the range formula in detail. The first INDIRECT is the column given by applying the row offset from R10.
Since Row 12 is 2 rows down from Row 10, ADDRESS(2,2+(ROWS($R$10:U12)),4)&":"&ADDRESS(COUNTA($C:$C),2+(ROWS($R$10:U12)),4) should yield the column that's 2 rows right of Row C, which is E. The formula evaluates to E2:E19.
The second INDIRECT is the column given by applying the column offset from R10. Similarly, since Column U is 3 columns right of Column R, ADDRESS(2,2+(COLUMNS($R$10:U12)),4)&":"&ADDRESS(COUNTA($C:$C),2+(COLUMNS($R$10:U12)),4) should yield the column that's 3 rows right of Row C, which is F. The second formula evaluates to F2:F19.
Substituting these range reference values in, the cell formula reduces to =CORREL(INDIRECT("E2:E19"),INDIRECT("F2:F19")) and further to =CORREL(E2:E19,F2:F19), which is what I'd been using up till now.
Just like a distance table, this table is symmetrical along the diagonal, because =CORREL(E2:E19,F2:F19) equals =CORREL(F2:F19,E2:E19). Each value on the diagonal is 1, because CORREL of the same range is 100% correlation by definition.