I need to create a dynamic sum of data chunks out of a large data set contained in a csv file (>100k rows). The data is planned to be displayed in PowerBI but I have literately no idea of the DAX coding language or VBA. So I hope I can preformat the data in excel.
The way to distinguish the data sub-sets I want to sum up is a counting row. The rows starts with every new subset from 1 but the final number >= 1 is totally ‘random’.
The first row is the countingRow the second row is the dataRow.
> 1 45
2 20
3 20
4 10 -> SUM 95
> 1 30
2 5 -> SUM 35
> 1 X -> new SUM
I think it is possible to work with the SUM, IF and OFFSET function.
My plan was to check whether a cell contains a 1 or not. Check the range between two true values minus one cell, then calculate the offset sum in the other column.
But when I thought I found the solution I realized that I have no way to bring my pointer to a new data subset.
Which function do I need to move my calculation threw the column?
Is it even possible to do an calculation of this scale in excel?
PS: I'm although thankful for a DAX or VBA tutorial which could bring me to a solution.
In the following sample data image, use the following formula in D2.
=IF(OR(A3={1,""}), SUM(INDEX(B:B, AGGREGATE(14, 6, ROW($1:2)/(A$1:A2=1), 1)):INDEX(B:B, ROW())), TEXT(,))
Fill down.
A slightly shorter formula without array type formula:
=IF(OR(A3={1,""}),SUM($B$2:B2)-SUM($C$1:C1),"")
Related
I need to add a column to my spread sheet that generates two "false" at random intervals every ten frames.
So for example rows 1 though 10 could read:
true
true
true
False
true
false
true
true
true
true
and then repeat that for rows 11 through 20, but the false are randomly put in different places. etc. I want write a formula that does this for me.
With Office 365:
In first cell you want the list to be created put:
=LET(rws,1000,arr,RANDARRAY(10,rws/10),seq,SEQUENCE(rws,,0),INDEX(MAKEARRAY(10,rws/10,LAMBDA(i,j,INDEX(BYCOL(arr,LAMBDA(v,MATCH(SMALL(v,i),v,0))),1,j)<9)),MOD(seq,10)+1,INT(seq/10)+1))
Change the 1000 to the number of rows desired.
If one does not have Office 365 then put this in the second row of a column and copy it down.
=IF(COUNTIF(INDEX(A:A,MIN(ROW($ZZ1)-MOD(ROW($ZZ1)-1,10)+1,ROW()-1)):INDEX(A:A,ROW()-1),FALSE)>=2,TRUE,IF(COUNTIF(INDEX(A:A,MIN(ROW($ZZ1)-MOD(ROW($ZZ1)-1,10)+1,ROW()-1)):INDEX(A:A,ROW()-1),TRUE)>=8,FALSE,RANDBETWEEN(0,9)<8))
Be aware:
Each cell is randomly chosen and as such FALSE will appear in the last of the 10 more often than truly random. One can play with the RANDBETWEEN(0,9)<8 to maybe make that more random.
BRUTE FORCE METHOD
There are 10!/(8!*2!) = 45 ways of arranging your True/False requirements
I personally didn't have anything better to do with my time so I wrote out all possible combinations in 45 columns.
The concept with this methodology is to randomly write out one of the 45 columns every 10 rows. One of the problems here is that using random in a formula does not mean you will be able to use the same random value in the next row of the formula.
A potential random problem side step
In order to make a random result accessible by multiple formula calculations one can spit out the results in a helper column. For this solution we will be randomly selecting from 45 possible columns, so in the first column the following formula is used and copied down. The number of rows will be equal to the number of 10 groupings you will use.
Start in A1 and copy down
=RANDBETWEEN(1,45)
How to make each formula in a group of ten pick the same random number
For demonstration purposes the next column is to generate integers starting at 1 and increasing by 1 after every 10 rows. For the demonstration it would need to be copied down a number of rows equal to the number of results needed (10 * number of groups of 10). Ultimately this formula can be embedded in the final formula.
Start in B1 and copy down
=INT((ROW(A1)-1)/10)+1
For demonstration purposes the next column is to generate integers starting at 1 and increasing by 1 row but resetting to 1 after the 10th row. For the demonstration it would need to be copied down a number of rows equal to the number of results needed (10 * number of groups of 10). Ultimately this formula can be embedded in the final formula.
Start in C1 and copy down
=MOD(ROW(A1)-1,10)+1
So now there is a way of indexing the column you need and what row of that column you need.
Indexing the solution
In the next column the index function is used (twice) to find out what column and row to look in from the list of all possible combination. In this demo, the list of all possible combination is written out from F1:AX10.
First we start by indexing which random column to use. Since the random numbers are written in column A starting in row 1 I used the following formula:
=INDEX(A:A,B1)
To get the row reference I used the following formula:
=C1
I then took those two formulas and combined them to pull data from the possibility table as follows:
Start in D1
=INDEX($F$1:$AX$10,C1,INDEX(A:A,B1))
Tidying it up
We can't eliminate the random number column as we need something quasi static for the formulas to refer to. The reason I say quasi static, random is a volatile function which means it will recalculate every time the sheet recalculates. However, we can place the formulas from B and C into D. This results in the formula in D looking like:
=INDEX($F$1:$AX$10,MOD(ROW(A1)-1,10)+1,INDEX(A:A,INT((ROW(A1)-1)/10)+1))
It's not clear which version of Excel you're using so this approach will work for all versions:
the starting point is C12:L13, where the formula in row 12 is
=RANDBETWEEN(1,5)
and the formula in row 13 is
=RANDBETWEEN(6,10)
These results determine the positions of the FALSE values in the range starting with cell C1 where the formula is
=NOT(OR(ROW()=C$12,ROW()=C$13))
The array formula in A1:A10 is
=INDEX($C$1:$L$10,,1+MOD(RANDBETWEEN(1,100),10))
column B is just an indexing column containing the formula
=1+MOD(ROW()-1,10)
which, coupled with the conditional formatting in column A illustrates that the positions of the FALSE values are different in each 10-row sequence.
(you will notice that the random numbers generated in columns I and J happen to be the same so, if this is a concern, you could extend the 'helper range' beyond 10 columns in order to augment randomness)
I have a large column of data where every 10 rows is a different set.
What I would like to do is get the average of those 10 rows, and then subtract that from every individual measured value.
Then it moves to the next 10, takes the average of those, and subtracts it from the 10 data points that yielded the new average.
I've tried using MOD and plenty of formulas and dragging out some kind of formula but Excel's pattern recognition is not working at all in this case.
Example of what I'm trying to do using 3 values instead of 10
The output I want takes the average of the first 3 values ((1+2+3)/3=2), then subtracts it from those 3 values and outputs it as the result. (1-2=-1, 2-2=0, 3-2=1). Then it repeats the same thing with the next 3 and the results from the previous 3 do not affect it.
Values________Average_______Result
1|__________________________-1
2|______________2 __________ 0
3|__________________________1
2|__________________________-2
5|______________4 __________ 1
5|___________________________1
2|___________________________-1
5|_____________3_____________2
2|___________________________-1
(I'm so sorry about the awful table)
Any help would be greatly appreciated! Thank you.
Using your data and doing every three:
=A2-AVERAGE(INDEX(A:A,INT((ROW(1:1)-1)/3)*3+2):INDEX(A:A,INT((ROW(1:1)-1)/3)*3+4))
To change to 10 change the each /3 to /10 and *3 to *10 this is the interval. You will also need to change the +2 to the first row of data and the +4 to +11 or +[the interval]-1
I like Scott's answer better, but since I had worked it out while he was typing, I'll add my solution as well.
I'm using the INDIRECT function to build the range reference and calculating the range for the AVERAGE by using MOD on the ROW function.
Basically you're looking to average over the first ten rows, so you need the range A1:A10, then A11:20, and so on. In order to calculate the beginning row, take your current ROW() and subtract the MOD 10 of its previous row: ROW()-MOD(ROW()-1,10). The last row of the group just adds 9 rows: ROW()-MOD(ROW()-1,10)+9.
Everything in Column B uses the formula:
=AVERAGE(INDIRECT("A"&ROW()-MOD(ROW()-1,10)&":A"&ROW()-MOD(ROW()-1,10)+9))
I have googled for hours, not being able to find a solution to what I need/want. I have an Excel sheet where I want to sum the values in one column based on the criteria that either one of two columns should have a specific value in it. For instance
A B C
1 4 20 7
2 5 100 3
3 100 21 4
4 15 21 4
5 21 24 8
I want to sum the values in C given that at least one of A and B contains a value of less than or equal to 20. Let us assume that A1:A5 is named A, B1:B5 is named B, and C1:C5 is named C (for simplicity). I have tried:
={SUMPRODUCT(C,((A<=20)+(C<=20)))}
which gives me the rows where both columns match summed twice, and
={SUMPRODUCT(C,((A<=20)*(C<=20)))}
which gives me only the rows where both columns match
So far, I have settled for the solution of adding a column D with the lowest value of A and B, but it bugs me so much that I can't do it with formulas.
Any help would be highly appreciated, so thanks in advance. All I have found when googling is the "multiple criteria for same column" problem.
Thanks. That works. Found another one that works, after I figured out that excel does not treat 1 + 1 = 1 as I learnt in discrete mathematics, but as you say, counts the both the trues. Tried instead with:
{=SUM(IF((A<=20)+(B<=20);C;0))}
But I like yours better.
Your problem that it is "summing twice" in this formula
={SUMPRODUCT(C,((A<=20)+(C<=20)))}
is due to addition turning first TRUE plus the second TRUE into 2. It is not actually summing twice, because for any row, if only one condition is met, it would count that row only once.
The solution is to transform either the 1 or the 2 into a 1, using an IF:
={SUMPRODUCT(C,IF((A<=20)+(C<=20))>0, 1, 0)}
That way, each value in column C would only be counted at max once.
Following this site you could build up your SUMPRODUCT() formula like this:
=SUMPRODUCT(C,SIGN((A<=20)+(C<=20)))
So, instead of a nested IF() you control your or condition with the SIGN()function.
hth
If you plan to use a large set of data then it is best to use the array formula:
{=SUM(IF((A1:A5<=20)+(B1:B5<=20),C1:C5,0))}
Obviously adjust the range to suit the data set, however if the whole of each column is to form part of the formula then you can simply adjust to:
{=SUM(IF((A:A<=20)+(B:B<=20),C:C,0))}
This will perform the calculation on all rows of data within the A, B and C columns. With either example remember to press Ctrl + Shift + Enter in order to trigger the array formula (as opposed to typing the { and }).
I have a bunch of sequential rows and I'm trying to average the last 90 based on a criteria. The average needs to be in a single cell, rather than a column that calculates a running average. I figured out how to calculate an average for the last 90 rows, but I am not able to correctly add the if function to meet the criteria prior to averaging.
Data:
sale type (b) Data(c) Rownumber (E)
a 45 1
b 35 2
c 36 3
c 56 93
Here is the average function that's working correctly AVERAGE(OFFSET(E2,COUNTA(E:E)-1,-2,-90)).
Here is the AVERAGEIF function that I'm trying to run that is giving incorrect data:
=AVERAGEIF(B:B,I15,OFFSET(E2,COUNTA(E:E)-1,-2,-90))
I15 cell in this case, is the sale type that I am trying to match.
Thanks in advance for the help!
=AVERAGEIF(OFFSET(B2,COUNTA(E:E)-2,0,-90),$I$15,OFFSET(C2,COUNTA(E:E)-2,0,-90))
first: you need to "-2", not "-1" from your row, you're not getting the last 90 with -1 .. (test it by changing a record in data to "999" at the outskirts. You'll see it pick up that value when the AVG changes dramatically.)
second: your and range have to match .. heights anyway. So use the same offset formula in both.
To slightly optimize this, you could calc that "COUNTA(E:E)-2" in another column, name it, then just reference it in both cases (that way Excel only calcs it once, not twice). ;)
Also, if the I15 cell is a single cell, you might want to $I$15 it to be safe. I don't think this matters in this case, just a habit of doing that to single, isolated cells that aren't part of a range :)
If you actually have row numbers (of the data) in column E you could use AVERAGEIFS and just use another criteria based on column E, like this:
=AVERAGEIFS(C:C,B:B,I15,E:E,">"&MAX(E:E)-90)
I have data like this in an excel sheet:
1 2 3
4 5 6
What I am trying to do is come up with a formal that will give me the total of each row, something like this =A+B+C, so I can use this formula on each row...I have plenty of rows, The above is just and example, I have this =A1+B1+C1 is there away in excel to get A1 + B1 + C1 without the row number?
Please help, this would save me lots of time.
I have looked into =ROW to get the row number, can I use the sum function with the row function?
=SUM(A=ROW():B=ROW())
I found a solution to my problem:
=SUM(INDIRECT("A"&ROW()):INDIRECT("C"&ROW()))
This will work
=SUM(INDIRECT("A"&ROW()):INDIRECT("C"&ROW()))