Flag rows into groups of one thousand using helper column - excel

I'm trying to group data into batches of 1000 with the aid of a helper column, so that I don't have to keep typing out specific ranges to select them.
I came up with a formula for a helper column, but it is an imperfect solution:
=IFS(ROW()<=1001,1,ROW()<=2001,2,ROW()<=3001,3,ROW()<=4001,4,ROW()<=5001,5)
Is there not a better way of writing something to do the same job but that is infinitely scalable?

If you have a sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ..., as with ROW(), but need 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, ..., so ten times 1, then ten times 2, then ten times 3 and so on, then divide by 10 and take integer result.
=INT((ROW()-1)/10)+1
In A1 and filled down will get ten times 1, then ten times 2, then ten times 3 and so on.
Instead of -1 you can provide the start row different
=INT((ROW()-2)/10)+1
will start the counting at row 2.
Now change the 10 to 1000 as you need thousand times 1, then thousand times 2, then thousand times 3 and so on.
=INT((ROW()-2)/1000)+1

If you have Office 365, you could enter a formula like:
=INT((SEQUENCE(COUNTA($A:$A))-1)/1000)+1
and it will spill down.
The COUNTA(... is merely to limit how far down the helper column is populated. If you want to populate the entire column, replace COUNTA with ROWS and enter the formula in Row 1.
But there may be simpler methods of solving your actual problem.

Related

Is there an EXCEL formula that detects WHEN/AT WHICH POINT a certain value appears in a row of data?

Would really appreciate some help with my Excel query.
If I have the following rows of values (always 7 values per row) of data in Excel (3 examples below) where data is coded as 1 or 2, does anyone know an Excel formula which can; detect WHEN in the row, the FIRST 1 appears?
For example;
2, 1, 2, 2, 2, 1, 1. (1 first appears at point 2)
2, 2, 2, 1, 2, 1, 1. (1 first appears at point 4)
2, 2, 1, 2, 2, 2, 2. (1 first appears at point 3)
Any help is appreciated!
Use the MATCH function:
=MATCH(1,A1:G1,0)

Nuanced Excel Question; calculating proportions

Fellow overflowers, all help is appreciated;
I have the following rows of values (always 7 values per row) of data in Excel (3 examples below), where data is coded as 1 or 2. I am interested in the 1's.
2, 2, 1, 2, 2, 1, 1.
1, 2, 2, 2, 2, 1, 2.
2, 2, 2, 1, 1, 1, 2.
I use the =MATCH(1,A1:G1,0) to tell me WHEN the first 1 appears, BUT now I want to calculate the proportion that 1's make up of the the remaining values in the row.
For example;
2, 2, 1, 2, 2, 1, 1. (1 first appears at point 3, but then 1's make up 2 out of 4 remaining points; 50%).
1, 2, 2, 2, 2, 1, 2. (1 first appears at point 1, but then 1's make up 1 out of the 6 remaining points; 16%).
2, 2, 2, 1, 1, 1, 2. (1 first appears at point 4, but then 1's make up 2 out of the 3 remaining points; 66%).
Please help me calculate this proportion!
You could use this one
=(LEN(SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",",""))
-LEN(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",",""),1,""))
)/LEN(SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",",""))
The
SUBSTITUTE(SUBSTITUTE(MID(A1,SEARCH(1,A1)+3,1000)," ",""),",","")
-part gets the string after the first 1. The single 1 in the middle part is the one, you want to calculate the percentage for. So if you want to adapt the formula to other chars, you have to change the single 1 in th emiddle part and the three 1s in the three searches.
EDIT thank you for the hint #foxfire
A solution for values in columns would be
=COUNTIF(INDEX(A1:G1,1,MATCH(1,A1:G1,0)+1):G1,1)/(COUNT(A1:G1)-MATCH(1,A1:G1,0))
You can do it with SUMPRODUCT:
My formula in column H is a MATCH like yours:
=MATCH(1;A3:G3;0)
My formula for calculatin % of 1's over reamining numbers after first 1 found, is:
=SUMPRODUCT((A3:G3=1)*(COLUMN(A3:G3)>H3))/(7-H3)
This is how it works:
(A3:G3=1) will return an array of 1 and 0 if cell value is 1 or not. So for row 3 it would be {0;0;1;0;0;1;1}.
COLUMN(A3:G3)>H3 will return an array of 1 and 0 if column number of cell is higher than column number of first 1 found, (that matchs with its position inside array). So for row 3 it would be {0;0;0;1;1;1;1}
We multiply both arrays. So for row 3 it would be {0;0;1;0;0;1;1} * {0;0;0;1;1;1;1} = {0;0;0;0;0;1;1}
With SUMPRODUCT we sum up the array of 1 and 0 from previous step. So for row 3 we would obtain 2. That means there are 2 cells with value 1 after first 1 found.
(7-H3) will just return how many cells are after first 1 found, so fo row 3, it means there are 4 cells after first 1 found.
We divide value from step 4 by value from previous step, and that's the % you want. So for row 3, it would be 2/4=0,50. That means 50%
update: I used 2 columns just in case you need to show where is the first 1. But in case you want a single column with the %, formula would be:
=SUMPRODUCT((A3:G3=1)*(COLUMN(A3:G3)>MATCH(1;A3:G3;0)))/(7-MATCH(1;A3:G3;0))

Calculate the percent change between every rolling nth row in a Pandas DataFrame

How can I calculate the percentage change between every rolling nth row in a Pandas DataFrame? Using every 2nd row as an example:
Given the following Dataframe:
>df = pd.DataFrame({"A":[14, 4, 5, 4, 1, 55],
"B":[5, 2, 54, 3, 2, 32],
"C":[20, 20, 7, 21, 8, 5],
"D":[14, 3, 6, 2, 6, 4]})
I would like the resulting DataFrame to be:
But, the closest I am getting by using this code:
>df.iloc[::2,:].pct_change(-1)
Which results in this:
It is performing the calculation for every other row but this is not the same as the a rolling window of calculating every nth row. I came across a similar Stack post but that example is not very straightforward.
Also, as a bonus, I'd like to display the resulting output as a percentage to two decimal places.
Thank you for your time!
Got it! Use the option "periods" for 'pct_change()'.
>df.pct_change(periods=-n) #where n=2 for the given example.

Sum values based on first occurrence of other column using excel formula

Let's say I have the following two columns in excel spreadsheet
A B
1 10
1 10
1 10
2 20
3 5
3 5
and I would like to sum the values from B-column that represents the first occurrence of the value in A-column using a formula. So I expect to get the following result:
result = B1+B4+B5 = 35
i.e., sum column B where any unique value exists in the same row but Column A. In my case if Ai = Aj, then Bi=Bj, where i,j represents the row positions. It means that if two rows from A-column have the same value, then its corresponding values from B-column are the same. I can have the value sorted by column A values, but I prefer to have a formula that works regardless of sorting.
I found this post that refers to the same problem, but the proposed solution I am not able to understand.
Use SUMPRODUCT and COUNTIF:
=SUMPRODUCT(B1:B6/COUNTIF(A1:A6,A1:A6))
Here the step by step explanation:
COUNTIF(A1:A6, A1:A6) will produce an array with the frequency of the values: A1:A6. In our case it will be: {3, 3, 3, 1, 2, 2}
Then we have to do the following division: {10, 10, 10, 20, 5, 5}/{3, 3, 3, 1, 2, 2}. The result will be: {3.33, 3.33, 3.33, 20, 2.5, 2.5}. It replaces each value by the average of its group.
Summing the result we will get: (3.33+3.33+3.33) + 20 + (2.5+2.5=35)=35.
Using the above trick we can just get the same result as if we just sum the first element of each group from the column A.
To make this dynamic, so it grows and shrinks with the data set use this:
=SUMPRODUCT($B$1:INDEX(B:B,MATCH(1E+99,B:B))/COUNTIF($A$1:INDEX(A:A,MATCH(1E+99,B:B)),$A$1:INDEX(A:A,MATCH(1E+99,B:B))))
... or just SUMPRODUCT.
=SUMPRODUCT(B2:B7, --(A2:A7<>A1:A6))

Dynamic Programming: Finding the number of ways in which a order-dependant sum of numbers is less than or equal to a number

Given a number N, and a set S of numbers, find the number of ways in which a order-dependant sum of numbers of S is less than or equal to N. The numbers in S can occur more than once. For example, when N = 3 and S={1, 2}, the answer is 6. In this example, 1, 1+1, 2, 1+1+1, 1+2, 2+1 are less than or equal to 3.
When S = {1, 2}, the answers for N = 0, 1, 2... are 0, 1, 3, 6, 11, 19, 32.... Think about why these numbers might be the same as the Fibonacci sequence with 2 subtracted.
When S={n1, n2, …, nk}, you have f(N)=f(N-n1)+f(N-n2)+…+f(N-nk). So you just have to compute f(i) for i < nk and then you can easily compute f(n) with the formula (f(n),f(n+1),…,f(n+nk))=(f(0),f(1),…,f(nk))*A^n where A is the companion matrix of the sequence.

Resources