i work with customers consumptions and sometime didn't have this consumption for month or more
so the first consumption after that need to break it down into those months
example
df = pd.DataFrame({'customerId':[1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'month':['2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01','2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01'],
'consumption':[100,130,0,0,400,140,105,500,0,0,0,0,0,3300]})
bfill() return same value not mean (value/count of null +1)
desired value
'c':[100,130,133,133,133,140,105,500,550,550,550,550,550,550]
You can try something like this:
df = pd.DataFrame({'customerId':[1,1,1,1,1,1,1,2,2,2,2,2,2,2],
'month':['2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01','2021-10-01','2021-11-01','2021-12-01','2022-01-01','2022-02-01','2022-03-01','2022-04-01'],
'consumption':[100,130,0,0,400,140,105,500,0,0,0,0,0,3300]})
df['grp'] = df['consumption'].ne(0)[::-1].cumsum()
df['c'] = df.groupby(['customerId', 'grp'])['consumption'].transform('mean')
df
Output:
customerId month consumption grp c
0 1 2021-10-01 100 7 100.000000
1 1 2021-11-01 130 6 130.000000
2 1 2021-12-01 0 5 133.333333
3 1 2022-01-01 0 5 133.333333
4 1 2022-02-01 400 5 133.333333
5 1 2022-03-01 140 4 140.000000
6 1 2022-04-01 105 3 105.000000
7 2 2021-10-01 500 2 500.000000
8 2 2021-11-01 0 1 550.000000
9 2 2021-12-01 0 1 550.000000
10 2 2022-01-01 0 1 550.000000
11 2 2022-02-01 0 1 550.000000
12 2 2022-03-01 0 1 550.000000
13 2 2022-04-01 3300 1 550.000000
Details:
Create a group by checking for zero, the do a cumsum in reverse order
to group zeroes with the next non-zero value.
Groupby that group and transform mean to distribute that non-zero
value across zeroes.
My data frame looks like this
Location week Number
Austria 1 154
Austria 2 140
Belgium 1 139
Bulgaria 2 110
Bulgaria 1 164
the solution should look like this
Location week Number
Austria 3 100
Austria 2 101
Austria 1 102
Bulgaria 2 100
Bulgaria 3 101
Bulgaria 1 102
this means that I need to display
Column 1 : I need to group the countries by name
Column 2 : Week (every country has 53 weeks assigned to them)
Column 3 : Show the numbers that occured in each of 53 weeks in an ascending order
I can not get my head around this
Sort the rows in the order your like (here by Location and Number) and take the first 5 rows per group with groupby+head:
df.sort_values(by=['Location', 'Number']).groupby('Location').head(5)
output:
Location week Number
0 Austria 3 100
1 Austria 2 101
2 Austria 1 102
3 Bulgaria 2 100
4 Bulgaria 3 101
5 Bulgaria 1 102
another way using .cumcount() and .loc
con = df.sort_values('Number',ascending=True).groupby('Location').cumcount()
df.loc[con.lt(5)]
I have 2 data sets. First, a master table that displays and sums all of the information from the reference tables. The master table looks like this.
BayNum NumCompleted
102
103
104
105
The reference table is a running timeline with indicator variables for whether or not something was completed at various time intervals.
BayNum 1030 1100 1130 1200 1230
102 1 0 1 0 0
102 0 0 1 0 1
102 1 0 0 1 0
102 0 0 0 0 1
103 0 1 1 1 0
103 1 0 0 0 1
103 1 0 1 1 1
104 1 0 0 0 1
104 0 0 1 0 1
104 1 0 0 1 0
104 1 0 0 0 1
104 1 0 0 0 1
105 1 0 1 0 0
105 0 1 1 1 0
105 0 0 0 0 1
I would like the NumCompleted column in the master table to sum all all of the records that have the same bay number.
I think that there is some sort of sumproduct way to go about this but I don't understand arrays very well so I am having trouble visualizing how this works in my head.
I tried this formula
=SUMPRODUCT(INDEX(TPH!H2:NC166,MATCH('Post Observations'!$G$2,TPH!$F$2:$F$166,0)))
But this returns a reference error I think because Index can only work through a column instead of a full array or something. Would I have to instead do something with Index Small so that it runs through the full list of things? I've done something like that before but I don't know if that would apply here.
Per the example above, I would expect my master table to look like this.
BayNum NumCompleted
102 7
103 9
104 10
105 6
You can use SUMPRODUCT to multiply each cell in the range, by whether the "BayNum" matches (1 if it does or 0 if not), then sum all the results:
=SUMPRODUCT(($B$2:$F$8)*($A$2:$A$8=$H2))
I'm trying to create a dynamic rolling 12 month cash flow in Excel.
Lets say the month name is in cell A1.
Underneath cell A1 l have a list of cash flow expenses in my rows and the expenses listed in the columns by month. I have a separate column at the end that totals up 12 months of expenses based on the month name (in cell A1).
So, if cell A1 says Jun-18, l want to add up the expenses for each row item from Jun-18 to May-19. OR say, if cell A1 says Sep-18, l want to add up the expenses for each row item from Sep-18 to Aug-19.
I don't know how to do this, can anyone please advise.
Thanks for your help,
M
You can use sum ofset match
Given the example data below (as I think that you have described in your question), you can use the the following formula (this example the formula result is showing in B2) A1 contains the start date to calculate from.
=SUM(OFFSET(A2,MATCH(A1,A2:A100)-1,4,12,1))
You will need to research how ofset works, as I currently do not have time to explain, but to help you use this formula within your worksheet you will need to change the number 4 which is 4th column away (column E containing the months totals) from the matched date.
The 12 in the formula shows how many rows down you want to sum.
A B C D E
1 Jul-18 8638.21
2 Expence1 Expence2 Expence3 Total
3 Jan-18 1 1 1 3
4 Feb-18 2 2 2 6
5 Mar-18 3541 531 51 4123
6 May-18 100000 31 351 100382
7 Jun-18 846 8 321 1175
8 Jul-18 1 153 12 166
9 Aug-18 0 8 21 29
10 Sep-18 0 65 8 73
11 Oct-18 54 321 1 376
12 Nov-18 321 123 1 445
13 Dec-18 1 321 2 324
14 Jan-19 546 0 51 597
15 Feb-19 132 51 15 198
16 Mar-19 12 321 51 384
17 Apr-19 51 123 321 495
18 May-19 5161 3.21 351 5515.21
19 Jun-19 21 3 12 36
20 Jul-19 321 1 1351 1673
I have a list of products ranked by percentile. I want to be able to retrieve the first value less than a specific percentile.
Product Orders Percentile Current Value Should Be
Apples 192 100.00% 29 29
Apples 185 97.62% 29 29
Apples 125 95.24% 29 29
Apples 122 92.86% 29 29
Apples 120 90.48% 29 29
Apples 90 88.10% 29 29
Apples 30 85.71% 29 29
Apples 29 83.33% 29 29
Apples 27 80.95% 29 29
Apples 25 78.57% 29 29
Apples 25 78.57% 29 29
Apples 25 78.57% 29 29
Oranges 2 100.00% 0 1
Oranges 2 100.00% 0 1
Oranges 1 60.00% 0 1
Oranges 1 60.00% 0 1
Lemons 11 100.00% 0 2
Lemons 10 88.89% 0 2
Lemons 2 77.78% 0 2
Lemons 2 77.78% 0 2
Lemons 1 55.56% 0 2
Currently my formula in the "Current Value" column is: =SUMIFS([Orders],[Product],[#[Product]],[Percentile],INDEX([Percentile],MATCH(FALSE,[Percentile]>$O$1,0))) (entered as an array formula)
$O$1 contains the percentile that I am matching (85.00%).
The current value for "Apples" (29) is correct, but as you can see my formula is not producing the correct value for the remaining products as in "Should Be" but is returning "0". Not sure how to set this up to get it to do what I need it to. I tried several things with SumProduct but couldn't get that to work either. I need someone with more experience to give me a hand on this.
You don't need the SUMIFS(), just the INDEX/MATCH:
=INDEX([Orders],MATCH(1,([Percentile]<$O$1)*([Product]=[#Product]),0))
This is an array formula and must be confirmed with Ctrl-Shift-Enter on exiting edit mode. If done properly then Excel will put {} around the formula.