Offset function in excel to skip 0s and sum a dynamic range of rows - excel

I am trying to use offset function in excel to dynamically calculate sales since launch of products (3 months after, 6 months after etc.). For eg. for item 1 first 3 months of sale is 2557+10000+14487= 27044 and similarly for item 2 first 3 months sale will be 2557+11853+14487=28897
Any guidance on this is super appreciated!!
Items
Item1
Item2
Jan-20
0
0
Feb-20
0
0
Mar-20
0
0
Apr-20
0
0
May-20
0
2557
Jun-20
0
11853
Jul-20
0
14487
Aug-20
0
11375
Sep-20
0
10938
Oct-20
0
10842
Nov-20
0
15132
Dec-20
0
19820
Jan-21
2557
20726
Feb-21
10000
25072
Mar-21
14487
28897
Apr-21
11375
28665
May-21
10938
42358
Jun-21
10842
25619
Jul-21
15132
20575
Aug-21
19820
23315
Sep-21
20726
21346
Oct-21
25072
19377

Yes, avoid offset. You could use index/match for example:
=SUM(INDEX(B:B,MATCH(0,B:B)+1):INDEX(B:B,MATCH(0,B:B)+3))

Related

Pandas: how to calculate average ignoring 0 within groups?

My data looks like this:
It is grouped by "name"
name star atm food foodcp drink drinkcp clean cozy service
___Backyard Jr. (__Xinyi) 4 4 4 4 4 0 4 0 0
___Backyard Jr. (__Xinyi) 3 0 3 0 3 0 0 0 3
___Backyard Jr. (__Xinyi) 4 0 0 0 4 0 0 0 0
___Backyard Jr. (__Xinyi) 3 0 0 0 0 0 0 3 3
I want to calculate the mean of all columns except for name, which will ignore the "0" and it will be done within groups. How can I do it?
I've tried use
df.groupby('name',as_index=False).mean()
but it dose calculate the "0".
Thank you for your help!!
You can first replace all the zeros by NaN:
df = df.replace(0, np.nan)
These nan values will be excluded from your mean.

Pandas in Python 3 - Return list of highest sum

I want to find the sum of each column in the dataframe below and return a list of the highest sums. I've tried to use the code below however it only reports the max number. How do I update to include the column label (or labels if there are multiple columns if more than one column equals the max).
grouped = df.sum()
mostPurchased = grouped.max()
print(grouped)
snow suit
gloves
coat
boots
january
1
0
0
0
february
1
0
1
0
march
0
0
0
0
april
0
0
1
0
may
0
0
1
1
june
0
0
0
1
july
0
1
0
1
I want this to return:
Coat 3, Boots 3
Select the columns where the column sum equals the max column sum:
grouped = df.sum()
grouped[grouped == grouped.max()]
#coat 3
#boots 3
#dtype: int64

Excel: Match and Index based on range

I am stumped with the following problem and not sure how to accomplish it in excel. Here is an example of the data:
A B
1 Date Stock_Return
2 Jan-95 -5.2%
3 Feb-95 2.1%
4 Mar-95 3.7%
5 Apr-95 6.9%
6 May-95 6.5%
7 Jun-95 -5.6%
8 Jul-95 6.6%
9 Aug-95 6.2%
What I would like is to have the dates returned which fall within a certain return range and sorted from low to high.
For example:
1 2 3 4 5
Below -7% 0 0 0 0 0
-7% to -5% Jun-95 Jan-95 0 0 0
-5% to -3% 0 0 0 0 0
-3% to 0% 0 0 0 0 0
0% to 3% Feb-95 0 0 0 0
3% to 5% Mar-95 0 0 0 0
5% to 7% Aug-95 May-95 Jul-95 Apr-95 0
I thought Index and Match might make the most sense but when I drag across columns it doesn't work. Any help is very much appreciated.
You can use AGGREGATE function:
=IFERROR(AGGREGATE(14,6,$A$2:$A$9/(($B$2:$B$9>$D2)*($B$2:$B$9<=$E2)),COLUMN(A1)),"0")
If you have Excel O365, you can use the FILTER function:
F2: =IFERROR(TRANSPOSE(FILTER($A$2:$A$9,(F2<=$B$2:$B$9)*(G2>=$B$2:$B$9))),"")
and fill down.

delete 0 and restack in excel

I have this issue in excel where I want to delete 0 and re-stack the rows.
Problem:
0 0 1 2 3
0 0 0 1 0
0 2 3 0 1
2 5 3 0 0
The desired result would be
1 2 3
1 0
2 3 0 1
2 5 3 0 0
Any suggestions?
This will create a range from the first non 0 to the end and then the outer INDEX will return them in order as it is dragged across.
=IFERROR(INDEX(INDEX($A1:$E1,AGGREGATE(15,7,COLUMN($A1:$E1)/($A1:$E1<>0),1)):$E1,,COLUMN(A:A)),"")
Just for the sake of giving alternatives:
Formula in A6 translates to:
=IFERROR(INDEX($A1:$E1,,MATCH(TRUE,INDEX($A1:$E1>0,0),0)+COLUMN()-1),"")
Dragged down and sideways.

Splitting a each column value into different columns [duplicate]

This question already has answers here:
Convert pandas DataFrame column of comma separated strings to one-hot encoded
(3 answers)
Closed 4 years ago.
I have a survey response sheet which has questions which can have multiple answers, selected using a set of checkboxes.
When I get the data from the response sheet and import it into pandas I get this:
Timestamp Sports you like Age
0 23/11/2013 13:22:30 Football, Chess, Cycling 15
1 23/11/2013 13:22:34 Football 25
2 23/11/2013 13:22:39 Swimming,Football 22
3 23/11/2013 13:22:45 Chess, Soccer 27
4 23/11/2013 13:22:48 Soccer 30
There can be any number of sport values in sports column (further rows has basketball,volleyball etc.) and there are still some other columns. I'd like to do statistics on the results of the question (how many people liked Football,etc). The problem is, that all of the answers are within one column, so grouping by that column and asking for counts doesn't work.
Is there a simple way within Pandas to convert this sort of data frame into one where there are multiple columns called Sports-Football, Sports-Volleyball, Sports-Basketball, and each of those is boolean (1 for yes, 0 for no)? I can't think of a sensible way to do this
What I need is a new dataframe that looks like this (along with Age column) -
Timestamp Sports-Football Sports-Chess Sports-Cycling ....
0 23/11/2013 13:22:30 1 1 1
1 23/11/2013 13:22:34 1 0 0
2 23/11/2013 13:22:39 1 0 0
3 23/11/2013 13:22:45 0 1 0
I tried till this point can't proceed further.
df['Sports you like'].str.split(',\s*')
which splits into different columns but the first column may have any sport, I need only 1 in first column if the user likes Football or 0.
Problem is separator ,\s*, so solution is add str.split with str.join before str.get_dummies:
df1 = (df.pop('Sports you like').str.split(',\s*')
.str.join('|')
.str.get_dummies()
.add_prefix('Sports-'))
df = df.join(df1)
print (df)
Timestamp Age Sports-Chess Sports-Cycling Sports-Football \
0 23/11/2013 13:22:30 15 1 1 1
1 23/11/2013 13:22:34 25 0 0 1
2 23/11/2013 13:22:39 22 0 0 1
3 23/11/2013 13:22:45 27 1 0 0
4 23/11/2013 13:22:48 30 0 0 0
Sports-Soccer Sports-Swimming
0 0 0
1 0 0
2 0 1
3 1 0
4 1 0
Or use MultiLabelBinarizer:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
s = df.pop('Sports you like').str.split(',\s*')
df1 = pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_).add_prefix('Sports-')
print (df1)
Sports-Chess Sports-Cycling Sports-Football Sports-Soccer \
0 1 1 1 0
1 0 0 1 0
2 0 0 1 0
3 1 0 0 1
4 0 0 0 1
Sports-Swimming
0 0
1 0
2 1
3 0
4 0
df = df.join(df1)
print (df)
Timestamp Age Sports-Chess Sports-Cycling Sports-Football \
0 23/11/2013 13:22:30 15 1 1 1
1 23/11/2013 13:22:34 25 0 0 1
2 23/11/2013 13:22:39 22 0 0 1
3 23/11/2013 13:22:45 27 1 0 0
4 23/11/2013 13:22:48 30 0 0 0
Sports-Soccer Sports-Swimming
0 0 0
1 0 0
2 0 1
3 1 0
4 1 0

Resources