I have a dataframe like below in which I need to replace the 0s with the mean of the rows where the parent_key matches the self_key.
Input DataFrame: df= pd.DataFrame ({'self_key':['a','b','c','d','e','e','e','f','f','f'],'parent_key':[np.nan,'a','b','b','c','c','c','d','d','d'], 'value':[0,0,0,0,4,6,14,12,8,22],'level':[1,2,3,3,4,4,4,4,4,4]})
The row 3 has self_key of 'd' so I would need to replace its 0 value in column 'value' with the mean of rows 7,8,9 to fill with the correct value of 14. Since the lower levels feed into the higher levels I would need to do it from lowest level to highest to fill out the dataframe as well but when I do the below code it doesn't work and I get the error "ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional". How can I fill in the 0s with the means from lowest level to highest?
df['value']=np.where((df['value']==0) & (df['level']==3), df['value'].groupby(df.where(df['parent_key']==df['self_key'])).transform('mean'), df['value'])
Input
self_key parent_key value level
0 a NaN 0 1
1 b a 0 2
2 c b 0 3
3 d b 0 3
4 e c 4 4
5 e c 6 4
6 e c 14 4
7 f d 12 4
8 f d 8 4
9 f d 22 4
My approach is to repeat the above code 3 times and change the level from 3 to 2 to 1, but its not working for even level 3.
Expected Ouput:
self_key parent_key value level
0 a NaN 11 1
1 b a 11 2
2 c b 8 3
3 d b 14 3
4 e c 4 4
5 e c 6 4
6 e c 14 4
7 f d 12 4
8 f d 8 4
9 f d 22 4
If I understand your problem correctly, you are trying to compute mean in a bottom-up fashion by filtering dataframe on certain keys. If so, then following should solve it:
for l in range(df["level"].max()-1, 0, -1):
df_sub = df[(df["level"] == l) & (df["value"] == 0)]
self_keys = df_sub["self_key"].tolist()
for k in self_keys:
df.loc[df_sub[df_sub["self_key"] == k].index, "value"] = df[df["parent_key"] == k]["value"].mean()
[Out]:
self_key parent_key value level
0 a 11 1
1 b a 11 2
2 c b 8 3
3 d b 14 3
4 e c 4 4
5 e c 6 4
6 e c 14 4
7 f d 12 4
8 f d 8 4
9 f d 22 4
I have three columns, A, B and D
Column A and B have integer numbers that Im trying to map to column D which looks something like this:
A
B
C
D
1
3
7
2
4
9
I want to populate column C with column D's data, but I need to look like bellow:
A
B
C
D
1
3
7
7
2
4
7
9
3
3
7
4
4
7
5
3
7
1
4
9
2
3
9
3
4
9
4
3
9
5
4
9
I need to map column D and duplicate those numbers down to Column A and have Column C change every time Column A repeats to the first number, which in this case is 1
I am working on graph and in need data in below format. I have data in COL A. I need to calculate COL B values as in below picture.
What is the formula for obtaining this in excel?
You can do with cumsum and shift:
# sample data
df = pd.DataFrame({'COL A': np.arange(11)})
df['COL B'] = df['COL A'].shift(fill_value=0).cumsum()
Output:
COL A COL B
0 0 0
1 1 0
2 2 1
3 3 3
4 4 6
5 5 10
6 6 15
7 7 21
8 8 28
9 9 36
10 10 45
Use simple MS technique.
You can use the formula (A3*A2)/2 for COL2
I am trying to use the IF and AND function in excel for values in two different cells. I have 25 conditions.
Below is the formula I've created but it keeps on saying there's an error.
IF(AND(A10=“A”,B10=1),11,IF(AND(A=“A”,B10=2),16,IF(AND(A10=“A”,B10=3),20,IF(AND(A10=“A”,B10=4),23,IF(AND(A10=“A”,B10=5),25,IF(AND(A10=“B”,B10=1),7,IF(AND(A10=“B”,B10=2),12,IF(AND(A10=“B”,B10=3),17,IF(AND(A10=“B”,B10=4),21,IF(AND(A10=“B”,B10=5),24,IF(AND(A10=“C”,B10=1),4,IF(AND(A10=“C”,B10=2),8,IF(AND(A10=“C”,B10=3),13,IF(AND(A10=“C”,B10=4),18,IF(AND(A10=“C”,B10=5),22,IF(AND(A10=“D”,B10=1),2,IF(AND(A10=“D”,B10=2),5,IF(AND(A10=“D”,B10=3),9,IF(AND(A10=“D”,B10=4),14,IF(AND(A10=“D”,B10=5),19,IF(AND(A10=“E”,B10=1),1,IF(AND(A10=“E”,B10=2),3,IF(AND(A10=“E”,B10=3),6,IF(AND(A10=“E”,B10=4),10,15))))))))))))))))))))))))))))))))))))))))))))))))
I expected the output to be, for example; if cell1 is "A" and cell2 is 1 the result should be 11.
I would highly advise a lookup table. Simply have all of your options listed out with their desired results and find them with a criteria search, such as the use of sumifs function.
For example, if you paste J1:L25 your possibilities:
A 1 11
A 2 16
A 3 20
A 4 23
A 5 25
B 1 7
B 2 12
B 3 17
B 4 21
B 5 24
C 1 4
C 2 8
C 3 13
C 4 18
C 5 22
D 1 2
D 2 5
D 3 9
D 4 14
D 5 19
E 1 1
E 2 3
E 3 6
E 4 10
E 5 15
You can then place the formula =SUMIFS($L$1:$L$25,$J$1:$J$25,$A$10,$K$1:$K$25,$B$10) to return your desired value.
That is, =SUMIFS(range_of_results, criteria_range_of_A-E, A10, criteria_range_of_1-5, B10)
need help.. i trying to compare 2 columns and copy data in other columns..
Columns:
A B C D
1 3 10
2 4 20
3 1 30
4 2 40
5 0 50
i want to compare column A to B to find its duplicate and copy data from column C if column A has a duplicate at column B...
Result must be:
A B C D
1 3 10 0
2 4 20 40
3 6 30 10
4 2 40 20
5 0 50 0
thanks in advance...
An answer as I understand the question (assuming the change in col B is just a typo):
Input
A B C D
1 3 10
2 4 20
3 6 30
4 2 40
5 0 50
Output
A B C D
1 3 10 0
2 4 20 40
3 6 30 10
4 2 40 20
5 0 50 0
Formula in D2 (filled down): =IF(COUNTIF(B$2:B$6, $A2)>0, VLOOKUP($A2,$B$2:$C$6, 2, FALSE), 0).
COUNTIF(B$2:B$6, $A2) returns the number of times the value in A2 appears in the array B2:B6. If this value is greater than 0 (meaning that A2 is in B2:B6), the IF() function looks looks up A2 in col B and returns the value in the 2nd row (col C); if A2 is not in B2:B6, the formula returns 0.