How can I add previous column values to to get new value in Excel? - excel

I am working on graph and in need data in below format. I have data in COL A. I need to calculate COL B values as in below picture.
What is the formula for obtaining this in excel?

You can do with cumsum and shift:
# sample data
df = pd.DataFrame({'COL A': np.arange(11)})
df['COL B'] = df['COL A'].shift(fill_value=0).cumsum()
Output:
COL A COL B
0 0 0
1 1 0
2 2 1
3 3 3
4 4 6
5 5 10
6 6 15
7 7 21
8 8 28
9 9 36
10 10 45

Use simple MS technique.
You can use the formula (A3*A2)/2 for COL2

Related

Dynamically updating row values based on a condition in pandas

I am running a simulation test where I want to dynamically change some values present in rows for each column based on certain set of conditions
The Problem Statement
My dataset has 400 rows and my first test case is to update 5% of the rows in each column, so 5% of 400 = 20 rows which needs to be updated
These 20 rows should be only updated for the top 5 categories that are present in my dataset. So 4 rows each which needs to be updated
My dataframe looks like this:
A B C D Category
1 10 3 4 X
4 9 6 9 Y
9 3 7 10 XX
10 1 9 7 YY
10 1 9 7 ZZ
10 1 9 7 YZZ
10 1 9 7 YZZ
10 1 9 7 YYYY
......400 rows
The conditions are:
While updating the rows I would want to make sure that 20 rows (5% of the overall dataset) should be updated only where the top 5 categories are encountered. In my case the top 5 categories are X, Y , XX, YY and ZZ. These rows should be updated to value 7 where the previous value was 1,2,3,4,5,6
The resultant datframe should look like this:
A B C D Category
7 10 7 7 X
7 9 7 9 Y
9 7 7 10 XX
10 7 9 7 YY
10 7 9 7 ZZ
10 1 9 7 YZZ
10 1 9 7 YZZ
10 1 9 7 YYYY
......400 rows
In the resultant dataframe, there is no impact on the categories which are not the top 5 categories, this case YZZ or YYYY and to demonstrate an example I can't show all the updated rows but for example in the above dataframe, 2 rows have been updated for column A where previous value was <=6 to a new value 7 and similarly the other two rows will get updated to 7 wherever the condition is met.
How can I achieve this?
You can try the following logic:
# get only desired Categories
m = df['Category'].isin(['X', 'Y', 'XX', 'YY', 'ZZ'])
# select 20 random rows from the above
idx = df[m].sample(n=20).index
# replace the 1 ≤ values ≤ 6 by 7
df.loc[idx] = df.loc[idx].mask(df.loc[idx].ge(1)&df.loc[idx].le(6), 7)
If you rather want 4 rows per Category, use this variant for the random sampling:
idx = df[m].groupby('Category').sample(n=4).index

Trouble Changing csv row based on column values in python3 and pandas

I am having issues trying to modify a value if columnA contains lets say 0,2,12,44,75,81 (looking at 33 different numerical values in columnA. If columnA contains 1 of the 33 vaules defined, i need to then change the same row of colulmnB to "approved".
I have tried this:
df[(df.columnA == '0|2|12|44|75|81')][df.columnB] = 'approved'
I get an error that there are none of index are in the columns but, i know that isn't correct looking at my csv file data. I must be searching wrong or using the wrong syntax.
As others have said, if you need the value in column A to match exactly with a value in the list, then you can use .isin. If you need a partial match, then you can convert to string dtypes and use .contains.
setup:
nums_to_match = [0,2,9]
df
A B
0 6 e
1 1 f
2 3 b
3 6 g
4 6 i
5 0 f
6 9 a
7 6 i
8 6 a
9 2 h
Exact match:
df.loc[df.A.isin(nums_to_match),'B']='approved'
df:
A B
0 6 e
1 1 f
2 3 b
3 6 g
4 6 i
5 0 approved
6 9 approved
7 6 i
8 6 a
9 2 approved
partial match:
nums_to_match_str = list(map(str,nums_to_match))
df['A']=df['A'].astype(str)
df.loc[df.A.str.contains('|'.join(nums_to_match_str),case=False,na=False),'B']='approved'
df
A B
0 1 h
1 4 c
2 6 i
3 7 d
4 3 d
5 9 approved
6 5 i
7 1 c
8 0 approved
9 5 d

Python create a column based on the values of each row of another column

I have a pandas dataframe as below:
import pandas as pd
df = pd.DataFrame({'ORDER':["A", "A", "A", "B", "B","B"], 'GROUP': ["A_2018_1B1", "A_2018_1B1", "A_2018_1M1", "B_2018_I000_1C1", "B_2018_I000_1B1", "B_2018_I000_1C1H"], 'VAL':[1,3,8,5,8,10]})
df
ORDER GROUP VAL
0 A A_2018_1B1 1
1 A A_2018_1B1H 3
2 A A_2018_1M1 8
3 B B_2018_I000_1C1 5
4 B B_2018_I000_1B1 8
5 B B_2018_I000_1C1H 10
I want to create a column "CAL" as sum of 'VAL' where GROUP name is same for all the rows expect H character in the end. So, for example, 'VAL' column for 1st two rows will be added because the only difference between the 'GROUP' is 2nd row has H in the last. Row 3 will remain as it is, Row 4 and 6 will get added and Row 5 will remain same.
My expected output
ORDER GROUP VAL CAL
0 A A_2018_1B1 1 4
1 A A_2018_1B1H 3 4
2 A A_2018_1M1 8 8
3 B B_2018_I000_1C1 5 15
4 B B_2018_I000_1B1 8 8
5 B B_2018_I000_1C1H 10 15
Try with replace then transform
df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')
0 4
1 4
2 8
3 15
4 8
5 15
Name: VAL, dtype: int64
df['CAL'] = df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')

How to replenish a data frame based on another one?

Given two data frames. One contains a column of repeated values (a, in this case). The other contains what this value corresponds to (in this example, it corresponds to some "d" values). How do I efficiently replenish the first data frame with a new column, values in which correspond to some existent column, according to a rule recorded in the other data frame. Here is an example code that works really slow:
import pandas as pd
import numpy as np
d1 = pd.DataFrame(np.asarray([[1,2,3], [2,4,5], [3,4,5], [2,1,4], [3,4,5]]), columns = ['a', 'b', 'c'])
d2 = pd.DataFrame(np.asarray([[1,7], [2,8], [3,11]]), columns = ['a', 'd'])
d = np.empty((d1.shape[0],))
for i in range(d1.shape[0]):
temp = d2.loc[d2['a'] == d1.at[i,'a']]
d[i] = temp['d'].array[0]
d1['d'] = d
This is d1 original:
a b c
0 1 2 3
1 2 4 5
2 3 4 5
3 2 1 4
4 3 4 5
This is d2:
a d
0 1 7
1 2 8
2 3 11
This is a resultant d1:
a b c d
0 1 2 3 7
1 2 4 5 8
2 3 4 5 11
3 2 1 4 8
4 3 4 5 11
You're probably looking for pd.merge.
In your case, d1 = d1.merge(d2, on=['a'], how='left') should do the trick.
Another way is to use map and make only the values you need.
d1['d'] = d1['a'].map(d2.set_index('a')['d'])
d1
Output:
a b c d
0 1 2 3 7
1 2 4 5 8
2 3 4 5 11
3 2 1 4 8
4 3 4 5 11

excel:compare 2 columns and copy data on other columns

need help.. i trying to compare 2 columns and copy data in other columns..
Columns:
A B C D
1 3 10
2 4 20
3 1 30
4 2 40
5 0 50
i want to compare column A to B to find its duplicate and copy data from column C if column A has a duplicate at column B...
Result must be:
A B C D
1 3 10 0
2 4 20 40
3 6 30 10
4 2 40 20
5 0 50 0
thanks in advance...
An answer as I understand the question (assuming the change in col B is just a typo):
Input
A B C D
1 3 10
2 4 20
3 6 30
4 2 40
5 0 50
Output
A B C D
1 3 10 0
2 4 20 40
3 6 30 10
4 2 40 20
5 0 50 0
Formula in D2 (filled down): =IF(COUNTIF(B$2:B$6, $A2)>0, VLOOKUP($A2,$B$2:$C$6, 2, FALSE), 0).
COUNTIF(B$2:B$6, $A2) returns the number of times the value in A2 appears in the array B2:B6. If this value is greater than 0 (meaning that A2 is in B2:B6), the IF() function looks looks up A2 in col B and returns the value in the 2nd row (col C); if A2 is not in B2:B6, the formula returns 0.

Resources