Fill in missing values in DataFrame Column which is incrementing by 10 - python-3.x

Say , Some Values in the 'Counts' column are missing. These numbers are meant to be increased by 10 with each row so '35' and '55' need to be put in place. I would want to fill in these missing values.
Counts
0 25
1 NaN
2 45
3 NaN
4 65
So my output should be :
Counts
0 25
1 35
2 45
3 55
4 65
Thanks,

We have interpolate
df=df.interpolate()
Counts
0 25.0
1 35.0
2 45.0
3 55.0
4 65.0

Since you now the pattern, you can simply recreate it:
start = df.iloc[0]['Counts'] # first row
end = df.iloc[-1]['Counts'] # last row
df['Counts'] = np.where(df['Counts'].notnull(), df['Counts'],
np.arange(start, end + 1, 10))

Related

how do i assign top 2, middle 2 and bottom 2 values with extra in the given data frame

In the given below data frame. i want to insert a new column with extra and assign, top 2, middle two and below two values as "Extra"
df
A_No B_Wt
39 184.66
40 193.11
46 197.82
2 203.82
12 205.27
9 208.11
3 208.49
14 208.70
Out put
A_No B_Wt Group
39 184.66 Extra
40 193.11 Extra
46 197.82
2 203.82 Extra
12 205.27 Extra
9 208.11
3 208.49 Extra
14 208.70 Extra
I believe you can use join positions for top2, middle2 and bottom2 together and then set values to new column:
lend = len(df)
mid = lend // 2
pos = np.r_[0:2, mid-1:mid+1, lend-2:lend]
df.loc[df.index[pos], 'Group'] = 'Extra'
print (df)
A_No B_Wt Group
0 39 184.66 Extra
1 40 193.11 Extra
2 46 197.82 NaN
3 2 203.82 Extra
4 12 205.27 Extra
5 9 208.11 NaN
6 3 208.49 Extra
7 14 208.70 Extra

Maximum for each column, return value of other for max, create new dataframe of returns

I hope the title is not misleading.
I need to go from this dataframe:
Column_1 Columns_2 First Second Third
0 Element_1 to_be_ignored 10 5 77
1 Element_2 to_be_ignored 30 30 11
2 Element_3 to_be_ignored 60 7 3
3 Element_4 to_be_ignored 20 87 90
to:
New_Column New_Column_1 Max
0 Element_3 First 60
1 Element_4 Second 87
2 Element_4 Third 90
get maximum value of every column
get responding value of Column_1 for maximum value
transform to new dataframe
what i got so far:
data = {'Column_1': ['Element_1', 'Element_2', 'Element_3', 'Element_4'],
'Columns_2': ['to_be_ignored', 'to_be_ignored', 'to_be_ignored', 'to_be_ignored'],
'First': [10,30,60,20], 'Second': [5,30,7,87], 'Third': [77,11,3,90]}
df = pd.DataFrame(data)
df.loc[df.iloc[:, 1:].idxmax(), ['Column_1']
so i am able to get the index position and value for the maximum in the columns.
2 Element_3
3 Element_4
3 Element_4
Unfortunately i can't figure out the rest.
THX
IIUC melt then sort_values + drop_duplicates
df.melt(['Column_1','Columns_2']).sort_values('value').drop_duplicates(['variable'],keep='last')
Column_1 Columns_2 variable value
2 Element_3 to_be_ignored First 60
7 Element_4 to_be_ignored Second 87
11 Element_4 to_be_ignored Third 90

Python Pandas: How to insert a new column which is a sum of next 'n' (can be a fraction also) values of another column?

I've got a DataFrame, let's say the name is 'test' storing data as below:
Week Stock(In Number of Weeks) Demand (In Units)
0 W01 2.4 37
1 W02 3.6 33
2 W03 2.0 46
3 W04 5.8 45
4 W05 4.6 56
5 W06 3.0 38
6 W07 5.0 45
7 W08 7.5 54
8 W09 4.3 35
9 W10 2.2 38
10 W11 2.0 50
11 W12 6.0 37
I want to insert a new column in this dataframe which for every row, is the sum of "No. of weeks" rows of column "Demand(In Units)".
That is, in the case of this dataframe,
for 0th row that new column should be the sum of 2.4 rows of column "Demand(In Units)" which would be 37+33+ 0.4*46
for 1st row, the value should be 33+46+45+ 0.6*56
for 2nd row, it should be 46+45
.
.
.
for 7th row, it should be 54+35+38+50+37 (since number of rows left are smaller than the value 7.5, all the remaining rows get summed up)
.
.
.
and so on.
Effectively, I want my dataframe to have a new column as follows:
Week Stock(In Number of Weeks) Demand (In Units) Stock (In Units)
0 W01 2.4 37 88.4
1 W02 3.6 33 157.6
2 W03 2.0 46 91.0
3 W04 5.8 45 266.0
4 W05 4.6 56 214.0
5 W06 3.0 38 137.0
6 W07 5.0 45 222.0
7 W08 7.5 54 214.0
8 W09 4.3 35 160.0
9 W10 2.2 38 95.4
10 W11 2.0 50 87.0
11 W12 6.0 37 37.0
Can somebody suggest some way to achieve this?
I can achieve it through iterating over each row but it would be very slow for millions of rows which I want to process at a time.
The code which I am using right now is:
for i in range(len(test)):
if int(np.floor(test.loc[i, 'Stock(In Number of Weeks)'])) >= len(test[i:]):
number_of_full_rows = len(test[i:])
fraction_of_last_row = 0
y = 0
else:
number_of_full_rows = int(np.floor(test.loc[i, 'Stock(In Number of Weeks)']))
fraction_of_last_row = test.loc[i, 'Stock(In Number of Weeks)'] - number_of_full_rows
y = test.loc[i+number_of_full_rows, 'Demand (In Units)'] * fraction_of_last_row
x = np.sum(test[i:i+number_of_full_rows]['Demand (In Units)'])
test.loc[i, 'Stock (In Units)'] = x+y
I tried with some test data:
def func(r, col):
n = int(r['Stock(In Number of Weeks)'])
f = float(r['Stock(In Number of Weeks)'] - n)
i = r.name # row index value
z = np.zeros(len(df)) #initialize all zeros
v = np.hstack((np.ones(n), np.array(f))) # vecotor of ones and fraction part
e = min(len(v), len(z[i:]))
z[i:i+e] = v[:len(z[i:])] #change z starting at index until lenght
r['Stock (In Units)'] = col # z #compute scalar product
return r
df = df.apply(lambda r: func(df['Demand (In Units)'].values, r), axis=1)

row substraction in lambda pandas dataframe

I have a dataframe with multiple columns. One of the column is the cumulative revenue column. If the year is not ended then the revenue will be constant for the rest of the period because the coming daily revenue is 0.
The dataframe looks like this
Now I want to create a new column where the row is substracted by the last row and if the result is 0 then print 0 for that row in the new column. If not zero then use the row value. The new dataframe should look like this:
My idea was to do this with the apply lambda method. So this is the thinking:
{df['2017new'] = df['2017'].apply(lambda x: 0 if row - lastrow == 0 else x)}
But i do not know how to write the row - lastrow part of the code. How to do this? Thanks in advance!
By using np.where
df2['New']=np.where(df2['2017'].diff().eq(0),0,df2['2017'])
df2
Out[190]:
2016 2017 New
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0
We can shift the data and fill the values based on condition using np.where i.e
df['new'] = np.where(df['2017']-df['2017'].shift(1)==0,0,df['2017'])
or with df.where i.e
df['new'] = df['2017'].where(df['2017']-df['2017'].shift(1)!=0,0)
2016 2017 new
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0

Pandas multi-index subtract from value based on value in other column part 2

Based on a thorough and accurate response to this question, I am now faced with a new issue based on slightly different data.
Given this data frame:
df = pd.DataFrame({
('A', 'a'): [23,3,54,7,32,76],
('B', 'b'): [23,'n/a',54,7,32,76],
('possible','possible'):[100,100,100,100,100,100]
})
df
A B possible
a b possible
0 23 23 100
1 3 n/a 100
2 54 54 100
3 7 n/a 100
4 32 32 100
5 76 76 100
I'd like to subtract 4 from 'possible', per row, for any instance (column) where the value is 'n/a' for that row (and then change all 'n/a' values to 0).
A B possible
a b possible
0 23 23 100
1 3 n/a 96
2 54 54 100
3 7 n/a 96
4 32 32 100
5 76 76 100
Some conditions:
It may occur that a column is all floats (though they appear to be integers upon inspection). This was not factored into the original question.
It may also occur that a row contains two instances (columns) of 'n/a' values. This was addressed by the previous solution.
Here is the previous solution:
idx = pd.IndexSlice
df.loc[:, idx['possible', 'possible']] -= (df.loc[:, idx[('A','B'),:]] == 'n/a').sum(axis=1) * 4
df.replace({'n/a':0}, inplace=True)
It works, except for where a column (A or B) contains all floats (seemingly integers). When that's the case, this error occurs:
TypeError: Could not compare ['n/a'] with block values
I think you can add casting to string by astype to condition:
idx = pd.IndexSlice
df.loc[:, idx['possible', 'possible']] -=
(df.loc[:, idx[('A','B'),:]].astype(str) == 'n/a').sum(axis=1) * 4
df.replace({'n/a':0}, inplace=True)
print df
A B possible
a b possible
0 23 23 100
1 3 0 96
2 54 54 100
3 7 0 96
4 32 32 100
5 76 76 100

Resources