Replacing constant values with nan - python-3.x

import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2]}
df=pd.DataFrame(data,columns=['col1'])
print df
col1
0 1
1 3
2 3
3 1
4 2
5 3
6 2
7 2
Expected result:
Col1 newCol1
0 1. 1
1 3. 3
2 3. NaN
3. 1. 1
4 2. 2
5 3. 3
6 2. 2
7. 2. Nan

Try where combine with shift
df['col2'] = df.col1.where(df.col1.ne(df.col1.shift()))
df
Out[191]:
col1 col2
0 1 1.0
1 3 3.0
2 3 NaN
3 1 1.0
4 2 2.0
5 3 3.0
6 2 2.0
7 2 NaN

Related

How to reshape dataframe by creating a grouping multiheader from specific column?

I like to reshape a dataframe thats first column should be used to group the other columns by an additional header row.
Initial dataframe
df = pd.DataFrame(
{
'col1':['A','A','A','B','B','B'],
'col2':[1,2,3,4,5,6],
'col3':[1,2,3,4,5,6],
'col4':[1,2,3,4,5,6],
'colx':[1,2,3,4,5,6]
}
)
Trial:
Using pd.pivot() I can create an example, but this do not fit my expected one, it seems to be flipped in grouping:
df.pivot(columns='col1', values=['col2','col3','col4','colx'])
col2 col3 col4 colx
col1 A B A B A B A B
0 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
1 2.0 NaN 2.0 NaN 2.0 NaN 2.0 NaN
2 3.0 NaN 3.0 NaN 3.0 NaN 3.0 NaN
3 NaN 4.0 NaN 4.0 NaN 4.0 NaN 4.0
4 NaN 5.0 NaN 5.0 NaN 5.0 NaN 5.0
5 NaN 6.0 NaN 6.0 NaN 6.0 NaN 6.0
Expected output:
A B
col1 col2 col3 col4 colx col2 col3 col4 colx
0 1 1 1 1 4 4 4 4
1 2 2 2 2 5 5 5 5
2 3 3 3 3 6 6 6 6
Create counter column by GroupBy.cumcount, then use DataFrame.pivot with swapping level of MultiIndex in columns by DataFrame.swaplevel, sorting it and last remove index and columns names by DataFrame.rename_axis:
df = (df.assign(g = df.groupby('col1').cumcount())
.pivot(index='g', columns='col1')
.swaplevel(0,1,axis=1)
.sort_index(axis=1)
.rename_axis(index=None, columns=[None, None]))
print(df)
A B
col2 col3 col4 colx col2 col3 col4 colx
0 1 1 1 1 4 4 4 4
1 2 2 2 2 5 5 5 5
2 3 3 3 3 6 6 6 6
As an alternative to the classical pivot, you can concat the output of groupby with a dictionary comprehension, ensuring alignment with reset_index:
out = pd.concat({k: d.drop(columns='col1').reset_index(drop=True)
for k,d in df.groupby('col1')}, axis=1)
output:
A B
col2 col3 col4 colx col2 col3 col4 colx
0 1 1 1 1 4 4 4 4
1 2 2 2 2 5 5 5 5
2 3 3 3 3 6 6 6 6

Subtract from all columns in dataframe row by the value in a Series when indexes match

I am trying to subtract 1 from all columns in the rows of a DataFrame that have a matching index in a list.
For example, if I have a DataFrame like this one:
df = pd.DataFrame({'AMOS Admin': [1,1,0,0,2,2], 'MX Programs': [0,0,1,1,0,0], 'Material Management': [2,2,2,2,1,1]})
print(df)
AMOS Admin MX Programs Material Management
0 1 0 2
1 1 0 2
2 0 1 2
3 0 1 2
4 2 0 1
5 2 0 1
I want to subtract 1 from all columns where index is in [2, 3] so that the end result is:
AMOS Admin MX Programs Material Management
0 1 0 2
1 1 0 2
2 -1 0 1
3 -1 0 1
4 2 0 1
5 2 0 1
Having found no way to do this I created a Series:
sr = pd.Series([1,1], index=['2', '3'])
print(sr)
2 1
3 1
dtype: int64
However, applying the sub method as per this question results in a DataFrame with all NaN and new rows at the bottom.
AMOS Admin MX Programs Material Management
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
Any help would be most appreciated.
Thanks,
Juan
Using reindex with you sr then subtract using values
df.loc[:]=df.values-sr.reindex(df.index,fill_value=0).values[:,None]
df
Out[1117]:
AMOS Admin MX Programs Material Management
0 1 0 2
1 1 0 2
2 -1 0 1
3 -1 0 1
4 2 0 1
5 2 0 1
If what you want to do is that specific, why don't you just:
df.loc[[2, 3], :] = df.loc[[2, 3], :].subtract(1)

shifting a column down in a pandas dataframe

I have data in the following way
A B C
1 2 3
2 5 6
7 8 9
I want to change the dataframe into
A B C
2 3
1 5 6
2 8 9
3
One way would be to add a blank row to the dataframe and then use shift
# input df:
A B C
0 1 2 3
1 2 5 6
2 7 8 9
df.loc[len(df.index), :] = None
df['A'] = df.A.shift(1)
print (df)
A B C
0 NaN 2.0 3.0
1 1.0 5.0 6.0
2 2.0 8.0 9.0
3 7.0 NaN NaN

Python: Summing every five rows of column b data and create a new column

I have a dataframe like below. I would like to sum row 0 to 4 (every 5 rows) and create another column with summed value ("new column"). My real dataframe has 263 rows so, last three rows every 12 rows will be sum of three rows only. How I can do this using Pandas/Python. I have started to learn Python recently. Thanks for any advice in advance!
My data patterns is more complex as I am using the index as one of my column values and it repeats like:
Row Data "new column"
0 5
1 1
2 3
3 3
4 2 14
5 4
6 8
7 1
8 2
9 1 16
10 0
11 2
12 3 5
0 3
1 1
2 2
3 3
4 2 11
5 2
6 6
7 2
8 2
9 1 13
10 1
11 0
12 1 2
...
259 50 89
260 1
261 4
262 5 10
I tried iterrows and groupby but can't make it work so far.
Use this:
df['new col'] = df.groupby(df.index // 5)['Data'].transform('sum')[lambda x: ~(x.duplicated(keep='last'))]
Output:
Data new col
0 5 NaN
1 1 NaN
2 3 NaN
3 3 NaN
4 2 14.0
5 4 NaN
6 8 NaN
7 1 NaN
8 2 NaN
9 1 16.0
Edit to handle updated question:
g = df.groupby(df.Row).cumcount()
df['new col'] = df.groupby([g, df.Row // 5])['Data']\
.transform('sum')[lambda x: ~(x.duplicated(keep='last'))]
Output:
Row Data new col
0 0 5 NaN
1 1 1 NaN
2 2 3 NaN
3 3 3 NaN
4 4 2 14.0
5 5 4 NaN
6 6 8 NaN
7 7 1 NaN
8 8 2 NaN
9 9 1 16.0
10 10 0 NaN
11 11 2 NaN
12 12 3 5.0
13 0 3 NaN
14 1 1 NaN
15 2 2 NaN
16 3 3 NaN
17 4 2 11.0
18 5 2 NaN
19 6 6 NaN
20 7 2 NaN
21 8 2 NaN
22 9 1 13.0
23 10 1 NaN
24 11 0 NaN
25 12 1 2.0

Using pandas' groupby with shifting

I am looking to use pd.rolling_mean in a groupby operation. I want to have in each group a rolling mean of the previous elemnets within the same group. Here is an example:
id val
0 1
0 2
0 3
1 4
1 5
2 6
Grouping by id, this should be transformed into:
id val
0 nan
0 1
0 1.5
1 nan
1 4
2 nan
I believe you want pd.Series.expanding
df.groupby('id').val.apply(lambda x: x.expanding().mean().shift())
0 NaN
1 1.0
2 1.5
3 NaN
4 4.0
5 NaN
Name: val, dtype: float64
I think you need groupby with shift and rolling, window size can be set to scalar:
df['val']=df.groupby('id')['val'].apply(lambda x: x.shift().rolling(2, min_periods=1).mean())
print (df)
id val
0 0 NaN
1 0 1.0
2 0 1.5
3 1 NaN
4 1 4.0
5 2 NaN
Thank you 3novak for comment - you can set window size by max length of group:
f = lambda x: x.shift().rolling(df['id'].value_counts().iloc[0], min_periods=1).mean()
df['val'] = df.groupby('id')['val'].apply(f)
print (df)
id val
0 0 NaN
1 0 1.0
2 0 1.5
3 1 NaN
4 1 4.0
5 2 NaN

Resources