Replace every value in pandas.DataFrame row with the count of that value - python-3.x

I would like to replace the values in my pd.DataFrame, df with counts of the value in row.
import pandas as pd
df = pd.DataFrame({
'foo': [3,3,1,1,1,2],
'bar': [4,4,1,3,3,3]
}).transpose()
0
1
2
3
4
5
foo
3
3
1
1
1
2
bar
4
4
1
3
3
3
I would expect to see:
0
1
2
3
4
5
foo
2
2
3
3
3
1
bar
2
2
1
3
3
3
Unable to determine a solution using .apply().
What is the most sensible way of achieving the above?

One way using pandas.Series.value_counts with map:
df.apply(lambda x: x.map(x.value_counts()), axis=1)
Output:
0 1 2 3 4 5
foo 2 2 3 3 3 1
bar 2 2 1 3 3 3

Related

pandas compare 1 row value with every other row value and create a matrix

DF in hand
Steps I want to perform:
compare A001 data with A002, A003,...A00N
for every value that matches raise a counter by 1
do not increment the count if NA
repeat for row A002 with all other rows
create a matrix using the index with total count of matching values
DF creation:
data = {'name':['A001', 'A002', 'A003',
'A004','A005','A006','A007','A008'],
'Q1':[2,1,1,1,2,1,1,5],
'Q2':[4,4,4,2,4,2,5,4]
'Q3':[2,2,3,2,2,3,2,2]
'Q4':[5,3,5,2,3,2,4,5]
'Q5':[2,2,3,2,2,2,2,2]}
df = pd.DataFrame(data)
df.at[7, 'Q3'] = None
desired output
thanks in advance.
IIUC,
df = pd.DataFrame({'name':['A001', 'A002', 'A003', 'A004','A005','A006','A007','A008'],
'Q1':[2,1,1,1,2,1,1,5],
'Q2':[4,4,4,2,4,2,5,4],
'Q3':[2,2,3,2,2,3,2,2],
'Q4':[5,3,5,2,3,2,4,5],
'Q5':[2,2,3,2,2,2,2,2]})
dfm = df.merge(df, how='cross').set_index(['name_x','name_y'])
dfm.columns = dfm.columns.str.split('_', expand=True)
df_out = dfm.stack(0).apply(pd.to_numeric, errors='coerce').diff(axis=1).eq(0).sum(axis=1).groupby(level=[0,1]).sum().unstack()
output:
name_y A001 A002 A003 A004 A005 A006 A007 A008
name_x
A001 5 3 2 2 4 1 2 4
A002 3 5 2 3 4 2 3 3
A003 2 2 5 1 1 2 1 2
A004 2 3 1 5 2 4 3 2
A005 4 4 1 2 5 1 2 3
A006 1 2 2 4 1 5 2 1
A007 2 3 1 3 2 2 5 2
A008 4 3 2 2 3 1 2 5

Taking different records from groups using group by in pandas

Suppose I have dataframe like this
>>> df = pd.DataFrame({'id':[1,1,1,2,2,2,2,3,4],'value':[1,2,3,1,2,3,4,1,1]})
>>> df
id value
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
6 2 4
7 3 1
8 4 1
Now I want top all records from each group using group id except last 3. That means I want to drop last 3 records from all groups. How can I do it using pandas group_by. This is dummy data.
Use GroupBy.cumcount for counter from back by ascending=False and then compare by Series.gt for greater values like 2, because python count from 0:
df = df[df.groupby('id').cumcount(ascending=False).gt(2)]
print (df)
id value
3 2 1
Details:
print (df.groupby('id').cumcount(ascending=False))
0 2
1 1
2 0
3 3
4 2
5 1
6 0
7 0
8 0
dtype: int64

pandas multiindex with expanding window function

I have a multiindex dataframe, an example can be created using:
arrays = [['bar', 'bar', 'bar', 'bar', 'bar','baz', 'baz','baz', 'baz', 'baz', 'foo', 'foo', 'foo',
'foo', 'foo', 'qux', 'qux', 'qux','qux', 'qux'],
[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]]
tuples = list(zip(*arrays))
values = [1,1,2,2,2,1,1,1,1,1,2,2,2,3,3,3,2,2,2,1]
df = pd.DataFrame(values, index=pd.MultiIndex.from_tuples(tuples, names=['first', 'second']),
columns = ['test'])
resulting in a dataframe that looks like this
test
first sec
bar 1 1
2 1
3 2
4 2
5 2
baz 1 1
2 1
3 1
4 1
5 1
foo 1 2
2 2
3 2
4 3
5 3
qux 1 3
2 2
3 2
4 2
5 2
I would like to figure out how to get the cumulative sum of the numbers in "test" for all "first" in a new column called ['result']. I feel like I am close using
df['result'] = df.test.expanding(1).sum()
but I cannot figure out how to cut it off at df['sec'] = 5 and start again (it just keeps going)
I would like my final output to look like
test result
first sec
bar 1 1 1
2 1 2
3 2 4
4 2 6
5 2 8
baz 1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
foo 1 2 2
2 2 4
3 2 6
4 3 9
5 3 12
qux 1 3 3
2 2 5
3 2 7
4 2 9
5 2 11
Suggestions are appreciated.
Did this work,
df['result'] = df.groupby(['first'])['test'].transform(lambda x: x.cumsum())

Pandas how to turn each group into a dataframe using groupby

I have a dataframe looks like,
A B
1 2
1 3
1 4
2 5
2 6
3 7
3 8
If I df.groupby('A'), how do I turn each group into sub-dataframes, so it will look like, for A=1
A B
1 2
1 3
1 4
for A=2,
A B
2 5
2 6
for A=3,
A B
3 7
3 8
By using get_group
g=df.groupby('A')
g.get_group(1)
Out[367]:
A B
0 1 2
1 1 3
2 1 4
You are close, need convert groupby object to dictionary of DataFrames:
dfs = dict(tuple(df.groupby('A')))
print (dfs[1])
A B
0 1 2
1 1 3
2 1 4
print (dfs[2])
A B
3 2 5
4 2 6

Repeating elements in a dataframe

Hi all I have the following dataframe:
A | B | C
1 2 3
2 3 4
3 4 5
4 5 6
And I am trying to only repeat the last two rows of the data so that it looks like this:
A | B | C
1 2 3
2 3 4
3 4 5
3 4 5
4 5 6
4 5 6
I have tried using append, concat and repeat to no avail.
repeated = lambda x:x.repeat(2)
df.append(df[-2:].apply(repeated),ignore_index=True)
This returns the following dataframe, which is incorrect:
A | B | C
1 2 3
2 3 4
3 4 5
4 5 6
3 4 5
3 4 5
4 5 6
4 5 6
You can use numpy.repeat for repeating index and then create df1 by loc, last append to original, but before filter out last 2 rows by iloc:
df1 = df.loc[np.repeat(df.index[-2:].values, 2)]
print (df1)
A B C
2 3 4 5
2 3 4 5
3 4 5 6
3 4 5 6
print (df.iloc[:-2])
A B C
0 1 2 3
1 2 3 4
df = df.iloc[:-2].append(df1,ignore_index=True)
print (df)
A B C
0 1 2 3
1 2 3 4
2 3 4 5
3 3 4 5
4 4 5 6
5 4 5 6
If want use your code add iloc for filtering only last 2 rows:
repeated = lambda x:x.repeat(2)
df = df.iloc[:-2].append(df.iloc[-2:].apply(repeated),ignore_index=True)
print (df)
A B C
0 1 2 3
1 2 3 4
2 3 4 5
3 3 4 5
4 4 5 6
5 4 5 6
Use pd.concat and index slicing with .iloc:
pd.concat([df,df.iloc[-2:]]).sort_values(by='A')
Output:
A B C
0 1 2 3
1 2 3 4
2 3 4 5
2 3 4 5
3 4 5 6
3 4 5 6
I'm partial to manipulating the index into the pattern we are aiming for then asking the dataframe to take the new form.
Option 1
Use pd.DataFrame.reindex
df.reindex(df.index[:-2].append(df.index[-2:].repeat(2)))
A B C
0 1 2 3
1 2 3 4
2 3 4 5
2 3 4 5
3 4 5 6
3 4 5 6
Same thing in multiple lines
i = df.index
idx = i[:-2].append(i[-2:].repeat(2))
df.reindex(idx)
Could also use loc
i = df.index
idx = i[:-2].append(i[-2:].repeat(2))
df.loc[idx]
Option 2
Reconstruct from values. Only do this is all dtypes are the same.
i = np.arange(len(df))
idx = np.append(i[:-2], i[-2:].repeat(2))
pd.DataFrame(df.values[idx], df.index[idx])
0 1 2
0 1 2 3
1 2 3 4
2 3 4 5
2 3 4 5
3 4 5 6
3 4 5 6
Option 3
Can also use np.array in iloc
i = np.arange(len(df))
idx = np.append(i[:-2], i[-2:].repeat(2))
df.iloc[idx]
A B C
0 1 2 3
1 2 3 4
2 3 4 5
2 3 4 5
3 4 5 6
3 4 5 6

Resources