Grouping By 2 Columns In Pandas Ignoring Order

Grouping By 2 Columns In Pandas Ignoring Order - python-3.x

I have a Dataframe in Pandas where there are 2 columns that are almost identical but not quite and hence sometimes I want to group by both columns ignoring the order.
As an example:
mydf = pd.DataFrame({'Colour1': ['Red', 'Red', 'Blue', 'Green', 'Blue'], 'Colour2': ['Red', 'Blue', 'Red', 'Blue', 'Green'], 'Rating': [4, 5, 7, 8, 2]})
Colour1 Colour2 Rating
0 Red Red 4
1 Red Blue 5
2 Blue Red 7
3 Green Blue 8
4 Blue Green 2
I would like to group by Colour1 and Colour2 whilst ignoring the order and then transforming the Dataframe by taking the mean to produce the following Dataframe:
Colour1 Colour2 Rating MeanRating
0 Red Red 4 4
1 Red Blue 5 6
2 Blue Red 7 6
3 Green Blue 8 5
4 Blue Green 2 5
Is there a good way of doing this? Thanks in advance.

You can first sort the column1 and 2 using np.sort then groupby:
s = pd.Series(map(tuple,np.sort(mydf[['Colour1','Colour2']],axis=1)),index=mydf.index)
mydf['MeanRating'] = mydf['Rating'].groupby(s).transform('mean')
print(mydf)
Colour1 Colour2 Rating MeanRating
0 Red Red 4 4
1 Red Blue 5 6
2 Blue Red 7 6
3 Green Blue 8 5
4 Blue Green 2 5

Related

Pandas Identify duplicate records, create a new column and add the ID of first occurrence

I am a newbie in python, so please be mercy with me :)
Let's say, that there is a dataframe like this
ID B C D E isDuplicated
1 Blue Green Blue Pink false
2 Red Green Red Green false
3 Red Orange Yellow Green false
4 Blue Pink Blue Pink false
5 Blue Orange Pink Green false
6 Blue Orange Pink Green true
7 Red Orange Yellow Green true
8 Red Orange Yellow Green true
If I have duplicates in the rows with the subset= B,C,D,E.
Then I would like to add an other column 'firstOccurred', which should have the ID of the first occurrence.
My desired dataframe should look like this:
ID B C D E isDuplicated firstOccurred
1 Blue Green Blue Pink false
2 Red Green Red Green false
3 Red Orange Yellow Green false
4 Blue Pink Blue Pink false
5 Blue Orange Pink Green false
6 Blue Orange Pink Green true 5
7 Red Orange Yellow Green true 3
8 Red Orange Yellow Green true 3
I would be grateful for any help!
Thank you in advance!

Use GroupBy.transform with first only for roww with True passed in numpy.where:
df['firstOccurred'] = np.where(df['isDuplicated'],
df.groupby(['B','C','D','E'])['ID'].transform('first'),
np.nan)
print (df)
ID B C D E isDuplicated firstOccurred
0 1 Blue Green Blue Pink False NaN
1 2 Red Green Red Green False NaN
2 3 Red Orange Yellow Green False NaN
3 4 Blue Pink Blue Pink False NaN
4 5 Blue Orange Pink Green False NaN
5 6 Blue Orange Pink Green True 5.0
6 7 Red Orange Yellow Green True 3.0
7 8 Red Orange Yellow Green True 3.0

Pandas create a new data frame from counting rows into columns

I have something like this data frame:
item color
0 A red
1 A red
2 A green
3 B red
4 B green
5 B green
6 C red
7 C green
And I want to count the times a color repeat for each item and group-by it into columns like this:
item red green
0 A 2 1
1 B 1 2
2 C 1 1
Any though? Thanks in advance

How to split Pandas string column into different rows?

Here is my issue. I have data like this:
data = {
'name': ["Jack ;; Josh ;; John", "Apple ;; Fruit ;; Pear"],
'grade': [11, 12],
'color':['black', 'blue']
}
df = pd.DataFrame(data)
It looks like:
name grade color
0 Jack ;; Josh ;; John 11 black
1 Apple ;; Fruit ;; Pear 12 blue
I want it to look like:
name age color
0 Jack 11 black
1 Josh 11 black
2 John 11 black
3 Apple 12 blue
4 Fruit 12 blue
5 Pear 12 blue
So first I'd need to split name by using ";;" and then explode that list into different rows

Use Series.str.split with reshape by DataFrame.stack and add orriginal another columns by DataFrame.join:
c = df.columns
s = (df.pop('name')
.str.split(' ;; ', expand=True)
.stack()
.reset_index(level=1, drop=True)
.rename('name'))
df = df.join(s).reset_index(drop=True).reindex(columns=c)
print (df)
name grade color
0 Jack 11 black
1 Josh 11 black
2 John 11 black
3 Apple 12 blue
4 Fruit 12 blue
5 Pear 12 blue

You have 2 challenges:
split the name with ;; into a list AND have each item in the list as a column such that:
df['name']=df.name.str.split(';;')
df_temp = df.name.apply(pd.Series)
df = pd.concat([df[:], df_temp[:]], axis=1)
df.drop('name', inplace=True, axis=1)
result:
grade color 0 1 2
0 11 black Jack Josh John
1 12 blue Apple Fruit Pear
Melt the list to get desired result:
df.melt(id_vars=["grade", "color"],
value_name="Name").sort_values('grade').drop('variable', axis=1)
desired result:
grade color Name
0 11 black Jack
2 11 black Josh
4 11 black John
1 12 blue Apple
3 12 blue Fruit
5 12 blue Pear

How to find max and return adjacent cell in Excel

Imagine a table:
Red 8 Black 1
Red 2 Black 3
Red 1 Black 0
Red 7 Black 8
Red 4 Black 5
How do I return "Red" or "Black" in a third column for each row depending on which has a larger value?
It would be:
Red 8 Black 1 Red
Red 2 Black 3 Black
Red 1 Black 0 Red
Red 7 Black 8 Black
Red 4 Black 5 Black

Use:
=INDEX(A2:D2,MATCH(MAX(A2:D2),A2:D2,0)-1)
Edit:
Since there are only two Options, a simple IF will work:
=IF(B2>D2,A2,C2)

Append Two Dataframes Together (Pandas, Python3)

I am trying to append/join(?) two different dataframes together that don't share any overlapping data.
DF1 looks like
Teams Points
Red 2
Green 1
Orange 3
Yellow 4
....
Brown 6
and DF2 looks like
Area Miles
2 3
1 2
....
7 12
I am trying to append these together using
bigdata = df1.append(df2,ignore_index = True).reset_index()
but I get this
Teams Points
Red 2
Green 1
Orange 3
Yellow 4
Area Miles
2 3
1 2
How do I get something like this?
Teams Points Area Miles
Red 2 2 3
Green 1 1 2
Orange 3
Yellow 4
EDIT: in regards to Edchum's answers, I have tried merge and join but each create somewhat strange tables. Instead of what I am looking for (as listed above) it will return something like this:
Teams Points Area Miles
Red 2 2 3
Green 1
Orange 3 1 2
Yellow 4

Use concat and pass param axis=1:
In [4]:
pd.concat([df1,df2], axis=1)
Out[4]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
join also works:
In [8]:
df1.join(df2)
Out[8]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
As does merge:
In [11]:
df1.merge(df2,left_index=True, right_index=True, how='left')
Out[11]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
EDIT
In the case where the indices do not align where for example your first df has index [0,1,2,3] and your second df has index [0,2] this will mean that the above operations will naturally align against the first df's index resulting in a NaN row for index row 1. To fix this you can reindex the second df either by calling reset_index() or assign directly like so: df2.index =[0,1].

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Grouping By 2 Columns In Pandas Ignoring Order - python-3.x

Related

Pandas Identify duplicate records, create a new column and add the ID of first occurrence

Pandas create a new data frame from counting rows into columns

How to split Pandas string column into different rows?

How to find max and return adjacent cell in Excel

Append Two Dataframes Together (Pandas, Python3)

Categories

Resources