Pandas create a new data frame from counting rows into columns - python-3.x

I have something like this data frame:
item color
0 A red
1 A red
2 A green
3 B red
4 B green
5 B green
6 C red
7 C green
And I want to count the times a color repeat for each item and group-by it into columns like this:
item red green
0 A 2 1
1 B 1 2
2 C 1 1
Any though? Thanks in advance

Related

How can I display data from 3 tables into a single table?

I have 3 datasets a,b,c and 3 tables that give values based on the strength of their relationships. a has 5 items, b has 7 items, and c has 4 items. (a,b)(a,c)(b,c). I am trying to find a way to display all the data from these tables in a single graphic. The axis would have a Y shape with each leg representing a different data set and the cells representing their relationship strength.
I have looked through excel charts and haven't found anything that might help represent this. Is there another program which would be better? Looking for something simple to use.
Below is an example of the 3 tables and the type of data they contain. I want a way to show the scores for each relationship in a grid.
Table 1
A
B
Score
1
x
3
2
x
5
3
x
0
1
y
2
2
y
6
3
y
0
1
z
5
2
z
8
3
z
0
Table 2
A
C
Score
1
blue
3
2
blue
8
3
blue
2
1
red
0
2
red
4
3
red
1
1
yellow
3
2
yellow
3
3
yellow
9
Table 3
B
C
Score
x
blue
2
x
red
1
x
yellow
5
y
blue
0
y
red
3
y
yellow
7
z
blue
0
z
red
1
z
yellow
3
Here is an example of what I am trying to do with the data
I manually created this type of visual in autoCAD
This works as a one-off but it doesn't scale and is very tedious. Hoping there is a programmatic way to create something similar.

Group By - but sum one column, and show original columns

I have a 5 column df. I need to groupby by the common names in column A, and sum column B and D. But I need to keep my output that currently sits in columns C through E.
Everytime I groupby its drops columns not involved in the the grouping.
I understand some columns will have 2 non common rows, for a common item in column A, and I need to display both of those values. Hope an example illustrates the problem better.
A
B
C
D
E
Apple
10
Green
1
X
Pear
15
Brown
2
Y
Pear
5
Yellow
3
Z
Banana
4
Yellow
4
P
Plum
2
Red
5
R
I'd like to output :
A
B
C
D
E
Apple
10
Green
1
X
Pear
20
Brown
5
Y
Yellow
Z
Banana
4
Yellow
4
P
Plum
2
Red
5
R
I cant seem to find the right combination within the groupby function
df_save =df_orig.loc[:, ["A", "C", "E"]]
df_agg = df_orig.groupby("A").agg({"B": "sum", "D" : "sum"}).reset_index()
df_merged = df_save.merge(df_agg)
for c in ["B", "D"] :
df_merged.loc[df_merged[c].duplicated(), c] = ''
A
C
E
B
D
Apple
Green
X
10
1
Pear
Brown
Y
155
23
Pear
Yellow
Z
Banana
Yellow
P
4
4
Plum
Red
R
2
5
The above is the output after the operations. I hope this works. Thanks

Filter rows based on the count of unique values

I need to count the unique values of column A and filter out the column with values greater than say 2
A C
Apple 4
Orange 5
Apple 3
Mango 5
Orange 1
I have calculated the unique values but not able to figure out how to filer them df.value_count()
I want to filter column A that have greater than 2, expected Dataframe
A B
Apple 4
Orange 5
Apple 3
Orange 1
value_counts should be called on a Series (single column) rather than a DataFrame:
counts = df['A'].value_counts()
Giving:
A
Apple 2
Mango 1
Orange 2
dtype: int64
You can then filter this to only keep those >= 2 and use isin to filter your DataFrame:
filtered = counts[counts >= 2]
df[df['A'].isin(filtered.index)]
Giving:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1
Use duplicated with parameter keep=False:
df[df.duplicated(['A'], keep=False)]
Output:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1

Merging two sheets of one excel into single sheet

I am trying to merge 2 sheets from excel.xlsx using python script. I want when sheet1('CLASS') matches to sheet2('C_MAP') then merge DSC and ASC after CLASS in sheet1 or in a new sheet.
To clarify it i am attaching my excel sheets.
this is my Sheet1:
P_MAP Q_GROUP CLASS
0 ram 2 pink
1 4 silver
2 sham 5 green
3 0 default
4 nil 2 pink
it contains P_MAP,Q_GROUP,CLASS
this is my Sheet2:
C_MAP DSC ASC
0 pink h1 match
1 green h2 match
2 silver h3 match
it contains C_MAP,ASC,DSC
So, I want when the CLASS matches to C_MAP it should add ASC and DSC and if it doesnt match add NA.
The output i want will be like this:
P_MAP Q_GROUP CLASS DSC ASC
0 ram 2 pink h1 match
1 4 silver h3 match
2 sham 5 green h2 match
3 0 default 0 NA
4 nil 2 pink h1 match
What you want is pd.merge:
df1 = pd.read_excel('filename.xlsx', sheet_name='Sheet1') # fill in the correct excel filename
df2 = pd.read_excel('filename.xlsx', sheet_name='Sheet2') # fill in the correct excel filename
df_final = df1.merge(df2,
left_on='CLASS',
right_on='C_MAP',
how='left').drop('C_MAP', axis=1)
df_final.to_excel('filename2.xlsx')
Output
P_MAP Q_GROUP CLASS DSC ASC
0 ram 2 pink h1 match
1 4 silver h3 match
2 sham 5 green h2 match
3 0 default NaN NaN
4 nil 2 pink h1 match

Append Two Dataframes Together (Pandas, Python3)

I am trying to append/join(?) two different dataframes together that don't share any overlapping data.
DF1 looks like
Teams Points
Red 2
Green 1
Orange 3
Yellow 4
....
Brown 6
and DF2 looks like
Area Miles
2 3
1 2
....
7 12
I am trying to append these together using
bigdata = df1.append(df2,ignore_index = True).reset_index()
but I get this
Teams Points
Red 2
Green 1
Orange 3
Yellow 4
Area Miles
2 3
1 2
How do I get something like this?
Teams Points Area Miles
Red 2 2 3
Green 1 1 2
Orange 3
Yellow 4
EDIT: in regards to Edchum's answers, I have tried merge and join but each create somewhat strange tables. Instead of what I am looking for (as listed above) it will return something like this:
Teams Points Area Miles
Red 2 2 3
Green 1
Orange 3 1 2
Yellow 4
Use concat and pass param axis=1:
In [4]:
pd.concat([df1,df2], axis=1)
Out[4]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
join also works:
In [8]:
df1.join(df2)
Out[8]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
As does merge:
In [11]:
df1.merge(df2,left_index=True, right_index=True, how='left')
Out[11]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
EDIT
In the case where the indices do not align where for example your first df has index [0,1,2,3] and your second df has index [0,2] this will mean that the above operations will naturally align against the first df's index resulting in a NaN row for index row 1. To fix this you can reindex the second df either by calling reset_index() or assign directly like so: df2.index =[0,1].

Resources