Merging two sheets of one excel into single sheet - excel

I am trying to merge 2 sheets from excel.xlsx using python script. I want when sheet1('CLASS') matches to sheet2('C_MAP') then merge DSC and ASC after CLASS in sheet1 or in a new sheet.
To clarify it i am attaching my excel sheets.
this is my Sheet1:
P_MAP Q_GROUP CLASS
0 ram 2 pink
1 4 silver
2 sham 5 green
3 0 default
4 nil 2 pink
it contains P_MAP,Q_GROUP,CLASS
this is my Sheet2:
C_MAP DSC ASC
0 pink h1 match
1 green h2 match
2 silver h3 match
it contains C_MAP,ASC,DSC
So, I want when the CLASS matches to C_MAP it should add ASC and DSC and if it doesnt match add NA.
The output i want will be like this:
P_MAP Q_GROUP CLASS DSC ASC
0 ram 2 pink h1 match
1 4 silver h3 match
2 sham 5 green h2 match
3 0 default 0 NA
4 nil 2 pink h1 match

What you want is pd.merge:
df1 = pd.read_excel('filename.xlsx', sheet_name='Sheet1') # fill in the correct excel filename
df2 = pd.read_excel('filename.xlsx', sheet_name='Sheet2') # fill in the correct excel filename
df_final = df1.merge(df2,
left_on='CLASS',
right_on='C_MAP',
how='left').drop('C_MAP', axis=1)
df_final.to_excel('filename2.xlsx')
Output
P_MAP Q_GROUP CLASS DSC ASC
0 ram 2 pink h1 match
1 4 silver h3 match
2 sham 5 green h2 match
3 0 default NaN NaN
4 nil 2 pink h1 match

Related

Pandas create a new data frame from counting rows into columns

I have something like this data frame:
item color
0 A red
1 A red
2 A green
3 B red
4 B green
5 B green
6 C red
7 C green
And I want to count the times a color repeat for each item and group-by it into columns like this:
item red green
0 A 2 1
1 B 1 2
2 C 1 1
Any though? Thanks in advance

Filter rows based on the count of unique values

I need to count the unique values of column A and filter out the column with values greater than say 2
A C
Apple 4
Orange 5
Apple 3
Mango 5
Orange 1
I have calculated the unique values but not able to figure out how to filer them df.value_count()
I want to filter column A that have greater than 2, expected Dataframe
A B
Apple 4
Orange 5
Apple 3
Orange 1
value_counts should be called on a Series (single column) rather than a DataFrame:
counts = df['A'].value_counts()
Giving:
A
Apple 2
Mango 1
Orange 2
dtype: int64
You can then filter this to only keep those >= 2 and use isin to filter your DataFrame:
filtered = counts[counts >= 2]
df[df['A'].isin(filtered.index)]
Giving:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1
Use duplicated with parameter keep=False:
df[df.duplicated(['A'], keep=False)]
Output:
A C
0 Apple 4
1 Orange 5
2 Apple 3
4 Orange 1

Pandas dependent columns lookup

I have a dataset that has 2 conditions, 2 replicates and samples with corresponding values (amounts). I read this into a pandas dataframe:
condition replicate sample amount
0 1 1 a1 5
1 1 1 a2 2
2 1 2 a1 3
3 1 2 a2 1
4 2 1 b99 7
5 2 1 a2 4
6 2 2 a1 3
7 2 2 a2 2
I want to divide the amount from every sample in condition 1, by the amount from the corresponding sample in condition 2, if they belong to the same replicate (and have the same sample name).
In other words, I want to find the ratio between the amounts where the sample names and replicate numbers match between the conditions.
In this example, the output should be something like:
replicate sample amount
0 1 a1 0.714286
1 1 a2 NaN
2 2 a1 1.000000
3 2 a2 0.500000
I need advice if I should structure my data differently and if it is a good idea to go for pandas dataframes? Can anyone think of an elegant lookup solution?
You can use unstack for columns by conditions, then divide columns and last remove all NaNs rows by dropna:
df = df.set_index(['sample','replicate','condition'])['amount'].unstack()
df['new'] = df[1].div(df[2])
df = df['new'].unstack().dropna(how='all').stack(dropna=False).reset_index(name='amount')
print (df)
sample replicate amount
0 a1 1 NaN
1 a1 2 1.0
2 a2 1 0.5
3 a2 2 0.5

using index match with sum if

I need to link up a sumif() with an index match (i'm guessing here) but don't really know where to start.
Basically i a table with different classes of pets, their species and quantity. there are 3 stores. I need an output where i can get the quantity of each species from each store dynamically.
data table:
"A1" Pet Stores
Species Class a b c
cat Fluffy1 1 0 0
cat Fluffy2 3 0 0
cat Fluffy3 5 7 1
cat Fluffy4 6 0 7
dog Barky1 7 6 9
dog Barky2 1 3 9
dog Barky3 0 2 8
dog Barky4 0 2 3
fish Swimmy1 0 0 0
fish Swimmy2 1 3 0
fish Swimmy3 0 2 3
fish Swimmy4 0 0 0
Output:
Pet Store a <--change this
cat 15 <--output
dog 8 <--output
fish 1 <--output
right now my formula for "cat" is =SUMIF($A$3:$A$14,A17,$C$3:$C$14). however, it only looks down the 1 column that i've set. how do i change it such that it searches for the "Pet Store" and returns sum of the respective column?
How about this:
Formula in cell H3 copied down is
=SUMIF($A$2:$A$13,G3,INDEX($C$2:$E$13,,MATCH(H$2,$C$1:$E$1,0)))
Slightly shorter that #teylyn's version:
=SUMIF(A$2:A$13,A16,OFFSET(C$2:C$13,,CODE(B$15)-97))
but less versatile as it relies on the shop names being coded (which however is as in the example and makes sense for column label purposes):
However my preference would be for a PivotTable:

Append Two Dataframes Together (Pandas, Python3)

I am trying to append/join(?) two different dataframes together that don't share any overlapping data.
DF1 looks like
Teams Points
Red 2
Green 1
Orange 3
Yellow 4
....
Brown 6
and DF2 looks like
Area Miles
2 3
1 2
....
7 12
I am trying to append these together using
bigdata = df1.append(df2,ignore_index = True).reset_index()
but I get this
Teams Points
Red 2
Green 1
Orange 3
Yellow 4
Area Miles
2 3
1 2
How do I get something like this?
Teams Points Area Miles
Red 2 2 3
Green 1 1 2
Orange 3
Yellow 4
EDIT: in regards to Edchum's answers, I have tried merge and join but each create somewhat strange tables. Instead of what I am looking for (as listed above) it will return something like this:
Teams Points Area Miles
Red 2 2 3
Green 1
Orange 3 1 2
Yellow 4
Use concat and pass param axis=1:
In [4]:
pd.concat([df1,df2], axis=1)
Out[4]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
join also works:
In [8]:
df1.join(df2)
Out[8]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
As does merge:
In [11]:
df1.merge(df2,left_index=True, right_index=True, how='left')
Out[11]:
Teams Points Area Miles
0 Red 2 2 3
1 Green 1 1 2
2 Orange 3 NaN NaN
3 Yellow 4 NaN NaN
EDIT
In the case where the indices do not align where for example your first df has index [0,1,2,3] and your second df has index [0,2] this will mean that the above operations will naturally align against the first df's index resulting in a NaN row for index row 1. To fix this you can reindex the second df either by calling reset_index() or assign directly like so: df2.index =[0,1].

Resources