Pandas Add Column Index Level to Data Frame - python-3.x

Given the following data frame:
d2=pd.DataFrame({'Item':['y','y','z','x'],
'other':['aa','bb','cc','dd']})
d2
Item other
0 y aa
1 y bb
2 z cc
3 x dd
I'd like to add a column index level 1 under the existing one (I think) because I want to join this data frame to another that is a multi-index.
I don't want to alter the other data frame because I have already written a lot of code assuming its current structure.
Thanks in advance!

IIUC you can add parameter append=True to set_index:
print (d2.set_index('Item', append=True))
other
Item
0 y aa
1 y bb
2 z cc
3 x dd

Related

How to turn a column of a data frame into suffixes for other column names? [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
Suppose I have a data frame like this:
A B C D
0 1 10 x 5
1 1 20 y 5
2 1 30 z 5
3 2 40 x 6
4 2 50 y 6
5 2 60 z 6
This, can be viewed, as a table that stores the value of B as a function of A, C, and D. Now, I would look like to transform the B column into three columns B_x, B_y, B_z, like this:
A B_x B_y B_z D
0 1 10 20 30 5
1 2 40 50 60 6
I.e., B_x stores B(A, D) when C = 'x', B_y stores B(A, D) when C = 'y', etc.
What is the most efficient way to do this?
I have a found a solution like this:
frames = []
for c, subframe in df.groupby('C'):
subframe = subframe.rename(columns={'B': f'B_{c}'})
subframe = subframe.set_index(['A', 'D'])
del subframe['C']
frames.append(subframe)
out = frames[0]
for frame in frames[1:]:
out = out.join(frame)
out = out.reset_index()
This gives the correct response, but I feel that it is highly inefficient. I am also not too happy with the fact that to implement this solution one would need to know which columns should not get the prefix in column C explicitly. (In this MWE there were only two of them, but there could be tens in real life.)
Is there a better solution? I.e., a method that says, take a column as a suffix column (in this case C) and a set of 'value' columns (in this case only B); turn the value column names into name_prefix and fill them appropriately?
Here's one way to do it:
import pandas as pd
df = pd.DataFrame( data = {'A':[1,1,1,2,2,2],
'B':[10,20,30,40,50,60],
'C':['x','y','z','x','y','z'],
'D':[5,5,5,6,6,6]})
df2 = df.pivot_table( index=['A','D'],
columns=['C'],
values=['B']
)
df2.columns = ['_'.join(col) for col in df2.columns.values]
df2 = df2.reset_index()

Pandas, DataFrame unique values from few columns [duplicate]

This question already has an answer here:
Get total values_count from a dataframe with Python Pandas
(1 answer)
Closed 4 years ago.
I am trying to count uniqiue values that are in few columns. My data frame looks like that:
Name Name.1 Name.2 Name.3
x z c y
y p q x
q p a y
Output should looks like below:
x 2
z 1
c 1
y 3
q 2
p 2
a 1
I used a groupby or count_values but couldn't get a correct output. Any ideas ? Thanks All !
Seems you want to consider values regardless of their row or column location. In that case you should collapse the dataframe and just use Counter.
from collections import Counter
arr = np.array(df)
count = Counter(arr.reshape(arr.size))
Another (Pandas-based) approach is to (Series) apply value_counts to multiple columns and then take the sum (column-wise)
df2 = df.apply(pd.Series.value_counts)
print(df2.sum(axis=1).astype(int)
a 1
c 1
p 2
q 2
x 2
y 3
z 1
dtype: int32

Replacing less occurring as Others

Replacing less occurring as Others.
In one of the column, i.e "Name" of the data frame say df, I have the data as below:
Sample Input:
Name
A
A
A
B
B
C
D
df['Name'].value_counts()
A 3
B 2
C 1
D 1
I would need output in below format.
Expected output:
A 3
B 2
Others 2
Any code in python3 is appreciated. Thanks in advance.
You need:
x = list(df['name'].value_counts()[:2].index) # This will fetch top N values
df['name'] = np.where(df['name'].isin(x), df['name'], 'others')
print(df['Name'].value_counts())
output
A 3
B 2
others 2

Assigning variables to cells in a Pandas table (Python)

I'm working on a script that takes test data from a website, assigns the data to a variable, then creates a pie chart of the responses for later analysis. I'm able to pull the data without a problem and format the information into a table, but I can't figure out how to assign a specific variable to a cell in the table.
For example, say question 1 had 20% of students answer A, 20% answer B, 30% answer C, and 30% answer D. I would like to take this information and assign it to the variables 1A for A, 1B, for B, etc.
I think the answer lies in this code. I've tried splitting columns and rows, but it looks like the column header doesn't correlate to the data below it. I'm also attaching the results of 'print(df)' below.
header = table.find_all('tr')[2]
cols = header.find_all('td')
cols = [ele.text.strip() for ele in cols]
cols = cols[0:3] + cols[4:8] + cols[9:]
df = pd.DataFrame(data, columns = cols)
print(df)
A/1 B/2 C/3 D/4 CORRECT MC ANSWER
0 6 84 1 9 B
1 6 1 91 2 C
2 12 1 14 72 D
3 77 3 11 9 A
4 82 7 8 2 A
Do you want try something like this with 'autopct'?
df1 = df.T.set_axis(['Question '+str(i+1) for i in df.T.columns.values], axis=1, inplace=False).iloc[:4]
ax = df1.plot.pie(subplots=True,autopct='%1.1f%%',layout=(5,1),figsize=(3,15),legend=False)

Python pandas: Weird index value

I have posted a similar thread but have now another angle to explore: After doing a covariance analysis between X and Z groupby 2 different levels, I get a DF like
index X Z
(1,1,'X') 2.3 0
...
'1' and '1' are the 2 different levels (I could have chosen '1' and '2'; there are 5 and 10 different levels)
Now I would like to extract each 'element' of the index and have something
index X Z H1 H2 H3
(1,1,'X') 2.3 0 1 1 X
...
I read few posts on slice and dice things - but this is not a normal string is it?
Cheers
(1,1,'X') isn't a string here, It's a tuple.
So you need to split the tuple into multiple columns. You can achieve this
by using apply(pandas.Series)
say your dataframe was df in this case.
df.apply(pandas.series)
In [10]: df['index'].apply(pd.Series)
Out[10]:
0 1 2 3
0 1 1 'X'
You need to add the columns back to original data frame so
df[['H1', 'H2','H3']] = df.apply(pandas.Series)

Resources