Split pandas columns into two with column MultiIndex - python-3.x

I need to split DataFrame columns into two and add an additional value to the new column. The twist is that I need to lift the original column names up one level and add two new column names.
Given a DataFrame h:
>>> import pandas as pd
>>> h = pd.DataFrame({'a': [0.6, 0.4, 0.1], 'b': [0.2, 0.4, 0.7]})
>>> h
a b
0 0.6 0.2
1 0.4 0.4
2 0.1 0.7
I need to lift the original column names up one level and add two new column names. The result should look like this:
>>> # some stuff...
a b
expected received expected received
0 0.6 1 0.2 1
1 0.4 1 0.4 1
2 0.1 1 0.7 1
I've tried this:
>>> h['a1'] = [1, 1, 1]
>>> h['b1'] = [1, 1, 1]
>>> t = [('f', 'expected'),('f', 'received'), ('g', 'expected'), ('g', 'received')]
>>> h.columns = pd.MultiIndex.from_tuples(t)
>>> h
f g
expected received expected received
0 0.6 0.2 1 1
1 0.4 0.4 1 1
2 0.1 0.7 1 1
This just renames the columns but does not align them properly. I think the issue is there's no link between a1 and b1 to the expected and received columns.
How do I lift the original column names up one level and add two new column names?

I am using concat with keys , then swaplevel
h1=h.copy()
h1[:]=1
pd.concat([h,h1],keys=['expected', 'received'],axis=1).\
swaplevel(0,1,axis=1).\
sort_index(level=0,axis=1)
Out[233]:
a b
expected received expected received
0 0.6 1.0 0.2 1.0
1 0.4 1.0 0.4 1.0
2 0.1 1.0 0.7 1.0

Related

How to use an integer list to find rows in pd.DataFrame with non-integer indices

How can I make this work?
import pandas as pd
L = [1,3,5]
df = pd.DataFrame([1,2,3,4,5,6,7], index=[0.1,0.2,0.3,0.4,0.5,0.6,0.7])
print(df[0])
print(df[0].loc(L))
I would like to have this output format:
0.2 2
0.4 4
0.6 6
I think that is .iloc
df.iloc[L]
Out[477]:
0
0.2 2
0.4 4
0.6 6

Pandas: Wide to long transformation: how to get the row and col numbers

Beginner question :
I have a matrix of lets say 3x3 and I want to convert it to the long format as follows :
Wide :
A B C
A 0.1 0.2 0.3
B 0.1 0.2 0.3
C 0.1 0.2 0.3
Long :
Col1 Col2 Row_num Col_num Value
0 A A 1 1 0.1
1 A B 1 2 0.2
2 A C 1 3 0.3
.
.
8 C C 3 3 0.3
I have tried various functions like melt, unstack(),wide_to_long but can't get the col number. What is the best way to do this ?
Thanks
Create data and unstack values
df = pd.DataFrame({'A': [0.1, 0.1, 0.1],
'B': [0.2, 0.2, 0.2],
'C': [0.3, 0.3, 0.3]},
index=['A', 'B', 'C'])
mapping = {col: idx for idx, col in enumerate(df.columns, 1)}
df = df.unstack().to_frame().reset_index()
df.columns = ['Col1', 'Col2', 'Value']
DataFrame
>>> df
Col1 Col2 Value
0 A A 0.1
1 A B 0.1
2 A C 0.1
3 B A 0.2
4 B B 0.2
5 B C 0.2
6 C A 0.3
7 C B 0.3
8 C C 0.3
Map remaining values
>>> df.assign(
Row_num=df['Col1'].map(mapping),
Col_num=df['Col2'].map(mapping)
)
Output
Col1 Col2 Value Row_num Col_num
0 A A 0.1 1 1
1 A B 0.1 1 2
2 A C 0.1 1 3
3 B A 0.2 2 1
4 B B 0.2 2 2
5 B C 0.2 2 3
6 C A 0.3 3 1
7 C B 0.3 3 2
8 C C 0.3 3 3
I'm sure there is a more efficient way to do this since my method involves two for loops but this is a quick and dirty way to transform the data like you're looking for:
# df is your initial dataframe
df = pd.DataFrame({"A": [1,1,1],
"B": [2,2,2],
"C": [3,3,3]},
index=["A","B","C"])
#long_rows will store the data we need for the new df
long_rows = []
# loop through each row
for i in range(len(df)):
#loop through each column
for j in range(len(df.columns)):
ind = list(df.index.values)[i]
col = list(df.columns.values)[j]
val = df.iloc[i,j]
row = [ind, col, i+1, j+1, val]
long_rows.append(row)
new_df = pd.DataFrame(long_rows, columns=["Col1", "Col2", "Row1", "Row2", "Value"])
and the result:
new_df
Col1 Col2 Row1 Row2 Value
0 A A 1 1 1
1 A B 1 2 2
2 A C 1 3 3
3 B A 2 1 1
4 B B 2 2 2
5 B C 2 3 3
6 C A 3 1 1
7 C B 3 2 2
8 C C 3 3 3

How do I get nlargest rows without the sorting?

I need to extract the n-smallest rows of a pandas df, but it is very important to me to maintain the original order of rows.
code example:
import pandas as pd
df = pd.DataFrame({
'a': [1, 10, 8, 11, -1],
'b': list('abdce'),
'c': [1.0, 2.0, 1.5, 3.0, 4.0]})
df.nsmallest(3, 'a')
Gives:
a b c
4 -1 e 4.0
0 1 a 1.0
2 8 d 1.5
I need:
a b c
0 1 a 1.0
2 8 d 1.5
4 -1 e 4.0
Any ideas how to do that?
PS! In my real example, the index is not sorted/sortable as they are strings (names).
Simplest approach assuming index was sorted in the beginning
df.nsmallest(3, 'a').sort_index()
a b c
0 1 a 1.0
2 8 d 1.5
4 -1 e 4.0
Alternatively with np.argpartition and iloc
This doesn't depend on sorting the index.emphasized text
df.iloc[np.sort(df.a.values.argpartition(3)[:3])]
a b c
0 1 a 1.0
2 8 d 1.5
4 -1 e 4.0

How to replace selected rows of pandas dataframe with a np array, sequentially?

I have a pandas dataframe
A B C
0 NaN 2 6
1 3.0 4 0
2 NaN 0 4
3 NaN 1 2
where I have a column A that has NaN values in some rows (not necessarily consecutive).
I want to replace these values not with a constant value (which pd.fillna does), but rather with the values from a numpy array.
So the desired outcome is:
A B C
0 1.0 2 6
1 3.0 4 0
2 5.0 0 4
3 7.0 1 2
I'm not sure the .replace method will help here as well, since that seems to replace value <-> value via dictionary. Whereas here I want to sequentially change NaN to its corresponding value (by index) in the np array.
I tried:
MWE:
huh = pd.DataFrame([[np.nan, 2, 6],
[3, 4, 0],
[np.nan, 0, 4],
[np.nan, 1, 2]],
columns=list('ABC'))
huh.A[huh.A.isnull()] = np.array([1,5,7]) # what i want to do, but this gives error
gives the error
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
'''
I read the docs but I can't understand how to do this with .loc.
How do I do this properly, preferably without a for loop?
Other info:
The number of elements in the np array will always match the number of NaN in the dataframe, so your answer does not need to check for this.
You are really close, need DataFrame.loc for avoid chained assignments:
huh.loc[huh.A.isnull(), 'A'] = np.array([1,5,7])
print (huh)
A B C
0 1.0 2 6
1 3.0 4 0
2 5.0 0 4
3 7.0 1 2
zip
This should account for uneven lengths
m = huh.A.isna()
a = np.array([1, 5, 7])
s = pd.Series(dict(zip(huh.index[m], a)))
huh.fillna({'A': s})
A B C
0 1.0 2 6
1 3.0 4 0
2 5.0 0 4
3 7.0 1 2

Python Pandas Merge data from different Dataframes on specific index and create new one

My code is given below: I have two data frames a,b. I want to create a new data frame c by merging a specific index data of a, b frames.
import pandas as pd
a = [10,20,30,40,50,60]
b = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
a = pd.DataFrame(a,columns=['Voltage'])
b = pd.DataFrame(b,columns=['Current'])
c = pd.merge(a,b,left_index=True, right_index=True)
print(c)
The actual output is:
Voltage Current
0 10 0.1
1 20 0.2
2 30 0.3
3 40 0.4
4 50 0.5
5 60 0.6
I don't want all the rows. But, specific index rows something like:
c = Voltage Current
0 30 0.3
1 40 0.4
How to modify c = pd.merge(a,b,left_index=True, right_index=True) code so that, I only want those specific third and fourth rows in c with new index order as given above?
Use iloc for select rows by positions and add reset_index with drop=True for default index in both DataFrames:
Solution1 with concat:
c = pd.concat([a.iloc[2:4].reset_index(drop=True),
b.iloc[2:4].reset_index(drop=True)], axis=1)
Or use merge:
c = pd.merge(a.iloc[2:4].reset_index(drop=True),
b.iloc[2:4].reset_index(drop=True),
left_index=True,
right_index=True)
print(c)
Voltage Current
0 30 0.3
1 40 0.4

Resources