Given the following data frame and pivot table:
import pandas as pd
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
D
A B C
a x a 7
b 4
y a 1
b 5
z a 3
b x a 5
y b 3
z a 1
b 6
I'd like to create a heat map with divisions per indices A and B like this:
Is it possible?
You can use Styler in jupyter notebook, see docs and notebook:
import seaborn as sns
import pandas as pd
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
cm = sns.light_palette("blue", as_cmap=True)
s = df.reset_index().style.background_gradient(cmap=cm)
s
Related
import pandas as pd
import numpy as np
np.random.seed(0)
idx = pd.MultiIndex.from_product([list('BAC'), list('AB'), ["high", "low"]], names=['level_0', 'level_1', 'level_2'])
df = pd.DataFrame(np.random.randn(2, len(idx)), columns=idx)
I am trying to select the column with: level_0 = B and A, level_1 = B, and level_2 = high.
I can select columns with
list = ["A", "B"]
df.reindex(columns=list, level=0)
But I can't figure out how to add the next level slice.
pd.IndexSlice comes to your rescue
In [21]: df.loc[:,pd.IndexSlice[(["A","B"],"B","high")]]
Out[21]:
level_0 B A
level_1 B B
level_2 high high
0 0.978738 0.950088
1 0.443863 0.313068
I have a pandas dataframe as below:
import pandas as pd
import numpy as np
import datetime
# intialise data of lists.
data = {'A' :[1,1,1,1,2,2,2,2],
'B' :[2,3,1,5,7,7,1,6]}
# Create DataFrame
df = pd.DataFrame(data)
df
I want to sort 'B' by each group of 'A'
Expected Output:
A B
0 1 1
1 1 2
2 1 3
3 1 5
4 2 1
5 2 6
6 2 7
7 2 7
You can sort a dataframe using the sort_values command. This command will sort your dataframe with priority on A and then B as requested.
df.sort_values(by=['A', 'B'])
Docs
I have a test excel file like:
df = pd.DataFrame({'name':list('abcdefg'),
'age':[10,20,5,23,58,4,6]})
print (df)
name age
0 a 10
1 b 20
2 c 5
3 d 23
4 e 58
5 f 4
6 g 6
I use Pandas and matplotlib to read and plot it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
excel_file = 'test.xlsx'
df = pd.read_excel(excel_file, sheet_name=0)
df.plot(kind="bar")
plt.show()
the result shows:
it use index number as item name, how can I change it to the name, which stored in column name?
You can specify columns for x and y values in plot.bar:
df.plot(x='name', y='age', kind="bar")
Or create Series first by DataFrame.set_index and select age column:
df.set_index('name')['age'].plot(kind="bar")
#if multiple columns
#df.set_index('name').plot(kind="bar")
Given the following data frame:
import numpy as np
import pandas as pd
df = pd.DataFrame({'Site':['a','a','a','b','b','b'],
'x':[1,1,0,1,np.nan,0],
'y':[1,np.nan,0,1,1,0]
})
df
Site y x
0 a 1.0 1
1 a NaN 1
2 a 0.0 0
3 b 1.0 1
4 b 1.0 NaN
5 b 0.0 0
I'd like to pivot this data frame to get the count of values (excluding "NaN") for each column.
I tried what I found in other posts, but nothing seems to work (maybe there was a change in pandas 0.18)?
Desired result:
Item count
Site
a y 2
b y 3
a x 3
b x 2
Thanks in advance!
pvt = pd.pivot_table(df, index = "Site", values = ["x", "y"], aggfunc = "count").stack().reset_index(level = 1)
pvt.columns = ["Item", "count"]
pvt
Out[38]:
Item count
Site
a x 3
a y 2
b x 2
b y 3
You can add pvt.sort_values("Item", ascending = False) if you want y's to appear first.
Given the following pivot table:
df=pd.DataFrame({'A':['a','a','a','a','a','b','b','b','b'],
'B':['x','y','z','x','y','z','x','y','z'],
'C':['a','b','a','b','a','b','a','b','a'],
'D':[7,5,3,4,1,6,5,3,1]})
table = pd.pivot_table(df, index=['A', 'B','C'],aggfunc='sum')
table
D
A B C
a x a 7
b 4
y a 1
b 5
z a 3
b x a 5
y b 3
z a 1
b 6
I'd like to create a horizontal bar chart which preserves the hierarchical layout of the indices.
Currently, if I do this:
%matplotlib inline
a=table.plot(kind='barh')
a.show()
I get this:
But what I really want is something like this:
well it preserves hierarchy, but it's not exactly what you've plotted as your desired graph:
orig_index = table.index
idx = (a.apply(lambda row: '{} {} {}'.format(
row['a'] if a.shift(1).ix[row.name, 'a'] != row['a'] else ' ',
row['b'] if a.shift(1).ix[row.name, 'b'] != row['b'] else ' ',
row['c']), axis=1)
)
table.index = idx[::-1]
table.plot.barh()