How to plot columns from a dataframe, as subplots - python-3.x

What am I doing wrong here? I want to create for new dataframe from df and use Dates as the x-axis in a line chart for each newly created dataframe (Emins, FTSE, Stoxx and Nikkei).
I have a dataframe called df that I created from data.xlsx and it looks like this:
Dates ES1 Z 1 VG1 NK1
0 2005-01-04 -0.0126 0.0077 -0.0030 0.0052
1 2005-01-05 -0.0065 -0.0057 0.0007 -0.0095
2 2005-01-06 0.0042 0.0017 0.0051 0.0044
3 2005-01-07 -0.0017 0.0061 0.0010 -0.0009
4 2005-01-11 -0.0065 -0.0040 -0.0147 0.0070
3670 2020-09-16 -0.0046 -0.0065 -0.0003 -0.0009
3671 2020-09-17 -0.0083 -0.0034 -0.0039 -0.0086
3672 2020-09-18 -0.0024 -0.0009 -0.0009 0.0052
3673 2020-09-23 -0.0206 0.0102 0.0022 -0.0013
3674 2020-09-24 0.0021 -0.0136 -0.0073 -0.0116
From df I created 4 new dataframes called Eminis, FTSE, Stoxx and Nikkei.
Thanks for your help!!!!
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('classic')
df = pd.read_excel('data.xlsx')
df = df.rename(columns={'Dates':'Date','ES1': 'Eminis', 'Z 1': 'FTSE','VG1': 'Stoxx','NK1': 'Nikkei','TY1': 'Notes','G 1': 'Gilts', 'RX1': 'Bunds','JB1': 'JGBS','CL1': 'Oil','HG1': 'Copper','S 1': 'Soybeans','GC1': 'Gold','WILLTIPS': 'TIPS'})
headers = df.columns
Eminis = df[['Date','Eminis']]
FTSE = df[['Date','FTSE']]
Stoxx = df[['Date','Stoxx']]
Nikkei = df[['Date','Nikkei']]
# create multiple plots via plt.subplots(rows,columns)
fig, axes = plt.subplots(2,2, figsize=(20,15))
x = Date
y1 = Eminis
y2 = Notes
y3 = Stoxx
y4 = Nikkei
# one plot on each subplot
axes[0][0].line(x,y1)
axes[0][1].line(x,y2)
axes[1][0].line(x,y3)
axes[1][1].line(x,y4)
plt.legends()
plt.show()

As elegant solution is to:
Set Dates column in your DataFrame as the index.
Create a figure with the required number of subplots
(in your case 4), calling plt.subplots.
Draw a plot from your DataFrame, passing:
ax - the ax result from subplots (here it is an array of Axes
objects, not a single Axes),
subplots=True - to draw each column in a separate
subplot.
The code to do it is:
fig, a = plt.subplots(2, 2, figsize=(12, 6), tight_layout=True)
df.plot(ax=a, subplots=True, rot=60);
To test the above code I created the following DataFrame:
np.random.seed(1)
ind = pd.date_range('2005-01-01', '2006-12-31', freq='7D')
df = pd.DataFrame(np.random.rand(ind.size, 4),
index=ind, columns=['ES1', 'Z 1', 'VG1', 'NK1'])
and got the following picture:
As my test data are random, I assumed "7 days" frequency, to
have the picture not much "cluttered".
In the case of your real data, consider e.g. resampling with
e.g. also '7D' frequency and mean() aggregation function.

I think the more succinct option is not to make many dataframes, which creates unnecessary work, and complexity.
Plotting data is about shaping the dataframe for the plot API
In this case, a better option is to convert the dataframe to a long (tidy) format, from a wide format, using .stack.
This places all the labels in one column, and the values in another column
Use seaborn.relplot, which can create a FacetGrid from a dataframe in a long format.
seaborn is a high-level API for matplotlib, and makes plotting much easier.
If the dataframe contains many stocks, but only a few are to be plotted, they can be selected with Boolean indexing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import data from excel, or setup test dataframe
data = {'Dates': ['2005-01-04', '2005-01-05', '2005-01-06', '2005-01-07', '2005-01-11', '2020-09-16', '2020-09-17', '2020-09-18', '2020-09-23', '2020-09-24'],
'ES1': [-0.0126, -0.0065, 0.0042, -0.0017, -0.0065, -0.0046, -0.0083, -0.0024, -0.0206, 0.0021],
'Z 1': [0.0077, -0.0057, 0.0017, 0.0061, -0.004, -0.0065, -0.0034, -0.0009, 0.0102, -0.0136],
'VG1': [-0.003, 0.0007, 0.0051, 0.001, -0.0147, -0.0003, -0.0039, -0.0009, 0.0022, -0.0073],
'NK1': [0.0052, -0.0095, 0.0044, -0.0009, 0.007, -0.0009, -0.0086, 0.0052, -0.0013, -0.0116]}
df = pd.DataFrame(data)
# rename columns
df = df.rename(columns={'Dates':'Date','ES1': 'Eminis', 'Z 1': 'FTSE','VG1': 'Stoxx','NK1': 'Nikkei'})
# set Date to a datetime
df.Date = pd.to_datetime(df.Date)
# set Date as the index
df.set_index('Date', inplace=True)
# stack the dataframe
dfs = df.stack().reset_index().rename(columns={'level_1': 'Stock', 0: 'val'})
# to select only a subset of values from Stock, to plot, select them with Boolean indexing
df_select = dfs[dfs.Stock.isin(['Eminis', 'FTSE', 'Stoxx', 'Nikkei'])]`
# df_select.head()
Date Stock val
0 2005-01-04 Eminis -0.0126
1 2005-01-04 FTSE 0.0077
2 2005-01-04 Stoxx -0.0030
3 2005-01-04 Nikkei 0.0052
4 2005-01-05 Eminis -0.0065
# plot
sns.relplot(data=df_select, x='Date', y='val', col='Stock', col_wrap=2, kind='line')
What am I doing wrong here?
The current implementation is inefficient, has a number of incorrect method calls, and undefined variables.
Date is not defined for x = Date
y2 = Notes: Notes is not defined
.line is not a plt method and causes an AttributeError; it should be plt.plot
y1 - y4 are DataFrames, but passed to the plot method for the y-axis, which causes TypeError: unhashable type: 'numpy.ndarray'; one column should be passes as y.
.legends is not a method; it's .legend
The legend must be shown for each subplot, if one is desired.
Eminis = df[['Date','Eminis']]
FTSE = df[['Date','FTSE']]
Stoxx = df[['Date','Stoxx']]
Nikkei = df[['Date','Nikkei']]
# create multiple plots via plt.subplots(rows,columns)
fig, axes = plt.subplots(2,2, figsize=(20,15))
x = df.Date
y1 = Eminis.Eminis
y2 = FTSE.FTSE
y3 = Stoxx.Stoxx
y4 = Nikkei.Nikkei
# one plot on each subplot
axes[0][0].plot(x,y1, label='Eminis')
axes[0][0].legend()
axes[0][1].plot(x,y2, label='FTSE')
axes[0][1].legend()
axes[1][0].plot(x,y3, label='Stoxx')
axes[1][0].legend()
axes[1][1].plot(x,y4, label='Nikkei')
axes[1][1].legend()
plt.show()

Related

How to plot subplots from a condition applied on a single column and the data available on another single column of the same dataframe?

I have a dataframe which is much like the one following:
data = {'A':[21,22,23,24,25,26,27,28,29,30,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10],
'B':[8,8,8,8,8,8,8,8,8,8,5,5,5,5,5,5,5,5,5,5,3,3,3,3,3,3,3,3,3,3],
'C':[10,15,23,17,18,26,24,30,35,42,44,42,38,36,34,30,27,25,27,24,1,0,2,3,5,26,30,40,42,50]}
data_df = pd.DataFrame(data)
data_df
I would like to have the subplots, the number of subplots should be equal to number of unique values of column 'B'. X axis = Values in column 'A' and Y axis = values in Column 'C'.
The code that I tried:
fig = px.line(data_df,
x='A',
y='C',
color='B',
facet_col = 'B',
)
fig.show()
gives output like
However, I would like to have the graphs in a single column, each graph autoscaled to the relevant area and resolution on the axes.
Possibility: Can I somehow make use of groupby command to do it?
Since I may have other number of unique values in column 'B' (for example 5 unique values) based on other data, I would like to have this piece of code to work dynamic. Kindly help me.
PS: plotly express module is used to plot the graph.
In order to stack all subplot in one column, and make sure that each xaxis is independent, just add the following in your px.line() call:
facet_col_wrap=1
And then follow up with:
fig.update_xaxes(matches=None)
Plot 1: Default setup with px.line(facet_col = 'B')
If you'd like to display all x-axis labels just include this:
fig.update_xaxes(showticklabels = True)
Plot 2: Show x-axes for all subplots
Complete code:
import plotly.express as px
import pandas as pd
data = {'A':[21,22,23,24,25,26,27,28,29,30,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10],
'B':[8,8,8,8,8,8,8,8,8,8,5,5,5,5,5,5,5,5,5,5,3,3,3,3,3,3,3,3,3,3],
'C':[10,15,23,17,18,26,24,30,35,42,44,42,38,36,34,30,27,25,27,24,1,0,2,3,5,26,30,40,42,50]}
data_df = pd.DataFrame(data)
data_df
fig = px.line(data_df,
x='A',
y='C',
color='B',
facet_col = 'B',
facet_col_wrap=1
)
fig.update_xaxes(matches=None, showticklabels = True)
fig.show()
You can instead use the argument facet_row = 'B' which will automatically stack the subplots by rows. Then to automatically rescale, you'll want to set all of the x data to the same array of values, which can be done by looping through fig.data and modifying fig.data[i]['x'] for each i.
import pandas as pd
import plotly.express as px
data = {'A':[21,22,23,24,25,26,27,28,29,30,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10],
'B':[8,8,8,8,8,8,8,8,8,8,5,5,5,5,5,5,5,5,5,5,3,3,3,3,3,3,3,3,3,3],
'C':[10,15,23,17,18,26,24,30,35,42,44,42,38,36,34,30,27,25,27,24,1,0,2,3,5,26,30,40,42,50]}
data_df = pd.DataFrame(data)
fig = px.line(data_df,
x='A',
y='C',
color='B',
facet_row = 'B',
)
for fig_data in fig.data:
fig_data['x'] = list(range(len(fig_data['y'])))
fig.show()

Python - Add new curve from a df into existing lineplot

I create a plot using sns base on a DafaFrame.
Now, I would like to add new curve from another dataframe on the plot created previusly.
This is the code of my plot:
tline = sns.lineplot(x='reads', y='time', data=df, hue='method', style='method', markers=True, dashes=False, ax=axs[0, 0])
tline.set_xlabel('Numero di reads')
tline.set_ylabel ('Time [s]')
tline.legend(loc='lower right')
tline.set_yscale('log')
tline.autoscale(enable=True, axis='x')
tline.autoscale(enable=True, axis='y')
Now I have another Dataframe with the same column of the first DataFrame. How can I add this new curve with a custom entry in the legend?
This is the structure of the DataFrame:
Dataset
Method
Reads
Time
Peak-memory
14M
Set
14000000
7.33
1035204
20K
Set
200000
0.38
107464
200K
Set
20000
0.07
42936
2M
Set
28428648
16.09
2347740
28M
Set
2000000
1.41
240240
I suggest to use matplotlibs OOP interface like this
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
# generate sample data
time_column = np.arange(10)
data_column1 = np.random.randint(0, 10, 10)
data_column2 = np.random.randint(0, 10, 10)
# store in pandas dfs
df1 = pd.DataFrame(zip(time_column, data_column1), columns=['Time', 'Data'])
df2 = pd.DataFrame(zip(time_column, data_column2), columns=['Time', 'Data'])
f, ax = plt.subplots()
sns.lineplot(df1.Time, df1.Data, label='foo', ax=ax)
sns.lineplot(df2.Time, df2.Data, label='bar', ax=ax)
ax.legend()
plt.show()
which generates the following output
the important thing is that both lineplots are on the same subplot (ax in this case).

How to plot multi-index, categorical data?

Given the following data:
DC,Mode,Mod,Ven,TY1,TY2,TY3,TY4,TY5,TY6,TY7,TY8
Intra,S,Dir,C1,False,False,False,False,False,True,True,False
Intra,S,Co,C1,False,False,False,False,False,False,False,False
Intra,M,Dir,C1,False,False,False,False,False,False,True,False
Inter,S,Co,C1,False,False,False,False,False,False,False,False
Intra,S,Dir,C2,False,True,True,True,True,True,True,False
Intra,S,Co,C2,False,False,False,False,False,False,False,False
Intra,M,Dir,C2,False,False,False,False,False,False,False,False
Inter,S,Co,C2,False,False,False,False,False,False,False,False
Intra,S,Dir,C3,False,False,False,False,True,True,False,False
Intra,S,Co,C3,False,False,False,False,False,False,False,False
Intra,M,Dir,C3,False,False,False,False,False,False,False,False
Inter,S,Co,C3,False,False,False,False,False,False,False,False
Intra,S,Dir,C4,False,False,False,False,False,True,False,True
Intra,S,Co,C4,True,True,True,True,False,True,False,True
Intra,M,Dir,C4,False,False,False,False,False,True,False,True
Inter,S,Co,C4,True,True,True,False,False,True,False,True
Intra,S,Dir,C5,True,True,False,False,False,False,False,False
Intra,S,Co,C5,False,False,False,False,False,False,False,False
Intra,M,Dir,C5,True,True,False,False,False,False,False,False
Inter,S,Co,C5,False,False,False,False,False,False,False,False
Imports:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
To reproduce my DataFrame, copy the data then use:
df = pd.read_clipboard(sep=',')
I'd like to create a plot conveying the same information as my example, but not necessarily with the same shape (I'm open to suggestions). I'd also like to hover over the color and have the appropriate Ven displayed (e.g. C1, not 1).:
Edit 2018-10-17:
The two solutions provided so far, are helpful and each accomplish a different aspect of what I'm looking for. However, the key issue I'd like to resolve, which wasn't explicitly stated prior to this edit, is the following:
I would like to perform the plotting without converting Ven to an int; this numeric transformation isn't practical with the real data. So the actual scope of the question is to plot all categorical data with two categorical axes.
The issue I'm experiencing is the data is categorical and the y-axis is multi-indexed.
I've done the following to transform the DataFrame:
# replace False witn nan
df = df.replace(False, np.nan)
# replace True with a number representing Ven (e.g. C1 = 1)
def rep_ven(row):
return row.iloc[4:].replace(True, int(row.Ven[1]))
df.iloc[:, 4:] = df.apply(rep_ven, axis=1)
# drop the Ven column
df = df.drop(columns=['Ven'])
# set multi-index
df_m = df.set_index(['DC', 'Mode', 'Mod'])
Plotting the transformed DataFrame produces:
plt.figure(figsize=(20,10))
heatmap = plt.imshow(df_m)
plt.xticks(range(len(df_m.columns.values)), df_m.columns.values)
plt.yticks(range(len(df_m.index)), df_m.index)
plt.show()
This plot isn't very streamlined, there are four axis values for each Ven. This is a subset of data, so the graph would be very long with all the data.
Here's my solution. Instead of plotting I just apply a style to the DataFrame, see https://pandas.pydata.org/pandas-docs/stable/style.html
# Transform Ven values from "C1", "C2" to 1, 2, ..
df['Ven'] = df['Ven'].str[1]
# Given a specific combination of dc, mode, mod, ven,
# do we have any True cells?
g = df.groupby(['DC', 'Mode', 'Mod', 'Ven']).any()
# Let's drop any rows with only False values
g = g[g.any(axis=1)]
# Convert True, False to 1, 0
g = g.astype(int)
# Get the values of the ven index as an int array
# Note: we don't want to drop the ven index!!
# Otherwise styling won't work
ven = g.index.get_level_values('Ven').values.astype(int)
# Multiply 1 and 0 with Ven value
g = g.mul(ven, axis=0)
# Sort the index
g.sort_index(ascending=False, inplace=True)
# Now display the dataframe with styling
# first we get a color map
import matplotlib
cmap = matplotlib.cm.get_cmap('tab10')
def apply_color_map(val):
# hide the 0 values
if val == 0:
return 'color: white; background-color: white'
else:
# for non-zero: get color from cmap, convert to hexcode for css
s = "color:white; background-color: " + matplotlib.colors.rgb2hex(cmap(val))
return s
g
g.style.applymap(apply_color_map)
The available matplotlib colormaps can be seen here: Colormap reference, with some additional explanation here: Choosing a colormap
Explanation: Remove rows where TY1-TY8 are all nan to create your plot. Refer to this answer as a starting point for creating interactive annotations to display Ven.
The below code should work:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_clipboard(sep=',')
# replace False witn nan
df = df.replace(False, np.nan)
# replace True with a number representing Ven (e.g. C1 = 1)
def rep_ven(row):
return row.iloc[4:].replace(True, int(row.Ven[1]))
df.iloc[:, 4:] = df.apply(rep_ven, axis=1)
# drop the Ven column
df = df.drop(columns=['Ven'])
idx = df[['TY1','TY2', 'TY3', 'TY4','TY5','TY6','TY7','TY8']].dropna(thresh=1).index.values
df = df.loc[idx,:].sort_values(by=['DC', 'Mode','Mod'], ascending=False)
# set multi-index
df_m = df.set_index(['DC', 'Mode', 'Mod'])
plt.figure(figsize=(20,10))
heatmap = plt.imshow(df_m)
plt.xticks(range(len(df_m.columns.values)), df_m.columns.values)
plt.yticks(range(len(df_m.index)), df_m.index)
plt.show()

Seaborn barplot with two y-axis

considering the following pandas DataFrame:
labels values_a values_b values_x values_y
0 date1 1 3 150 170
1 date2 2 6 200 180
It is easy to plot this with Seaborn (see example code below). However, due to the big difference between values_a/values_b and values_x/values_y, the bars for values_a and values_b are not easily visible (actually, the dataset given above is just a sample and in my real dataset the difference is even bigger). Therefore, I would like to use two y-axis, i.e., one y-axis for values_a/values_b and one for values_x/values_y. I tried to use plt.twinx() to get a second axis but unfortunately, the plot shows only two bars for values_x and values_y, even though there are at least two y-axis with the right scaling. :) Do you have an idea how to fix that and get four bars for each label whereas the values_a/values_b bars relate to the left y-axis and the values_x/values_y bars relate to the right y-axis?
Thanks in advance!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
# working example but with unreadable values_a and values_b
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted)
plt.show()
# values_a and values_b are not displayed
values1_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_a", "values_b"],\
var_name="source1", value_name="value_numbers1")
values2_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_x", "values_y"],\
var_name="source2", value_name="value_numbers2")
g1 = sns.barplot(x=columns[0], y="value_numbers1", hue="source1",\
data=values1_melted)
ax2 = plt.twinx()
g2 = sns.barplot(x=columns[0], y="value_numbers2", hue="source2",\
data=values2_melted, ax=ax2)
plt.show()
This is probably best suited for multiple sub-plots, but if you are truly set on a single plot, you can scale the data before plotting, create another axis and then modify the tick values.
Sample Data
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
Code:
# Scale the data, just a simple example of how you might determine the scaling
mask = test_data_melted.source.isin(['values_a', 'values_b'])
scale = int(test_data_melted[~mask].value_numbers.mean()
/test_data_melted[mask].value_numbers.mean())
test_data_melted.loc[mask, 'value_numbers'] = test_data_melted.loc[mask, 'value_numbers']*scale
# Plot
fig, ax1 = plt.subplots()
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted, ax=ax1)
# Create a second y-axis with the scaled ticks
ax1.set_ylabel('X and Y')
ax2 = ax1.twinx()
# Ensure ticks occur at the same positions, then modify labels
ax2.set_ylim(ax1.get_ylim())
ax2.set_yticklabels(np.round(ax1.get_yticks()/scale,1))
ax2.set_ylabel('A and B')
plt.show()

Plotting a timeseris graph from pandas dataframe using matplotlib

I have the following data in a csv file
SourceID BSs hour Type
7208 87 11 MAIN
11060 67 11 MAIN
3737 88 11 MAIN
9683 69 11 MAIN
I have the following python code.I want to plot a graph with the following specifications.
For each SourceID and Type I want to plot a graph of BSs over time. I would prefer if each SourceID and Type is a subplot on single plot.I have tried a lot of options using groupby, but can't seem to get it work.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
COLLECTION = 'NEW'
DATA = r'C:\Analysis\Test\{}'.format(COLLECTION)
INPUT_FILE = DATA + r'\in.csv'
OUTPUT_FILE = DATA + r'\out.csv'
with open(INPUT_FILE) as fin:
df = pd.read_csv(INPUT_FILE,
usecols=["SourceID", 'hour','BSs','Type'],
header=0)
df.drop_duplicates(inplace=True)
df.reset_index(inplace=True)
It's still not 100% clear to me what sort of plot you actually want, but my guess is that you're looking for something like this:
from matplotlib import pyplot as plt
# group by SourceID and Type, find out how many unique combinations there are
grps = df.groupby(['SourceID', 'Type'])
ngrps = len(grps)
# make a grid of axes
ncols = int(np.sqrt(ngrps))
nrows = -(-ngrps // ncols)
fig, ax = plt.subplots(nrows, ncols, sharex=True, sharey=True)
# iterate over the groups, plot into each axis
for ii, (idx, rows) in enumerate(grps):
rows.plot(x='hour', y='BSs', style='-s', ax=ax.flat[ii], legend=False,
scalex=False, scaley=False)
# hide any unused axes
for aa in ax.flat[ngrps:]:
aa.set_axis_off()
# set the axis limits
ax.flat[0].set_xlim(df['hour'].min() - 1, df['hour'].max() + 1)
ax.flat[0].set_ylim(df['BSs'].min() - 5, df['BSs'].max() + 5)
fig.tight_layout()

Resources