Pandas & Matplotlib: personalize the date format in a line chart - python-3.x

I want to make the dates on the x- axis look more prettier, currently the dates cannot be even read. what is the best way to do it.
Below is the code and also the actual graph picture
import matplotlib.pyplot as plt
import pandas as pd
import pandas as pd
df = dataset
# gca stands for 'get current axis'
ax = plt.gca()
y1 = df['Predicted_Lower']
y2 = df['Predicted_Upper']
x = df['Date']
ax.fill_between(x,y1, y2, facecolor="#CC6666", alpha=0.7)
df.plot(kind='line',x='Date',y='Predicted_Lower',color='white',ax=ax)
df.plot(kind='line',x='Date',y='Predicted_Upper',color='white', ax=ax)
df.plot(kind='line',x='Date',y='Predicted', color='yellow', ax=ax)
df.plot(kind='line',x='Date',y='Actuals', color='green', ax=ax)
plt.xticks(rotation=45)
plt.show()

You can modify the number of labels, by settings locs and labels parameters using matplotlib.pyplot.xticks, for example get the current locs and labels and only plot one-third of them:
# ...
df.plot(kind='line',x='Date',y='Actuals', color='green', ax=ax)
locs, labels = plt.xticks()
plt.xticks(locs[::3], labels[::3], rotation=45)
plt.show()

Related

How to plot vertical stacked graph from different text files?

I have 5 txt files which contain data give me the effect of increasing heat on my samples and I want plot them in a vertical stacked graph, Where the final figure is 5 vertical stacked chart sharing the same X-axis and each line in a separate one to reveal the difference between them.
I wrote this code:
import glob
import pandas as pd
import matplotlib.axes._axes as axes
import matplotlib.pyplot as plt
input_files = glob.glob('01-input/RR_*.txt')
for file in input_files:
data = pd.read_csv(file, header=None, delimiter="\t").values
x = data[:,0]
y = data[:,1]
plt.subplot(2, 1, 1)
plt.plot(x, y, linewidth=2, linestyle=':')
plt.tight_layout()
plt.xlabel('x-axis')
plt.ylabel('y-axis')
But the result is only one graph containing all the lines:
I want to get the following chart:
import matplotlib.pyplot as plt
import numpy as np
# just a dummy data
x = np.linspace(0, 2700, 50)
all_data = [np.sin(x), np.cos(x), x**0.3, x**0.4, x**0.5]
n = len(all_data)
n_rows = n
n_cols = 1
fig, ax = plt.subplots(n_rows, n_cols) # each element in "ax" is a axes
for i, y in enumerate(all_data):
ax[i].plot(x, y, linewidth=2, linestyle=':')
ax[i].set_ylabel('y-axis')
# You can to use a list of y-labels. Example:
# my_labels = ['y1', 'y2', 'y3', 'y4', 'y5']
# ax[i].set_ylabel(my_labels[i])
# The "my_labels" lenght must be "n" too
plt.xlabel('x-axis') # add xlabel at last axes
plt.tight_layout()

Using a nested for loop in subplots [duplicate]

I am a little confused about how this code works:
fig, axes = plt.subplots(nrows=2, ncols=2)
plt.show()
How does the fig, axes work in this case? What does it do?
Also why wouldn't this work to do the same thing:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
There are several ways to do it. The subplots method creates the figure along with the subplots that are then stored in the ax array. For example:
import matplotlib.pyplot as plt
x = range(10)
y = range(10)
fig, ax = plt.subplots(nrows=2, ncols=2)
for row in ax:
for col in row:
col.plot(x, y)
plt.show()
However, something like this will also work, it's not so "clean" though since you are creating a figure with subplots and then add on top of them:
fig = plt.figure()
plt.subplot(2, 2, 1)
plt.plot(x, y)
plt.subplot(2, 2, 2)
plt.plot(x, y)
plt.subplot(2, 2, 3)
plt.plot(x, y)
plt.subplot(2, 2, 4)
plt.plot(x, y)
plt.show()
import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 2)
ax[0, 0].plot(range(10), 'r') #row=0, col=0
ax[1, 0].plot(range(10), 'b') #row=1, col=0
ax[0, 1].plot(range(10), 'g') #row=0, col=1
ax[1, 1].plot(range(10), 'k') #row=1, col=1
plt.show()
You can also unpack the axes in the subplots call
And set whether you want to share the x and y axes between the subplots
Like this:
import matplotlib.pyplot as plt
# fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
ax1, ax2, ax3, ax4 = axes.flatten()
ax1.plot(range(10), 'r')
ax2.plot(range(10), 'b')
ax3.plot(range(10), 'g')
ax4.plot(range(10), 'k')
plt.show()
You might be interested in the fact that as of matplotlib version 2.1 the second code from the question works fine as well.
From the change log:
Figure class now has subplots method
The Figure class now has a subplots() method which behaves the same as pyplot.subplots() but on an existing figure.
Example:
import matplotlib.pyplot as plt
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
plt.show()
Read the documentation: matplotlib.pyplot.subplots
pyplot.subplots() returns a tuple fig, ax which is unpacked in two variables using the notation
fig, axes = plt.subplots(nrows=2, ncols=2)
The code:
fig = plt.figure()
axes = fig.subplots(nrows=2, ncols=2)
does not work because subplots() is a function in pyplot not a member of the object Figure.
Iterating through all subplots sequentially:
fig, axes = plt.subplots(nrows, ncols)
for ax in axes.flatten():
ax.plot(x,y)
Accessing a specific index:
for row in range(nrows):
for col in range(ncols):
axes[row,col].plot(x[row], y[col])
Subplots with pandas
This answer is for subplots with pandas, which uses matplotlib as the default plotting backend.
Here are four options to create subplots starting with a pandas.DataFrame
Implementation 1. and 2. are for the data in a wide format, creating subplots for each column.
Implementation 3. and 4. are for data in a long format, creating subplots for each unique value in a column.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
Imports and Data
import seaborn as sns # data only
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# wide dataframe
df = sns.load_dataset('planets').iloc[:, 2:5]
orbital_period mass distance
0 269.300 7.10 77.40
1 874.774 2.21 56.95
2 763.000 2.60 19.84
3 326.030 19.40 110.62
4 516.220 10.50 119.47
# long dataframe
dfm = sns.load_dataset('planets').iloc[:, 2:5].melt()
variable value
0 orbital_period 269.300
1 orbital_period 874.774
2 orbital_period 763.000
3 orbital_period 326.030
4 orbital_period 516.220
1. subplots=True and layout, for each column
Use the parameters subplots=True and layout=(rows, cols) in pandas.DataFrame.plot
This example uses kind='density', but there are different options for kind, and this applies to them all. Without specifying kind, a line plot is the default.
ax is array of AxesSubplot returned by pandas.DataFrame.plot
See How to get a Figure object, if needed.
How to save pandas subplots
axes = df.plot(kind='density', subplots=True, layout=(2, 2), sharex=False, figsize=(10, 6))
# extract the figure object; only used for tight_layout in this example
fig = axes[0][0].get_figure()
# set the individual titles
for ax, title in zip(axes.ravel(), df.columns):
ax.set_title(title)
fig.tight_layout()
plt.show()
2. plt.subplots, for each column
Create an array of Axes with matplotlib.pyplot.subplots and then pass axes[i, j] or axes[n] to the ax parameter.
This option uses pandas.DataFrame.plot, but can use other axes level plot calls as a substitute (e.g. sns.kdeplot, plt.plot, etc.)
It's easiest to collapse the subplot array of Axes into one dimension with .ravel or .flatten. See .ravel vs .flatten.
Any variables applying to each axes, that need to be iterate through, are combined with .zip (e.g. cols, axes, colors, palette, etc.). Each object must be the same length.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
cols = df.columns # create a list of dataframe columns to use
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for col, color, ax in zip(cols, colors, axes):
df[col].plot(kind='density', ax=ax, color=color, label=col, title=col)
ax.legend()
fig.delaxes(axes[3]) # delete the empty subplot
fig.tight_layout()
plt.show()
Result for 1. and 2.
3. plt.subplots, for each group in .groupby
This is similar to 2., except it zips color and axes to a .groupby object.
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 6)) # define the figure and subplots
axes = axes.ravel() # array to 1D
dfg = dfm.groupby('variable') # get data for each unique value in the first column
colors = ['tab:blue', 'tab:orange', 'tab:green'] # list of colors for each subplot, otherwise all subplots will be one color
for (group, data), color, ax in zip(dfg, colors, axes):
data.plot(kind='density', ax=ax, color=color, title=group, legend=False)
fig.delaxes(axes[3]) # delete the empty subplot
fig.tight_layout()
plt.show()
4. seaborn figure-level plot
Use a seaborn figure-level plot, and use the col or row parameter. seaborn is a high-level API for matplotlib. See seaborn: API reference
p = sns.displot(data=dfm, kind='kde', col='variable', col_wrap=2, x='value', hue='variable',
facet_kws={'sharey': False, 'sharex': False}, height=3.5, aspect=1.75)
sns.move_legend(p, "upper left", bbox_to_anchor=(.55, .45))
Convert the axes array to 1D
Generating subplots with plt.subplots(nrows, ncols), where both nrows and ncols is greater than 1, returns a nested array of <AxesSubplot:> objects.
It’s not necessary to flatten axes in cases where either nrows=1 or ncols=1, because axes will already be 1 dimensional, which is a result of the default parameter squeeze=True
The easiest way to access the objects, is to convert the array to 1 dimension with .ravel(), .flatten(), or .flat.
.ravel vs. .flatten
flatten always returns a copy.
ravel returns a view of the original array whenever possible.
Once the array of axes is converted to 1-d, there are a number of ways to plot.
This answer is relevant to seaborn axes-level plots, which have the ax= parameter (e.g. sns.barplot(…, ax=ax[0]).
seaborn is a high-level API for matplotlib. See Figure-level vs. axes-level functions and seaborn is not plotting within defined subplots
import matplotlib.pyplot as plt
import numpy as np # sample data only
# example of data
rads = np.arange(0, 2*np.pi, 0.01)
y_data = np.array([np.sin(t*rads) for t in range(1, 5)])
x_data = [rads, rads, rads, rads]
# Generate figure and its subplots
fig, axes = plt.subplots(nrows=2, ncols=2)
# axes before
array([[<AxesSubplot:>, <AxesSubplot:>],
[<AxesSubplot:>, <AxesSubplot:>]], dtype=object)
# convert the array to 1 dimension
axes = axes.ravel()
# axes after
array([<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>],
dtype=object)
Iterate through the flattened array
If there are more subplots than data, this will result in IndexError: list index out of range
Try option 3. instead, or select a subset of the axes (e.g. axes[:-2])
for i, ax in enumerate(axes):
ax.plot(x_data[i], y_data[i])
Access each axes by index
axes[0].plot(x_data[0], y_data[0])
axes[1].plot(x_data[1], y_data[1])
axes[2].plot(x_data[2], y_data[2])
axes[3].plot(x_data[3], y_data[3])
Index the data and axes
for i in range(len(x_data)):
axes[i].plot(x_data[i], y_data[i])
zip the axes and data together and then iterate through the list of tuples.
for ax, x, y in zip(axes, x_data, y_data):
ax.plot(x, y)
Ouput
An option is to assign each axes to a variable, fig, (ax1, ax2, ax3) = plt.subplots(1, 3). However, as written, this only works in cases with either nrows=1 or ncols=1. This is based on the shape of the array returned by plt.subplots, and quickly becomes cumbersome.
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) for a 2 x 2 array.
This option is most useful for two subplots (e.g.: fig, (ax1, ax2) = plt.subplots(1, 2) or fig, (ax1, ax2) = plt.subplots(2, 1)). For more subplots, it's more efficient to flatten and iterate through the array of axes.
You could use the following:
import numpy as np
import matplotlib.pyplot as plt
fig, _ = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(fig.axes):
ax.plot(np.sin(np.linspace(0,2*np.pi,100) + np.pi/2*i))
Or alternatively, using the second variable that plt.subplot returns:
fig, ax_mat = plt.subplots(nrows=2, ncols=2)
for i, ax in enumerate(ax_mat.flatten()):
...
ax_mat is a matrix of the axes. It's shape is nrows x ncols.
here is a simple solution
fig, ax = plt.subplots(nrows=2, ncols=3, sharex=True, sharey=False)
for sp in fig.axes:
sp.plot(range(10))
Go with the following if you really want to use a loop:
def plot(data):
fig = plt.figure(figsize=(100, 100))
for idx, k in enumerate(data.keys(), 1):
x, y = data[k].keys(), data[k].values
plt.subplot(63, 10, idx)
plt.bar(x, y)
plt.show()
Another concise solution is:
// set up structure of plots
f, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20,10))
// for plot 1
ax1.set_title('Title A')
ax1.plot(x, y)
// for plot 2
ax2.set_title('Title B')
ax2.plot(x, y)
// for plot 3
ax3.set_title('Title C')
ax3.plot(x,y)

Combine bar plot and line plot in seaborn [duplicate]

I have dataframe like this:
df_meshX_min_select = pd.DataFrame({
'Number of Elements' : [5674, 8810,13366,19751,36491],
'Time (a)' : [42.14, 51.14, 55.64, 55.14, 56.64],
'Different Result(Temperature)' : [0.083849, 0.057309, 0.055333, 0.060516, 0.035343]})
and I tried to combine bar plot (number of elements Vs Different result) and line plot (Number of elements Vs Time) in the same figure, but I found the following problem like this:
it seems that x_value doesn't match when combining 2 plots, but if you see the data frame, the x value is exactly the same value.
My expectation is combining these 2 plots into 1 figure:
and this is the code that I made:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
df_meshX_min_select = pd.DataFrame({
'Number of Elements' : [5674, 8810,13366,19751,36491],
'Time (a)' : [42.14, 51.14, 55.64, 55.14, 56.64],
'Different Result(Temperature)' : [0.083849, 0.057309, 0.055333, 0.060516, 0.035343]})
x1= df_meshX_min_select["Number of Elements"]
t1= df_meshX_min_select["Time (a)"]
T1= df_meshX_min_select["Different Result(Temperature)"]
#Create combo chart
fig, ax1 = plt.subplots(figsize=(10,6))
color = 'tab:green'
#bar plot creation
ax1.set_title('Mesh Analysis', fontsize=16)
ax1.set_xlabel('Number of elements', fontsize=16)
ax1.set_ylabel('Different Result(Temperature)', fontsize=16)
ax1 = sns.barplot(x='Number of Elements', y='Different Result(Temperature)', data = df_meshX_min_select)
ax1.tick_params(axis='y')
#specify we want to share the same x-axis
ax2 = ax1.twinx()
color = 'tab:red'
#line plot creation
ax2.set_ylabel('Time (a)', fontsize=16)
ax2 = sns.lineplot(x='Number of Elements', y='Time (a)', data = df_meshX_min_select, sort=False, color=color, ax=ax2)
ax2.tick_params(axis='y', color=color)
#show plot
plt.show()
Anyone can help me, please?
Seaborn and pandas use a categorical x-axis for bar plots (internally numbered 0,1,2,...) and floating-point numbers for a line plot. Note that your x-values aren't evenly spaced, so either the bars would have weird distances between them, or wouldn't align with the x-values from the line plot.
Here is a solution using standard matplotlib to combine both graphs.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
df_meshx_min_select = pd.DataFrame({
'number of elements': [5674, 8810, 13366, 19751, 36491],
'time (a)': [42.14, 51.14, 55.64, 55.14, 56.64],
'different result(temperature)': [0.083849, 0.057309, 0.055333, 0.060516, 0.035343]})
x1 = df_meshx_min_select["number of elements"]
t1 = df_meshx_min_select["time (a)"]
d1 = df_meshx_min_select["different result(temperature)"]
fig, ax1 = plt.subplots(figsize=(10, 6))
color = 'limegreen'
ax1.set_title('mesh analysis', fontsize=16)
ax1.set_xlabel('number of elements', fontsize=16)
ax1.set_ylabel('different result(temperature)', fontsize=16, color=color)
ax1.bar(x1, height=d1, width=2000, color=color)
ax1.tick_params(axis='y', colors=color)
ax2 = ax1.twinx() # share the x-axis, new y-axis
color = 'crimson'
ax2.set_ylabel('time (a)', fontsize=16, color=color)
ax2.plot(x1, t1, color=color)
ax2.tick_params(axis='y', colors=color)
plt.show()
I was plotting a boxplot with a lineplot and I had the same problem even my two x-axes are identical, so I solved converting my x-axis feature to type string:
df_meshX_min_select['Number of Elements'] = df_meshX_min_select['Number of Elements'].astype('string')
This way the plot works using seaborn:

Plot data from Excel in Python

The code I have to read and plot data from my excel file is this:
import pandas as pd
import matplotlib.pyplot as plt
excel_file = 'file1.xlsx'
file1 = pd.read_excel(excel_file)
file1.head()
plt.plot(x,y1,y2)
plt.xlabel('wavelenghts')
plt.ylabel('reflectivity')
plt.legend(loc='upper left')
plt.show
It works.
The questions are:
I have more columns, but when I want to add y3, y4,... I get the error that y3 is undefined.
In legend I want to change the name of y1 to CK4/5-PCA82500 and others as well. Is there any way to do it?
f, ax = figure()
plt.plot(file1.x,file1.y1,label='')
plt.plot(file1.x,file1.y2)
plt.plot(file1.x,file1.y3)
.....
plt.xlabel('wavelenghts')
plt.ylabel('reflectivity')
plt.legend(loc='upper left')
plt.show

Why is Python matplot not starting from the point where my Data starts [duplicate]

So currently learning how to import data and work with it in matplotlib and I am having trouble even tho I have the exact code from the book.
This is what the plot looks like, but my question is how can I get it where there is no white space between the start and the end of the x-axis.
Here is the code:
import csv
from matplotlib import pyplot as plt
from datetime import datetime
# Get dates and high temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#for index, column_header in enumerate(header_row):
#print(index, column_header)
dates, highs = [], []
for row in reader:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
# Plot data.
fig = plt.figure(dpi=128, figsize=(10,6))
plt.plot(dates, highs, c='red')
# Format plot.
plt.title("Daily high temperatures, July 2014", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()
There is an automatic margin set at the edges, which ensures the data to be nicely fitting within the axis spines. In this case such a margin is probably desired on the y axis. By default it is set to 0.05 in units of axis span.
To set the margin to 0 on the x axis, use
plt.margins(x=0)
or
ax.margins(x=0)
depending on the context. Also see the documentation.
In case you want to get rid of the margin in the whole script, you can use
plt.rcParams['axes.xmargin'] = 0
at the beginning of your script (same for y of course). If you want to get rid of the margin entirely and forever, you might want to change the according line in the matplotlib rc file:
axes.xmargin : 0
axes.ymargin : 0
Example
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
tips.plot(ax=ax1, title='Default Margin')
tips.plot(ax=ax2, title='Margins: x=0')
ax2.margins(x=0)
Alternatively, use plt.xlim(..) or ax.set_xlim(..) to manually set the limits of the axes such that there is no white space left.
If you only want to remove the margin on one side but not the other, e.g. remove the margin from the right but not from the left, you can use set_xlim() on a matplotlib axes object.
import seaborn as sns
import matplotlib.pyplot as plt
import math
max_x_value = 100
x_values = [i for i in range (1, max_x_value + 1)]
y_values = [math.log(i) for i in x_values]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
sn.lineplot(ax=ax1, x=x_values, y=y_values)
sn.lineplot(ax=ax2, x=x_values, y=y_values)
ax2.set_xlim(-5, max_x_value) # tune the -5 to your needs

Resources