Is there a way to plot 2x Standard Deviation in Seaborn? - python-3.x

For Seaborn lineplot, it seems pretty easy to plot the Standard Deviation by specifying ci='sd'. Is there a way to plot 2 times the standard deviation?
For example, I have a graph like this:
sns.lineplot(data=df, ax=x, x='day_of_week', y='y_variable', color='lightgrey', ci='sd')
Is there a way to make it so the "CI" plotted is 2 times the standard deviation?

I didn't find a solution within the seaborn, but a walk-around way is by using matplotlib.pyplot.fill_between, as, e.g., was done in this answer, but also in the thread suggested in the comments.
Here is my implementation:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
flights = sns.load_dataset("flights")
fig, axs = plt.subplots(1, 2, figsize=(12, 6), sharey=True)
sns.lineplot(data=flights, x="year", y = "passengers", ci="sd", ax=axs[0])
axs[0].set_title("seaborn")
nstd = 1.
means = flights.groupby("year")["passengers"].mean()
stds = flights.groupby("year")["passengers"].std()
axs[1].plot(means.index, means.values)
for nstd in range(1, 4):
axs[1].fill_between(means.index, (means - nstd*stds).values, (means + nstd*stds).values, alpha=0.3, label="nstd={}".format(nstd))
axs[1].legend(loc="upper left")
axs[0].set_title("homemade")
plt.savefig("./tmp/flights.png")
plt.close(fig)
The resulting figure is

Related

making multiple plot at the same time in python3

I have a list and a python array like these 2 examples:
example:
Neg = [37.972200755611425, 32.14963079785344]
Pos = array([[15.24373185, 13.66099865, 11.86959384, 9.72792045, 7.12928302, 6.04439412],[14.5235007 , 13. , 11.1792871 , 9.14974712, 6.4429435 , 5.04439412]
both Neg and Pos have 2 elements (in this example) therefore I would like to make 2 separate plots (pdf file) for every element.
in every plot there would be 2 lines:
1- comes from Pos and is a line plot basically which is made of all the elements in the sub-list.
2- comes from Neg and is a horizontal line on the y-axis.
I am trying to do that in a for loop for all elements at the same time. to do so, I made the following code in python but it does not return what I would like to get. do you know how to fix it ?
for i in range(len(Neg)):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(concentration, Pos[i], label='gg')
plt.axhline(y=Neg[i], color='b', linestyle='-')
ax.legend()
ax.set_xlabel("log2 concentration")
ax.set_ylabel("log2 raw counts")
ax.set_ylim(0, 40)
plt.savefig(f'{i}.pdf')
Not quite sure exactly what you want but this code creates two subplots of the data in the way I think you're describing it:
import numpy as np
from matplotlib import pyplot as plt
Neg = [37.972200755611425, 32.14963079785344]
Pos = np.array([[15.24373185, 13.66099865, 11.86959384, 9.72792045, 7.12928302, 6.04439412],[14.5235007 , 13. , 11.1792871 , 9.14974712, 6.4429435 , 5.04439412]])
fig = plt.figure()
for i in range(len(Neg)):
ax = fig.add_subplot(2,1,i+1)
ax.plot(Pos[i], label='gg')
plt.axhline(y=Neg[i], color='b', linestyle='-')
ax.legend()
ax.set_xlabel("log2 concentration")
ax.set_ylabel("log2 raw counts")
ax.set_ylim(0, 40)
plt.subplots_adjust(hspace=1.0)
extent = ax.get_window_extent().transformed(fig.dpi_scale_trans.inverted())
fig.savefig(f'{i}.pdf', bbox_inches=extent.expanded(1.2, 1.9))
Edited the code to save each subplot individually to file by grabbing a specific part of the plot for saving, as used in this question: Save a subplot in matplotlib.
Also included some additional spacing between each subplot by calling subplots_adjust(), so that each subplot can be saved to individual files without any detail from the other subplots being included. This might not be the best way of doing what you want, but I think it will do what you want now.
Alternatively, if you're not set on using subplots, you could always just use a plot per element:
fig = plt.figure()
for i in range(len(Neg)):
plt.plot(Pos[i], label='gg')
plt.axhline(y=Neg[i], color='b', linestyle='-')
plt.legend()
plt.xlabel("log2 concentration")
plt.ylabel("log2 raw counts")
plt.ylim(0, 40)
fig = plt.gcf()
fig.savefig(f'{i}.pdf')
plt.show()

Using python and networkx to find the probability density function

I'm struggling to draw a power law graph for Facebook Data that I found online. I'm using Networkx and I've found how to draw a Degree Histogram and a degree rank. The problem that I'm having is I want the y axis to be a probability so I'm assuming I need to sum up each y value and divide by the total number of nodes? Can anyone please help me do this? Once I've got this I'd like to draw a log-log graph to see if I can obtain a straight line. I'd really appreciate it if anyone could help! Here's my code:
import collections
import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community
import math
import pylab as plt
g = nx.read_edgelist("/Users/Michael/Desktop/anaconda3/facebook_combined.txt","r")
nx.info(g)
degree_sequence = sorted([d for n, d in g.degree()], reverse=True)
degreeCount = collections.Counter(degree_sequence)
deg, cnt = zip(*degreeCount.items())
fig, ax = plt.subplots()
plt.bar(deg, cnt, width=0.80, color='b')
plt.title("Degree Histogram for Facebook Data")
plt.ylabel("Count")
plt.xlabel("Degree")
ax.set_xticks([d + 0.4 for d in deg])
ax.set_xticklabels(deg)
plt.show()
plt.loglog(degree_sequence, 'b-', marker='o')
plt.title("Degree rank plot")
plt.ylabel("Degree")
plt.xlabel("Rank")
plt.show()
You seem to be on the right tracks, but some simplifications will likely help you. The code below uses only 2 libraries.
Without access your graph, we can use some graph generators instead. I've chosen 2 qualitatively different types here, and deliberately chosen different sizes so that the normalization of the histogram is needed.
import networkx as nx
import matplotlib.pyplot as plt
g1 = nx.scale_free_graph(1000, )
g2 = nx.watts_strogatz_graph(2000, 6, p=0.8)
# we don't need to sort the values since the histogram will handle it for us
deg_g1 = nx.degree(g1).values()
deg_g2 = nx.degree(g2).values()
# there are smarter ways to choose bin locations, but since
# degrees must be discrete, we can be lazy...
max_degree = max(deg_g1 + deg_g2)
# plot different styles to see both
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(deg_g1, bins=xrange(0, max_degree), density=True, histtype='bar', rwidth=0.8)
ax.hist(deg_g2, bins=xrange(0, max_degree), density=True, histtype='step', lw=3)
# setup the axes to be log/log scaled
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('degree')
ax.set_ylabel('relative density')
ax.legend()
plt.show()
This produces an output plot like this (both g1,g2 are randomised so won't be identical):
Here we can see that g1 has an approximately straight line decay in the degree distribution -- as expected for scale-free distributions on log-log axes. Conversely, g2 does not have a scale-free degree distribution.
To say anything more formal, you could look at the toolboxes from Aaron Clauset: http://tuvalu.santafe.edu/~aaronc/powerlaws/ which implement model fitting and statistical testing of power-law distributions.

matplotlib, add legend for each line? [duplicate]

TL;DR -> How can one create a legend for a line graph in Matplotlib's PyPlot without creating any extra variables?
Please consider the graphing script below:
if __name__ == '__main__':
PyPlot.plot(total_lengths, sort_times_bubble, 'b-',
total_lengths, sort_times_ins, 'r-',
total_lengths, sort_times_merge_r, 'g+',
total_lengths, sort_times_merge_i, 'p-', )
PyPlot.title("Combined Statistics")
PyPlot.xlabel("Length of list (number)")
PyPlot.ylabel("Time taken (seconds)")
PyPlot.show()
As you can see, this is a very basic use of matplotlib's PyPlot. This ideally generates a graph like the one below:
Nothing special, I know. However, it is unclear what data is being plotted where (I'm trying to plot the data of some sorting algorithms, length against time taken, and I'd like to make sure people know which line is which). Thus, I need a legend, however, taking a look at the following example below(from the official site):
ax = subplot(1,1,1)
p1, = ax.plot([1,2,3], label="line 1")
p2, = ax.plot([3,2,1], label="line 2")
p3, = ax.plot([2,3,1], label="line 3")
handles, labels = ax.get_legend_handles_labels()
# reverse the order
ax.legend(handles[::-1], labels[::-1])
# or sort them by labels
import operator
hl = sorted(zip(handles, labels),
key=operator.itemgetter(1))
handles2, labels2 = zip(*hl)
ax.legend(handles2, labels2)
You will see that I need to create an extra variable ax. How can I add a legend to my graph without having to create this extra variable and retaining the simplicity of my current script?
Add a label= to each of your plot() calls, and then call legend(loc='upper left').
Consider this sample (tested with Python 3.8.0):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 20, 1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, "-b", label="sine")
plt.plot(x, y2, "-r", label="cosine")
plt.legend(loc="upper left")
plt.ylim(-1.5, 2.0)
plt.show()
Slightly modified from this tutorial: http://jakevdp.github.io/mpl_tutorial/tutorial_pages/tut1.html
You can access the Axes instance (ax) with plt.gca(). In this case, you can use
plt.gca().legend()
You can do this either by using the label= keyword in each of your plt.plot() calls or by assigning your labels as a tuple or list within legend, as in this working example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-0.75,1,100)
y0 = np.exp(2 + 3*x - 7*x**3)
y1 = 7-4*np.sin(4*x)
plt.plot(x,y0,x,y1)
plt.gca().legend(('y0','y1'))
plt.show()
However, if you need to access the Axes instance more that once, I do recommend saving it to the variable ax with
ax = plt.gca()
and then calling ax instead of plt.gca().
Here's an example to help you out ...
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
ax.set_title('ADR vs Rating (CS:GO)')
ax.scatter(x=data[:,0],y=data[:,1],label='Data')
plt.plot(data[:,0], m*data[:,0] + b,color='red',label='Our Fitting
Line')
ax.set_xlabel('ADR')
ax.set_ylabel('Rating')
ax.legend(loc='best')
plt.show()
You can add a custom legend documentation
first = [1, 2, 4, 5, 4]
second = [3, 4, 2, 2, 3]
plt.plot(first, 'g--', second, 'r--')
plt.legend(['First List', 'Second List'], loc='upper left')
plt.show()
A simple plot for sine and cosine curves with a legend.
Used matplotlib.pyplot
import math
import matplotlib.pyplot as plt
x=[]
for i in range(-314,314):
x.append(i/100)
ysin=[math.sin(i) for i in x]
ycos=[math.cos(i) for i in x]
plt.plot(x,ysin,label='sin(x)') #specify label for the corresponding curve
plt.plot(x,ycos,label='cos(x)')
plt.xticks([-3.14,-1.57,0,1.57,3.14],['-$\pi$','-$\pi$/2',0,'$\pi$/2','$\pi$'])
plt.legend()
plt.show()
Add labels to each argument in your plot call corresponding to the series it is graphing, i.e. label = "series 1"
Then simply add Pyplot.legend() to the bottom of your script and the legend will display these labels.

Python matplotlib graphing [duplicate]

I need help with setting the limits of y-axis on matplotlib. Here is the code that I tried, unsuccessfully.
import matplotlib.pyplot as plt
plt.figure(1, figsize = (8.5,11))
plt.suptitle('plot title')
ax = []
aPlot = plt.subplot(321, axisbg = 'w', title = "Year 1")
ax.append(aPlot)
plt.plot(paramValues,plotDataPrice[0], color = '#340B8C',
marker = 'o', ms = 5, mfc = '#EB1717')
plt.xticks(paramValues)
plt.ylabel('Average Price')
plt.xlabel('Mark-up')
plt.grid(True)
plt.ylim((25,250))
With the data I have for this plot, I get y-axis limits of 20 and 200. However, I want the limits 20 and 250.
Get current axis via plt.gca(), and then set its limits:
ax = plt.gca()
ax.set_xlim([xmin, xmax])
ax.set_ylim([ymin, ymax])
One thing you can do is to set your axis range by yourself by using matplotlib.pyplot.axis.
matplotlib.pyplot.axis
from matplotlib import pyplot as plt
plt.axis([0, 10, 0, 20])
0,10 is for x axis range.
0,20 is for y axis range.
or you can also use matplotlib.pyplot.xlim or matplotlib.pyplot.ylim
matplotlib.pyplot.ylim
plt.ylim(-2, 2)
plt.xlim(0,10)
Another workaround is to get the plot's axes and reassign changing only the y-values:
x1,x2,y1,y2 = plt.axis()
plt.axis((x1,x2,25,250))
You can instantiate an object from matplotlib.pyplot.axes and call the set_ylim() on it. It would be something like this:
import matplotlib.pyplot as plt
axes = plt.axes()
axes.set_ylim([0, 1])
Just for fine tuning. If you want to set only one of the boundaries of the axis and let the other boundary unchanged, you can choose one or more of the following statements
plt.xlim(right=xmax) #xmax is your value
plt.xlim(left=xmin) #xmin is your value
plt.ylim(top=ymax) #ymax is your value
plt.ylim(bottom=ymin) #ymin is your value
Take a look at the documentation for xlim and for ylim
This worked at least in matplotlib version 2.2.2:
plt.axis([None, None, 0, 100])
Probably this is a nice way to set up for example xmin and ymax only, etc.
To add to #Hima's answer, if you want to modify a current x or y limit you could use the following.
import numpy as np # you probably alredy do this so no extra overhead
fig, axes = plt.subplot()
axes.plot(data[:,0], data[:,1])
xlim = axes.get_xlim()
# example of how to zoomout by a factor of 0.1
factor = 0.1
new_xlim = (xlim[0] + xlim[1])/2 + np.array((-0.5, 0.5)) * (xlim[1] - xlim[0]) * (1 + factor)
axes.set_xlim(new_xlim)
I find this particularly useful when I want to zoom out or zoom in just a little from the default plot settings.
This should work. Your code works for me, like for Tamás and Manoj Govindan. It looks like you could try to update Matplotlib. If you can't update Matplotlib (for instance if you have insufficient administrative rights), maybe using a different backend with matplotlib.use() could help.

Second y-axis and overlapping labeling?

I am using python for a simple time-series analysis of calory intake. I am plotting the time series and the rolling mean/std over time. It looks like this:
Here is how I do it:
## packages & libraries
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from pandas import Series, DataFrame, Panel
## import data and set time series structure
data = pd.read_csv('time_series_calories.csv', parse_dates={'dates': ['year','month','day']}, index_col=0)
## check ts for stationarity
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
#Determing rolling statistics
rolmean = pd.rolling_mean(timeseries, window=14)
rolstd = pd.rolling_std(timeseries, window=14)
#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show()
The plot doesn't look good - since the rolling std distorts the scale of variation and the x-axis labelling is screwed up. I have two question: (1) How can I plot the rolling std on a secony y-axis? (2) How can I fix the x-axis overlapping labeling?
EDIT
With your help I managed to get the following:
But do I get the legend sorted out?
1) Making a second (twin) axis can be done with ax2 = ax1.twinx(), see here for an example. Is this what you needed?
2) I believe there are several old answers to this question, i.e. here, here and here. According to the links provided, the easiest way is probably to use either plt.xticks(rotation=70) or plt.setp( ax.xaxis.get_majorticklabels(), rotation=70 ) or fig.autofmt_xdate().
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
plt.xticks(rotation=70) # Either this
ax.set_xticks([1, 2, 3, 4, 5])
ax.set_xticklabels(['aaaaaaaaaaaaaaaa','bbbbbbbbbbbbbbbbbb','cccccccccccccccccc','ddddddddddddddddddd','eeeeeeeeeeeeeeeeee'])
# fig.autofmt_xdate() # or this
# plt.setp( ax.xaxis.get_majorticklabels(), rotation=70 ) # or this works
fig.tight_layout()
plt.show()
Answer to Edit
When sharing lines between different axes into one legend is to create some fake-plots into the axis you want to have the legend as:
ax1.plot(something, 'r--') # one plot into ax1
ax2.plot(something else, 'gx') # another into ax2
# create two empty plots into ax1
ax1.plot([][], 'r--', label='Line 1 from ax1') # empty fake-plot with same lines/markers as first line you want to put in legend
ax1.plot([][], 'gx', label='Line 2 from ax2') # empty fake-plot as line 2
ax1.legend()
In my silly example it is probably better to label the original plot in ax1, but I hope you get the idea. The important thing is to create the "legend-plots" with the same line and marker settings as the original plots. Note that the fake-plots will not be plotted since there is no data to plot.

Resources