How to use fill_between utilizing the where parameter - python-3.x

So following a tutorial, I tried to create a graph using the following code:
time_values = [i for i in range(1,100)]
execution_time = [random.randint(0,100) for i in range(1,100)]
fig = plt.figure()
ax1 = plt.subplot()
threshold=[.8 for i in range(len(execution_time))]
ax1.plot(time_values, execution_time)
ax1.margins(x=-.49, y=0)
ax1.fill_between(time_values,execution_time, 1,where=(execution_time>1), color='r', alpha=.3)
This did not work as I got an error saying I could not compare a list and an int.
However, I then tried:
ax1.fill_between(time_values,execution_time, 1)
And that gave me a graph with all area in between the execution time and the y=1 line, filled in. Since I want the area above the y=1 line filled in, with the area below left un-shaded, I created a list called threshold, and populated it with 1 so that I could recreate the comparison. However,
ax1.fill_between(time_values,execution_time, 1,where=(execution_time>threshold)
and
ax1.fill_between(time_values,execution_time, 1)
create the exact same graph, even though the execution times values do go beyond 1.
I am confused for two reasons:
firstly, in the tutorial I was watching, the teacher was able to successfully compare a list and an integer within the fill_between function, why was I not able to do this?
Secondly, why is the where parameter not identifying the regions I want to fill? Ie, why is the graph shading in the areas between the y=1 and the value of the execution time?

The problem is mainly due the use of python lists instead of numpy arrays. Clearly you could use lists, but then you need to use them throughout the code.
import numpy as np
import matplotlib.pyplot as plt
time_values = list(range(1,100))
execution_time = [np.random.randint(0,100) for _ in range(len(time_values))]
threshold = 50
fig, ax = plt.subplots()
ax.plot(time_values, execution_time)
ax.fill_between(time_values, execution_time, threshold,
where= [e > threshold for e in execution_time],
color='r', alpha=.3)
ax.set_ylim(0,None)
plt.show()
Better is the use of numpy arrays throughout. It's not only faster, but also easier to code and understand.
import numpy as np
import matplotlib.pyplot as plt
time_values = np.arange(1,100)
execution_time = np.random.randint(0,100, size=len(time_values))
threshold = 50
fig, ax = plt.subplots()
ax.plot(time_values, execution_time)
ax.fill_between(time_values,execution_time, threshold,
where=(execution_time > threshold), color='r', alpha=.3)
ax.set_ylim(0,None)
plt.show()

Related

Too Many Indices For Array when using matplotlib

Thank you for taking time to read this question.
I am trying to plot pie charts in one row. The number of pie charts will depend on the result returned.
import matplotlib.pyplot as plt
import numpy as np
fig, axs = plt.subplots(1,len(to_plot_arr))
labels = ['Label1','Label2','Label3','Label4']
pos = 0
for scope in to_plot_arr:
if data["summary"][scope]["Count"] > 0:
pie_data = np.array(db_data)
axs[0,pos].pie(pie_data,labels=labels)
axs[0,pos].set_title(scope)
pos += 1
plt.show()
In the code, db_data looks like: [12,75,46,29]
When I execute the code above, I get the following error message:
Exception has occurred: IndexError
too many indices for array: array is 1-dimensional, but 2 were indexed
I've tried searching for what could be causing this problem, but just can't find any solution to it. I'm not sure what is meant by "but 2 were indexed"
I've tried generating a pie cahrt with :
y = np.array(db_data)
plt.pie(y)
plt.show()
And it generates the pie chart as expected. So, I'm not sure what is meant by "too many indices for array" which array is being referred to and how to resolve this.
Hope you are able to help me with this.
Thank You Again.
Notice that the axs you create in line 4 is of shape (len(to_plot_arr),) i.e., is 1D array, but in the loop in lines 11 and 12 you provide it 2 indices, which tells the interpreter that it is a 2D array, and conflicts with its actual shape.
Here is a fix:
import matplotlib.pyplot as plt
import numpy as np
fig, axs = plt.subplots(1,len(to_plot_arr))
labels = ['Label1','Label2','Label3','Label4']
pos = 0
for scope in to_plot_arr:
if data["summary"][scope]["Count"] > 0:
pie_data = np.array(db_data)
axs[pos].pie(pie_data,labels=labels)
axs[pos].set_title(scope)
pos += 1
plt.show()
Cheers.
So, I think this not technically and answer because I still don't know what was causing the error, but I found a way to solve my problem while still achieving my desired output.
Firstly, I realised, when I changed:
fig, axs = plt.subplots(1,len(to_plot_arr))
to:
fig, axs = plt.subplots(2,len(to_plot_arr)),
the figure could be drawn. So, I continued to try with other variations like (1,2),(2,1),(1,3) and always found that if nrows`` or ncols``` was 1, the error would come up.
Fortunately, for my use case, the layout I required was with 2 rows with the first row being one column, spanning 2 and the bottom row being 2 columns.
So, (2,2) fit my use case very well.
Then I set out to get the top row to span 2 columns and found out that this is best done with GridSpec in Matplotlib. While trying to figure out how to use GridSpec, I came to learn that using add_subplot() would be a better route with more flexibility.
So, my final code looks something like:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.gridspec import GridSpec
def make_chart():
fig = plt.figure()
fig.set_figheight(8)
fig.set_figwidth(10)
# Gridspec is used to specify the grid distribution of the figure
gs = GridSpec(2,len(to_plot_arr),figure=fig)
# This allows for the first row to span all the columns
r1 = fig.add_subplot(gs[0,:])
tbl = plt.table(
cellText = summary_data,
rowLabels = to_plot_arr,
colLabels = config["Key3"],
loc ='upper left',
cellLoc='center'
)
tbl.set_fontsize(20)
tbl.scale(1,3)
r1.axis('off')
pos = 0
for scope in to_plot_arr:
if data["Key1"][scope][0] > 0:
pie_data = np.array(data["Key2"][scope])
# Add a chart at the specified position
r2 = fig.add_subplot(gs[1,pos])
r2.pie(pie_data, autopct=make_autopct(pie_data))
r2.set_title(config["Key3"][scope])
pos += 1
fig.suptitle(title, fontsize=24)
plt.xticks([])
plt.yticks([])
fig.legend(labels,loc="center left",bbox_to_anchor=(0,0.25))
plt.savefig(savefile)
return filename
This was my first go at trying to use Matplotlib, the learning curve has been steep but with a little of patients and attention to the documentation, I was able to complete my task. I'm sure that there's better ways to do what I did. If you do know a better way or know how to explain the error I was encountering, please do add an answer to this.
Thank You!

Matplotlib get all axes artist objects for ArtistAnimation?

I am trying to make an animation using ArtistAnimation like this:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig, ax = plt.subplots()
ims = []
for i in range(60):
x = np.linspace(0,i,1000)
y = np.sin(x)
im = ax.plot(x,y, color='black')
ims.append(im)
ani = animation.ArtistAnimation(fig, ims, interval=50, blit=True,
repeat_delay=1000)
plt.show()
This animates a sine wave growing across the figure. Currently I'm just adding the Lines2D object returned by ax.plot() to ims. However, I would like to potentially draw multiple overlapping plots on the Axes and adjust the title, legend and x-axis range for each frame. How do I get an object that I can add to ims after plotting and making all the changes I want for each frame?
The list you supply to ArtistAnimation should be a list of lists of artists, one list per frame.
artist_list = [[line1a, line1b, title1], [line2a, line2b, title2], ...]
where the first list is shown in the first frame, the second list in the second frame etc.
The reason your code works is that ax.plot returns a list of lines (in your case only a list of a single line).
In any case, the following might be a more understandable version of your code where an additional text is animated.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
fig, ax = plt.subplots()
artist_list = []
for i in range(60):
x = np.linspace(0,i,1000)
y = np.sin(x)
line, = ax.plot(x,y, color='black')
text = ax.text(i,0,i)
artist_list.append([line, text])
ani = animation.ArtistAnimation(fig, artist_list, interval=50, blit=True,
repeat_delay=1000)
plt.show()
In general, it will be hard to animate changing axes limits with ArtistAnimation, so if that is an ultimate goal consider using a FuncAnimation instead.

Using python and networkx to find the probability density function

I'm struggling to draw a power law graph for Facebook Data that I found online. I'm using Networkx and I've found how to draw a Degree Histogram and a degree rank. The problem that I'm having is I want the y axis to be a probability so I'm assuming I need to sum up each y value and divide by the total number of nodes? Can anyone please help me do this? Once I've got this I'd like to draw a log-log graph to see if I can obtain a straight line. I'd really appreciate it if anyone could help! Here's my code:
import collections
import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community
import math
import pylab as plt
g = nx.read_edgelist("/Users/Michael/Desktop/anaconda3/facebook_combined.txt","r")
nx.info(g)
degree_sequence = sorted([d for n, d in g.degree()], reverse=True)
degreeCount = collections.Counter(degree_sequence)
deg, cnt = zip(*degreeCount.items())
fig, ax = plt.subplots()
plt.bar(deg, cnt, width=0.80, color='b')
plt.title("Degree Histogram for Facebook Data")
plt.ylabel("Count")
plt.xlabel("Degree")
ax.set_xticks([d + 0.4 for d in deg])
ax.set_xticklabels(deg)
plt.show()
plt.loglog(degree_sequence, 'b-', marker='o')
plt.title("Degree rank plot")
plt.ylabel("Degree")
plt.xlabel("Rank")
plt.show()
You seem to be on the right tracks, but some simplifications will likely help you. The code below uses only 2 libraries.
Without access your graph, we can use some graph generators instead. I've chosen 2 qualitatively different types here, and deliberately chosen different sizes so that the normalization of the histogram is needed.
import networkx as nx
import matplotlib.pyplot as plt
g1 = nx.scale_free_graph(1000, )
g2 = nx.watts_strogatz_graph(2000, 6, p=0.8)
# we don't need to sort the values since the histogram will handle it for us
deg_g1 = nx.degree(g1).values()
deg_g2 = nx.degree(g2).values()
# there are smarter ways to choose bin locations, but since
# degrees must be discrete, we can be lazy...
max_degree = max(deg_g1 + deg_g2)
# plot different styles to see both
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(deg_g1, bins=xrange(0, max_degree), density=True, histtype='bar', rwidth=0.8)
ax.hist(deg_g2, bins=xrange(0, max_degree), density=True, histtype='step', lw=3)
# setup the axes to be log/log scaled
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('degree')
ax.set_ylabel('relative density')
ax.legend()
plt.show()
This produces an output plot like this (both g1,g2 are randomised so won't be identical):
Here we can see that g1 has an approximately straight line decay in the degree distribution -- as expected for scale-free distributions on log-log axes. Conversely, g2 does not have a scale-free degree distribution.
To say anything more formal, you could look at the toolboxes from Aaron Clauset: http://tuvalu.santafe.edu/~aaronc/powerlaws/ which implement model fitting and statistical testing of power-law distributions.

matplotlib, add legend for each line? [duplicate]

TL;DR -> How can one create a legend for a line graph in Matplotlib's PyPlot without creating any extra variables?
Please consider the graphing script below:
if __name__ == '__main__':
PyPlot.plot(total_lengths, sort_times_bubble, 'b-',
total_lengths, sort_times_ins, 'r-',
total_lengths, sort_times_merge_r, 'g+',
total_lengths, sort_times_merge_i, 'p-', )
PyPlot.title("Combined Statistics")
PyPlot.xlabel("Length of list (number)")
PyPlot.ylabel("Time taken (seconds)")
PyPlot.show()
As you can see, this is a very basic use of matplotlib's PyPlot. This ideally generates a graph like the one below:
Nothing special, I know. However, it is unclear what data is being plotted where (I'm trying to plot the data of some sorting algorithms, length against time taken, and I'd like to make sure people know which line is which). Thus, I need a legend, however, taking a look at the following example below(from the official site):
ax = subplot(1,1,1)
p1, = ax.plot([1,2,3], label="line 1")
p2, = ax.plot([3,2,1], label="line 2")
p3, = ax.plot([2,3,1], label="line 3")
handles, labels = ax.get_legend_handles_labels()
# reverse the order
ax.legend(handles[::-1], labels[::-1])
# or sort them by labels
import operator
hl = sorted(zip(handles, labels),
key=operator.itemgetter(1))
handles2, labels2 = zip(*hl)
ax.legend(handles2, labels2)
You will see that I need to create an extra variable ax. How can I add a legend to my graph without having to create this extra variable and retaining the simplicity of my current script?
Add a label= to each of your plot() calls, and then call legend(loc='upper left').
Consider this sample (tested with Python 3.8.0):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 20, 1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, "-b", label="sine")
plt.plot(x, y2, "-r", label="cosine")
plt.legend(loc="upper left")
plt.ylim(-1.5, 2.0)
plt.show()
Slightly modified from this tutorial: http://jakevdp.github.io/mpl_tutorial/tutorial_pages/tut1.html
You can access the Axes instance (ax) with plt.gca(). In this case, you can use
plt.gca().legend()
You can do this either by using the label= keyword in each of your plt.plot() calls or by assigning your labels as a tuple or list within legend, as in this working example:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-0.75,1,100)
y0 = np.exp(2 + 3*x - 7*x**3)
y1 = 7-4*np.sin(4*x)
plt.plot(x,y0,x,y1)
plt.gca().legend(('y0','y1'))
plt.show()
However, if you need to access the Axes instance more that once, I do recommend saving it to the variable ax with
ax = plt.gca()
and then calling ax instead of plt.gca().
Here's an example to help you out ...
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(111)
ax.set_title('ADR vs Rating (CS:GO)')
ax.scatter(x=data[:,0],y=data[:,1],label='Data')
plt.plot(data[:,0], m*data[:,0] + b,color='red',label='Our Fitting
Line')
ax.set_xlabel('ADR')
ax.set_ylabel('Rating')
ax.legend(loc='best')
plt.show()
You can add a custom legend documentation
first = [1, 2, 4, 5, 4]
second = [3, 4, 2, 2, 3]
plt.plot(first, 'g--', second, 'r--')
plt.legend(['First List', 'Second List'], loc='upper left')
plt.show()
A simple plot for sine and cosine curves with a legend.
Used matplotlib.pyplot
import math
import matplotlib.pyplot as plt
x=[]
for i in range(-314,314):
x.append(i/100)
ysin=[math.sin(i) for i in x]
ycos=[math.cos(i) for i in x]
plt.plot(x,ysin,label='sin(x)') #specify label for the corresponding curve
plt.plot(x,ycos,label='cos(x)')
plt.xticks([-3.14,-1.57,0,1.57,3.14],['-$\pi$','-$\pi$/2',0,'$\pi$/2','$\pi$'])
plt.legend()
plt.show()
Add labels to each argument in your plot call corresponding to the series it is graphing, i.e. label = "series 1"
Then simply add Pyplot.legend() to the bottom of your script and the legend will display these labels.

All Matplotlib points appearing at bottom of graph, regardless of y-value

I'm following this linear regression tutorial. Here's my code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
dataframe = pd.read_fwf('brain_body.txt')
x_values = dataframe[['Brain']]
y_values = dataframe[['Body']]
body_reg = linear_model.LinearRegression()
body_reg.fit(x_values, y_values)
plt.scatter(x_values, y_values)
plt.plot(x_values, body_reg.predict(x_values))
plt.show()
When I run the script, I get no errors, but the graph doesn't seem to account for the y-values. I reduced the data points to three so it's easier to see:
I tried to manually change the y-axis with plt.ylim([-1000,7000]) but no luck.
Thanks for any suggestions!
There's nothing wrong with the code, it's just that you have a few very extreme values in relation to the rest of your data. Matplotlib expands the graph to show the extreme values, but that ends up in bunching all the others. Broadening your ylim will only increase the effect - try a much smaller ylim and xlim instead:
plt.ylim([0, 20])
plt.xlim([0, 2])

Resources