Unable to read data from kdeplot - python-3.x

I have a pandas dataframe with two columns, A and B, named df in the following bits of code.
And I try to plot a kde for each value of B like so:
import seaborn as sbn, numpy as np, pandas as pd
fig = plt.figure(figsize=(15, 7.5))
sbn.kdeplot(data=df, x="A", hue="B", fill=True)
fig.savefig("test.png")
I read the following propositions but only those where I compute the kde from scratch using statsmodel or some other module get me somewhere:
Seaborn/Matplotlib: how to access line values in FacetGrid?
Get data points from Seaborn distplot
For curiosity's sake, I would like to know why I am unable to get something from the following code:
kde = sns.kdeplot(data=df, x="A", hue="B", fill=True)
line = kde.lines[0]
x, y = line.get_data()
print(x, y)
The error I get is IndexError: list index out of range. kde.lines has a length of 0.
Accessing the lines through fig.axes[0].lines[0] also raises an IndexError.
All in all, I think I tried everything proposed in the previous threads (I tried switching to displot instead of using kdeplot but this is the same story, only that I have to access axes differently, note displot and not distplot because it is deprecated), but every time I get to .get_lines(), ax.lines, ... what is returned is an empty list. So I can't get any values out of it.
EDIT : Reproducible example
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sbn
# 1. Generate random data
df = pd.DataFrame(columns=["A", "B"])
for i in [1, 2, 3, 5, 7, 8, 10, 12, 15, 17, 20, 40, 50]:
for _ in range(10):
df = df.append({"A": np.random.random() * i, "B": i}, ignore_index=True)
# 2. Plot data
fig = plt.figure(figsize=(15, 7.5))
sbn.kdeplot(data=df, x="A", hue="B", fill=True)
# 3. Read data (error)
ax = fig.axes[0]
x, y = ax.lines[0].get_data()
print(x, y)

This happens because using fill=True changes the object that matplotlib draws.
When no fill is used, lines are plotted:
fig = plt.figure(figsize=(15, 7.5))
ax = sbn.kdeplot(data=df, x="A", hue="B")
print(ax.lines)
# [<matplotlib.lines.Line2D object at 0x000001F365EF7848>, etc.]
when you use fill, it changes them to PolyCollection objects
fig = plt.figure(figsize=(15, 7.5))
ax = sbn.kdeplot(data=df, x="A", hue="B", fill=True)
print(ax.collections)
# [<matplotlib.collections.PolyCollection object at 0x0000016EE13F39C8>, etc.]
You could draw the kdeplot a second time, but with fill=False so that you have access to the line objects

Related

Pyplot: subsequent plots with a gradient of colours [duplicate]

I am plotting multiple lines on a single plot and I want them to run through the spectrum of a colormap, not just the same 6 or 7 colors. The code is akin to this:
for i in range(20):
for k in range(100):
y[k] = i*x[i]
plt.plot(x,y)
plt.show()
Both with colormap "jet" and another that I imported from seaborn, I get the same 7 colors repeated in the same order. I would like to be able to plot up to ~60 different lines, all with different colors.
The Matplotlib colormaps accept an argument (0..1, scalar or array) which you use to get colors from a colormap. For example:
col = pl.cm.jet([0.25,0.75])
Gives you an array with (two) RGBA colors:
array([[ 0. , 0.50392157, 1. , 1. ],
[ 1. , 0.58169935, 0. , 1. ]])
You can use that to create N different colors:
import numpy as np
import matplotlib.pylab as pl
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
pl.figure()
pl.plot(x,y)
n = 20
colors = pl.cm.jet(np.linspace(0,1,n))
for i in range(n):
pl.plot(x, i*y, color=colors[i])
Bart's solution is nice and simple but has two shortcomings.
plt.colorbar() won't work in a nice way because the line plots aren't mappable (compared to, e.g., an image)
It can be slow for large numbers of lines due to the for loop (though this is maybe not a problem for most applications?)
These issues can be addressed by using LineCollection. However, this isn't too user-friendly in my (humble) opinion. There is an open suggestion on GitHub for adding a multicolor line plot function, similar to the plt.scatter(...) function.
Here is a working example I was able to hack together
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def multiline(xs, ys, c, ax=None, **kwargs):
"""Plot lines with different colorings
Parameters
----------
xs : iterable container of x coordinates
ys : iterable container of y coordinates
c : iterable container of numbers mapped to colormap
ax (optional): Axes to plot on.
kwargs (optional): passed to LineCollection
Notes:
len(xs) == len(ys) == len(c) is the number of line segments
len(xs[i]) == len(ys[i]) is the number of points for each line (indexed by i)
Returns
-------
lc : LineCollection instance.
"""
# find axes
ax = plt.gca() if ax is None else ax
# create LineCollection
segments = [np.column_stack([x, y]) for x, y in zip(xs, ys)]
lc = LineCollection(segments, **kwargs)
# set coloring of line segments
# Note: I get an error if I pass c as a list here... not sure why.
lc.set_array(np.asarray(c))
# add lines to axes and rescale
# Note: adding a collection doesn't autoscalee xlim/ylim
ax.add_collection(lc)
ax.autoscale()
return lc
Here is a very simple example:
xs = [[0, 1],
[0, 1, 2]]
ys = [[0, 0],
[1, 2, 1]]
c = [0, 1]
lc = multiline(xs, ys, c, cmap='bwr', lw=2)
Produces:
And something a little more sophisticated:
n_lines = 30
x = np.arange(100)
yint = np.arange(0, n_lines*10, 10)
ys = np.array([x + b for b in yint])
xs = np.array([x for i in range(n_lines)]) # could also use np.tile
colors = np.arange(n_lines)
fig, ax = plt.subplots()
lc = multiline(xs, ys, yint, cmap='bwr', lw=2)
axcb = fig.colorbar(lc)
axcb.set_label('Y-intercept')
ax.set_title('Line Collection with mapped colors')
Produces:
Hope this helps!
An anternative to Bart's answer, in which you do not specify the color in each call to plt.plot is to define a new color cycle with set_prop_cycle. His example can be translated into the following code (I've also changed the import of matplotlib to the recommended style):
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 2*np.pi, 64)
y = np.cos(x)
n = 20
ax = plt.axes()
ax.set_prop_cycle('color',[plt.cm.jet(i) for i in np.linspace(0, 1, n)])
for i in range(n):
plt.plot(x, i*y)
If you are using continuous color pallets like brg, hsv, jet or the default one then you can do like this:
color = plt.cm.hsv(r) # r is 0 to 1 inclusive
Now you can pass this color value to any API you want like this:
line = matplotlib.lines.Line2D(xdata, ydata, color=color)
This approach seems to me like the most concise, user-friendly and does not require a loop to be used. It does not rely on user-made functions either.
import numpy as np
import matplotlib.pyplot as plt
# make 5 lines
n_lines = 5
x = np.arange(0, 2).reshape(-1, 1)
A = np.linspace(0, 2, n_lines).reshape(1, -1)
Y = x # A
# create colormap
cm = plt.cm.bwr(np.linspace(0, 1, n_lines))
# plot
ax = plt.subplot(111)
ax.set_prop_cycle('color', list(cm))
ax.plot(x, Y)
plt.show()
Resulting figure here

python-How do I plot from a list of lists imported from Excel?

Helo everyone. My task sounds simple but I feel myself confused. I have got an Excel file with five different columns (Magnitude, BP1, BP2, D1, D2) each with the same number of rows (23).
Input:
Five columns with numerical data.
Desired output:
Two subplots. First one must contain Magnitude vs Frequency (understanding frequency as a line from BP1 to BP2 for every magnitude item). Second one must be Magnitude vs Distance (understanding distance in a similar manner as above).
Tried coding:
import numpy as numpy
import os, sys
import os.path
import pandas as pd
import matplotlib.pyplot as plt
fname=os.path.join(workingdir, 'Frequency and BP values.xlsx')
if not os.path.isfile(fname):
sys.exit('File missing: '+fname)
f_read=pd.read_excel(fname, sheet_name='Valores')
#Reading columns
Magnitude=f_read['Magnitude (Mw)'].tolist()
BP1=f_read['BP1'].tolist()
BP2=f_read['BP2'].tolist()
D1=f_read['Distance1'].tolist()
D2=f_read['Distance2'].tolist()
#Building lists
Feq_Mag=[BP1, BP2]
for i in range(0, 23):
Feq_Mag0=[x[i] for x in Feq_Mag]
D_Mag=[D1, D2]
D_Mag=[x[0] for x in D_Mag]
#Plotting attributes
#---Frequency plot
fig=plt.figure()
Freq_plot=fig.add_subplot(121)
Freq_plot.set_xlabel(u'Frequency (Hz)', fontsize=6)
Freq_plot.set_ylabel(u'Magnitude (Mw)', fontsize=6)
Freq_plot.plot(BP_Mag, c='crimson', linewidth=2.5)
plt.show()
new_dir=os.chdir(catdir)
fig.savefig('Ranges_Distribution.png')
plt.close('all')
Actual output:
A single line is plotted and it does not even correspond to the magnitude value. It is plotted at base level.
Thank you for your time and help.
Right, so if I understand correctly: you have one set of x-values and two sets of corresponding y-values (let's call them y1 and y2). You want to take these values and draw a line from y1 to y2 for each x. In that case, you want to run a for-loop. Written as a 2-D list, and using only lists and matplotlib.pyplot, this is what I came up with.
import matplotlib.pyplot as plt
y = [[1, 2, 3, 4, 5, 6],
[2, 4, 6, 8, 10, 12]]
x = [1, 2, 3, 4, 5, 6]
some_figure = plt.figure()
some_subplot = some_figure.add_subplot('111')
for i in range(len(x)):
some_subplot.plot([x[i], x[i]], [y[0][i], y[1][i]])
plt.show()

How to visualize a list of strings on a colorbar in matplotlib

I have a dataset like
x = 3,4,6,77,3
y = 8,5,2,5,5
labels = "null","exit","power","smile","null"
Then I use
from matplotlib import pyplot as plt
plt.scatter(x,y)
colorbar = plt.colorbar(labels)
plt.show()
to make a scatter plot, but cannot make colorbar showing labels as its colors.
How to get this?
I'm not sure, if it's a good idea to do that for scatter plots in general (you have the same description for different data points, maybe just use some legend here?), but I guess a specific solution to what you have in mind, might be the following:
from matplotlib import pyplot as plt
# Data
x = [3, 4, 6, 77, 3]
y = [8, 5, 2, 5, 5]
labels = ('null', 'exit', 'power', 'smile', 'null')
# Customize colormap and scatter plot
cm = plt.cm.get_cmap('hsv')
sc = plt.scatter(x, y, c=range(5), cmap=cm)
cbar = plt.colorbar(sc, ticks=range(5))
cbar.ax.set_yticklabels(labels)
plt.show()
This will result in such an output:
The code combines this Matplotlib demo and this SO answer.
Hope that helps!
EDIT: Incorporating the comments, I can only think of some kind of label color dictionary, generating a custom colormap from the colors, and before plotting explicitly grabbing the proper color indices from the labels.
Here's the updated code (I added some additional colors and data points to check scalability):
from matplotlib import pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
# Color information; create custom colormap
label_color_dict = {'null': '#FF0000',
'exit': '#00FF00',
'power': '#0000FF',
'smile': '#FF00FF',
'addon': '#AAAAAA',
'addon2': '#444444'}
all_labels = list(label_color_dict.keys())
all_colors = list(label_color_dict.values())
n_colors = len(all_colors)
cm = LinearSegmentedColormap.from_list('custom_colormap', all_colors, N=n_colors)
# Data
x = [3, 4, 6, 77, 3, 10, 40]
y = [8, 5, 2, 5, 5, 4, 7]
labels = ('null', 'exit', 'power', 'smile', 'null', 'addon', 'addon2')
# Get indices from color list for given labels
color_idx = [all_colors.index(label_color_dict[label]) for label in labels]
# Customize colorbar and plot
sc = plt.scatter(x, y, c=color_idx, cmap=cm)
c_ticks = np.arange(n_colors) * (n_colors / (n_colors + 1)) + (2 / n_colors)
cbar = plt.colorbar(sc, ticks=c_ticks)
cbar.ax.set_yticklabels(all_labels)
plt.show()
And, the new output:
Finding the correct middle point of each color segment is (still) not good, but I'll leave this optimization to you.

Why is Python matplot not starting from the point where my Data starts [duplicate]

So currently learning how to import data and work with it in matplotlib and I am having trouble even tho I have the exact code from the book.
This is what the plot looks like, but my question is how can I get it where there is no white space between the start and the end of the x-axis.
Here is the code:
import csv
from matplotlib import pyplot as plt
from datetime import datetime
# Get dates and high temperatures from file.
filename = 'sitka_weather_07-2014.csv'
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
#for index, column_header in enumerate(header_row):
#print(index, column_header)
dates, highs = [], []
for row in reader:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
dates.append(current_date)
high = int(row[1])
highs.append(high)
# Plot data.
fig = plt.figure(dpi=128, figsize=(10,6))
plt.plot(dates, highs, c='red')
# Format plot.
plt.title("Daily high temperatures, July 2014", fontsize=24)
plt.xlabel('', fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis='both', which='major', labelsize=16)
plt.show()
There is an automatic margin set at the edges, which ensures the data to be nicely fitting within the axis spines. In this case such a margin is probably desired on the y axis. By default it is set to 0.05 in units of axis span.
To set the margin to 0 on the x axis, use
plt.margins(x=0)
or
ax.margins(x=0)
depending on the context. Also see the documentation.
In case you want to get rid of the margin in the whole script, you can use
plt.rcParams['axes.xmargin'] = 0
at the beginning of your script (same for y of course). If you want to get rid of the margin entirely and forever, you might want to change the according line in the matplotlib rc file:
axes.xmargin : 0
axes.ymargin : 0
Example
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
tips.plot(ax=ax1, title='Default Margin')
tips.plot(ax=ax2, title='Margins: x=0')
ax2.margins(x=0)
Alternatively, use plt.xlim(..) or ax.set_xlim(..) to manually set the limits of the axes such that there is no white space left.
If you only want to remove the margin on one side but not the other, e.g. remove the margin from the right but not from the left, you can use set_xlim() on a matplotlib axes object.
import seaborn as sns
import matplotlib.pyplot as plt
import math
max_x_value = 100
x_values = [i for i in range (1, max_x_value + 1)]
y_values = [math.log(i) for i in x_values]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
sn.lineplot(ax=ax1, x=x_values, y=y_values)
sn.lineplot(ax=ax2, x=x_values, y=y_values)
ax2.set_xlim(-5, max_x_value) # tune the -5 to your needs

Plotting a chart a plot in which the Y text data and X numeric data from dictionary. Matplotlib & Python 3 [duplicate]

I can create a simple columnar diagram in a matplotlib according to the 'simple' dictionary:
import matplotlib.pyplot as plt
D = {u'Label1':26, u'Label2': 17, u'Label3':30}
plt.bar(range(len(D)), D.values(), align='center')
plt.xticks(range(len(D)), D.keys())
plt.show()
But, how do I create curved line on the text and numeric data of this dictionarie, I do not know?
ΠΆ_OLD = {'10': 'need1', '11': 'need2', '12': 'need1', '13': 'need2', '14': 'need1'}
Like the picture below
You may use numpy to convert the dictionary to an array with two columns, which can be plotted.
import matplotlib.pyplot as plt
import numpy as np
T_OLD = {'10' : 'need1', '11':'need2', '12':'need1', '13':'need2','14':'need1'}
x = list(zip(*T_OLD.items()))
# sort array, since dictionary is unsorted
x = np.array(x)[:,np.argsort(x[0])].T
# let second column be "True" if "need2", else be "False
x[:,1] = (x[:,1] == "need2").astype(int)
# plot the two columns of the array
plt.plot(x[:,0], x[:,1])
#set the labels accordinly
plt.gca().set_yticks([0,1])
plt.gca().set_yticklabels(['need1', 'need2'])
plt.show()
The following would be a version, which is independent on the actual content of the dictionary; only assumption is that the keys can be converted to floats.
import matplotlib.pyplot as plt
import numpy as np
T_OLD = {'10': 'run', '11': 'tea', '12': 'mathematics', '13': 'run', '14' :'chemistry'}
x = np.array(list(zip(*T_OLD.items())))
u, ind = np.unique(x[1,:], return_inverse=True)
x[1,:] = ind
x = x.astype(float)[:,np.argsort(x[0])].T
# plot the two columns of the array
plt.plot(x[:,0], x[:,1])
#set the labels accordinly
plt.gca().set_yticks(range(len(u)))
plt.gca().set_yticklabels(u)
plt.show()
Use numeric values for your y-axis ticks, and then map them to desired strings with plt.yticks():
import matplotlib.pyplot as plt
import pandas as pd
# example data
times = pd.date_range(start='2017-10-17 00:00', end='2017-10-17 5:00', freq='H')
data = np.random.choice([0,1], size=len(times))
data_labels = ['need1','need2']
fig, ax = plt.subplots()
ax.plot(times, data, marker='o', linestyle="None")
plt.yticks(data, data_labels)
plt.xlabel("time")
Note: It's generally not a good idea to use a line graph to represent categorical changes in time (e.g. from need1 to need2). Doing that gives the visual impression of a continuum between time points, which may not be accurate. Here, I changed the plotting style to points instead of lines. If for some reason you need the lines, just remove linestyle="None" from the call to plt.plot().
UPDATE
(per comments)
To make this work with a y-axis category set of arbitrary length, use ax.set_yticks() and ax.set_yticklabels() to map to y-axis values.
For example, given a set of potential y-axis values labels, let N be the size of a subset of labels (here we'll set it to 4, but it could be any size).
Then draw a random sample data of y values and plot against time, labeling the y-axis ticks based on the full set labels. Note that we still use set_yticks() first with numerical markers, and then replace with our category labels with set_yticklabels().
labels = np.array(['A','B','C','D','E','F','G'])
N = 4
# example data
times = pd.date_range(start='2017-10-17 00:00', end='2017-10-17 5:00', freq='H')
data = np.random.choice(np.arange(len(labels)), size=len(times))
fig, ax = plt.subplots(figsize=(15,10))
ax.plot(times, data, marker='o', linestyle="None")
ax.set_yticks(np.arange(len(labels)))
ax.set_yticklabels(labels)
plt.xlabel("time")
This gives the exact desired plot:
import matplotlib.pyplot as plt
from collections import OrderedDict
T_OLD = {'10' : 'need1', '11':'need2', '12':'need1', '13':'need2','14':'need1'}
T_SRT = OrderedDict(sorted(T_OLD.items(), key=lambda t: t[0]))
plt.plot(map(int, T_SRT.keys()), map(lambda x: int(x[-1]), T_SRT.values()),'r')
plt.ylim([0.9,2.1])
ax = plt.gca()
ax.set_yticks([1,2])
ax.set_yticklabels(['need1', 'need2'])
plt.title('T_OLD')
plt.xlabel('time')
plt.ylabel('need')
plt.show()
For Python 3.X the plotting lines needs to explicitly convert the map() output to lists:
plt.plot(list(map(int, T_SRT.keys())), list(map(lambda x: int(x[-1]), T_SRT.values())),'r')
as in Python 3.X map() returns an iterator as opposed to a list in Python 2.7.
The plot uses the dictionary keys converted to ints and last elements of need1 or need2, also converted to ints. This relies on the particular structure of your data, if the values where need1 and need3 it would need a couple more operations.
After plotting and changing the axes limits, the program simply modifies the tick labels at y positions 1 and 2. It then also adds the title and the x and y axis labels.
Important part is that the dictionary/input data has to be sorted. One way to do it is to use OrderedDict. Here T_SRT is an OrderedDict object sorted by keys in T_OLD.
The output is:
This is a more general case for more values/labels in T_OLD. It assumes that the label is always 'needX' where X is any number. This can readily be done for a general case of any string preceding the number though it would require more processing,
import matplotlib.pyplot as plt
from collections import OrderedDict
import re
T_OLD = {'10' : 'need1', '11':'need8', '12':'need11', '13':'need1','14':'need3'}
T_SRT = OrderedDict(sorted(T_OLD.items(), key=lambda t: t[0]))
x_val = list(map(int, T_SRT.keys()))
y_val = list(map(lambda x: int(re.findall(r'\d+', x)[-1]), T_SRT.values()))
plt.plot(x_val, y_val,'r')
plt.ylim([0.9*min(y_val),1.1*max(y_val)])
ax = plt.gca()
y_axis = list(set(y_val))
ax.set_yticks(y_axis)
ax.set_yticklabels(['need' + str(i) for i in y_axis])
plt.title('T_OLD')
plt.xlabel('time')
plt.ylabel('need')
plt.show()
This solution finds the number at the end of the label using re.findall to accommodate for the possibility of multi-digit numbers. Previous solution just took the last component of the string because numbers were single digit. It still assumes that the number for plotting position is the last number in the string, hence the [-1]. Again for Python 3.X map output is explicitly converted to list, step not necessary in Python 2.7.
The labels are now generated by first selecting unique y-values using set and then renaming their labels through concatenation of the strings 'need' with its corresponding integer.
The limits of y-axis are set as 0.9 of the minimum value and 1.1 of the maximum value. Rest of the formatting is as before.
The result for this test case is:

Resources