Matplotlib legend in increasing order - python-3.x

I have text files named as 5.txt, 10.txt, 15.txt, 20.txt but when I read the files with glob module and use fname variable in the legend I get disorganized legend data.
for fname in glob("*.txt"):
potential, current_density = np.genfromtxt(fname, unpack=True)
current_density = current_density*1e6
ax = plt.gca()
ax.get_yaxis().get_major_formatter().set_useOffset(False)
plt.plot(potential,current_density, label=fname[0:-4])
plt.legend(loc=4,prop={'size':12},
ncol=1, shadow=True, fancybox=True,
title = "Scan rate (mV/s)")
How can I plot and give the corresponding label to the data with in increasing order?

Just to provide yet another method, which does not require to change anything in the plotting part of the script:
handles, labels = plt.gca().get_legend_handles_labels()
handles, labels = zip(*[ (handles[i], labels[i]) for i in sorted(range(len(handles)), key=lambda k: list(map(int,labels))[k])] )
plt.legend(handles, labels, loc=4, ...)

Method 1 (Recommended)
You will need to sort and display the legend yourself. plt.legend takes a list of lines and a list of strings as the first two optional positional arguments. You can maintain a list of the items you need, sort it into the order you want, and pass the portions you want over to legend.
ax = plt.gca()
legend_items = []
for fname in glob("*.txt"):
potential, current_density = np.genfromtxt(fname, unpack=True)
current_density *= 1e6
line, = ax.plot(potential, current_density)
name = fname[0:-4]
legend_items.append((int(name), line, name))
legend_items.sort()
ax.get_yaxis().get_major_formatter().set_useOffset(False)
ax.legend([x[1] for x in legend_items], [x[2] for x in legend_items],
loc=4, prop={'size':12}, ncol=1, shadow=True,
fancybox=True, title = "Scan rate (mV/s)")
Major additions are marked in bold, while minor style changes that can probably be ignored are marked in italics.
Major additions include the accumulation of the items for the legend. I use tuples for each item because a list of tuples is automatically sorted by the first element first. The comma in line, = ax.plot... is necessary because it triggers argument unpacking on the list that plot returns. An alternative would be to do line = ax.plot(...)[0]. The file name is no longer added as an explicit label to the data.
Among the minor changes, I switched to using ax.plot and ax.legend instead of plt.plot and plt.legend. This is the object oriented part of Matplotlib's API and it makes things a little clearer. Also, you don't have to keep calling gca() to get the reference over and over this way. Also, set_useoffset only needs to be called only once, not inside the loop.
Method 2
Another way to approach the problem would be to pre-sort the file names before processing them, so that they appear in the correct order in your legend:
import os
file_list = os.listdir('.')
file_list = [x for x in file_list if x.endswith('.txt')]
file_list.sort(key=lambda x: int(x[0:-4]))
for fname in file_list:
...
You will have to do the name filtering yourself, but it is not especially difficult. The sorting key is just the number. Also, you will note that I got tired of doing the custom fancy formatting for this update :)

Dont know if this is so relevant but I ended up here anyway - I found I didnt need the middle line - If you want 2 columns this worked for me;
handles, labels = plt.gca().get_legend_handles_labels()
plt.legend(handles, labels, loc=4,
ncol=2, shadow=True, title="Legend", fancybox=True)

Related

Sorting algorithm visulizer: how to highlight the current element being accessed and compared in the algorithm?

So im trying to write a sorting algorithm visualizer. Code bellow. I am basically using matplotlib to plot the figure. My problem is that i want to also highlight the current element in the array being accessed, compared, and swaped. all of my attempts have failed at this. Please do also let me know if there is a better way of writing a visulizer in python. I have seen some tutorials using pygame but wanted to stick to basics. Also when the program runs till the end and everthing is sorted the plot goes blank. Is this because of the plt.clf() command and is there a way for the sorted plot to not close. Thanks!!!
from matplotlib import pyplot as plt
import numpy as np
# generate sudo-random list of numbers
lst = np.random.randint(0, 100, 20)
# x values for the bar plot
x = range(0, len(lst))
def insertion_sort(lst):
# loop through the list
# incrementally check which index to the left should i be placed in
for i in range(1, len(lst)):
while lst[i-1] > lst[i] and i>0:
lst[i], lst[i-1] = lst[i-1], lst[i]
i = i-1
# plot
plt.bar(x,lst)
plt.pause(0.1)
plt.clf()
plt.show()
return lst
print(lst)
print(insertion_sort(lst))
So the solution i came up with for this problem was to create a second list containing the current i and i-1 indexes and basically plot a second barchart over the main one set to a different color. Bad solution and failed indeed. Another idea i tried was to pass a conditional argument for the color paramater of plt.bar()
colors = ['red' if lst[i-1]>lst[i] else for element in lst 'blue']
plt.bar(x, lst, color=colors)
This did not work aswell. dont know if am on the right track and just need to keep at it or this is whole setup is futile to begin with. thank you for your time!!

open and plot several data file on same plot Python

Newbie here, first question.
I have several data files, that I want to open, get the relevant data (x and y) and plot on the same plot.
I know how to do it if I type out a plot statement for each of them, but what I would want to create is a single function or script that takes the filenames as input, extracts the data (this part depends on the type of file, but I think I know how to do it) and then creates one single plot with the different datasets. It should be pretty basic, but all my attempts return a plot for each file.
I think that my problem is that I have not understood how the whole ax, fig, gca, plot loop works, as I have been learning mostly by adapting things and doing.
So far I have created a for loop that opens each file, gets the data and stores it in a dataframe (a dataframe per file) then uses a plt.plot to plot, and then out of the loop, I have a plt.gca() that in my intentions would get things together to then modify the plot, add stuff to it and save it. I have also tried changing the position of the gca and using ax and fig, playing around with a few tutorials, but never with satisfying results.
I get different kinds of errors, depending on the different iterations of the script, here is one of my attempts. If there's an electrochemist among you they might recognize the datatype :) but the datatype should not be important.
**EDIT: I modified the script, as it had a couple of errors, the current versions returns an empty plot.
**
the current version returns an empty plot, the dataframe is created properly, from what I can see
files = ['file1.i2b', 'file2.i2b']
colors = []
fig_name = ''
file_type = 'i2b'
norm = []
if len(colors) != len(files):
l = len(files)
col_list = ['b', 'g', 'r', 'c', 'm', 'y', 'k']
color_list = col_list[0:l]
if len(norm)!= len(files):
norm = [1]*len(files)
if file_type == 'i2b':
for filename, norm_factor, col in zip (files, norm, color_list):
flnm1 = os.path.splitext(filename)[0]
data_xrd = pd.read_csv(filename, sep=(' '), decimal = '.', skiprows =10,
header= None, names =['Freq','Real_part', 'Imm_part'])
data_xrd['norm_Imm_part'] = (0-data_xrd['Imm_part'])*norm_factor
data_xrd['norm_Re_part'] = data_xrd['Real_part']*norm_factor
plt.plot(x=data_xrd['norm_Re_part'], y=data_xrd['norm_Imm_part'],
legend=flnm1, style='-', color = col)
#plt.show
plt.gca()
#plt.axhline(y=0, color='k', linestyle='--')
#plt.set_xlabel('Z_real [Ohm]')
#plt.set_ylabel('Z_imm [Ohm]')
#plt.set_aspect('equal')
plt.savefig(fig_name + '.png')
Now, it might be better to split the data extraction to a different function, so that the plotting function is more flexible and can be paired with different kinds of data input, but at the moment I'd just like to understand how to use plot multiple files on a single plot simply by using a list of their names as input, in order to facilitate the grouping and plotting of a lot of datafiles.
Thanks for the help and please let me know how to improve my question!

How to label line chart with column from pandas dataframe (from 3rd column values)?

I have a data set I filtered to the following (sample data):
Name Time l
1 1.129 1G-d
1 0.113 1G-a
1 3.374 1B-b
1 3.367 1B-c
1 3.374 1B-d
2 3.355 1B-e
2 3.361 1B-a
3 1.129 1G-a
I got this data after filtering the data frame and converting it to CSV file:
# Assigns the new data frame to "df" with the data from only three columns
header = ['Names','Time','l']
df = pd.DataFrame(df_2, columns = header)
# Sorts the data frame by column "Names" as integers
df.Names = df.Names.astype(int)
df = df.sort_values(by=['Names'])
# Changes the data to match format after converting it to int
df.Time=df.Time.astype(int)
df.Time = df.Time/1000
csv_file = df.to_csv(index=False, columns=header, sep=" " )
Now, I am trying to graph lines for each label column data/items with markers.
I want the column l as my line names (labels) - each as a new line, Time as my Y-axis values and Names as my X-axis values.
So, in this case, I would have 7 different lines in the graph with these labels: 1G-d, 1G-a, 1B-b, 1B-c, 1B-d, 1B-e, 1B-a.
I have done the following so far which is the additional settings, but I am not sure how to graph the lines.
plt.xlim(0, 60)
plt.ylim(0, 18)
plt.legend(loc='best')
plt.show()
I used sns.lineplot which comes with hue and I do not want to have name for the label box. Also, in that case, I cannot have the markers without adding new column for style.
I also tried ply.plot but in that case, I am not sure how to have more lines. I can only give x and y values which create only one line.
If there's any other source, please let me know below.
Thanks
The final graph I want to have is like the following but with markers:
You can apply a few tweaks to seaborn's lineplot. Using some created data since your sample isn't really long enough to demonstrate:
# Create data
np.random.seed(2019)
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = pd.DataFrame({'Name':np.repeat(range(1,11), 10),
'Time':np.random.randn(100).cumsum(),
'l':np.random.choice(categories, 100)
})
# Plot
sns.lineplot(data=df, x='Name', y='Time', hue='l', style='l', dashes=False,
markers=True, ci=None, err_style=None)
# Temporarily removing limits based on sample data
#plt.xlim(0, 60)
#plt.ylim(0, 18)
# Remove seaborn legend title & set new title (if desired)
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], title='New Title', loc='best')
plt.show()
To apply markers, you have to specify a style variable. This can be the same as hue.
You likely want to remove dashes, ci, and err_style
To remove the seaborn legend title, you can get the handles and labels, then re-add the legend without the first handle and label. You can also specify the location here and set a new title if desired (or just remove title=... for no title).
Edits per comments:
Filtering your data to only a subset of level categories can be done fairly easily via:
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = df.loc[df['l'].isin(categories)]
markers=True will fail if there are too many levels. If you are only interested in marking points for aesthetic purposes, you can simply multiply a single marker by the number of categories you are interested in (which you have already created to filter your data to categories of interest): markers='o'*len(categories).
Alternatively, you can specify a custom dictionary to pass to the markers argument:
points = ['o', '*', 'v', '^']
mult = len(categories) // len(points) + (len(categories) % len(points) > 0)
markers = {key:value for (key, value)
in zip(categories, points * mult)}
This will return a dictionary of category-point combinations, cycling over the marker points specified until each item in categories has a point style.

Need help in creating a function to plot a Matplotlib GridSpec

I have a dataset with 80 variables. I am interested in creating a function that will automate the creation of a 20 X 4 GridSpec in Matplotlib. Each subplot would either contain a histogram or a barplot for each of the 80 variables in the data. As a first step, I successfully created two functions (I call them 'counts' and 'histogram') that contain the layout of the plot that I want. Both of them work when tested on individual variables. As a next step, I attempted to create a function that would take the column names, loop through a conditional to test whether the data type is an object or otherwise and call the right function based on the datatype as a new subplot. Here is the code that I have so far:
Creates list of coordinates we will need for subplot specification:
A = np.arange(21)
B = np.arange(4)
coords = []
for i in A:
for j in B:
coords.append([A[i], B[j]])
#Create the gridspec and layout the figure
import matplotlib.gridspec as gridspec
fig = plt.figure(figsize=(12,6))
gs = gridspec.GridSpec(2,4)
#Function that relies on what we've done above:
def grid(cols=['MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley']):
for i in cols:
for vals in coords:
if str(train[i].dtype) == 'object':
plt.subplot('gs'+str(vals))
counts(cols)
else:
plt.subplot('gs'+str(vals))
histogram(cols)
When attempted, this code returns an error:
ValueError: Single argument to subplot must be a 3-digit integer
For purposes of helping you visualize, what I am hoping to achieve, I attach the screen shot below, which was produced by the line by line coding (with my created helper functions) I am trying to avoid:
Can anyone help me figure out where I am going wrong? I would appreciate any advice. Thank you!
The line plt.subplot('gs'+str(vals)) cannot work; which is also what the error tells you.
As can be seen from the matplotlib GridSpec tutorial, it needs to be
ax = plt.subplot(gs[0, 0])
So in your case you may use the values from the list as
ax = plt.subplot(gs[vals[0], vals[1]])
Mind that you also need to make sure that the coords list must have the n*m elements, if the gridspec is defined as gs = gridspec.GridSpec(n,m).

Matplotlib - Stacked Bar Chart with ~1000 Bars

Background:
I'm working on a program to show a 2d cross section of 3d data. The data is stored in a simple text csv file in the format x, y, z1, z2, z3, etc. I take a start and end point and flick through the dataset (~110,000 lines) to create a line of points between these two locations, and dump them into an array. This works fine, and fairly quickly (takes about 0.3 seconds). To then display this line, I've been creating a matplotlib stacked bar chart. However, the total run time of the program is about 5.5 seconds. I've narrowed the bulk of it (3 seconds worth) down to the code below.
'values' is an array with the x, y and z values plus a leading identifier, which isn't used in this part of the code. The first plt.bar is plotting the bar sections, and the second is used to create an arbitrary floor of -2000. In order to generate a continuous looking section, I'm using an interval between each bar of zero.
import matplotlib.pyplot as plt
for values in crossSection:
prevNum = None
layerColour = None
if values != None:
for i in range(3, len(values)):
if values[i] != 'n':
num = float(values[i].strip())
if prevNum != None:
plt.bar(spacing, prevNum-num, width=interval, \
bottom=num, color=layerColour, \
edgecolor=None, linewidth=0)
prevNum = num
layerColour = layerParams[i].strip()
if prevNum != None:
plt.bar(spacing, prevNum+2000, width=interval, bottom=-2000, \
color=layerColour, linewidth=0)
spacing += interval
I'm sure there's a more efficient way to do this, but I'm new to Matplotlib and still unfamilar with its capabilities. The other main use of time in the code is:
plt.savefig('output.png')
which takes about a second, but I figure this is to be expected to save the file and I can't do anything about it.
Question:
Is there a faster way of generating the same output (a stacked bar chart or something that looks like one) by using plt.bar() better, or a different Matplotlib function?
EDIT:
I forgot to mention in the original post that I'm using Python 3.2.3 and Matplotlib 1.2.0
Leaving this here in case someone runs into the same problem...
While not exactly the same as using bar(), with a sufficiently large dataset (large enough that using bar() takes a few seconds) the results are indistinguishable from stackplot(). If I sort the data into layers using the method given by tcaswell and feed it into stackplot() the chart is created in 0.2 seconds, rather than 3 seconds.
EDIT
Code provided by tcaswell to turn the data into layers:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
It looks like you are drawing each bar, you can pass sequences to bar (see this example)
I think something like:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
ax = plt.gca()
spacing = interval*numpy.arange(len(accum_values[0]))
for data,color is zip(accum_values,layer_params):
ax.bar(spacing,data,bottom=bottom,color=color,linewidth=0,width=interval)
bottom += data
will be faster (because each call to bar creates one BarContainer and I suspect the source of your issues is you were creating one for each bar, instead of one for each layer).
I don't really understand what you are doing with the bars that have tops below their bottoms, so I didn't try to implement that, so you will have to adapt this a bit.

Resources