Error bar on graph only visible in one direction - errorbar

Hej
I am trying to make a plot with specific range of error for a specific value in a data set. However, I can not get the error bar to follow the range I specify.
The error bar is only visible in one direction, as shown in figur.
I have tried the code below:
df1.reset_index().plot(y='Dybde', x ='Value' ,figsize=(3,5) )
plt.gca().invert_yaxis()
plt.suptitle(' Hydrogensulfid')
plt.xlabel('ug/L')
plt.ylabel('Dybde [m]')
plt.legend().remove()
x_errormin = [0, 0,0,0, 10,15]
x_errormax = [0, 0,83,226,1566, 10250]
x_error = [x_errormin, x_errormax]
plt.errorbar(df1.Value,df1.index, xerr=x_error, fmt='r^')
plt.show()
Image of plot, with error only in one direction

Related

How to plot hyperparameter tuning results?

I have the result of a grid search as follows.
"trial","learning_rate","batch_size","accuracy","f1","loss"
1,0.000007,70,0.789,0.862,0.467
2,0.000008,100,0.710,0.822,0.563
3,0.000008,90,0.823,0.874,0.524
4,0.000007,90,0.833,0.878,0.492
5,0.000009,110,0.715,0.825,0.509
6,0.000006,90,0.883,0.885,0.932
7,0.000009,80,0.850,0.895,0.408
8,0.000006,110,0.683,0.812,0.593
9,0.000005,90,0.769,0.848,0.468
10,0.000005,80,0.816,0.868,0.462
11,0.000003,100,0.852,0.901,0.448
12,0.000004,100,0.705,0.818,0.512
13,0.000003,110,0.708,0.818,0.567
14,0.000002,90,0.683,0.812,0.552
15,0.000008,100,0.791,0.857,0.438
16,0.000006,110,0.683,0.812,0.604
17,0.000007,70,0.693,0.816,0.592
18,0.000005,110,0.830,0.883,0.892
19,0.000004,90,0.693,0.816,0.591
20,0.000008,70,0.696,0.818,0.570
I want to create a plot more or less similar to this using matplotlib. I know this is plotted using weights and biases but I cannot use that.
Though I don't care for the inference part. I just want the plot. I've been trying to do this using twinx but have not been successful. This is what I have so far.
from csv import DictReader
import matplotlib.pyplot as plt
trials = list(DictReader(open("hparams_trials.csv")))
trials = {f"trial_{trial['trial']}": [int(trial["batch_size"]),
float(trial["f1"]),
float(trial["loss"]),
float(trial["accuracy"]),
float(trial["learning_rate"])] for trial in trials}
items = ["batch_size", "f1", "loss", "accuracy", "learning_rate"]
host_y_values_index = 0
parts_y_values_indexes = [1, 2, 3, 4]
fig, host = plt.subplots(figsize=(8, 5)) # (width, height) in inches
fig.dpi = 300. # Figure resolution
# Removing extra spines
host.spines.top.set_visible(False)
host.spines.bottom.set_visible(False)
host.spines.right.set_visible(False)
# Creating subplots which share the same x axis.
parts = {index: host.twinx() for index in parts_y_values_indexes}
# Setting the limits of the host plot
host.set_xlim(0, len(trials["trial_1"]))
host.set_ylim(min([i[host_y_values_index] for i in trials.values()]),
max([i[host_y_values_index] for i in trials.values()]))
# Removing the extra spines from the other plots and setting y limits
for part in parts_y_values_indexes:
parts[part].spines.top.set_visible(False)
parts[part].spines.bottom.set_visible(False)
parts[part].set_ylim(min([trial[part] for trial in trials.values()]),
max([trial[part] for trial in trials.values()]))
# Colors of the trials
colors = ["gold", "lightcoral", "maroon", "springgreen", "cyan", "steelblue", "darkmagenta", "fuchsia", "crimson",
"lime", "mediumblue", "cadetblue", "dodgerblue", "olivedrab", "sandybrown", "bisque", "orangered", "black",
"rosybrown", "chocolate"]
# The plots
plots = []
# Plotting the trials. This is where I'm having problems with.
for index, trial in enumerate(trials):
plots.append(host.plot(items, trials[trial], color=colors[index], label=trial)[0])
# Creating the legend
host.legend(handles=plots, fancybox=True, loc='right', facecolor="snow", bbox_to_anchor=(1.02, 0.495), framealpha=1)
# Defining the positions of the spines.
spines_positions = [-104.85 * i for i in parts_y_values_indexes]
# Repositioning the spines
for part in parts_y_values_indexes:
parts[part].spines['right'].set_position(('outward', spines_positions[-part]))
# Adjust spacings around fig
fig.tight_layout()
host.grid(True)
# This is better than the one above but it appears on top of the legend.
# plt.grid(True)
plt.draw()
plt.show()
I'm having several problems with that code. First, I cannot place each value of a single trial based on a different spine and then connect them to one another. What I mean is that each trial has a batch size, an f1, a loss, accuracy and a learning rate. Each of those need to be plotted based on their own spine while connected to each other in that order. However, I cannot plot them based their dedicated spines and then connect them to one another to have a line plot per trial. Accordingly, for now I have placed everything in the host plot but I know that is wrong and have no idea what the correct approach is. Second problem, the ticks of the learning rate change. It gets shown as a range of 2 to 9 and then a 1e-6 appears at the top. I want to keep the original value. Third problem is probably part of the second one. The 1e-6 appears at the top right above the legend rather than above the spine for some reason. I'm struggling with resolving all three of these problems and would appreciate any help anyone can provide. If what I am doing is totally wrong, please help me in finding the correct solution. I'm somewhat going in circles here and haven't been able to find any working solutions so far.

Is it possible to extract the default tick locations from the primary axis and pass it to a secondary access with matplotlib?

When making a plot with with
fig, ax = plt.subplots()
x=[1,2,3,4,5,6,7,8,9,10]
y=[1,2,3,4,5,6,7,8,9,10]
ax.plot(x,y)
plt.show()
matplotlib will determine the tick spacing/location and value of the tick. Is there are way to extract this automatic spacing/location AND the value? I want to do this so i can pass it to
set_xticks()
for my secondary axis (using twiny()) then use set_ticklabels() with a custom label. I realise I could use secondary axes giving both a forward and inverse function however providing an inverse function is not feasible for the goal of my code.
So in the image below, the ticks are only showing at 2,4,6,8,10 rather than all the values of x and I want to somehow extract these values and position so I can pass to set_xticks() and then change the tick labels (on a second x axis created with twiny).
UPDATE
When using the fix suggested it works well for the x axis. However, it does not work well for the y-axis. For the y-axis it seems to take the dataset values for the y ticks only. My code is:
ax4 = ax.twinx()
ax4.yaxis.set_ticks_position('left')
ax4.yaxis.set_label_position('left')
ax4.spines["left"].set_position(("axes", -0.10))
ax4.set_ylabel(self.y_2ndary_label, fontweight = 'bold')
Y = ax.get_yticks()
ax4.yaxis.set_ticks(Y)
ax4.yaxis.set_ticklabels( Y*Y )
ax4.set_ylim(ax.get_ylim())
fig.set_size_inches(8, 8)
plt.show()
but this gives me the following plot. The plot after is the original Y axis. This is not the case when I do this on the x-axis. Any ideas?
# From "get_xticks" Doc: The locations are not clipped to the current axis limits
# and hence may contain locations that are not visible in the output.
current_x_ticks = ax.get_xticks()
current_x_limits = ax.get_xlim()
ax.set_yticks(current_x_ticks) # Use this before "set_ylim"
ax.set_ylim(current_x_limits)
plt.show()

Pandas Series boolean maps and plotting

I am just trying to up my understanding of plotting Pandas Series data using Booleans to mask out values I don't want. I am not sure that what I have is the correct or efficient way to do it.
Don't get me wrong, I do get the chart I am after but are my assumptions on the syntax correct?
All I want to do is plot the non zero values on my chart. I have not formatted the charts as I would normally as this was just a test of Booleans and masking data and not for creating report grade charts.
If I masked this as a Pandas DataFrame I would do the following if df1 were my DataFrame.
I understand this and it makes sense that the df1[mask] returns my values as required
# Plot our graph with only items that are non-zero
fig = px.bar(df1[mask], x = 'Animals', y = 'Count')
fig.show()
Doing it as a Pandas Series
This is the snippet that creates the graph I require
# Plot our graph with only items that are non-zero
fig = px.bar(sf, x = sf.index[sf_mask], y = sf[sf_mask])
fig.show()
After my initial test with adding my mask to sf and getting an error. I deduced that I needed to add the mask against the x and y parameters. I take it this is because a Series is just a single column and the index is set as my "animals". Therefore by mapping the sf.index[sf_mask] I get the returned animals in the index and sf[sf_mask] returns me the values. failure to add either one would give a "ValueError" stating that the arguments should have the same length.
Here is what I did to test my workings
My initial imports and setting up Plotly as my plotting backend
import pandas as pd
import plotly.express as px
# Set our plotting backend to Plotly
pd.options.plotting.backend = "plotly"
I just created a test dataset from a dictionary
animals = {'rabbits' : 1,
'dogs' : 3,
'cats' : 0,
'ferrets' : 3,
'horses' : 8,
'goldfish' : 0,
'guinea_pigs' : 2,
'hamsters' : 6,
'mice' : 3,
'rats' : 0
}
Then converted it to a pandas Series
sf = pd.Series(animals)
I then create my boolean mask to mask out all our non-Zero entries on our Pandas Series
sf_mask = sf != 0
And if I then view the mask I can see I only get non zero values which is exactly what I am looking for.
sf[sf_mask]
Which outputs my non-zero items in my series.
rabbits 1
dogs 3
ferrets 3
horses 8
guinea_pigs 2
hamsters 6
mice 3
dtype: int64
If I plot without my Boolean mask 'sf_mask' using the following syntax I get my complete Pandas Series charted
# Plot our Series showing all items
fig = px.bar(sf, x = sf.index, y = sf)
fig.show()
Which outputs the following chart
If I plot with my Boolean mask 'sf_mask' using the following syntax I get the chart I want which excludes the gaps with zero value items.
# Plot our graph with only items that are non-zero
fig = px.bar(sf, x = sf.index[sf_mask], y = sf[sf_mask])
fig.show()
Which outputs the correct chart.
Your understanding of booleans and masking is correct.
You can simplify your syntax a little though: if you take a look at the plotly.express.bar documentation, you'll see that the arguments 'x' and 'y' are optional. You don't need to pass 'x' or 'y' because by default plotly.express will create the bars using the index of the Series as x and the values of the Series as y. You can also pass the masked series in place of the entire series.
For example, this will produce the same bar chart:
fig = px.bar(sf[sf>0])
fig.update_layout(showlegend=False)

Plot multi label (values) with multi bar chart

I've this issue I hope you can help.
I've this data :
to_stack = pd.DataFrame([['CHILDREN', 0.42806248287201976, 0.0],
['AMT_TOTAL', 165006, 179357],
['SAL', 582065, 703917.0],
['ANNUITY', 26851, 28416]], columns=('Variable','Id','Mean'))
When I run the code below
to_stack.plot.barh(x='Variable', figsize=(12,8), width = .9)
## First Loop for first Variable "ID"
for index,value in enumerate(to_stack['Id']):
plt.text(value, index, str(value), va='top', )
## Second Loop for Second Variable
for i,val in enumerate(to_stack['Mean']):
plt.text(val, i, str(val), va='bottom' )
I get this result
The Values in each bar ar not well centralized
I've tried several options in Matplotlib.plt.text (ha (center, left, right) , va (top, bottom, baseline) without good results, sometimes it's even worse, values are one on each other.
How can we get the values aligned with the bars ?
Any ideas are really welcome
It's better to extract information from the bars and annotate. That way, you have more control of how the text appears in relative to the bars:
fig, ax = plt.subplots(figsize=(12,8),)
to_stack.plot.barh(x='Variable', width = .9, ax=ax)
for patch in ax.patches:
w, h = patch.get_width(), patch.get_height()
y = patch.get_y()
ax.text(w + -0.1,h/2+y, f'{w:.3f}', va='center')
Output:

Control View in Bokeh State Map

I am trying to plot a state with county-level detail using Bokeh and want to be able to control the portion of the state that is visible. I've seen some users suggest deleting counties, but I want to have a rectangular area based on lat/long parameters that controls what portion is shown. Is this possible?
You can control the what is visible on the plot by specifying the x and y ranges. These can be specified either directly in the figure command or by setting the respective attributes using a Range1D. Bokeh will then allow interactive panning respecting while keeping the dimensions of the initial visible area.
If you want to then prevent the user from modifying the visible portion of the plot, you can simply create the figure without any zoom or resize tools.
Here's an example illustrating the above.
from bokeh.plotting import figure, output_file, show
from bokeh.models import Range1d
output_file("title.html")
# Specify tools for the plot
tools = "pan, reset, save"
# create a new plot with a range set with a tuple
p = figure(plot_width=400, plot_height=400,
x_range=(0, 20), tools=tools)
# set a range using a Range1d
p.y_range = Range1d(0, 15)
p.circle([1, 2, 3, 4, 5], [2, 5, 8, 2, 7], size=10)
show(p)

Resources