Matplotlib scatter - imshow offset - python-3.x

I am overlaying a scatter plot of points on an imshow 128 x 128 pixels. If you look closely here:
the objects do not always fall exactly on the center of the corresponding pixels. I tried different interpolations on imshow and origins for scatter, but nothing changed. So I thought I could overlay a grid to see how much this offset actually is:
and I noticed that the grid also falls exactly on the objects and not the center of the imshow pixels. The script for the above plot is:
fig = plt.figure(figsize=(15,8))
plt.imshow(counts_pre[:,:,slice_z],cmap='viridis',interpolation=None)
plt.scatter(j_index,i_index, s = 0.1, c = 'red', marker = 'o')
myInterval=1.
loc = matplotlib.ticker.MultipleLocator(base=myInterval)
plt.gca().xaxis.set_minor_locator(loc)
plt.gca().yaxis.set_minor_locator(loc)
plt.grid(which="both", linewidth=0.72,color="white",alpha=0.1)
plt.tick_params(which="minor", length=0)
plt.show()
Any ideas on why this offset exists and how I can fix it? Notice that the grid is not very homogeneous, i.e. some squares are rectangular.
Edit:
Upgrading to the newest matplotlib version did not resolve the
issue.
I created objects where the entries are non-zero, such that I know that the points should be perfectly aligned, but they still don't match up.

Related

How to plot hyperparameter tuning results?

I have the result of a grid search as follows.
"trial","learning_rate","batch_size","accuracy","f1","loss"
1,0.000007,70,0.789,0.862,0.467
2,0.000008,100,0.710,0.822,0.563
3,0.000008,90,0.823,0.874,0.524
4,0.000007,90,0.833,0.878,0.492
5,0.000009,110,0.715,0.825,0.509
6,0.000006,90,0.883,0.885,0.932
7,0.000009,80,0.850,0.895,0.408
8,0.000006,110,0.683,0.812,0.593
9,0.000005,90,0.769,0.848,0.468
10,0.000005,80,0.816,0.868,0.462
11,0.000003,100,0.852,0.901,0.448
12,0.000004,100,0.705,0.818,0.512
13,0.000003,110,0.708,0.818,0.567
14,0.000002,90,0.683,0.812,0.552
15,0.000008,100,0.791,0.857,0.438
16,0.000006,110,0.683,0.812,0.604
17,0.000007,70,0.693,0.816,0.592
18,0.000005,110,0.830,0.883,0.892
19,0.000004,90,0.693,0.816,0.591
20,0.000008,70,0.696,0.818,0.570
I want to create a plot more or less similar to this using matplotlib. I know this is plotted using weights and biases but I cannot use that.
Though I don't care for the inference part. I just want the plot. I've been trying to do this using twinx but have not been successful. This is what I have so far.
from csv import DictReader
import matplotlib.pyplot as plt
trials = list(DictReader(open("hparams_trials.csv")))
trials = {f"trial_{trial['trial']}": [int(trial["batch_size"]),
float(trial["f1"]),
float(trial["loss"]),
float(trial["accuracy"]),
float(trial["learning_rate"])] for trial in trials}
items = ["batch_size", "f1", "loss", "accuracy", "learning_rate"]
host_y_values_index = 0
parts_y_values_indexes = [1, 2, 3, 4]
fig, host = plt.subplots(figsize=(8, 5)) # (width, height) in inches
fig.dpi = 300. # Figure resolution
# Removing extra spines
host.spines.top.set_visible(False)
host.spines.bottom.set_visible(False)
host.spines.right.set_visible(False)
# Creating subplots which share the same x axis.
parts = {index: host.twinx() for index in parts_y_values_indexes}
# Setting the limits of the host plot
host.set_xlim(0, len(trials["trial_1"]))
host.set_ylim(min([i[host_y_values_index] for i in trials.values()]),
max([i[host_y_values_index] for i in trials.values()]))
# Removing the extra spines from the other plots and setting y limits
for part in parts_y_values_indexes:
parts[part].spines.top.set_visible(False)
parts[part].spines.bottom.set_visible(False)
parts[part].set_ylim(min([trial[part] for trial in trials.values()]),
max([trial[part] for trial in trials.values()]))
# Colors of the trials
colors = ["gold", "lightcoral", "maroon", "springgreen", "cyan", "steelblue", "darkmagenta", "fuchsia", "crimson",
"lime", "mediumblue", "cadetblue", "dodgerblue", "olivedrab", "sandybrown", "bisque", "orangered", "black",
"rosybrown", "chocolate"]
# The plots
plots = []
# Plotting the trials. This is where I'm having problems with.
for index, trial in enumerate(trials):
plots.append(host.plot(items, trials[trial], color=colors[index], label=trial)[0])
# Creating the legend
host.legend(handles=plots, fancybox=True, loc='right', facecolor="snow", bbox_to_anchor=(1.02, 0.495), framealpha=1)
# Defining the positions of the spines.
spines_positions = [-104.85 * i for i in parts_y_values_indexes]
# Repositioning the spines
for part in parts_y_values_indexes:
parts[part].spines['right'].set_position(('outward', spines_positions[-part]))
# Adjust spacings around fig
fig.tight_layout()
host.grid(True)
# This is better than the one above but it appears on top of the legend.
# plt.grid(True)
plt.draw()
plt.show()
I'm having several problems with that code. First, I cannot place each value of a single trial based on a different spine and then connect them to one another. What I mean is that each trial has a batch size, an f1, a loss, accuracy and a learning rate. Each of those need to be plotted based on their own spine while connected to each other in that order. However, I cannot plot them based their dedicated spines and then connect them to one another to have a line plot per trial. Accordingly, for now I have placed everything in the host plot but I know that is wrong and have no idea what the correct approach is. Second problem, the ticks of the learning rate change. It gets shown as a range of 2 to 9 and then a 1e-6 appears at the top. I want to keep the original value. Third problem is probably part of the second one. The 1e-6 appears at the top right above the legend rather than above the spine for some reason. I'm struggling with resolving all three of these problems and would appreciate any help anyone can provide. If what I am doing is totally wrong, please help me in finding the correct solution. I'm somewhat going in circles here and haven't been able to find any working solutions so far.

Is it possible to extract the default tick locations from the primary axis and pass it to a secondary access with matplotlib?

When making a plot with with
fig, ax = plt.subplots()
x=[1,2,3,4,5,6,7,8,9,10]
y=[1,2,3,4,5,6,7,8,9,10]
ax.plot(x,y)
plt.show()
matplotlib will determine the tick spacing/location and value of the tick. Is there are way to extract this automatic spacing/location AND the value? I want to do this so i can pass it to
set_xticks()
for my secondary axis (using twiny()) then use set_ticklabels() with a custom label. I realise I could use secondary axes giving both a forward and inverse function however providing an inverse function is not feasible for the goal of my code.
So in the image below, the ticks are only showing at 2,4,6,8,10 rather than all the values of x and I want to somehow extract these values and position so I can pass to set_xticks() and then change the tick labels (on a second x axis created with twiny).
UPDATE
When using the fix suggested it works well for the x axis. However, it does not work well for the y-axis. For the y-axis it seems to take the dataset values for the y ticks only. My code is:
ax4 = ax.twinx()
ax4.yaxis.set_ticks_position('left')
ax4.yaxis.set_label_position('left')
ax4.spines["left"].set_position(("axes", -0.10))
ax4.set_ylabel(self.y_2ndary_label, fontweight = 'bold')
Y = ax.get_yticks()
ax4.yaxis.set_ticks(Y)
ax4.yaxis.set_ticklabels( Y*Y )
ax4.set_ylim(ax.get_ylim())
fig.set_size_inches(8, 8)
plt.show()
but this gives me the following plot. The plot after is the original Y axis. This is not the case when I do this on the x-axis. Any ideas?
# From "get_xticks" Doc: The locations are not clipped to the current axis limits
# and hence may contain locations that are not visible in the output.
current_x_ticks = ax.get_xticks()
current_x_limits = ax.get_xlim()
ax.set_yticks(current_x_ticks) # Use this before "set_ylim"
ax.set_ylim(current_x_limits)
plt.show()

specify the lat/lon label location in cartopy (remove at some sides)

The new capability in Cartopy 0.18.0 to add lat/lon labels for any map projection is excellent. It's a great addition to this package. For some maps, especially in polar regions, the lat/lon labels can be very crowded. Here is an example.
from matplotlib import pyplot as plt
import numpy as np
import cartopy.crs as ccrs
pcproj = ccrs.PlateCarree()
lon0 = -150
mapproj = ccrs.LambertAzimuthalEqualArea(
central_longitude=lon0,central_latitude=75,
)
XLIM = 600e3; YLIM=700e3
dm =5; dp=2
fig = plt.figure(0,(7,7))
ax = fig.add_axes([0.1,0.1,0.85,0.9],projection=mapproj)
ax.set_extent([-XLIM,XLIM,-YLIM,YLIM],crs=mapproj)
ax.coastlines(resolution='50m',color='.5',linewidth=1.5)
lon_grid = np.arange(-180,181,dm)
lat_grid = np.arange(-80,86,dp)
gl = ax.gridlines(draw_labels=True,
xlocs=lon_grid,ylocs=lat_grid,
x_inline=False,y_inline=False,
color='k',linestyle='dotted')
gl.rotate_labels = False
Here is the output plot: I can't embed image yet, so here is the link
What I am looking for is to have lat labels on the left and right sides and lon labels at the bottom, with no labels at the top. This can be easily done in Basemap using a list of flags. I am wondering if this is possible with cartopy now.
Several failed attempts:
I came across a Github open issue for cartopy on a similar topic, but the suggested method does not work for this case. Adding gl.ylocator = mticker.FixedLocator(yticks) does nothing and adding gl.xlocator = mticker.FixedLocator(xticks) gets rid of most of lon labels except the 180 line on left and right sides but all the other lon labels are missing. The 80N lat label is still on the top, see here. After a more careful read of that thread, it seems it is still an ongoing effort for future cartopy releases.
Using gl.top_labels=False does not work either.
Setting y_inline to True makes the lat labels completely gone. I guess this might be because of axes extent I used. The lat labels might be on some longitude lines outside of the box. This is a separate issue, about how to specify the longitude lines/locations of the inline labels.
Right now, I have chosen to turn off the labels. Any suggestions and temporary solutions will be appreciated. At this point, the maps such as the examples above are useful for quicklooks but not ready for any formal use.
UPDATE:
Based on #swatchai 's suggestion, there is a temporary workaround below:
# --- add _labels attribute to gl
plt.draw()
# --- tol is adjusted based on the positions of the labels relative to the borders.
tol = 20
for ea in gl._labels:
pos = ea[2].get_position()
t_label = ea[2].get_text()
# --- remove lon labels on the sides
if abs(abs(pos[0])-XLIM)<tol:
if 'W' in t_label or 'E' in t_label or '180°' in t_label:
print(t_label)
ea[2].set_text('')
# --- remove labels on top
if abs(pos[1]-YLIM)<tol:
ea[2].set_text('')
This is almost what I wanted except that the 74N labels are missing because it is close to the 170W labels on the sides and cartopy chose 170W label instead of 74N. So I need a little more simple tweaks to put it back there.
This could be a workaround for your project until a better solution comes up.
# more code above this line
# this suppresses drawing labels on top edges
# only longitude_labels disappear, some latitude_labels persist
gl.top_labels=False
# workaround here to manipulate the remaining labels
plt.draw() #enable the use of ._lables()
for ea in gl._labels:
#here, ea[2] is a Text object
#print(ea)
if '80°N'==ea[2].get_text():
# set it a blank string
ea[2].set_text("")
ax.set_title("No Labels on Top Edge");
plt.show()
The output plot:

How to change color in pie chart using Matplotlib

I am trying to make v1 as blue, v2 as orange, v3 green and v4 as light grey
I tried going through documentation but cannot understand how to define color in piechart. Thank you for help.
I am using few line of codes of generate a piechart
where vol1 = v1,v2,v3,v4
plt.pie(vol1,labels = vollabels, autopct="%0.2f%%")
plt.legend(title="Normalized Volumes",loc="upper left", fontsize=14)
plt.axis
plt.show()
If you want to have control over which colors your pie chart contains, while at the same time not fall out of matplotlib's convenient handling of colour maps, you might want to have a look at documentation example Nested pie charts. Extracted highlights:
import matplotlib.pyplot as plt
import numpy as np
Retrieve a named colour map and "hand-pick", using a numbered range, suitable colors. The index picking in inner_colors matches hues for a larger numbers of data points in the inner circle:
cmap = plt.get_cmap("tab20c")
outer_colors = cmap(np.arange(3)*4)
inner_colors = cmap(np.array([1, 2, 5, 6, 9, 10]))
The actual plotting, including some customisation, is then straightforward:
fig, ax = plt.subplots()
size = 0.3
vals = np.array([[60., 32.], [37., 40.], [29., 10.]])
ax.pie(vals.sum(axis=1), radius=1, colors=outer_colors,
wedgeprops=dict(width=size, edgecolor='w'))
ax.pie(vals.flatten(), radius=1-size, colors=inner_colors,
wedgeprops=dict(width=size, edgecolor='w'))
Bonus content in the linked location: how to achieve the same result using a bar plot, but using polar coordinates. That way, one has more flexibility over the exact design, if one's goals diverge from the defaults assumed in pie.

How to change scatter plot marker color in plotting loop using pandas?

I'm trying to write a simple program that reads in a CSV with various datasets (all of the same length) and automatically plots them all (as a Pandas Dataframe scatter plot) on the same figure. My current code does this well, but all the marker colors are the same (blue). I'd like to figure out how to make a colormap so that in the future, if I have much larger data sets (let's say, 100+ different X-Y pairings), it will automatically color each series as it plots. Eventually, I would like for this to be a quick and easy method to run from the command line. I did not have luck reading the documentation or stack exchange, hopefully this is not a duplicate!
I've tried the recommendations from these posts:
1)Setting different color for each series in scatter plot on matplotlib
2)https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.scatter.html
3) https://matplotlib.org/users/colormaps.html
However, the first one essentially grouped the data points according to their position on the x-axis and made those groups of data the same color (not what I want, each series of data is roughly a linearly increasing function). The second and third links seemed to have worked, but I don't like the colormap choices (e.g. "viridis", many colors are too similar and it's hard to distinguish data points).
This is a simplified version of my code so far (took out other lines that automatically named axes, etc. to make it easier to read). I've also removed any attempts I've made to specify a colormap, for more of a blank canvas feel:
''' Importing multiple scatter data and plotting '''
import pandas as pd
import matplotlib.pyplot as plt
### Data file path (please enter Dataframe however you like)
path = r'/Users/.../test_data.csv'
### Read in data CSV
data = pd.read_csv(path)
### List of headers
header_list = list(data)
### Set data type to float so modified data frame can be plotted
data = data.astype(float)
### X-axis limits
xmin = 1e-4;
xmax = 3e-3;
## Create subplots to be plotted together after loop
fig, ax = plt.subplots()
### Since there are multiple X-axes (every other column), this loop only plots every other x-y column pair
for i in range(len(header_list)):
if i % 2 == 0:
dfplot = data.plot.scatter(x = "{}".format(header_list[i]), y = "{}".format(header_list[i + 1]), ax=ax)
dfplot.set_xlim(xmin,xmax) # Setting limits on X axis
plot.show()
The dataset can be found in the google drive link below. Thanks for your help!
https://drive.google.com/drive/folders/1DSEs8D7lIDUW4NIPBl2qW2EZiZxslGyM?usp=sharing

Resources