How to remove legend being taken as a dimension in python plot - python-3.x

So I wish to Plot the pairwise scatter plot with hue dimension of the grid.
But legend which is :1 and :2 is also getting displayed along other labels.I guess it is taking 1 and 2 as numbers not as legend strings.
Here is my code:
plt.close();
sns.set_style("whitegrid");
x = sns.pairplot(data, hue="status", height=2.5);
plt.show()
This is the output I am getting:
I wish to remove status being taken as a dimension

Since you do not want all the variables to be plotted, you should specify which ones you want. That is done with the vars argument:
x = sns.pairplot(data, vars=['age', 'years', 'nodes'], hue="status", height=2.5);
plt.show()

Related

Is it possible to extract the default tick locations from the primary axis and pass it to a secondary access with matplotlib?

When making a plot with with
fig, ax = plt.subplots()
x=[1,2,3,4,5,6,7,8,9,10]
y=[1,2,3,4,5,6,7,8,9,10]
ax.plot(x,y)
plt.show()
matplotlib will determine the tick spacing/location and value of the tick. Is there are way to extract this automatic spacing/location AND the value? I want to do this so i can pass it to
set_xticks()
for my secondary axis (using twiny()) then use set_ticklabels() with a custom label. I realise I could use secondary axes giving both a forward and inverse function however providing an inverse function is not feasible for the goal of my code.
So in the image below, the ticks are only showing at 2,4,6,8,10 rather than all the values of x and I want to somehow extract these values and position so I can pass to set_xticks() and then change the tick labels (on a second x axis created with twiny).
UPDATE
When using the fix suggested it works well for the x axis. However, it does not work well for the y-axis. For the y-axis it seems to take the dataset values for the y ticks only. My code is:
ax4 = ax.twinx()
ax4.yaxis.set_ticks_position('left')
ax4.yaxis.set_label_position('left')
ax4.spines["left"].set_position(("axes", -0.10))
ax4.set_ylabel(self.y_2ndary_label, fontweight = 'bold')
Y = ax.get_yticks()
ax4.yaxis.set_ticks(Y)
ax4.yaxis.set_ticklabels( Y*Y )
ax4.set_ylim(ax.get_ylim())
fig.set_size_inches(8, 8)
plt.show()
but this gives me the following plot. The plot after is the original Y axis. This is not the case when I do this on the x-axis. Any ideas?
# From "get_xticks" Doc: The locations are not clipped to the current axis limits
# and hence may contain locations that are not visible in the output.
current_x_ticks = ax.get_xticks()
current_x_limits = ax.get_xlim()
ax.set_yticks(current_x_ticks) # Use this before "set_ylim"
ax.set_ylim(current_x_limits)
plt.show()

How to divide the area between two co-ordinates into blocks and assign some values to those blocks?

Basically I have to create a heatmap of the crowd present in an area.
I have two coordinates. X starts from 0 and maximum is 119994. Y ranges from -14,000 to +27,000. I have to divide these coordinates into as many blocks blocks as I wish, count the number of people in each block and create a heatmap of this whole area.
Basically show the crowdedness of the area divided as blocks.
I have data in the below format:-
Employee_ID X_coord Y_coord_start Y_coord_end
23 1333 0 6000
45 3999 7000 17000
I tried dividing both the coordinate maximums by 100(to make 100 blocks) and tried finding the block coordinates but that was very complex.
As I have to make a heatmap I have to prepare a matrix of values in the form of blocks. Every block will have a count of people which I can count and find out from my data but the problem is how to make these blocks of coordinates?
I have another question regarding scatter plot:-
My data is:-
Batch_ID Pieces_Productivity
181031008780 4.578886
181031008781 2.578886
When I plot it using the following code:-
plt.scatter(list(df_books_location.Batch_ID),list(df_books_location['Pieces_productivity']), s=area, alpha=0.5)
It doesn't give me proper plot. But when I plot with small integers(0-1000) for Batch_ID I get good graph. How to handle large integers for plotting?
I don't know which of both Y_coord_-rows should give the actual Y coordinate, and also don't know whether your plot should be evaluate the data on a strict "grid", or perhaps rather smooth it out; hence I am using both an imshow() and a sns.kdeplot() in the code below:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
### generate some data
np.random.seed(0)
data = np.random.multivariate_normal([0, 0], [(1, .6), (.6, 1)], 100)
## this would e.g. be X,Y=df['X_coord'], df['Y_coord_start'] :
X,Y=data[:,0],data[:,1]
fig,ax=plt.subplots(nrows=1,ncols=3,figsize=(10,5))
ax[0].scatter(X,Y)
sns.kdeplot(X,Y, shade=True, ax=ax[1],cmap="viridis")
## the X,Y points are binned into 10x10 bins here, you will need
# to adjust the amount of bins so that it looks "nice" for you
heatmap, xedges, yedges = np.histogram2d(X, Y, bins=(10,10))
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
im=ax[2].imshow(heatmap.T, extent=extent,
origin="lower",aspect="auto",
interpolation="nearest") ## also play with different interpolations
## Loop over heatmap dimensions and create text annotations:
# note that we need to "push" the text from the lower left corner of each pixel
# into the center of each pixel
## also try to choose a text color which is readable on all pixels,
# or e.g. use vmin=… vmax= to adjust the colormap such that the colors
# don't clash with e.g. white text
pixel_center_x=(xedges[1]-xedges[0])/2.
pixel_center_y=(yedges[1]-yedges[0])/2.
for i in range(np.shape(heatmap)[1]):
for j in range(np.shape(heatmap)[0]):
text = ax[2].text(pixel_center_x+xedges[j], pixel_center_y+yedges[i],'{0:0.0f}'.format(heatmap[j, i]),
ha="center", va="center", color="w",fontsize=6)
plt.colorbar(im)
plt.show()
yields:

Concatenating multiple barplots in seaborn

My data-frame contains the following column headers: subject, Group, MASQ_GDA, MASQ_AA, MASQ_GDD, MASQ_AD
I was successfully able to plot one of them using a bar plot with the following specifications:
bar_plot = sns.barplot(x="Group", y='MASQ_GDA', units="subject", ci = 68, hue="Group", data=demo_masq)
However, I am attempting to create several of such bar plot side by side. Might anyone know how I can accomplish this, for each plot to contain the remaining 3 variables (MASQ_AA, MASQ_GDD, MASQ_AD). Here is an example of what I am trying to achieve.
If you look in the documentation for sns.barplot(), you will see that the function accepts a parameter ax= allowing you to tell seaborn which Axes object to use to plot the result
ax : matplotlib Axes, optional
Axes object to draw the plot onto, otherwise uses the current Axes.
Therefore, the simple way to obtain the desired output is to create the Axes beforehand, and then calling sns.barplot() with the corresponding ax parameter
fig, axs = plt.subplots(1,4) # create 4 subplots on 1 row
for ax,col in zip(axs,["MASQ_GDA", "MASQ_AA", "MASQ_GDD", "MASQ_AD"]):
sns.barplot(x="Group", y=col, units="subject", ci = 68, hue="Group", data=demo_masq, ax=ax) # <- notice ax= argument
Another option, and maybe an option that is more in line with the philosophy of seaborn is to use a FacetGrid. This would allow you to automatically create the required number of subplots depending on the number of categories in your dataset. However, it requires to reshape your dataframe so that the content of your MASQ_* columns are on a single column, with a new column showing what category each value corresponds to.

Pandas datetime indexing in plots

I've been having problems with different types of formatting on my xAxis in my plots.
When I load my data from a .csv file with the errorhandling "forward fill" (df.ffill()) I get the following plot:
Which is extremely neat! However, when I errorhandle with df.drop() I get this plot:
Which does not have the same type of formatting on the xAxis as the first plot, which is both annoying and not as pretty / useful as the first plot.
I'm thinking it has to do with the amount of data? But honestly I have no idea. I've been googling for hours and found no particular answer on how to specify the first type of formatting as a plotting parameter.
My code is as follows:
# Create a date-type series
# tvec is an Nx6 matrix where each column represents [year,month,day,hour,min,sec]
# pltData is an Nx4 matrix where each column is a positive float value
xAxis = pd.to_datetime(tvec)
xLabel = "Date" # Set label
# Set xAxis as index to data
pltData = pltData.set_index([xAxis])
# Plot data with use of pandas plotting function
pltData.plot(ax=ax,
title="Consumption per {}".format(period))
# Add options to plot and draw to canvas
plt.xlabel(xLabel) # Add x-label
plt.ylabel(self.unit) # Add y-label
# Define plot parameters
plt.subplots_adjust(top=0.9, bottom=0.255, left=0.1, right=0.955,
hspace=0.2, wspace=0.2) # adjust size
canvas.draw() # Draw to canvas
I am using a FigureCanvas to plot on in a GUI created in PyQt5

how to customise plot title in spatstat

How may I change the plot titles and subtitles when using plot command on linnet object. For example
library(spatstat)
first = runiflpp(10, as.linnet(chicago), nsim = 2)
plot(first)
This code above gives two realisations of a a point process and a plot with the plot command because we requested for nsim=2. But it plots the two realisations with plot title 'simulation 1' and 'simulation 2'.
How can I change the subplot titles for example from simulation 1 to experiment 1?
thank you
The simplest way would be to change the names of the items in the list:
names(first) <- paste("experiment", 1:2)
Alternatively you can change the argument main.panel in plot.solist (see ?plot.solist for all the options):
plot(first, main.panel = paste("experiment", 1:2))

Resources