Pandas datetime indexing in plots - python-3.x

I've been having problems with different types of formatting on my xAxis in my plots.
When I load my data from a .csv file with the errorhandling "forward fill" (df.ffill()) I get the following plot:
Which is extremely neat! However, when I errorhandle with df.drop() I get this plot:
Which does not have the same type of formatting on the xAxis as the first plot, which is both annoying and not as pretty / useful as the first plot.
I'm thinking it has to do with the amount of data? But honestly I have no idea. I've been googling for hours and found no particular answer on how to specify the first type of formatting as a plotting parameter.
My code is as follows:
# Create a date-type series
# tvec is an Nx6 matrix where each column represents [year,month,day,hour,min,sec]
# pltData is an Nx4 matrix where each column is a positive float value
xAxis = pd.to_datetime(tvec)
xLabel = "Date" # Set label
# Set xAxis as index to data
pltData = pltData.set_index([xAxis])
# Plot data with use of pandas plotting function
pltData.plot(ax=ax,
title="Consumption per {}".format(period))
# Add options to plot and draw to canvas
plt.xlabel(xLabel) # Add x-label
plt.ylabel(self.unit) # Add y-label
# Define plot parameters
plt.subplots_adjust(top=0.9, bottom=0.255, left=0.1, right=0.955,
hspace=0.2, wspace=0.2) # adjust size
canvas.draw() # Draw to canvas
I am using a FigureCanvas to plot on in a GUI created in PyQt5

Related

Is it possible to extract the default tick locations from the primary axis and pass it to a secondary access with matplotlib?

When making a plot with with
fig, ax = plt.subplots()
x=[1,2,3,4,5,6,7,8,9,10]
y=[1,2,3,4,5,6,7,8,9,10]
ax.plot(x,y)
plt.show()
matplotlib will determine the tick spacing/location and value of the tick. Is there are way to extract this automatic spacing/location AND the value? I want to do this so i can pass it to
set_xticks()
for my secondary axis (using twiny()) then use set_ticklabels() with a custom label. I realise I could use secondary axes giving both a forward and inverse function however providing an inverse function is not feasible for the goal of my code.
So in the image below, the ticks are only showing at 2,4,6,8,10 rather than all the values of x and I want to somehow extract these values and position so I can pass to set_xticks() and then change the tick labels (on a second x axis created with twiny).
UPDATE
When using the fix suggested it works well for the x axis. However, it does not work well for the y-axis. For the y-axis it seems to take the dataset values for the y ticks only. My code is:
ax4 = ax.twinx()
ax4.yaxis.set_ticks_position('left')
ax4.yaxis.set_label_position('left')
ax4.spines["left"].set_position(("axes", -0.10))
ax4.set_ylabel(self.y_2ndary_label, fontweight = 'bold')
Y = ax.get_yticks()
ax4.yaxis.set_ticks(Y)
ax4.yaxis.set_ticklabels( Y*Y )
ax4.set_ylim(ax.get_ylim())
fig.set_size_inches(8, 8)
plt.show()
but this gives me the following plot. The plot after is the original Y axis. This is not the case when I do this on the x-axis. Any ideas?
# From "get_xticks" Doc: The locations are not clipped to the current axis limits
# and hence may contain locations that are not visible in the output.
current_x_ticks = ax.get_xticks()
current_x_limits = ax.get_xlim()
ax.set_yticks(current_x_ticks) # Use this before "set_ylim"
ax.set_ylim(current_x_limits)
plt.show()

I am trying to move the x axis from bottom to top in matplotlib. But xaxis.set_ticks_position('top') does not seem to work as it should?

I am plotting a log using matplotlib and would like my x axis to be position at the top of the plot rather than bottom.
I tried xaxis.set_ticks_position('top') but it did not work. However xaxis.set_label_position('top') worked for the label.
from matplotlib import gridspec
# creating the figure
fig=plt.figure(figsize=(12,10))
# adding the title
fig.suptitle('Volume of Clay from different methods',fontsize=14)
# creating the axes
gs=gridspec.GridSpec(4,3)
ax1=fig.add_subplot(gs[:,0])
ax2=fig.add_subplot(gs[0,1])
ax3=fig.add_subplot(gs[1,1])
ax4=fig.add_subplot(gs[2,1])
ax5=fig.add_subplot(gs[3,1])
ax6=fig.add_subplot(gs[:,2],sharey=ax1)
# Plotting graph for GR,SP
ax1.invert_yaxis()
ax1.xaxis.set_label_position('top')
ax1.xaxis.set_ticks_position('top')
ax1.grid(True)
ax1.set_ylabel('DEPTH')
ax1.set_xlabel('GR[api]',color='green')
ax1.tick_params('x',colors='green')
ax1.spines['top'].set_position(('outward',0))
ax1.plot(data.GR,data.index, color='green')
ax11=ax1.twiny()
ax11.plot(data.SP,data.index,color='blue')
ax11.set_xlabel("SP[mV]",color='blue')
ax11.spines['top'].set_position(('outward',40))
plt.show()
i am expecting the x axis for the GR curve in green to be on top but it remains in the bottom instead.
I think i found out what's going on thanks to #ImportanceOfBeingErnest
ax11.ax1.twiny() is overwriting ax1
i've fix the code as below.
from matplotlib import gridspec
# creating the figure
fig=plt.figure(figsize=(12,10))
# adding the title
fig.suptitle('Volume of Clay from different methods',fontsize=14)
fig.subplots_adjust(top=0.9,wspace=0.3, hspace=0.3)
# creating the axes
gs=gridspec.GridSpec(4,3)
ax1=fig.add_subplot(gs[:,0])
ax1.get_xaxis().set_visible(False)
ax2=fig.add_subplot(gs[0,1])
ax3=fig.add_subplot(gs[1,1])
ax4=fig.add_subplot(gs[2,1])
ax5=fig.add_subplot(gs[3,1])
ax6=fig.add_subplot(gs[:,2],sharey=ax1)
# Plotting graph for GR,SP
ax10=ax1.twiny()
ax10.invert_yaxis()
ax10.xaxis.set_label_position('top')
ax10.xaxis.set_ticks_position('top')
ax10.tick_params('x',colors='green')
ax10.spines['top'].set_position(('outward',0))
ax10.grid(True)
ax10.set_ylabel('DEPTH')
ax10.set_xlabel('GR[api]',color='green')
ax10.plot(data.GR,data.index, color='green')
ax11=ax1.twiny()
ax11.plot(data.SP,data.index,color='blue')
ax11.set_xlabel("SP[mV]",color='blue')
ax11.spines['top'].set_position(('outward',40))
If there are any better way to write this please do comment.

How to change scatter plot marker color in plotting loop using pandas?

I'm trying to write a simple program that reads in a CSV with various datasets (all of the same length) and automatically plots them all (as a Pandas Dataframe scatter plot) on the same figure. My current code does this well, but all the marker colors are the same (blue). I'd like to figure out how to make a colormap so that in the future, if I have much larger data sets (let's say, 100+ different X-Y pairings), it will automatically color each series as it plots. Eventually, I would like for this to be a quick and easy method to run from the command line. I did not have luck reading the documentation or stack exchange, hopefully this is not a duplicate!
I've tried the recommendations from these posts:
1)Setting different color for each series in scatter plot on matplotlib
2)https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.scatter.html
3) https://matplotlib.org/users/colormaps.html
However, the first one essentially grouped the data points according to their position on the x-axis and made those groups of data the same color (not what I want, each series of data is roughly a linearly increasing function). The second and third links seemed to have worked, but I don't like the colormap choices (e.g. "viridis", many colors are too similar and it's hard to distinguish data points).
This is a simplified version of my code so far (took out other lines that automatically named axes, etc. to make it easier to read). I've also removed any attempts I've made to specify a colormap, for more of a blank canvas feel:
''' Importing multiple scatter data and plotting '''
import pandas as pd
import matplotlib.pyplot as plt
### Data file path (please enter Dataframe however you like)
path = r'/Users/.../test_data.csv'
### Read in data CSV
data = pd.read_csv(path)
### List of headers
header_list = list(data)
### Set data type to float so modified data frame can be plotted
data = data.astype(float)
### X-axis limits
xmin = 1e-4;
xmax = 3e-3;
## Create subplots to be plotted together after loop
fig, ax = plt.subplots()
### Since there are multiple X-axes (every other column), this loop only plots every other x-y column pair
for i in range(len(header_list)):
if i % 2 == 0:
dfplot = data.plot.scatter(x = "{}".format(header_list[i]), y = "{}".format(header_list[i + 1]), ax=ax)
dfplot.set_xlim(xmin,xmax) # Setting limits on X axis
plot.show()
The dataset can be found in the google drive link below. Thanks for your help!
https://drive.google.com/drive/folders/1DSEs8D7lIDUW4NIPBl2qW2EZiZxslGyM?usp=sharing

How to divide the area between two co-ordinates into blocks and assign some values to those blocks?

Basically I have to create a heatmap of the crowd present in an area.
I have two coordinates. X starts from 0 and maximum is 119994. Y ranges from -14,000 to +27,000. I have to divide these coordinates into as many blocks blocks as I wish, count the number of people in each block and create a heatmap of this whole area.
Basically show the crowdedness of the area divided as blocks.
I have data in the below format:-
Employee_ID X_coord Y_coord_start Y_coord_end
23 1333 0 6000
45 3999 7000 17000
I tried dividing both the coordinate maximums by 100(to make 100 blocks) and tried finding the block coordinates but that was very complex.
As I have to make a heatmap I have to prepare a matrix of values in the form of blocks. Every block will have a count of people which I can count and find out from my data but the problem is how to make these blocks of coordinates?
I have another question regarding scatter plot:-
My data is:-
Batch_ID Pieces_Productivity
181031008780 4.578886
181031008781 2.578886
When I plot it using the following code:-
plt.scatter(list(df_books_location.Batch_ID),list(df_books_location['Pieces_productivity']), s=area, alpha=0.5)
It doesn't give me proper plot. But when I plot with small integers(0-1000) for Batch_ID I get good graph. How to handle large integers for plotting?
I don't know which of both Y_coord_-rows should give the actual Y coordinate, and also don't know whether your plot should be evaluate the data on a strict "grid", or perhaps rather smooth it out; hence I am using both an imshow() and a sns.kdeplot() in the code below:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
### generate some data
np.random.seed(0)
data = np.random.multivariate_normal([0, 0], [(1, .6), (.6, 1)], 100)
## this would e.g. be X,Y=df['X_coord'], df['Y_coord_start'] :
X,Y=data[:,0],data[:,1]
fig,ax=plt.subplots(nrows=1,ncols=3,figsize=(10,5))
ax[0].scatter(X,Y)
sns.kdeplot(X,Y, shade=True, ax=ax[1],cmap="viridis")
## the X,Y points are binned into 10x10 bins here, you will need
# to adjust the amount of bins so that it looks "nice" for you
heatmap, xedges, yedges = np.histogram2d(X, Y, bins=(10,10))
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
im=ax[2].imshow(heatmap.T, extent=extent,
origin="lower",aspect="auto",
interpolation="nearest") ## also play with different interpolations
## Loop over heatmap dimensions and create text annotations:
# note that we need to "push" the text from the lower left corner of each pixel
# into the center of each pixel
## also try to choose a text color which is readable on all pixels,
# or e.g. use vmin=… vmax= to adjust the colormap such that the colors
# don't clash with e.g. white text
pixel_center_x=(xedges[1]-xedges[0])/2.
pixel_center_y=(yedges[1]-yedges[0])/2.
for i in range(np.shape(heatmap)[1]):
for j in range(np.shape(heatmap)[0]):
text = ax[2].text(pixel_center_x+xedges[j], pixel_center_y+yedges[i],'{0:0.0f}'.format(heatmap[j, i]),
ha="center", va="center", color="w",fontsize=6)
plt.colorbar(im)
plt.show()
yields:

Concatenating multiple barplots in seaborn

My data-frame contains the following column headers: subject, Group, MASQ_GDA, MASQ_AA, MASQ_GDD, MASQ_AD
I was successfully able to plot one of them using a bar plot with the following specifications:
bar_plot = sns.barplot(x="Group", y='MASQ_GDA', units="subject", ci = 68, hue="Group", data=demo_masq)
However, I am attempting to create several of such bar plot side by side. Might anyone know how I can accomplish this, for each plot to contain the remaining 3 variables (MASQ_AA, MASQ_GDD, MASQ_AD). Here is an example of what I am trying to achieve.
If you look in the documentation for sns.barplot(), you will see that the function accepts a parameter ax= allowing you to tell seaborn which Axes object to use to plot the result
ax : matplotlib Axes, optional
Axes object to draw the plot onto, otherwise uses the current Axes.
Therefore, the simple way to obtain the desired output is to create the Axes beforehand, and then calling sns.barplot() with the corresponding ax parameter
fig, axs = plt.subplots(1,4) # create 4 subplots on 1 row
for ax,col in zip(axs,["MASQ_GDA", "MASQ_AA", "MASQ_GDD", "MASQ_AD"]):
sns.barplot(x="Group", y=col, units="subject", ci = 68, hue="Group", data=demo_masq, ax=ax) # <- notice ax= argument
Another option, and maybe an option that is more in line with the philosophy of seaborn is to use a FacetGrid. This would allow you to automatically create the required number of subplots depending on the number of categories in your dataset. However, it requires to reshape your dataframe so that the content of your MASQ_* columns are on a single column, with a new column showing what category each value corresponds to.

Resources