Can't make dates appear on x-axis in pyplot - python-3.x

So I've been trying to plot some data. I have got the data to fetch from a database and placed it all correctly into the variable text_. This is the snippet of the code:
import sqlite3
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from dateutil.parser import parse
fig, ax = plt.subplots()
# Twin the x-axis twice to make independent y-axes.
axes = [ax, ax.twinx(), ax.twinx()]
# Make some space on the right side for the extra y-axis.
fig.subplots_adjust(right=0.75)
# Move the last y-axis spine over to the right by 20% of the width of the axes
axes[-1].spines['right'].set_position(('axes', 1.2))
# To make the border of the right-most axis visible, we need to turn the frame on. This hides the other plots, however, so we need to turn its fill off.
axes[-1].set_frame_on(True)
axes[-1].patch.set_visible(False)
# And finally we get to plot things...
text_ = [('01/08/2017', 6.5, 143, 88, 60.2, 3), ('02/08/2017', 7.0, 146, 90, 60.2, 4),
('03/08/2017', 6.7, 142, 85, 60.2, 5), ('04/08/2017', 6.9, 144, 86, 60.1, 6),
('05/08/2017', 6.8, 144, 88, 60.2, 7), ('06/08/2017', 6.7, 147, 89, 60.2, 8)]
colors = ('Green', 'Red', 'Blue')
label = ('Blood Sugar Level (mmol/L)', 'Systolic Blood Pressure (mm Hg)', 'Diastolic Blood Pressure (mm Hg)')
y_axisG = [text_[0][1], text_[1][1], text_[2][1], text_[3][1], text_[4][1], text_[5][1]] #Glucose data
y_axisS = [text_[0][2], text_[1][2], text_[2][2], text_[3][2], text_[4][2], text_[5][2]] # Systolic Blood Pressure data
y_axisD = [text_[0][3], text_[1][3], text_[2][3], text_[3][3], text_[4][3], text_[5][3]] # Diastolic Blood Pressure data
AllyData = [y_axisG, y_axisS, y_axisD] #list of the lists of data
dates = [text_[0][0], text_[1][0], text_[2][0], text_[3][0], text_[4][0], text_[5][0]] # the dates as strings
x_axis = [(parse(x, dayfirst=True)) for x in dates] #converting the dates to datetime format for the graph
Blimits = [5.5, 130, 70] #lower limits of the axis
Tlimits = [8, 160, 100] #upper limits of the axis
for ax, color, label, AllyData, Blimits, Tlimits in zip(axes, colors, label, AllyData, Blimits, Tlimits):
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y')) #format's the date
plt.gca().xaxis.set_major_locator(mdates.DayLocator())
data = AllyData
ax.plot(data, color=color) #plots all the y-axis'
ax.set_ylim([Blimits, Tlimits]) #limits
ax.set_ylabel(label, color=color) #y-axis labels
ax.tick_params(axis='y', colors=color)
axes[0].set_xlabel('Date', labelpad=20)
plt.gca().set_title("Last 6 Month's Readings",weight='bold',fontsize=15)
plt.show()
The code currently makes this graph:
Graph with no x-values
I understand the problem is probably in the ax.plot part but I'm not sure what exactly. I tried putting that line of code as ax.plot(data, x_axis, color=color however, this made the whole graph all messed up and the dates didn't show up on the x-axis like i wanted them to.
Is there something I've missed?
If this has been answered elsewhere, please can you show me how to implement that into my code by editing my code?
Thanks a ton

Apparently x_data is never actually used in the code. Instead of
ax.plot(data, color=color)
which plots the data against its indices, you would want to plot the data against the dates stored in x_axis.
ax.plot(x_axis, data, color=color)
Finally, adding plt.gcf().autofmt_xdate() just before plt.show will rotate the dates nicely, such that they don't overlap.

Related

My Bar Plot is not showing bars for all the data values

I have a DataFrame that contains two features namely LotFrontage and LotArea.
I want to plot a bar graph to show the relation between them.
My code is:
import matplotlib.pyplot as plt
visual_df=pd.DataFrame()
visual_df['area']=df_encoded['LotArea']
visual_df['frontage']=df_encoded['LotFrontage']
visual_df.dropna(inplace=True)
plt.figure(figsize=(15,10))
plt.bar(visual_df['area'],visual_df['frontage'])
plt.show()
The column LotFrontage is in Float datatype.
What is wrong with my code and How can I correct it?
To see a relationship between two features, a scatter plot is usually much more informative than a bar plot. To draw a scatter plot via matplotlib: plt.scatter(visual_df['area'], visual_df['frontage']). You can also invoke pandas scatter plot, which automatically adds axis labels: df.plot(kind='scatter', x='area', y='frontage').
For a lot of statistical purposes, seaborn can be handy. sns.regplot not only creates the scatter plot but automatically also tries to fit the data with a linear regression and shows a confidence interval.
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
area = [8450, 9600, 11250, 9550, 14260, 14115, 10084, 6120, 7420, 11200, 11924, 10652, 6120, 10791, 13695, 7560, 14215, 7449, 9742, 4224, 14230, 7200]
frontage = [65, 80, 68, 60, 84, 85, 75, 51, 50, 70, 85, 91, 51, 72, 68, 70, 101, 57, 75, 44, 110, 60]
df = pd.DataFrame({'area': area, 'frontage': frontage})
sns.regplot(x='area', y='frontage', data=df)
plt.show()
PS: The main problem with the intented bar plot is that the x-values lie very far apart. Moreover, the default width is one and very narrow bars can get too narrow to see in the plot. Adding an explicit edge color can make them visible:
plt.bar(visual_df['area'], visual_df['frontage'], ec='blue')
You could set a larger width, but then some bars would start to overlap.
Alternatively, pandas barplot would treat the x-axis as categorical, showing all x-values next to each other, as if they were strings. The bars are drawn in the order of the dataframe, so you might want to sort first:
df.sort_values('area').plot(kind='bar', x='area', y='frontage')
plt.tight_layout()

matplotlib - dashed line between points if one condition is met

I am using matplotlib to draw a plot. What I want to achieve is to connect points if one condition is met. For instance, if I have a dataframe like the following:
import os
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
df=pd.DataFrame({'dates': [2001, 2002, 2003, 2004, 2005, 2006], 'census_people': [306,327,352,478,250, 566], 'census_houses': [150,200,249,263, 180, 475]}) #I changed the dates from strings to ints
I could create plots like this use the following codes:
plt.plot('dates','census_houses',data=df[df['dates'] < 2004] ,marker='o',color='orange', linewidth=2)
plt.plot('dates','census_houses',data=df[df['dates'] > 2002] ,marker='o',color='orange', linewidth=2, linestyle = '--')
The plot is like the following:
However, what I truely want is, for instance, use the dashed line to connect points if the census_houses is bigger than 250. How to achieve this using matplotlib? Any suggestions and insights are welcomed! Thank you~
This effect can be achieved by applying clipping paths. In this example I suppose the full line completely draws over the dashed line, so only clipping of the full line is needed.
In the example, the special value for the y-axis is set to 220, different colors and very thick lines are used, to better see what is happening. The parameters for Rectangle((x, y), width, height) are setting y to the desired cut-off value, x is some position far left, width makes sure that x + width is far right and height is a large positive number to clip above the line, negative to clip below the line.
This post has more information about clipping paths.
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
def do_clipping(patches, special_y, keep_below=True, ax=None):
ax = ax or plt.gca()
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
height = ymax - ymin
if keep_below:
height = -height
clip_rect = Rectangle((xmin, special_y), xmax - xmin, height,
transform=ax.transData)
for p in patches:
p.set_clip_path(clip_rect)
df = pd.DataFrame({'dates': [2001, 2002, 2003, 2004, 2005, 2006],
'census_houses': [150, 200, 249, 263, 180, 475]})
plt.plot('dates', 'census_houses', data=df, color='limegreen', linewidth=10, linestyle='--')
plot_patches = plt.plot('dates', 'census_houses', data=df, color='crimson', linewidth=10)
do_clipping(plot_patches, 220)
plt.show()

How to reduce the width of histogram?

I have drawn histogram of a diagnosis, which I modeled as poisson distribution in python. I need to reduce the width of rectangle in output graph.
I have written following line in python. I need to width reduction parameter to this code line.
fig = df['overall_diagnosis'].value_counts(normalize=True).plot(kind='bar',rot=0, color=['b', 'r'], alpha=0.5)
You are looking for matplotlib.pyplot.figure. You can use it like this:
from matplotlib.pyplot import figure
figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k')
Here is a example of how to do it:
names = ['group_a', 'group_b', 'group_c']
values = [1, 10, 100]
plt.figure(1, figsize=(9, 3))
plt.subplot(131)
plt.bar(names, values)
plt.subplot(132)
plt.scatter(names, values)
plt.subplot(133)
plt.plot(names, values)
plt.suptitle('Categorical Plotting')
plt.show()

interpolate.interp1d linear plot doesn't agree with new inputs to the function

I have used scipy.interpolate.interp1d to have a linear interpolation between two arrays with float values. Then, I plotted the interpolation function with matplotlib. However, I noticed that some new values (that weren't originally included in the arrays representing x and y data) yield different results when plugged into the interpolation function, than what the plot suggests.
I am essentially trying to find the intersection points between a few lines that are parallel to the x-axis and the interpolation function's linear curve. By research online, I saw that many people use scipy's interpolate.interp1d for this purpose.
Here is the code:
from scipy import interpolate
import matplotlib.pyplot as plt
# Data
size = [12, 9, 6.5, 4.8, 2, 0.85, 0.45, 0.15, 0.07]
poW = [100, 99, 98, 97, 94, 80, 50, 6, 1]
# Approximate function f: size = f(poW)
f = interpolate.interp1d(poW, size, kind="linear")
# Here I create the plot
plt.axes(xscale='log') # scale x-axis
plt.plot(size, poW, "bs", # add data points with blue squares
f(poW), poW, "b") # add a blue trendline
# Draw D_10 as an additional point
plt.plot(f(10), 10, "rx", markersize=15)
# Draw D_30 as an additional point
plt.plot(f(30), 30, "rx", markersize=15)
# Draw D_60 as an additional point
plt.plot(f(60), 60, "rx", markersize=15)
plt.show()
The additional points I plot in the last 3 lines before plt.show(), don't correspond to the same positions indicated by the plot of the interpolation function itself. This is pretty interesting for me, and I can't seem to locate the problem here. I am pretty new to matplotlib and scipy, so I am sure I must be missing something. Any help or pointing in the right direction will be appreciated!

Making a histogram/barchart

i have a Pandas dataframe, which contains 6000 values ranging between 1 and 2500, i would like to create a chart that shows a predetermined x-axis, i.e. [1,2,4,8,16,32,64,128,256,512,more] and the a bar for each of these counts, i've been looking into the numpy.histogram, bit that does not let me choose the bin range (it estimates one) same goes for matplotlib.
The codes i've tried so far is,
plt.hist(df['cnt'],bins=[0,1,2,4,8,16,32,64,128,256,512])
plt.show()
np.histogram(df['cnt'])
And the plotting the np data, but i does not look like i want it.
I hope my question makes sense, else i will try to expand.
EDIT
when i run the
plt.hist(df['cnt'],bins=[0,1,2,4,8,16,32,64,128,256,512])
plt.show()
i get:
What i want:
Where the second one have been made in Excel using the data analysis histogram function. I hope this gives a better picture of what i would like to do.
I think you want a base-2 logarithmic scale on the xaxis.
You can do that by setting ax.set_xscale('log', basex=2)
You also then need to adjust the tick locations and formatting, which you can do with ax.xaxis.set_major_locator(ticker.FixedLocator(bins)) and ax.xaxis.set_major_formatter(ticker.ScalarFormatter()
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
fig, ax = plt.subplots(1)
# Some fake data
cnt = np.random.lognormal(0.5, 2.0, 6000)
# Define your bins
bins = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
# Plot the histogram
ax.hist(cnt, bins=bins)
# Set scale to base2 log
ax.set_xscale('log', basex=2)
# Set ticks and ticklabels using ticker
ax.xaxis.set_major_locator(ticker.FixedLocator(bins))
ax.xaxis.set_major_formatter(ticker.ScalarFormatter())
plt.show()

Resources