matplotlib not displaying all axis values - python-3.x

I have a small program that is plotting some data. The program runs without any errors and displays the plot, but it is removing every other x-axis value. What should I be doing to get all twelve axis labels to display properly?
The program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
print('NumPy: {}'.format(np.__version__))
print('Pandas: {}'.format(pd.__version__))
print('-----')
display_settings = {
'max_columns': 14,
'expand_frame_repr': False, # Wrap to multiple pages
'max_rows': 50,
'show_dimensions': False
}
pd.options.display.float_format = '{:,.2f}'.format
for op, value in display_settings.items():
pd.set_option("display.{}".format(op), value)
file = "e:\\python\\dataquest\\unrate.csv"
unrate = pd.read_csv(file)
print(unrate.shape, '\n')
unrate['DATE'] = pd.to_datetime(unrate['DATE'])
print(unrate.info(), '\n')
print(unrate.head(12), '\n')
register_matplotlib_converters()
plt.xticks(rotation=90)
plt.plot(unrate['DATE'][0:12], unrate['VALUE'][0:12])
plt.show()
I am getting as output: (I am using PyCharm)
I believe I should be getting: (From Dataquests built in IDE)

#Quang Hong, You were on the right track. I had to adjust the interval value and the date format as follows:
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=30))
Now I get this output:

Related

Bokeh plot title 'str' object is not callable

In Jupyter Notebooks I read in a dataframe and create several plots with Pandas / Bokeh.
While creating one of the latter I get an error.
Search for similar problems said, that there might be somewhere above in the script something like
plt.title = "Title"
which overwrites the method. But this is not the case for me. I have nothing similar in the code above -exept in the figure parameters. Here the Bokeh documentation describes to set a figure title like I used it.
Using the part of the code that leads the the error in the complete notebook in a stand-alone script only does NOT lead to the error. So, also in my case the problem might have something to do with my code above. But maybe some of you has an idea when seeing this..(?)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bokeh.plotting import figure, show, output_notebook, ColumnDataSource
from bokeh.io import output_notebook
from bokeh.layouts import column, gridplot
from bokeh.models import Label, Title
from bokeh.models import Div
data = df
output_notebook()
# Title of the overall plot
abovetitle = ("This should be the overall title of all graphs")
# GRAPH 1
s1 = figure(width = 250, plot_height = 250, title="Graph 1", x_axis_label = "axis title 1", y_axis_label = 'µs')
s1.line(x, y, width=1, color="black", alpha=1, source = data)
# s1.title.text = "Title With Options" # this is a instead-off 'title=' test, but does not solve the problem
# GRAPH 2
s2 = figure(width = 250, plot_height = 250, title="Graph 2", x_axis_label = "axis title 2, y_axis_label = 'µs')
s2.line(x, y, width=1, color="blue", alpha=1, source = data)
#s2.title.text = "Title With Options" # this is a instead-off 'title=' test, but does not solve the problem
# plot graphs:
p = gridplot([[s1, s2]])
show(column(Div(text=abovetitle), p))
leads to the type error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-24-33e4828b986d> in <module>
31 # plot graphs:
32 p = gridplot([[s1, s2]])
---> 33 show(column(Div(text=title), p))
TypeError: 'str' object is not callable
Recalling
import matplotlib.pyplot as plt
does not solve the problem. Hence, recalling
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bokeh.plotting import figure, show, output_notebook, ColumnDataSource
from bokeh.io import output_notebook
from bokeh.layouts import column, gridplot
from bokeh.models import Label, Title
from bokeh.models import Div
solves the problem. Any further idea what might cause this error?
In the mean time I got a very useful hint: In one of the prior cells I accidentially used a Bokeh API function name as variable name and overwrote the function. If someone faces a comparable problem have a look at your variable naming. Maybe there happend the same accident... ;-)
#############################
# Define column names of XData binary part
header = ["Col1","Col2","Col3"]
# Split XData in single, space separated columns
x_df = selected_df.XData.str.split(' ', expand=True)
x_df.drop(0, inplace=True, axis=1)
x_df.columns = header
#print(x_df)
# Binary XData to integer
for column in x_df: # DONT DO THAT!!!!! THIS OVERWRITES BOKEH API FUNCTION. EG. USE `col` INSTEAD OF `column`
x_df[column] = x_df[column].apply(int, base=16) # DONT DO THAT!!!!! THIS OVERWRITES BOKEH API FUNCTION. EG. USE `col` INSTEAD OF `column`

Facets not working properly plotly express

import plotly.graph_objects as go
import plotly.express as px
fig = px.histogram(df, nbins = 5, x = "numerical_col", color = "cat_1", animation_frame="date",
range_x=["10000","500000"], facet_col="cat_2")
fig.update_layout(
margin=dict(l=25, r=25, t=20, b=20))
fig.show()
How can I fix the output? I would like multiple subplots based on cat_2 where the hue is cat_1.
you have not provided sample data, so I've simulated it based on code you are using to generate figure
I have encountered one issue range_x does not work, it impacts y-axis as well. Otherwise approach fully works.
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
import pandas as pd
# data not provided.... simulate some
DAYS = 5
ROWS = DAYS * 2000
df = pd.DataFrame(
{
"date_d": np.repeat(pd.date_range("1-Jan-2021", periods=DAYS), ROWS // DAYS),
"numerical_col": np.random.uniform(10000, 500000, ROWS),
"cat_1": np.random.choice(list("ABCD"), ROWS),
"cat_2": np.random.choice(list("UVWXYZ"), ROWS),
}
)
# animation frame has to be a string not a date...
df["date"] = df["date_d"].dt.strftime("%Y-%b-%d")
# always best to provide pre-sorted data to plotly
df = df.sort_values(["date", "cat_1", "cat_2"])
fig = px.histogram(
df,
nbins=5,
x="numerical_col",
color="cat_1",
animation_frame="date",
# range_x=[10000, 500000],
facet_col="cat_2",
)
fig.update_layout(margin=dict(l=25, r=25, t=20, b=20))

How to fix 'RuntimeError: Locator ... exceeds Locator.MAXTICKS - matplotlib'

I'm plotting camapign data on a timeline, where only the time (rather than the date) sent is relevant hance the column contains only time data (imported from a csv)
It displays the various line graphs (spaghetti plot) however, when I want to add the labels to the x axis, I receive
RuntimeError: Locator attempting to generate 4473217 ticks from 30282.0 to 76878.0: exceeds Locator.MAXTICKS
I have 140 rows of data for this test file, the times are between 9:05 and 20:55 and my code is supposed to get a tick for every 15 minutes.
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
pandas: 0.23.4
matplotlib: 3.0.2
My actual code looks like:
import pandas
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
file_name = r'''C:\\Users\\A_B_testing.csv'''
df1 = pandas.read_csv(file_name, encoding='utf-8')
df_Campaign1 = df1[df1['DataSource ID'].str.contains('Campaign1')==True]
Campaign1_times = df_Campaign1['time sent'].tolist()
Campaign1_revenue = df_Campaign1['EstValue/sms'].tolist()
Campaign1_times = [datetime.strptime(slot,"%H:%M").time() for slot in Campaign1_times]
df_Campaign2 = df1[df1['DataSource ID'].str.contains('Campaign2')==True]
Campaign2_times = df_Campaign2['time sent'].tolist()
Campaign2_revenue = df_Campaign2['EstValue/sms'].tolist()
Campaign2_times = [datetime.strptime(slot,"%H:%M").time() for slot in Campaign2_times]
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
xlocator = mdates.MinuteLocator(byminute=None, interval=15) # tick every 15 minutes
xformatter = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_locator(xlocator)
ax.xaxis.set_major_formatter(xformatter)
ax.minorticks_off()
plt.grid(True)
plt.plot(Campaign1_times, Campaign1_revenue, c = 'g', linewidth = 1)
plt.plot(Campaign2_times, Campaign2_revenue, c = 'y', linewidth = 2)
plt.show()
I tired to reduce the number of values to be plotted and it worked fine on a dummy set as follows:
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import HourLocator, MinuteLocator, DateFormatter
from datetime import datetime
fig, ax = plt.subplots(1, figsize=(16, 6))
xlocator = MinuteLocator(interval=15)
xformatter = DateFormatter('%H:%M')
ax.xaxis.set_major_locator(xlocator)
ax.xaxis.set_major_formatter(xformatter)
ax.minorticks_off()
plt.grid(True, )
xvalues = ['9:05', '10:35' ,'12:05' ,'12:35', '13:05']
xvalues = [datetime.strptime(slot,"%H:%M") for slot in xvalues]
yvalues = [2.2, 2.4, 1.7, 3, 2]
zvalues = [3.2, 1.4, 1.8, 2.7, 2.2]
plt.plot(xvalues, yvalues, c = 'g')
plt.plot(xvalues, zvalues, c = 'b')
plt.show()
So I think that issue is related to the way I'm declaring the ticks, tried to find a relevant post here on but none has solved my problem. Can anyone please point me to the right direction? Thanks in advance.
I had a similar issue which got fixed by using datetime objects instead of time objects in the x-axis.
Similarly, in the code of the question, using the full datetime instead of just the time should fix the issue.
replace:
[datetime.strptime(slot,"%H:%M").time() for slot in ...
by:
[datetime.strptime(slot,"<full date format>") for slot in

x axis labels (date) slips in Python matplotlib

I'm beginner in Python and I have the following problems. I would like to plot a dataset, where the x-axis shows date data. The Dataset look likes the follows:
datum, start, end
2017.09.01 38086 37719,8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
In the first column are the labels of the x-axis (this is the date).
When I write the following code the x axis data slips:
import pandas as pd
import matplotlib.pyplot as plt
bux = pd.read_csv('C:\\Home\\BUX.txt',
sep='\t',
decimal='.',
header=0)
fig1 = bux.plot(marker='o')
fig1.set_xticklabels(bux.datum, rotation='vertical', fontsize=8)
The resulted figure look likes as follows:
The second data row in the dataset is '2017.09.04 37707.3906 37465.2617', BUT '2017.09.04' is yield at the third data row with start value=37471.5117
What shell I do to get correct x axis labels?
Thank you!
Agnes
First, there is a comma in the second line instead of a .. This should be adjusted. Then, you convert the "datum," column to actual dates and simply plot the dataframe with matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/BUX.txt', sep='\s+')
df["datum,"] = pd.to_datetime(df["datum,"], format="%Y.%m.%d")
plt.plot(df["datum,"], df["start,"], marker="o")
plt.plot(df["datum,"], df["end"], marker="o")
plt.gcf().autofmt_xdate()
plt.show()
Thank you! It works perfectly. The key moment was to convert the data to date format. Thank you again!
Agnes
Actually you can easily use the df.plot() to fix it:
import pandas as pd
import matplotlib.pyplot as plt
import io
t="""
date start end
2017.09.01 38086 37719.8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
"""
import numpy as np
data=pd.read_fwf(io.StringIO(t),header=1,parse_dates=['date'])
data.plot(x='date',marker='o')
plt.show()

Timeserie datetick problems when using pandas.DataFrame.plot method

I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)
xt = axe.get_xticks()
When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:
t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.
This explains why I cannot succed to format my axe using:
axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
The first raise an error that there is to many ticks to generate
The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).
It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.
If I run instead:
fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)
xt = axe.get_xticks()
Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].
How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?
Update
The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')
There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.
Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index
# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)
# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()
plt.show()

Resources