Timeserie datetick problems when using pandas.DataFrame.plot method - python-3.x

I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)
xt = axe.get_xticks()
When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:
t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.
This explains why I cannot succed to format my axe using:
axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
The first raise an error that there is to many ticks to generate
The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).
It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.
If I run instead:
fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)
xt = axe.get_xticks()
Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].
How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?
Update
The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')
There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.

Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index
# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)
# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()
plt.show()

Related

X and Y label being cut in matplotlib plots

I have this code:
import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
start = datetime.date(2016,1,1)
end = datetime.date.today()
stock = 'fb'
fig = plt.figure(dpi=1400)
data = web.DataReader(stock, 'yahoo', start, end)
fig, ax = plt.subplots(dpi=720)
data['vol_pct'] = data['Volume'].pct_change()
data.plot(y='vol_pct', ax = plt.gca(), title = 'this is the title \n second line')
ax.set(xlabel="Date")
ax.legend(loc='upper center', bbox_to_anchor=(0.32, -0.22), shadow=True, ncol=2)
plt.savefig('Test')
This is an example of another code but the problem is the same:
At bottom of the plot you can see that the legend is being cut out. In another plot of a different code which i am working on, even the ylabel is also cut when i save the plot using plt.savefig('Test').How can i can fix this?
It's a long-standing issue with .savefig() that it doesn't check legend and axis locations before setting bounds. As a rule, I solve this with the bbox_inches argument:
plt.savefig('Test', bbox_inches='tight')
This is similar to calling plt.tight_layout(), but takes all of the relevant artists into account, whereas tight_layout will often pull some objects into frame while cutting off new ones.
I have to tell pyplot to keep it tight more than half the time, so I'm not sure why this isn't the default behavior.
plt.subplots_adjust(bottom=0.4 ......)
I think this modification will satisfy you.
Or maybe you can relocate the legend to loc="upper left"
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots_adjust.html
please also checked this issue which raised 8 years ago..
Moving matplotlib legend outside of the axis makes it cutoff by the figure box

matplotlib x-axis messed up [duplicate]

I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter

How to fix 'RuntimeError: Locator ... exceeds Locator.MAXTICKS - matplotlib'

I'm plotting camapign data on a timeline, where only the time (rather than the date) sent is relevant hance the column contains only time data (imported from a csv)
It displays the various line graphs (spaghetti plot) however, when I want to add the labels to the x axis, I receive
RuntimeError: Locator attempting to generate 4473217 ticks from 30282.0 to 76878.0: exceeds Locator.MAXTICKS
I have 140 rows of data for this test file, the times are between 9:05 and 20:55 and my code is supposed to get a tick for every 15 minutes.
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
pandas: 0.23.4
matplotlib: 3.0.2
My actual code looks like:
import pandas
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
file_name = r'''C:\\Users\\A_B_testing.csv'''
df1 = pandas.read_csv(file_name, encoding='utf-8')
df_Campaign1 = df1[df1['DataSource ID'].str.contains('Campaign1')==True]
Campaign1_times = df_Campaign1['time sent'].tolist()
Campaign1_revenue = df_Campaign1['EstValue/sms'].tolist()
Campaign1_times = [datetime.strptime(slot,"%H:%M").time() for slot in Campaign1_times]
df_Campaign2 = df1[df1['DataSource ID'].str.contains('Campaign2')==True]
Campaign2_times = df_Campaign2['time sent'].tolist()
Campaign2_revenue = df_Campaign2['EstValue/sms'].tolist()
Campaign2_times = [datetime.strptime(slot,"%H:%M").time() for slot in Campaign2_times]
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
xlocator = mdates.MinuteLocator(byminute=None, interval=15) # tick every 15 minutes
xformatter = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_locator(xlocator)
ax.xaxis.set_major_formatter(xformatter)
ax.minorticks_off()
plt.grid(True)
plt.plot(Campaign1_times, Campaign1_revenue, c = 'g', linewidth = 1)
plt.plot(Campaign2_times, Campaign2_revenue, c = 'y', linewidth = 2)
plt.show()
I tired to reduce the number of values to be plotted and it worked fine on a dummy set as follows:
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import HourLocator, MinuteLocator, DateFormatter
from datetime import datetime
fig, ax = plt.subplots(1, figsize=(16, 6))
xlocator = MinuteLocator(interval=15)
xformatter = DateFormatter('%H:%M')
ax.xaxis.set_major_locator(xlocator)
ax.xaxis.set_major_formatter(xformatter)
ax.minorticks_off()
plt.grid(True, )
xvalues = ['9:05', '10:35' ,'12:05' ,'12:35', '13:05']
xvalues = [datetime.strptime(slot,"%H:%M") for slot in xvalues]
yvalues = [2.2, 2.4, 1.7, 3, 2]
zvalues = [3.2, 1.4, 1.8, 2.7, 2.2]
plt.plot(xvalues, yvalues, c = 'g')
plt.plot(xvalues, zvalues, c = 'b')
plt.show()
So I think that issue is related to the way I'm declaring the ticks, tried to find a relevant post here on but none has solved my problem. Can anyone please point me to the right direction? Thanks in advance.
I had a similar issue which got fixed by using datetime objects instead of time objects in the x-axis.
Similarly, in the code of the question, using the full datetime instead of just the time should fix the issue.
replace:
[datetime.strptime(slot,"%H:%M").time() for slot in ...
by:
[datetime.strptime(slot,"<full date format>") for slot in

Matplotlib function visualtization changing with precision

So I was trying to map out some math functions in 3d using matplotlib when I noticed something... The 3d plot suddenly changed (more like broke) when I tried to fix a previous issue wherein I was encountering some 'missing surface' - a gap in the plot. The main question is this -- Is the 3d plot not showing the two peaks on higher precision due to some inherent computing limitations of Axes3d or some other reason? Also a secondary question -- Why am I encountering 'missing surfaces' near +1.25 and -1.25 in lower precision plot?
I have tried googling for it and referred a few posts but nothing came ot except more questions.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
X=np.arange(-2,+2,0.025)
## Use np.arange(-5,+5,0.25) to experience the 'surface loss' I mention but otherwise correct 2 spike plot at each of (0,-1) and (0,+1) for both X and Y
Y=np.arange(-2,+2,0.025)
X,Y=np.meshgrid(X,Y)
R=1+X**2-Y**2
S=R**2+4*(X**2)*(Y**2)
Z=R/S
fig=plt.figure()
ax=Axes3D(fig)
ax.plot_surface(X,Y,Z,rstride=1,cstride=1,cmap=cm.viridis,norm=mpl.colors.Normalize(vmin=-1.,vmax=1.))
##NORMALIZE Was essential to get the proper color range
plt.show()
plt.savefig('art3d.jpeg',bbox_inches='tight')
plt.savefig('art3d.svg',bbox_inches='tight')
The ideal result should be like this (shows the func and the plot)
https://i.stack.imgur.com/kVnYc.png
The two plots I'm getting could be seen when the code is run as I can't seem to add images presumably because of low reputation :(
Any and all help is appreciated.Thanks in advance.
First note that the function in use is different from the wolfram alpha output. So let's use the function shown in the screenshot. Then you can limit the data to the range you want to show.
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
X = np.arange(-2,+2,0.025)
Y=np.arange(-2,+2,0.025)
X,Y=np.meshgrid(X,Y)
Z = -2*X*Y / ((2*X*Y)**2 + (X**2 - Y**2 + 1)**2)
Z[(Z < -1)] = -1
Z[(Z > 1)] = 1
fig=plt.figure()
ax=Axes3D(fig)
ax.plot_surface(X,Y,Z,rstride=1,cstride=1,cmap=cm.viridis,norm=mpl.colors.Normalize(vmin=-1.,vmax=1.))
plt.show()

Y-axis values not showing in matplotlib.pyplot plot

My plot is not showing any indication of what the order of magnitude of my y-values are on the axis. How do I force python to indicate some values on the y-axis?
import numpy as np
import matplotlib.pyplot as plt
BERfinal = [0.4967843137254903, 0.49215686274509757, 0.4938823529411763,
0.49170588235294116, 0.48852941176470605, 0.48203921568627417,
0.4797058823529405, 0.47454901960784257, 0.4795686274509802,
0.474901960784313, 0.4732549019607838, 0.4703137254901953,
0.4705490196078425]
x = np.linspace(-4,8,len(BERfinal))
plt.semilogy(x,BERfinal)
plt.title("BER vs SNR")
plt.ylabel("Bit Error Rate(BER)")
plt.xlabel("Signal-to-Noise Ratio(SNR)[dB]")
plt.xlim(-4,8)
I ended up playing around with:
plt.ylim(4.7*10**-1, 5*10**-1)
and changed the values until I found an appropriate range. It now shows 5x10^-1 on the y-axis.

Resources