Need to display custom shared x-axis tick labels on both subplots, using two datasets with different dates.
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
#Dataset 1
rng1 = date_range(start='01-01-2015', periods=5, freq='1M')
df1 = DataFrame({'y':np.random.normal(size=len(rng1))}, index=rng1)
y1 = df1['y']
#Dataset 2
rng2 = date_range(start='01-01-2015', periods=5, freq='2M')
df2 = DataFrame({'y':np.random.normal(size=len(rng2))}, index=rng2)
y2 = df2['y']
#Figure
fig,(ax1,ax2) = plt.subplots(2,1,sharex=True)
y1.plot(ax=ax1)
y2.plot(ax=ax2)
plt.xticks(rotation=30)
ax1.xaxis.set_minor_formatter(plt.NullFormatter())
ax2.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%D'))
plt.show()
(Above) Creates figure without x-axis labels on the upper subplot
I expect adding the below code to display the same x-axis labels to the upper subplot, but they are not showing up. What am I doing wrong?
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%D'))
Setting sharex=False does not work because the dates are different for each dataset.
First of all, if you want to use matplotlib.dates locators and formatters on a plot created via pandas you should use the x_compat=True argument in your plot, otherwise pandas may scale the axis rather arbitrarily.
Then '%D' is not a valid format string. Maybe you mean '%b'?
Now there are two options.
Use sharex=False, set your locators and formatters to both axes, and finally set the limits of the one plot to the limits of the other. In this case since the lower plot comprises a larger range,
ax1.set_xlim(ax2.get_xlim())
The other option is to use sharex=True and turn the labels visible again.
plt.setp(ax1.get_xticklabels(), visible=True)
Unfortunately this option is broken on the newest matplotlib version. I just opened a bug report about it.
Full code for the first option (since the second one is not working):
from pandas import DataFrame, date_range, Timedelta
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
#Dataset 1
rng1 = date_range(start='2015-01-01', periods=5, freq='1M')
df1 = DataFrame({'y':np.random.normal(size=len(rng1))}, index=rng1)
y1 = df1['y']
#Dataset 2
rng2 = date_range(start='2015-01-01', periods=5, freq='2M')
df2 = DataFrame({'y':np.random.normal(size=len(rng2))}, index=rng2)
y2 = df2['y']
#Figure
fig,(ax1,ax2) = plt.subplots(2,1,sharex=False)
y1.plot(ax=ax1, x_compat=True)
y2.plot(ax=ax2, x_compat=True)
plt.xticks(rotation=30)
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax1.xaxis.set_minor_locator(plt.NullLocator())
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax2.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
ax2.xaxis.set_minor_locator(plt.NullLocator())
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax1.set_xlim(ax2.get_xlim())
plt.show()
This seemed to work for me, just following the shared_axis_demo
fig = plt.figure()
ax1 = plt.subplot(211)
_ = plt.plot(y1)
plt.setp(ax1.get_xticklabels(), visible=True)
ax2 = plt.subplot(212, sharex=ax1)
_ = plt.plot(y2)
plt.show()
Related
I have a time-series dataframe with missing data for some time period. I would like to create a line plot and break a line where there is missing data.
data_site1_ave[["samples", "lkt"]].plot(figsize=(15,4), title = "Site 1", xlabel='')
Is it possible to create a gap, let's say from 2018-05-01 to 2018-10-30 in the line plot?
Yes, you can create arbitrary gaps by simply calling df.plot() several times, on the appropriate slices of the full dataframe. To make everything appear in the same plot, you can pass the ax keyword argument to plt.plot() via the df.plot() method. Turn the legend off for all but one call, so that the legend only has the one entry.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# create sample time series
N = 365
np.random.seed(42)
x = pd.date_range('2018-01-01', freq='d', periods=N)
y = np.cumsum(np.random.rand(N, 1) - 0.5)
df = pd.DataFrame(y, columns=['y'], index=x)
# plot time series with gap
fig, ax = plt.subplots()
df.loc[:'2018-05-01'].plot(ax=ax, c='blue')
df.loc['2018-10-31':].plot(ax=ax, c='blue', legend=False);
I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter
can you please tell me how to plot the graph for csv data.
csv file have x,y,depth,color values i want to plot the depth and color for x and y axis,i goggled many times but i didn't find anything properly.so please guide me how to plot the graph for that values?
this is i tried :
from matplotlib import pyplot as plt
from matplotlib import style
import pandas as pd
data=pd.read_csv("Tunnel.csv",names=['x','y','z','color'])
data1 =data[data.z==0]
print (data1)
# plt.plot(data[data.x],data[data.y])
plt.ylabel('yaxis')
plt.xlabel('xaxis')
plt.title('Tunnel 2d')
plt.show()
my data is given bellow
I'm assuming that you want the first two columns to be used as plot axis and columns 3 and 4 as plot data.
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv("Tunnel.csv")
x = stats[stats.columns[2]]
y = stats[stats.columns[3]]
xlab = list(stats)[0] #x-axis label
ylab = list(stats)[1] #y-axis label
fig, pli = plt.subplots()
pli.show()
#Assuming it's a line graph that you want to plot
line, = pli.plot(x, y, color='g', linewidth=5, label='depth vs color')
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
fig.savefig('./Directory/Graph.png')
I am assuming that you want the color and depth as text annotations.
import stuff
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
create the df
dep=list(np.random.randint(0,100,10))
col=list(np.random.randint(0,100,10))
y=[int(x/3)+1 for x in range(0,10)]
x=list(range(0,10))
my_df=pd.DataFrame({'x':x,'y':y,'colour':col,'depth':dep})
create the annotate column
my_df['my_text']='c= '+my_df.colour.astype(str)+','+'d= '+my_df.depth.astype(str)
plot it
plt.figure(figsize=(20,10))
plt.plot(my_df.x,my_df.y,'o')
for i, txt in enumerate(my_df['my_text']):
plt.annotate(txt, (x[i],y[i]), size=10, xytext=(0,0), ha='left', textcoords='offset points', bbox=dict(facecolor='none', edgecolor='red'))
plt.ylabel('yaxis')
plt.xlabel('xaxis')
plt.title('Tunnel 2d')
plt.show()
Result
I am trying to set hue based on a range of values rather than unique values in seaborn stripplot. For example, different colors for different value ranges (1940-1950, 1950-1960 etc.).
sns.stripplot('Condition', 'IM', data=dd3, jitter=0.3, hue= dd3['Year Built'])
Output Figure
Thanks
Looks like you need to bin the data. Use .cut() in the below manner. The years are binned into 5 groups. You can arrange your own step in .arrange() to adjust your ranges.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = np.random.randint(0,100,size=100)
y = np.random.randint(0,100, size=100)
year = np.random.randint(1918, 2019, size=100)
df = pd.DataFrame({
'x':x,
'y':y,
'year':year
})
df['year_bin'] = pd.cut(df['year'], np.arange(min(year), max(year), step=20))
sns.lmplot('x','y', data=df, hue='year_bin')
plt.show()
Output:
I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)
xt = axe.get_xticks()
When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:
t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.
This explains why I cannot succed to format my axe using:
axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
The first raise an error that there is to many ticks to generate
The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).
It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.
If I run instead:
fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)
xt = axe.get_xticks()
Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].
How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?
Update
The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')
There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.
Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index
# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)
# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()
plt.show()