MatPlotLib Plot last few items differently - python-3.x

I'm exploring MatPlotLib and would like to know if it is possible to show last few items in a dataset differently.
Example: If my dataset contains 100 numbers, I want to display last 5 items in different color.
So far I could do it with one last record using annotate, but want to show last few items dotted with 'red' color as against the blue line.
I could finally achieve this by changing few things in my code.
Below is what I have done.
Let me know in case there is a better way. :)
series_df = pd.read_csv('my_data.csv')
series_df = series_df.fillna(0)
series_df = series_df.sort_values(['Date'], ascending=True)
# Created a new DataFrame for last 5 items series_df2
plt.plot(series_df["Date"],series_df["Values"],color="red", marker='+')
plt.plot(series_df2["Date"],series_df2["Values"],color="blue", marker='+')

You should add some minimal code example or a figure with the desired output to make your question clear. It seems you want to highlight some of the last few points with a marker. You can achieve this by calling plot() twice:
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.arange(N)
y = np.random.rand(N)
plt.figure()
plt.plot(x, y)
plt.plot(x[-5:], y[-5:], ls='', c='tab:red', marker='.', ms=10)

Related

Seaborn / Matplotlib: Subplots depending on one column

I have a Dataframe and based on its data, I draw lineplots for it.
The code currently looks as simple as that:
ax = sns.lineplot(x='datapoints', y='mean', hue='index', data=df)
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))
Now, there actually is a column, called "klinger", which has 8 different values and I would like to get a plot consisting of eight subplots (4x2) for it, all sharing just one legend.
Is that an easy thing to do?
Currently, I generate sub-dfs by filtering and just draw eight diagrams and cut them together with a graphic tool, but this can't be the solution
You can get what you are looking for with sns.relplot and kind='line'.
Use col='klinger' to plot subplots as many as you need, col_wrap=4 will help to obtain 4x2 shape, and col_order=klinger_categories will select which categories you want to plot.
import numpy as np
import pandas as pd
import seaborn as sns
number = 100
klinger_categories = ['a','b','c','d','e','f','g','h']
data = {'datapoints': np.arange(number),
'mean': np.random.normal(0,1,size=number),
'index': np.random.choice(np.arange(2),size=number),
'klinger': np.random.choice(klinger_categories,size=number),
}
df = pd.DataFrame(data)
sns.relplot(
data=df, x='datapoints', y='mean', hue='index', kind='line',
col='klinger', col_wrap=4, col_order=klinger_categories
)

Too Many Indices For Array when using matplotlib

Thank you for taking time to read this question.
I am trying to plot pie charts in one row. The number of pie charts will depend on the result returned.
import matplotlib.pyplot as plt
import numpy as np
fig, axs = plt.subplots(1,len(to_plot_arr))
labels = ['Label1','Label2','Label3','Label4']
pos = 0
for scope in to_plot_arr:
if data["summary"][scope]["Count"] > 0:
pie_data = np.array(db_data)
axs[0,pos].pie(pie_data,labels=labels)
axs[0,pos].set_title(scope)
pos += 1
plt.show()
In the code, db_data looks like: [12,75,46,29]
When I execute the code above, I get the following error message:
Exception has occurred: IndexError
too many indices for array: array is 1-dimensional, but 2 were indexed
I've tried searching for what could be causing this problem, but just can't find any solution to it. I'm not sure what is meant by "but 2 were indexed"
I've tried generating a pie cahrt with :
y = np.array(db_data)
plt.pie(y)
plt.show()
And it generates the pie chart as expected. So, I'm not sure what is meant by "too many indices for array" which array is being referred to and how to resolve this.
Hope you are able to help me with this.
Thank You Again.
Notice that the axs you create in line 4 is of shape (len(to_plot_arr),) i.e., is 1D array, but in the loop in lines 11 and 12 you provide it 2 indices, which tells the interpreter that it is a 2D array, and conflicts with its actual shape.
Here is a fix:
import matplotlib.pyplot as plt
import numpy as np
fig, axs = plt.subplots(1,len(to_plot_arr))
labels = ['Label1','Label2','Label3','Label4']
pos = 0
for scope in to_plot_arr:
if data["summary"][scope]["Count"] > 0:
pie_data = np.array(db_data)
axs[pos].pie(pie_data,labels=labels)
axs[pos].set_title(scope)
pos += 1
plt.show()
Cheers.
So, I think this not technically and answer because I still don't know what was causing the error, but I found a way to solve my problem while still achieving my desired output.
Firstly, I realised, when I changed:
fig, axs = plt.subplots(1,len(to_plot_arr))
to:
fig, axs = plt.subplots(2,len(to_plot_arr)),
the figure could be drawn. So, I continued to try with other variations like (1,2),(2,1),(1,3) and always found that if nrows`` or ncols``` was 1, the error would come up.
Fortunately, for my use case, the layout I required was with 2 rows with the first row being one column, spanning 2 and the bottom row being 2 columns.
So, (2,2) fit my use case very well.
Then I set out to get the top row to span 2 columns and found out that this is best done with GridSpec in Matplotlib. While trying to figure out how to use GridSpec, I came to learn that using add_subplot() would be a better route with more flexibility.
So, my final code looks something like:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.gridspec import GridSpec
def make_chart():
fig = plt.figure()
fig.set_figheight(8)
fig.set_figwidth(10)
# Gridspec is used to specify the grid distribution of the figure
gs = GridSpec(2,len(to_plot_arr),figure=fig)
# This allows for the first row to span all the columns
r1 = fig.add_subplot(gs[0,:])
tbl = plt.table(
cellText = summary_data,
rowLabels = to_plot_arr,
colLabels = config["Key3"],
loc ='upper left',
cellLoc='center'
)
tbl.set_fontsize(20)
tbl.scale(1,3)
r1.axis('off')
pos = 0
for scope in to_plot_arr:
if data["Key1"][scope][0] > 0:
pie_data = np.array(data["Key2"][scope])
# Add a chart at the specified position
r2 = fig.add_subplot(gs[1,pos])
r2.pie(pie_data, autopct=make_autopct(pie_data))
r2.set_title(config["Key3"][scope])
pos += 1
fig.suptitle(title, fontsize=24)
plt.xticks([])
plt.yticks([])
fig.legend(labels,loc="center left",bbox_to_anchor=(0,0.25))
plt.savefig(savefile)
return filename
This was my first go at trying to use Matplotlib, the learning curve has been steep but with a little of patients and attention to the documentation, I was able to complete my task. I'm sure that there's better ways to do what I did. If you do know a better way or know how to explain the error I was encountering, please do add an answer to this.
Thank You!

Gantt Chart for USGS Hydrology Data with Python?

I have a compiled a dataframe that contains USGS streamflow data at several different streamgages. Now I want to create a Gantt chart similar to this. Currently, my data has columns as site names and a date index as rows.
Here is a sample of my data.
The problem with the Gantt chart example I linked is that my data has gaps between the start and end dates that would normally define the horizontal time-lines. Many of the examples I found only account for the start and end date, but not missing values that may be in between. How do I account for the gaps where there is no data (blanks or nan in those slots for values) for some of the sites?
First, I have a plot that shows where the missing data is.
import missingno as msno
msno.bar(dfp)
Now, I want time on the x-axis and a horizontal line on the y-axis that tracks when the sites contain data at those times. I know how to do this the brute force way, which would mean manually picking out the start and end dates where there is valid data (which I made up below).
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dt
df=[('RIO GRANDE AT EMBUDO, NM','2015-7-22','2015-12-7'),
('RIO GRANDE AT EMBUDO, NM','2016-1-22','2016-8-5'),
('RIO GRANDE DEL RANCHO NEAR TALPA, NM','2014-12-10','2015-12-14'),
('RIO GRANDE DEL RANCHO NEAR TALPA, NM','2017-1-10','2017-11-25'),
('RIO GRANDE AT OTOWI BRIDGE, NM','2015-8-17','2017-8-21'),
('RIO GRANDE BLW TAOS JUNCTION BRIDGE NEAR TAOS, NM','2015-9-1','2016-6-1'),
('RIO GRANDE NEAR CERRO, NM','2016-1-2','2016-3-15'),
]
df=pd.DataFrame(data=df)
df.columns = ['A', 'Beg', 'End']
df['Beg'] = pd.to_datetime(df['Beg'])
df['End'] = pd.to_datetime(df['End'])
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax = ax.xaxis_date()
ax = plt.hlines(df['A'], dt.date2num(df['Beg']), dt.date2num(df['End']))
How do I make a figure (like the one shown above) with the dataframe I provided as an example? Ideally I want to avoid the brute force method.
Please note: values of zero are considered valid data points.
Thank you in advance for your feedback!
Find date ranges of non-null data
2020-02-12 Edit to clarify logic in loop
df = pd.read_excel('Downloads/output.xlsx', index_col='date')
Make sure the dates are in order:
df.sort_index(inplace=True)
Loop thru the data and find the edges of the good data ranges. Get the corresponding index values and the name of the gauge and collect them all in a list:
# Looping feels like defeat. However, I'm not clever enough to avoid it
good_ranges = []
for i in df:
col = df[i]
gauge_name = col.name
# Start of good data block defined by a number preceeded by a NaN
start_mark = (col.notnull() & col.shift().isnull())
start = col[start_mark].index
# End of good data block defined by a number followed by a Nan
end_mark = (col.notnull() & col.shift(-1).isnull())
end = col[end_mark].index
for s, e in zip(start, end):
good_ranges.append((gauge_name, s, e))
good_ranges = pd.DataFrame(good_ranges, columns=['gauge', 'start', 'end'])
Plotting
Nothing new here. Copied pretty much straight from your question:
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax = ax.xaxis_date()
ax = plt.hlines(good_ranges['gauge'],
dt.date2num(good_ranges['start']),
dt.date2num(good_ranges['end']))
fig.tight_layout()
Here's an approach that you could use, it's a bit hacky so perhaps some else will produce a better solution but it should produce your desired output. First use pd.where to replace non NaN values with an integer which will later determine the position of the lines on y-axis later, I do this row by row so that all data which belongs together will be at the same height. If you want to increase the spacing between the lines of the gantt chart you can add a number to i, I've provided an example in the comments in the code block below.
The y-labels and their positions are produced in the data munging steps, so this method will work regardless of the number of columns and will position the labels correctly when you change the spacing described above.
This approach returns matplotlib.pyplot.axes and matplotlib.pyplot.Figure object, so you can adjust the asthetics of the chart to suit your purposes (i.e. change the thickness of the lines, colours etc.). Link to docs.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_excel('output.xlsx')
dates = pd.to_datetime(df.date)
df.index = dates
df = df.drop('date', axis=1)
new_rows = [df[s].where(df[s].isna(), i) for i, s in enumerate(df, 1)]
# To increase spacing between lines add a number to i, eg. below:
# [df[s].where(df[s].isna(), i+3) for i, s in enumerate(df, 1)]
new_df = pd.DataFrame(new_rows)
### Plotting ###
fig, ax = plt.subplots() # Create axes object to pass to pandas df.plot()
ax = new_df.transpose().plot(figsize=(40,10), ax=ax, legend=False, fontsize=20)
list_of_sites = new_df.transpose().columns.to_list() # For y tick labels
x_tick_location = new_df.iloc[:, 0].values # For y tick positions
ax.set_yticks(x_tick_location) # Place ticks in correct positions
ax.set_yticklabels(list_of_sites) # Update labels to site names

Python seaborn heatmap grid - Not taking expected columns

I have following pandas dataframe. Basically, 7 different action categories, 5 different targets, each category has 1 or many unique endpoints, then each endpoint got a certain score in each target.
There are total 250 endpoints.
action,target,endpoint,score
Category1,target1,endpoint1,813.0
Category1,target2,endpoint1,757.0
Category1,target3,endpoint1,155.0
Category1,target4,endpoint1,126.0
Category1,target5,endpoint1,75.5
Category2,target1,endpoint2,106.0
Category2,target1,endpoint3,101.0
Category2,target1,endpoint4,499.0
Category2,target1,endpoint5,207.0
Category2,target2,endpoint2,316.0
Category2,target2,endpoint3,208.0
Category2,target2,endpoint4,161.0
Category2,target2,endpoint5,198.0
<omit>
Category3,target1,endpoint8,193.0
Category3,target1,endpoint9,193.0
Category3,target1,endpoint10,193.0
Category3,target1,endpoint11,193.0
Category3,target2,endpoint8,193.0
Category3,target2,endpoint9,193.0
<List goes on...>
Now, I wanted to map out this dataframe as heatmap per category.
So, I used seabron facet grid heatmap with the following code.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('rawData.csv')
data = data.drop('Unnamed: 0', 1)
def facet_heatmap(data, **kwargs):
data2 = data.pivot(index="target", columns='endpoint', values='score')
ax1 = sns.heatmap(data2, cmap="YlGnBu", linewidths=2)
for item in ax1.get_yticklabels():
item.set_rotation(0)
for item in ax1.get_xticklabels():
item.set_rotation(70)
with sns.plotting_context(font_scale=5.5):
g = sns.FacetGrid(data, col="action", col_wrap=7, size=5, aspect=0.5)
cbar_ax = g.fig.add_axes([.92, .3, .02, .4])
g = g.map_dataframe(facet_heatmap, cbar=cbar_ax, min=0, vmax=2000)
# <-- Specify the colorbar axes and limits
g.set_titles(col_template="{col_name}", fontweight='bold', fontsize=18)
g.fig.subplots_adjust(right=3) # <-- Add space so the colorbar doesn't overlap the plot
plt.savefig('seabornPandas.png', dpi=400)
plt.show()
It actually generates heatmap grid. However, the problem is the each heatmap uses the same column for some reason. See attached screenshot below.
(Please ignore color bar and limits.)
This is quite odd. First, the Index is not in order. Second, each heatmap box only takes the last three endpoints (Endpoint 248, 249, and 250). This is incorrect. For category 1, it should take endpoint 1 only. I don't expect a gray box there..
For category2, it should take endpoint 2,3,4,5. Not endpoint 248, 249, 250.
How can I fix these two issues? Any suggestion or comments are welcome.
as mwaskom suggested: use the sharex parameter to fix your issues:
...
with sns.plotting_context(font_scale=5.5):
g = sns.FacetGrid(data, col="action", col_wrap=7, size=5, aspect=0.5,
sharex=False)
...

Matplotlib: personalize imshow axis

I have the results of a (H,ranges) = numpy.histogram2d() computation and I'm trying to plot it.
Given H I can easily put it into plt.imshow(H) to get the corresponding image. (see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow )
My problem is that the axis of the produced image are the "cell counting" of H and are completely unrelated to the values of ranges.
I know I can use the keyword extent (as pointed in: Change values on matplotlib imshow() graph axis ). But this solution does not work for me: my values on range are not growing linearly (actually they are going exponentially)
My question is: How can I put the value of range in plt.imshow()? Or at least, or can I manually set the label values of the plt.imshow resulting object?
Editing the extent is not a good solution.
You can just change the tick labels to something more appropriate for your data.
For example, here we'll set every 5th pixel to an exponential function:
import numpy as np
import matplotlib.pyplot as plt
im = np.random.rand(21,21)
fig,(ax1,ax2) = plt.subplots(1,2)
ax1.imshow(im)
ax2.imshow(im)
# Where we want the ticks, in pixel locations
ticks = np.linspace(0,20,5)
# What those pixel locations correspond to in data coordinates.
# Also set the float format here
ticklabels = ["{:6.2f}".format(i) for i in np.exp(ticks/5)]
ax2.set_xticks(ticks)
ax2.set_xticklabels(ticklabels)
ax2.set_yticks(ticks)
ax2.set_yticklabels(ticklabels)
plt.show()
Expanding a bit on #thomas answer
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mi
im = np.random.rand(20, 20)
ticks = np.exp(np.linspace(0, 10, 20))
fig, ax = plt.subplots()
ax.pcolor(ticks, ticks, im, cmap='viridis')
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlim([1, np.exp(10)])
ax.set_ylim([1, np.exp(10)])
By letting mpl take care of the non-linear mapping you can now accurately over-plot other artists. There is a performance hit for this (as pcolor is more expensive to draw than AxesImage), but getting accurate ticks is worth it.
imshow is for displaying images, so it does not support x and y bins.
You could either use pcolor instead,
H,xedges,yedges = np.histogram2d()
plt.pcolor(xedges,yedges,H)
or use plt.hist2d which directly plots your histogram.

Resources