how to avoid overlapping of barchart using matplotlib - python-3.x

I am new to data science using python . While plotting two different barcharts I got into an problem. This is my code :
def compare_groups(field):
if field in less_equal_150_cal.columns:
less_equal_150_cal[field].plot.bar(color = 'blue',alpha =0.4, title = field )
more_150_cal[field].plot.bar(color = 'red', alpha =0.4)
else:
raise ValueError(f"{field} not found")
The resulting bargraphs are overlapping each other. I want two different bar graphs.

Related

Altair: Remove title from layered faceted graphs

I tried layering faceted graphs and it failed, so moved to the method suggested in here - https://stackoverflow.com/a/52882510/20390480 which basically layer the graphs and then call .facet(column). With this method I am unable to remove the facet title.
I tried .facet(column, title=None) throws the following error.
import altair as alt
from vega_datasets import data
cars = data.cars()
horse = alt.Chart().mark_point().encode(
x = 'Weight_in_lbs',
y = 'Horsepower'
)
miles = alt.Chart().mark_point(color='red').encode(
x = 'Weight_in_lbs',
y = 'Miles_per_Gallon'
)
alt.layer(horse, miles, data=cars).facet(column='Origin', title=None)
SchemaValidationError: Invalid specification
altair.vegalite.v4.api.Chart, validating 'required'
'data' is a required property
alt.FacetChart(...)
Try:
alt.layer(horse, miles, data=cars).facet(column=alt.Column('Origin', title=None))

Pandas Series boolean maps and plotting

I am just trying to up my understanding of plotting Pandas Series data using Booleans to mask out values I don't want. I am not sure that what I have is the correct or efficient way to do it.
Don't get me wrong, I do get the chart I am after but are my assumptions on the syntax correct?
All I want to do is plot the non zero values on my chart. I have not formatted the charts as I would normally as this was just a test of Booleans and masking data and not for creating report grade charts.
If I masked this as a Pandas DataFrame I would do the following if df1 were my DataFrame.
I understand this and it makes sense that the df1[mask] returns my values as required
# Plot our graph with only items that are non-zero
fig = px.bar(df1[mask], x = 'Animals', y = 'Count')
fig.show()
Doing it as a Pandas Series
This is the snippet that creates the graph I require
# Plot our graph with only items that are non-zero
fig = px.bar(sf, x = sf.index[sf_mask], y = sf[sf_mask])
fig.show()
After my initial test with adding my mask to sf and getting an error. I deduced that I needed to add the mask against the x and y parameters. I take it this is because a Series is just a single column and the index is set as my "animals". Therefore by mapping the sf.index[sf_mask] I get the returned animals in the index and sf[sf_mask] returns me the values. failure to add either one would give a "ValueError" stating that the arguments should have the same length.
Here is what I did to test my workings
My initial imports and setting up Plotly as my plotting backend
import pandas as pd
import plotly.express as px
# Set our plotting backend to Plotly
pd.options.plotting.backend = "plotly"
I just created a test dataset from a dictionary
animals = {'rabbits' : 1,
'dogs' : 3,
'cats' : 0,
'ferrets' : 3,
'horses' : 8,
'goldfish' : 0,
'guinea_pigs' : 2,
'hamsters' : 6,
'mice' : 3,
'rats' : 0
}
Then converted it to a pandas Series
sf = pd.Series(animals)
I then create my boolean mask to mask out all our non-Zero entries on our Pandas Series
sf_mask = sf != 0
And if I then view the mask I can see I only get non zero values which is exactly what I am looking for.
sf[sf_mask]
Which outputs my non-zero items in my series.
rabbits 1
dogs 3
ferrets 3
horses 8
guinea_pigs 2
hamsters 6
mice 3
dtype: int64
If I plot without my Boolean mask 'sf_mask' using the following syntax I get my complete Pandas Series charted
# Plot our Series showing all items
fig = px.bar(sf, x = sf.index, y = sf)
fig.show()
Which outputs the following chart
If I plot with my Boolean mask 'sf_mask' using the following syntax I get the chart I want which excludes the gaps with zero value items.
# Plot our graph with only items that are non-zero
fig = px.bar(sf, x = sf.index[sf_mask], y = sf[sf_mask])
fig.show()
Which outputs the correct chart.
Your understanding of booleans and masking is correct.
You can simplify your syntax a little though: if you take a look at the plotly.express.bar documentation, you'll see that the arguments 'x' and 'y' are optional. You don't need to pass 'x' or 'y' because by default plotly.express will create the bars using the index of the Series as x and the values of the Series as y. You can also pass the masked series in place of the entire series.
For example, this will produce the same bar chart:
fig = px.bar(sf[sf>0])
fig.update_layout(showlegend=False)

Flopy MF6 Plotting specific discharge in cross-section view

How to plot discharge vectors in cross-section view in Flopy MF6?
I know this plots for plan view:
quiver = mapview.plot_specific_discharge(spdis[0])
I tried to get specific discharge using the following code but got error:
AttributeError: module 'flopy.utils.postprocessing' has no attribute 'get_specific_discharge'
Code:
hds_file = os.path.join(workspace, headfile)
hds = flopy.utils.binaryfile.HeadFile(hds_file)
cbb_file = os.path.join(workspace, budgetfile)
cbb = flopy.utils.CellBudgetFile(fname, precision='double')
q = flopy.utils.postprocessing.get_specific_discharge(gwf,cbb_file, precision='single', kstpkper=(0,1),
hdsfile=hds_file, position='centers')
For me (using mf6) plotting specific discharge on cross sections works like this:
Reading the cbc output:
cbcdobj = flopy.utils.binaryfile.CellBudgetFile(path, precision='double', verbose=True)
SPDIS = cbcdobj.get_data(kstpkper=kstpkper, text='DATA-SPDIS')[0]
You might need to use 'verbose=False' and 'precision=single' when using mf2005.
Then the cross section:
cros_mp=flopy.plot.PlotCrossSection(model=gwf, line={'row': 200})
cros_mp.plot_specific_discharge(SPDIS)
Remark that plotting specific discharges on an irregular cross section ('line', not 'row' nor 'column') is not possible.

How to control the number of stacked bars through single select widget in python bokeh

I have created a vertical stacked bar chart using python bokeh on an input dataset df using the following code -
print(df.head())
YearMonth A B C D E
0 Jan'18 1587.816 1586.544 856.000 1136.464 1615.360
1 Feb'18 2083.024 1847.808 1036.000 1284.016 2037.872
2 Mar'18 2193.420 1850.524 1180.000 1376.028 2076.464
3 Apr'18 2083.812 1811.636 1192.028 1412.028 2104.588
4 May'18 2379.976 2091.536 1452.000 1464.432 2400.876
Stacked Bar Chart Code -
products = ['python', 'pypy', 'jython']
customers = ['Cust 1', 'Cust 2']
colours = ['red', 'blue']
data = {
'products': products,
'Cust 1': [200, 850, 400],
'Cust 2': [600, 620, 550],
'Retail 1' : [100, 200, 300],
'Retail 2' : [400,500,600]
}
source = ColumnDataSource(data)
# Set up widgets
select=Select(options=['customers','retailers'],value='customers')
def make_plot() :
p=figure()
#p.title.text=select.value
if select.value=='customers' :
customers=['cust 1','cust 2']
else :
customers=['Retail 1','Retail 2']
p.hbar_stack(customers, y='products', height=0.5, source=source, color=colours)
return p
layout = column(select, make_plot())
# Set up callbacks
def update_data(attrname, old, new):
p = make_plot() # make a new plot
layout.children[1] = p
select.on_change('value', update_data)
# # Set up layouts and add to document
curdoc().add_root(layout)
Now I want to limit the number of segments(ie.stacked bars) by using a widget (preferrably by a single select widget). Can anyone please guide me how can i achieve using bokeh serve functionality. I don't want to use Javascript call back function.
This would take some non-trivial work to make happen. The vbar_stack method is a convenience function that actually creates multiple glyph renderers, one for each "row" in the initial stacking. What's more the renderers are all inter-related to one another, via the Stack transform that stacks all the previous renderers at each step. So there is not really any simple way to change the number of rows that are stacked after the fact. So much so that I would suggest simply deleting and re-creating the entire plot in each callback. (I would not normally recommend this approach, but this situation is one of the few exceptions.)
Since you have not given complete code or even mentioned what widget you want to use, all I can provide is a high level sketch of the code. Here is a complete example that updates a plot based on select widget:
from bokeh.layouts import column
from bokeh.models import Select
from bokeh.plotting import curdoc, figure
select = Select(options=["1", "2", "3", "4"], value="1")
def make_plot():
p = figure()
p.circle(x=[0,2], y=[0, 5], size=15)
p.circle(x=1, y=float(select.value), color="red", size=15)
return p
layout = column(select, make_plot())
def update(attr, old, new):
p = make_plot() # make a new plot
layout.children[1] = p # replace the old plot
select.on_change('value', update)
curdoc().add_root(layout)
Note I have changed your show call to curdoc().add_root since it is never useful to call show in a Bokeh server application. You might want to refer to and study the User Guide chapter Running a Bokeh Server for background information, if necessary.

Matplotlib - Stacked Bar Chart with ~1000 Bars

Background:
I'm working on a program to show a 2d cross section of 3d data. The data is stored in a simple text csv file in the format x, y, z1, z2, z3, etc. I take a start and end point and flick through the dataset (~110,000 lines) to create a line of points between these two locations, and dump them into an array. This works fine, and fairly quickly (takes about 0.3 seconds). To then display this line, I've been creating a matplotlib stacked bar chart. However, the total run time of the program is about 5.5 seconds. I've narrowed the bulk of it (3 seconds worth) down to the code below.
'values' is an array with the x, y and z values plus a leading identifier, which isn't used in this part of the code. The first plt.bar is plotting the bar sections, and the second is used to create an arbitrary floor of -2000. In order to generate a continuous looking section, I'm using an interval between each bar of zero.
import matplotlib.pyplot as plt
for values in crossSection:
prevNum = None
layerColour = None
if values != None:
for i in range(3, len(values)):
if values[i] != 'n':
num = float(values[i].strip())
if prevNum != None:
plt.bar(spacing, prevNum-num, width=interval, \
bottom=num, color=layerColour, \
edgecolor=None, linewidth=0)
prevNum = num
layerColour = layerParams[i].strip()
if prevNum != None:
plt.bar(spacing, prevNum+2000, width=interval, bottom=-2000, \
color=layerColour, linewidth=0)
spacing += interval
I'm sure there's a more efficient way to do this, but I'm new to Matplotlib and still unfamilar with its capabilities. The other main use of time in the code is:
plt.savefig('output.png')
which takes about a second, but I figure this is to be expected to save the file and I can't do anything about it.
Question:
Is there a faster way of generating the same output (a stacked bar chart or something that looks like one) by using plt.bar() better, or a different Matplotlib function?
EDIT:
I forgot to mention in the original post that I'm using Python 3.2.3 and Matplotlib 1.2.0
Leaving this here in case someone runs into the same problem...
While not exactly the same as using bar(), with a sufficiently large dataset (large enough that using bar() takes a few seconds) the results are indistinguishable from stackplot(). If I sort the data into layers using the method given by tcaswell and feed it into stackplot() the chart is created in 0.2 seconds, rather than 3 seconds.
EDIT
Code provided by tcaswell to turn the data into layers:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
It looks like you are drawing each bar, you can pass sequences to bar (see this example)
I think something like:
accum_values = []
for values in crosssection:
accum_values.append([float(v.strip()) for v iv values[3:]])
accum_values = np.vstack(accum_values).T
layer_params = [l.strip() for l in layerParams]
bottom = numpy.zeros(accum_values[0].shape)
ax = plt.gca()
spacing = interval*numpy.arange(len(accum_values[0]))
for data,color is zip(accum_values,layer_params):
ax.bar(spacing,data,bottom=bottom,color=color,linewidth=0,width=interval)
bottom += data
will be faster (because each call to bar creates one BarContainer and I suspect the source of your issues is you were creating one for each bar, instead of one for each layer).
I don't really understand what you are doing with the bars that have tops below their bottoms, so I didn't try to implement that, so you will have to adapt this a bit.

Resources