How to control the number of stacked bars through single select widget in python bokeh - python-3.x

I have created a vertical stacked bar chart using python bokeh on an input dataset df using the following code -
print(df.head())
YearMonth A B C D E
0 Jan'18 1587.816 1586.544 856.000 1136.464 1615.360
1 Feb'18 2083.024 1847.808 1036.000 1284.016 2037.872
2 Mar'18 2193.420 1850.524 1180.000 1376.028 2076.464
3 Apr'18 2083.812 1811.636 1192.028 1412.028 2104.588
4 May'18 2379.976 2091.536 1452.000 1464.432 2400.876
Stacked Bar Chart Code -
products = ['python', 'pypy', 'jython']
customers = ['Cust 1', 'Cust 2']
colours = ['red', 'blue']
data = {
'products': products,
'Cust 1': [200, 850, 400],
'Cust 2': [600, 620, 550],
'Retail 1' : [100, 200, 300],
'Retail 2' : [400,500,600]
}
source = ColumnDataSource(data)
# Set up widgets
select=Select(options=['customers','retailers'],value='customers')
def make_plot() :
p=figure()
#p.title.text=select.value
if select.value=='customers' :
customers=['cust 1','cust 2']
else :
customers=['Retail 1','Retail 2']
p.hbar_stack(customers, y='products', height=0.5, source=source, color=colours)
return p
layout = column(select, make_plot())
# Set up callbacks
def update_data(attrname, old, new):
p = make_plot() # make a new plot
layout.children[1] = p
select.on_change('value', update_data)
# # Set up layouts and add to document
curdoc().add_root(layout)
Now I want to limit the number of segments(ie.stacked bars) by using a widget (preferrably by a single select widget). Can anyone please guide me how can i achieve using bokeh serve functionality. I don't want to use Javascript call back function.

This would take some non-trivial work to make happen. The vbar_stack method is a convenience function that actually creates multiple glyph renderers, one for each "row" in the initial stacking. What's more the renderers are all inter-related to one another, via the Stack transform that stacks all the previous renderers at each step. So there is not really any simple way to change the number of rows that are stacked after the fact. So much so that I would suggest simply deleting and re-creating the entire plot in each callback. (I would not normally recommend this approach, but this situation is one of the few exceptions.)
Since you have not given complete code or even mentioned what widget you want to use, all I can provide is a high level sketch of the code. Here is a complete example that updates a plot based on select widget:
from bokeh.layouts import column
from bokeh.models import Select
from bokeh.plotting import curdoc, figure
select = Select(options=["1", "2", "3", "4"], value="1")
def make_plot():
p = figure()
p.circle(x=[0,2], y=[0, 5], size=15)
p.circle(x=1, y=float(select.value), color="red", size=15)
return p
layout = column(select, make_plot())
def update(attr, old, new):
p = make_plot() # make a new plot
layout.children[1] = p # replace the old plot
select.on_change('value', update)
curdoc().add_root(layout)
Note I have changed your show call to curdoc().add_root since it is never useful to call show in a Bokeh server application. You might want to refer to and study the User Guide chapter Running a Bokeh Server for background information, if necessary.

Related

how to avoid overlapping of barchart using matplotlib

I am new to data science using python . While plotting two different barcharts I got into an problem. This is my code :
def compare_groups(field):
if field in less_equal_150_cal.columns:
less_equal_150_cal[field].plot.bar(color = 'blue',alpha =0.4, title = field )
more_150_cal[field].plot.bar(color = 'red', alpha =0.4)
else:
raise ValueError(f"{field} not found")
The resulting bargraphs are overlapping each other. I want two different bar graphs.

Bokeh plot line not updating after checking CheckboxGroup in server mode (python callback)

I have just initiated myself to Bokeh library and I would like to add interactivity in my dashboard. To do so, I want to use CheckboxGroup widget in order to select which one of a pandas DataFrame column to plot.
I have followed tutorials but I must have misunderstood the use of ColumnDataSource as I cannot make a simple example work...
I am aware of previous questions on the matter, and one that seems relevant on the StackOverflow forum is the latter :
Bokeh not updating plot line update from CheckboxGroup
Sadly I did not succeed in reproducing the right behavior.
I have tried to reproduce an example following the same updating structure presented in Bokeh Server plot not updating as wanted, also it keeps shifting and axis information vanishes by #bigreddot without success.
import numpy as np
import pandas as pd
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.palettes import Spectral
from bokeh.layouts import row
from bokeh.models.widgets import CheckboxGroup
from bokeh.io import curdoc
# UPDATE FUNCTION ------------------------------------------------
# make update function
def update(attr, old, new):
feature_selected_test = [feature_checkbox.labels[i] for i in feature_checkbox.active]
# add index to plot
feature_selected_test.insert(0, 'index')
# create new DataFrame
new_df = dummy_df.filter(feature_selected_test)
plot_src.data = ColumnDataSource.from_df(data=new_df)
# CREATE DATA SOURCE ------------------------------------------------
# create dummy data for debugging purpose
index = list(range(0, 890))
index.extend(list(range(2376, 3618)))
feature_1 = np.random.rand(len(index))
feature_2 = np.random.rand(len(index))
feature_3 = np.random.rand(len(index))
feature_4 = np.random.rand(len(index))
dummy_df = pd.DataFrame(dict(index=index, feature_1=feature_1, feature_2=feature_2, feature_3=feature_3,feature_4=feature_4))
# CREATE CONTROL ------------------------------------------------------
# list available data to plot
available_feature = list(dummy_df.columns[1:])
# initialize control
feature_checkbox = CheckboxGroup(labels=available_feature, active=[0, 1], name='checkbox')
feature_checkbox.on_change('active', update)
# INITIALIZE DASHBOARD ---------------------------------------------------
# initialize ColumnDataSource object
plot_src = ColumnDataSource(dummy_df)
# create figure
line_fig = figure()
feature_selected = [feature_checkbox.labels[i] for i in feature_checkbox.active]
# feature_selected = ['feature_1', 'feature_2', 'feature_3', 'feature_4']
for index_int, col_name_str in enumerate(feature_selected):
line_fig.line(x='index', y=col_name_str, line_width=2, color=Spectral[11][index_int % 11], source=plot_src)
curdoc().add_root(row(feature_checkbox, line_fig))
The program should work with a copy/paste... well without interactivity...
Would someone please help me ? Thanks a lot in advance.
You are only adding glyphs for the initial subset of selected features:
for index_int, col_name_str in enumerate(feature_selected):
line_fig.line(x='index', y=col_name_str, line_width=2, color=Spectral[11][index_int % 11], source=plot_src)
So that is all that is ever going to show.
Adding new columns to the CDS does not automatically make anything in particular happen, it's just extra data that is available for glyphs or hover tools to potentially use. To actually show it, there have to be glyphs configured to display those columns. You could do that, add and remove glyphs dynamically, but it would be far simpler to just add everything once up front, and use the checkbox to toggle only the visibility. There is an example of just this in the repo:
https://github.com/bokeh/bokeh/blob/master/examples/app/line_on_off.py
That example passes the data as literals the the glyph function but you could put all the data in CDS up front, too.

How to label line chart with column from pandas dataframe (from 3rd column values)?

I have a data set I filtered to the following (sample data):
Name Time l
1 1.129 1G-d
1 0.113 1G-a
1 3.374 1B-b
1 3.367 1B-c
1 3.374 1B-d
2 3.355 1B-e
2 3.361 1B-a
3 1.129 1G-a
I got this data after filtering the data frame and converting it to CSV file:
# Assigns the new data frame to "df" with the data from only three columns
header = ['Names','Time','l']
df = pd.DataFrame(df_2, columns = header)
# Sorts the data frame by column "Names" as integers
df.Names = df.Names.astype(int)
df = df.sort_values(by=['Names'])
# Changes the data to match format after converting it to int
df.Time=df.Time.astype(int)
df.Time = df.Time/1000
csv_file = df.to_csv(index=False, columns=header, sep=" " )
Now, I am trying to graph lines for each label column data/items with markers.
I want the column l as my line names (labels) - each as a new line, Time as my Y-axis values and Names as my X-axis values.
So, in this case, I would have 7 different lines in the graph with these labels: 1G-d, 1G-a, 1B-b, 1B-c, 1B-d, 1B-e, 1B-a.
I have done the following so far which is the additional settings, but I am not sure how to graph the lines.
plt.xlim(0, 60)
plt.ylim(0, 18)
plt.legend(loc='best')
plt.show()
I used sns.lineplot which comes with hue and I do not want to have name for the label box. Also, in that case, I cannot have the markers without adding new column for style.
I also tried ply.plot but in that case, I am not sure how to have more lines. I can only give x and y values which create only one line.
If there's any other source, please let me know below.
Thanks
The final graph I want to have is like the following but with markers:
You can apply a few tweaks to seaborn's lineplot. Using some created data since your sample isn't really long enough to demonstrate:
# Create data
np.random.seed(2019)
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = pd.DataFrame({'Name':np.repeat(range(1,11), 10),
'Time':np.random.randn(100).cumsum(),
'l':np.random.choice(categories, 100)
})
# Plot
sns.lineplot(data=df, x='Name', y='Time', hue='l', style='l', dashes=False,
markers=True, ci=None, err_style=None)
# Temporarily removing limits based on sample data
#plt.xlim(0, 60)
#plt.ylim(0, 18)
# Remove seaborn legend title & set new title (if desired)
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], title='New Title', loc='best')
plt.show()
To apply markers, you have to specify a style variable. This can be the same as hue.
You likely want to remove dashes, ci, and err_style
To remove the seaborn legend title, you can get the handles and labels, then re-add the legend without the first handle and label. You can also specify the location here and set a new title if desired (or just remove title=... for no title).
Edits per comments:
Filtering your data to only a subset of level categories can be done fairly easily via:
categories = ['1G-d', '1G-a', '1B-b', '1B-c', '1B-d', '1B-e', '1B-a']
df = df.loc[df['l'].isin(categories)]
markers=True will fail if there are too many levels. If you are only interested in marking points for aesthetic purposes, you can simply multiply a single marker by the number of categories you are interested in (which you have already created to filter your data to categories of interest): markers='o'*len(categories).
Alternatively, you can specify a custom dictionary to pass to the markers argument:
points = ['o', '*', 'v', '^']
mult = len(categories) // len(points) + (len(categories) % len(points) > 0)
markers = {key:value for (key, value)
in zip(categories, points * mult)}
This will return a dictionary of category-point combinations, cycling over the marker points specified until each item in categories has a point style.

How to change the limits for geo_shape in altair (python vega-lite)

I am trying to plot locations in three states in the US in python with Altair. I saw the tutorial about the us map but I am wondering if there is anyway to zoom the image to the only three states of interest, i.e. NY,NJ and CT.
Currently, I have the following code:
from vega_datasets import data
states = alt.topo_feature(data.us_10m.url, 'states')
# US states background
background = alt.Chart(states).mark_geoshape(
fill='lightgray',
stroke='white',
limit=1000
).properties(
title='US State Capitols',
width=700,
height=400
).project("albers")
points=alt.Chart(accts).mark_point().encode(
longitude = "longitude",
latitude = "latitude",
color = "Group")
background+points
I inspected the us_10m.url data set and seems like there is no field which specifies the individual states. So I am hoping if I could just somehow change the xlim and ylim for the background to [-80,-70] and [35,45] for example. I want to zoom in to the regions where there are data points(blue dots).
Could someone kindly show me how to do that? Thanks!!
Update
There is a field called ID in the JSON file and I manually found out that NJ is 34, NY is 36 and CT is 9. Is there a way to filter on these IDs? That will get the job done!
Alright seems like the selection/zoom/xlim/ylim feature for geotype is not supported yet:
Document and add warning that geo-position doesn't support selection yet #3305
So I end up with a hackish way to solve this problem by first filtering based on the IDs using pure python. Basically, load the JSON file into a dictionary and then change the value field before converting the dictionary to topojson format. Below is an example for 5 states,PA,NJ,NY,CT,RI and MA.
import altair as alt
from vega_datasets import data
# Load the data, which is loaded as a dict object
us_10m = data.us_10m()
# Select the geometries under states under objects, filter on id (9,25,34,36,42,44)
us_10m['objects']['states']['geometries']=[item for item in us_10m['objects'] \
['states']['geometries'] if item['id'] in [9,25,34,36,42,44]]
# Make the topojson data
states = alt.Data(
values=us_10m,
format=alt.TopoDataFormat(feature='states',type='topojson'))
# Plot background (now only has 5 states)
background = alt.Chart(states).mark_geoshape(
fill='lightgray',
stroke='white',
limit=1000
).properties(
title='US State Capitols',
width=700,
height=400
).project("mercator")
# Plot the points
points=alt.Chart(accts).mark_circle(size=60).encode(
longitude = "longitude",
latitude = "latitude",
color = "Group").project("mercator")
# Overlay the two plots
background+points
The resulting plot looks ok:

Matplotlib and pie/donut chart labels

If yall had seen my previous question, I am coding a Python program to evaluate the data that I collect while playing a game of Clue. I have decided to implement a GUI (tkinter) into my program to make it faster and easier to work with. One of the main window's of the GUI illustrates the different cards that I know each player has in their hand, the cards that I know must be in the middle "the murder cards", and the unknown cards that are inconclusively placed in the above categories. I have decided to implement this data through a matplotlib pie chart, five wedges for each of the previously mentioned categories.
Right now, I am unconcerned with how I implement this matplotlib function into my tkinter widget. I am solely focused on the design of the chart.
So far, I have documented the cards that are within each player's hand within a dictionary, wherein the keys are the player names, and the values are a set of cards that are in their hand. For example...
player_cards = { 'player1':{'Mustard', 'Scarlet', 'Revolver', 'Knife', 'Ballroom', 'Library'}, 'player2':{}, 'player3':{} }
So the data for the first three wedges of the pie chart will be extracted from the dictionary. For the other two wedges, the data will be stored within similarly organized sets.
After looking at the matplotlib.org website I have seen a example that sorta demonstrates what I am looking for...
with the code...
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(aspect="equal"))
recipe = ["225 g flour",
"90 g sugar",
"1 egg",
"60 g butter",
"100 ml milk",
"1/2 package of yeast"]
data = [225, 90, 50, 60, 100, 5]
wedges, texts = ax.pie(data, wedgeprops=dict(width=0.5), startangle=-40)
bbox_props = dict(boxstyle="square,pad=0.3", fc="w", ec="k", lw=0.72)
kw = dict(xycoords='data', textcoords='data', arrowprops=dict(arrowstyle="-"), bbox=bbox_props, zorder=0, va="center")
for i, p in enumerate(wedges):
ang = (p.theta2 - p.theta1)/2. + p.theta1
y = np.sin(np.deg2rad(ang))
x = np.cos(np.deg2rad(ang))
horizontalalignment = {-1: "right", 1: "left"}[int(np.sign(x))]
connectionstyle = "angle,angleA=0,angleB={}".format(ang)
kw["arrowprops"].update({"connectionstyle": connectionstyle})
ax.annotate(recipe[i], xy=(x, y), xytext=(1.35*np.sign(x), 1.4*y),
horizontalalignment=horizontalalignment, **kw)
ax.set_title("Matplotlib bakery: A donut")
plt.show()
However, what is lacking from this example code is... (1) The label for each wedge is a single string rather than a set of strings (which is what stores the cards in each player's hand). (2) I cannot seem to control the color of the wedges. (3) the outline of each wedge is black, rather than white which is the background color of my GUI window. (4) I want to control the exact placement of the labels. And finally (5) I need the change the font/size of the labels. Other than that the example code is perfect.
Just note that the actual size of each wedge in the pie chart will be dictated by the size of each of the five sets (so they will add up to 21).
Just in case that you all need some more substantive code to work with, here are five sets that make up the data needed for this pie chart...
player1_cards = {'Mustard', 'Plum', 'Revolver', 'Rope', 'Ballroom', 'Library'}
player2_cards = {'Scarlet', 'White', 'Candlestick'}
player3_cards = {'Green', 'Library', 'Kitchen', 'Conservatory'}
middle_cards = {'Peacock'}
unknown_cards = {'Lead Pipe', 'Wrench', 'Knife', 'Hall', 'Lounge', 'Dining Room, 'Study'}
Okay that it, sorry for a rather long post, and thanks for those of you viewing and responding :)

Resources