Bokeh plot line not updating after checking CheckboxGroup in server mode (python callback) - python-3.x

I have just initiated myself to Bokeh library and I would like to add interactivity in my dashboard. To do so, I want to use CheckboxGroup widget in order to select which one of a pandas DataFrame column to plot.
I have followed tutorials but I must have misunderstood the use of ColumnDataSource as I cannot make a simple example work...
I am aware of previous questions on the matter, and one that seems relevant on the StackOverflow forum is the latter :
Bokeh not updating plot line update from CheckboxGroup
Sadly I did not succeed in reproducing the right behavior.
I have tried to reproduce an example following the same updating structure presented in Bokeh Server plot not updating as wanted, also it keeps shifting and axis information vanishes by #bigreddot without success.
import numpy as np
import pandas as pd
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.palettes import Spectral
from bokeh.layouts import row
from bokeh.models.widgets import CheckboxGroup
from bokeh.io import curdoc
# UPDATE FUNCTION ------------------------------------------------
# make update function
def update(attr, old, new):
feature_selected_test = [feature_checkbox.labels[i] for i in feature_checkbox.active]
# add index to plot
feature_selected_test.insert(0, 'index')
# create new DataFrame
new_df = dummy_df.filter(feature_selected_test)
plot_src.data = ColumnDataSource.from_df(data=new_df)
# CREATE DATA SOURCE ------------------------------------------------
# create dummy data for debugging purpose
index = list(range(0, 890))
index.extend(list(range(2376, 3618)))
feature_1 = np.random.rand(len(index))
feature_2 = np.random.rand(len(index))
feature_3 = np.random.rand(len(index))
feature_4 = np.random.rand(len(index))
dummy_df = pd.DataFrame(dict(index=index, feature_1=feature_1, feature_2=feature_2, feature_3=feature_3,feature_4=feature_4))
# CREATE CONTROL ------------------------------------------------------
# list available data to plot
available_feature = list(dummy_df.columns[1:])
# initialize control
feature_checkbox = CheckboxGroup(labels=available_feature, active=[0, 1], name='checkbox')
feature_checkbox.on_change('active', update)
# INITIALIZE DASHBOARD ---------------------------------------------------
# initialize ColumnDataSource object
plot_src = ColumnDataSource(dummy_df)
# create figure
line_fig = figure()
feature_selected = [feature_checkbox.labels[i] for i in feature_checkbox.active]
# feature_selected = ['feature_1', 'feature_2', 'feature_3', 'feature_4']
for index_int, col_name_str in enumerate(feature_selected):
line_fig.line(x='index', y=col_name_str, line_width=2, color=Spectral[11][index_int % 11], source=plot_src)
curdoc().add_root(row(feature_checkbox, line_fig))
The program should work with a copy/paste... well without interactivity...
Would someone please help me ? Thanks a lot in advance.

You are only adding glyphs for the initial subset of selected features:
for index_int, col_name_str in enumerate(feature_selected):
line_fig.line(x='index', y=col_name_str, line_width=2, color=Spectral[11][index_int % 11], source=plot_src)
So that is all that is ever going to show.
Adding new columns to the CDS does not automatically make anything in particular happen, it's just extra data that is available for glyphs or hover tools to potentially use. To actually show it, there have to be glyphs configured to display those columns. You could do that, add and remove glyphs dynamically, but it would be far simpler to just add everything once up front, and use the checkbox to toggle only the visibility. There is an example of just this in the repo:
https://github.com/bokeh/bokeh/blob/master/examples/app/line_on_off.py
That example passes the data as literals the the glyph function but you could put all the data in CDS up front, too.

Related

How to add traces in plotly.express

I am very new to python and plotly.express, and I find it very confusing...
I am trying to use the principle of adding different traces to my figure, using example code shown here https://plotly.com/python/line-charts/, Line Plot Modes, #Create traces.
BUT I get my data from a .CSV file.
import plotly.express as px
import plotly as plotly
import plotly.graph_objs as go
import pandas as pd
data = pd.read_csv(r"C:\Users\x.csv")
fig = px.scatter(data, x="Time", y="OD", color="C-source", size="C:A 1 ratio")
fig = px.line(data, x="Time", y="OD", color="C-source")
fig.show()
The above lines produces scatter/line plots with the correct data, but the data is mixed together. I have data from 2 different sources marked by a column named "Strain" in my .csv file that I would like the chart to reflect.
Is the traces option a possible way to do it, or is there another way?
You can add traces using an Express plot by using .select_traces(). Something like:
fig.add_traces(
list(px.line(...).select_traces())
)
Note the need to convert to list, since .select_traces() returns a generator.
It looks like you probably want the lines with the scatter dots as well on a single plot?
You're setting fig to equal px.scatter() and then setting (changing) it to equal px.line(). When set to line, the scatter plot is overwritten.
You're already importing graph objects so you can use add_trace with go, something like this:
fig.add_trace(go.Scatter(x=data["Time"], y=data["OD"], mode='markers', marker=dict(color=data["C-source"], size=data["C:A 1 ratio"])))
Depending on how your data is set up, you may need to add each C-source separately doing something like:
x=data.query("C-source=='Term'")["Time"], ... , name='Term'`
Here's a few references with examples and options you can use to set up your scatter:
Scatter plot examples  
Marker styles  
Scatter arguments and attributes
You can use the apporach stated in Plotly: How to combine scatter and line plots using Plotly Express?
fig3 = go.Figure(data=fig1.data + fig2.data)
or a more convenient and scalable approach:
fig1.data and fig2.data are common tuples that hold all the info needed for a plot and the + just concatenates them.
# this will hold all figures until they are combined
all_figures = []
# data_collection: dictionary with Pandas dataframes
for df_label in data_collection:
df = data_collection[df_label]
fig = px.line(df, x='Date', y=['Value'])
all_figures.append(fig)
import operator
import functools
# now you can concatenate all the data tuples
# by using the programmatic add operator
fig3 = go.Figure(data=functools.reduce(operator.add, [_.data for _ in all_figures]))
fig3.show()
thanks for taking the time to help me out. I ended up with two solutions that worked, of which using "facet_col" to divide the plot into two subplots (1 for each strain) was the most simple solution.
https://plotly.com/python/axes/
Thanks. this worked for me also where Fig_Set_B is a list of scatter plots
# create a tuple of first line plots in first 6 plots from plot set Fig_Set_B`
fig_combined = go.Figure(data= tuple(Fig_Set_B[x].data[0] for x in range(6)) )
fig_combined.show()

How to render seaborn objects repeatedly?

The version of python I am using is 3.7. I tried it both in Spyder and JupyterNotebook
I used a sns.dataset as an example.
As I run the following code, the figure will be automatically rendered in IPython console without using plt.show() which is different from some instructions in previous posts.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
df = sns.load_dataset('iris')
g = sns.pairplot(df, hue = 'species', height = 2.5)
However, I want to repeatedly show the seaborn object. How can I render g?
I've tried
plt.show(g)
g.show()
etc...
but none of them works. I do not want that everytime I call a figure, I have to re-plot it.
As long as you put the previously created figure object as the last line of new cells, this figure will return with whatever new additional elements, see cell 4 below:
In your case if g = ... is in your 1st cell, add an f = plt.gcf() to get the Figure object as here.

Bokeh ValueError: expected an element of either Seq(String)

I'm trying to build a simple bar chart via bokeh but struggling for it to recognize the x-axis and keep getting a ValueError... I think it needs to be in string format but for some reason whatever I try it just won't work. Please note, the column that contains the Years (as floats by the looks of it) is called RegionName, if it seems confusing. Please see my code below, any suggestions?
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
import os
from bokeh.palettes import Spectral5
from bokeh.transform import factor_cmap
os.chdir("C:/Users/Vladimir.Tikhnenko/Python/Land Reg")
# Pivot data
def pivot2(infile="Land Registry.csv", outfile="SalesVolume.csv"):
df=pd.read_csv(infile)
table=pd.pivot_table(df,index=
["RegionName"],columns="Year",values="SalesVolume",aggfunc=sum)
table.to_csv(outfile)
return table
pivot2()
# Transpose data
df=pd.read_csv("SalesVolume.csv")
df=df.drop(df.columns[1:28],1)
df=pd.read_csv("SalesVolume.csv", index_col=0, header=None).T
df.to_csv("C:\\Users\Vladimir.Tikhnenko\Python\Land
Reg\SalesVolume.csv",index=None)
df=pd.read_csv("SalesVolume.csv")
source = ColumnDataSource(df)
years = source.data['RegionName'].tolist()
p = figure(x_range=['RegionName'])
color_map = factor_cmap(field_name='RegionName',palette=Spectral5,
factors=years)
p.vbar(x='RegionName', top='Southwark', source=source, width=1,
color=color_map)
p.title.text ='Transactions'
p.xaxis.axis_label = 'Years'
p.yaxis.axis_label = 'Number of Sales'
show(p)
the error message is
ValueError: expected an element of either Seq(String), Seq(Tuple(String,
String)) or Seq(Tuple(String, String, String)), got [1968.0, 1969.0, 1970.0,
1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977.0, 1978.0, 1979.0,
1980.0, 1981.0, 1982.0, 1983.0, 1984.0, 1985.0, 1986.0, 1987.0, 1988.0,
1989.0, 1990.0, 1991.0, 1992.0, 1993.0, 1994.0, 1995.0, 1996.0, 1997.0,
1998.0, 1999.0, 2000.0, 2001.0, 2002.0, 2003.0, 2004.0, 2005.0, 2006.0,
2007.0, 2008.0, 2009.0, 2010.0, 2011.0, 2012.0, 2013.0, 2014.0, 2015.0,
2016.0, 2017.0, 2018.0]
Categorical factors must only be strings (or sequences of strings for nested factors), so factor_cmap only accepts lists of those things. You passed it a list a numbers, which causes the error shown. To use use the years as categorical factors, you need to convert them to strings as suggested, and use those string values to initialize x_range, and for the coordinates to vbar.
Alternatively, if you want to use numerical values for the years, but just want to have fixed, controlled tick locations, do this:
p = figure() # don't pass x_range
p.xaxis.ticker = years
And then also use linear_cmap to map the numerical values (instead of factor_cmap)

Optimal way to display data with different ranges

I have an application which I pull data from an FPGA & display it for the engineers. Good application ... until you start displaying data which are extremely different in ranges...
say: a signal perturbating around +4000 and another around zero (both with small peak-peak).
At the moment the only real workaround is to "export to csv" and then view in Excel but I would like to improve the application so that this isn't needed
Option 1 is a more dynamic pointer that will give you readings of ALL visible plots for the present x
Option 2. Multiple Y axis. This is where it gets a bit ... tight with respect to UI area.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import host_subplot
import mpl_toolkits.axisartist as AA
import numpy as np
t = np.arange(0,1,0.00001)
data = [5000*np.sin(t*2*np.pi*10),
10*np.sin(t*2*np.pi*20),
20*np.sin(t*2*np.pi*30),
np.sin(t*2*np.pi*40)+5000,
np.sin(t*2*np.pi*50)-5000,
np.sin(t*2*np.pi*60),
np.sin(t*2*np.pi*70),
]
fig = plt.figure()
host = host_subplot(111, axes_class=AA.Axes)
axis_list = [None]*7
for i in range(len(axis_list)):
axis_list[i] = host.twinx()
new_axis = axis_list[i].get_grid_helper().new_fixed_axis
axis_list[i].axis['right'] = new_axis(loc='right',
axes=axis_list[i],
offset=(60*i,0))
axis_list[i].axis['right'].toggle(all=True)
axis_list[i].plot(t,data[i])
plt.show()
for i in data:
plt.plot(t,i)
plt.show()
This code snippet doesn't contain any figure resize to ensure all 7 y-axis are visible BUT ignoring that, you can see it is quite large...
Any advice with respect to multi-Y or a better solution to displaying no more than 7 datasets?

Matplotlib: Import and plot multiple time series with legends direct from .csv

I have several spreadsheets containing data saved as comma delimited (.csv) files in the following format: The first row contains column labels as strings ('Time', 'Parameter_1'...). The first column of data is Time and each subsequent column contains the corresponding parameter data, as a float or integer.
I want to plot each parameter against Time on the same plot, with parameter legends which are derived directly from the first row of the .csv file.
My spreadsheets have different numbers of (columns of) parameters to be plotted against Time; so I'd like to find a generic solution which will also derive the number of columns directly from the .csv file.
The attached minimal working example shows what I'm trying to achieve using np.loadtxt (minus the legend); but I can't find a way to import the column labels from the .csv file to make the legends using this approach.
np.genfromtext offers more functionality, but I'm not familiar with this and am struggling to find a way of using it to do the above.
Plotting data in this style from .csv files must be a common problem, but I've been unable to find a solution on the web. I'd be very grateful for your help & suggestions.
Many thanks
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('Data.csv', skiprows=1, delimiter=',') # skip the column labels
cols = data.shape[1] # get the number of columns in the array
for n in range (1,cols):
plt.plot(data[:,0],data[:,n]) # plot each parameter against time
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
plt.show()
Here's my minimal working example for the above using genfromtxt rather than loadtxt, in case it is helpful for anyone else.
I'm sure there are more concise and elegant ways of doing this (I'm always happy to get constructive criticism on how to improve my coding), but it makes sense and works OK:
import numpy as np
import matplotlib.pyplot as plt
arr = np.genfromtxt('Data.csv', delimiter=',', dtype=None) # dtype=None automatically defines appropriate format (e.g. string, int, etc.) based on cell contents
names = (arr[0]) # select the first row of data = column names
for n in range (1,len(names)): # plot each column in turn against column 0 (= time)
plt.plot (arr[1:,0],arr[1:,n],label=names[n]) # omitting the first row ( = column names)
plt.legend()
plt.show()
The function numpy.genfromtxt is more for broken tables with missing values rather than what you're trying to do. What you can do is simply open the file before handing it to numpy.loadtxt and read the first line. Then you don't even need to skip it. Here is an edited version of what you have here above that reads the labels and makes the legend:
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
#open the file
with open('Data.csv') as f:
#read the names of the colums first
names = f.readline().strip().split(',')
#np.loadtxt can also handle already open files
data = np.loadtxt(f, delimiter=',') # no skip needed anymore
cols = data.shape[1]
for n in range (1,cols):
#labels go in here
plt.plot(data[:,0],data[:,n],label=names[n])
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
#And finally the legend is made
plt.legend()
plt.show()

Resources