slider.value values not getting updated using ColumnDataSource(Dataframe).data - python-3.x

I have been working on COVID19 analysis for a dashboard and am using a JSON data source. I have converted the json to dataframe. I am working on plotting bar chart for "Days to reach deaths" over a "States" x-axis (categorical values). I am trying to use a function to update the slider.value. Upon running the bokeh serve with --log-level=DEBUG, I am getting a following error:
Can someone provide me with any direction or help with what might be causing the issue as I am new to Python and any help is appreciated? Or if there's any other alternative.
Please find the code below:
cases_summary = requests.get('https://api.rootnet.in/covid19-in/stats/history')
json_data = cases_summary.json()
#Data Cleaning
cases_summary=pd.json_normalize(json_data['data'], record_path='regional', meta='day')
cases_summary['loc']=np.where(cases_summary['loc']=='Nagaland#', 'Nagaland', cases_summary['loc'])
cases_summary['loc']=np.where(cases_summary['loc']=='Madhya Pradesh#', 'Madhya Pradesh', cases_summary['loc'])
cases_summary['loc']=np.where(cases_summary['loc']=='Jharkhand#', 'Jharkhand', cases_summary['loc'])
#Calculate cumulative days since 1st case for each state
cases_summary['day_count']=(cases_summary['day'].groupby(cases_summary['loc']).cumcount())+1
#Initial plot for default slider value=35
days_reach_death_count=cases_summary.loc[(cases_summary['deaths']>=35)].groupby(cases_summary['loc']).head(1).reset_index()
slider = Slider(start=10, end=max(cases_summary['deaths']), value=35, step=10, title="Total Deaths")
source = ColumnDataSource(data=dict(days_reach_death_count[['loc','day_count', 'deaths']]))
q = figure(x_range=days_reach_death_count['loc'], plot_width=1200, plot_height=600, sizing_mode="scale_both")
q.title.align = 'center'
q.title.text_font_size = '17px'
q.xaxis.axis_label = 'State'
q.yaxis.axis_label = 'Days since 1st Case'
q.xaxis.major_label_orientation = math.pi/2
q.vbar('loc', top='day_count', width=0.9, source=source)
deaths = slider.value
q.title.text = 'Days to reach %d Deaths' % deaths
hover = HoverTool(line_policy='next')
hover.tooltips = [('State', '#loc'),
('Days since 1st Case', '#day_count'), # #$name gives the value corresponding to the legend
('Deaths', '#deaths')
]
q.add_tools(hover)
def update(attr, old, new):
days_death_count = cases_summary.loc[(cases_summary['deaths'] >= slider.value)].groupby(cases_summary['loc']).head(1).reindex()
source.data = [ColumnDataSource().from_df(days_death_count)]
slider.on_change('value', update)
layout = row(q, slider)
tab = Panel(child=layout, title="New Confirmed Cases since Day 1")
tabs= Tabs(tabs=[tab])
curdoc().add_root(tabs)

Your code has 2 issues
(critical) source.data must be a dictionary, but you're assigning it an array
(minor) from_df is a class method, you don't have to construct an object of it
Try using source.data = ColumnDataSource.from_df(days_death_count) instead.

Related

networkx output scale problem with matplotlib (re-post)

I'm re-posting this question since I didn't make a good example code in last question.
I'm trying to make a nodes to set in specific location.
But I found out that the output drawing is not... fixed. Let me show you the pic.
So this is the one I make with 10 nodes. worked perfectly as I intended.
Also it has plt.text on the bottom left.
And here's the other picture
As you can see, something is wrong. plt.text is gone, and USA's location is weird. Actually that location is where DEU is located in the first pic. Both pics use same code.
Now, let me show you some of my code.
for spec_df, please download from my gdrive:
https://drive.google.com/drive/folders/11X_i5-pRLGBfQ9vIwQ3hfDU5EWIfR3Uo?usp=sharing
auto_flag = 0
spec_df=pd.read_stata("C:\\"Your_file_loc"\\CombinedHS6_example.dta")
#top_10_list = ["USA","CHN","KOR"] (Try for three nodes)
#or
#auto_flag = 1 (Try for 10 nodes)
df_p = spec_df[['partneriso3','tradevalue']]
df_p = df_p.groupby('partneriso3').sum().reset_index()
df_r = spec_df[['reporteriso3','tradevalue']]
df_r = df_r.groupby('reporteriso3').sum().reset_index()
df_r = df_r.rename(columns={'reporteriso3': 'Nation'})
df_r = df_r.rename(columns={'tradevalue': 'tradevalue_r'})
df_p = df_p.rename(columns={'partneriso3': 'Nation'})
df_s = pd.merge(df_r, df_p, on='Nation', how='outer').fillna(0)
df_s["final"] = df_s['tradevalue'] + df_s['tradevalue_r']
if auto_flag == 1:
df_s = df_s.sort_values(by=['final'], ascending = False).reset_index()
cut = df_s[:10]
else:
cut = df_s[(df_s['Nation'].isin(top_10_list))]
cut['final'] = cut['final'].apply(lambda x: normalize(x, cut['final'].max()))
cut['font_size'] = cut['final'] * 13
cut['final'] = cut['final'] * 1500
top_10_list = list(cut["Nation"])
top10 = spec_df[(spec_df['reporteriso3'].isin(top_10_list))&(spec_df['partneriso3'].isin(top_10_list))]
top10['tradevalue'] = top10['tradevalue'].apply(lambda x: normalize(x, top10['tradevalue'].max()))
top10['tradevalue'] = top10['tradevalue']*10
plt.figure(figsize=(10,10), dpi = 100)
G = nx.from_pandas_edgelist(top10, 'reporteriso3', 'partneriso3', 'tradevalue', create_using= nx.DiGraph())
widths = nx.get_edge_attributes(G,'tradevalue')
pos = {}
pos_cord = [(-0.30779309, -0.26419882), (0.26767895, 0.19524759), (-0.38479095, 0.88179998), (0.33785317, 0.96090914), (0.94090464, 0.40707934), (0.9270665, -0.38403114), (0.41246223, -0.85684049), (-0.32083322, -1.0), (-0.99724456, -0.34947554), (-0.87530367, 0.40950993)]
for t in range(len(top_10_list)):
if top_10_list == "":
continue
else:
pos[top_10_list[t]] = pos_cord[t]
pos_nodes = nudge(pos, 0, 0.12)
nx.draw_networkx_edges(G,pos, width=list(widths.values()), edge_color = '#9ECAE4')
nx.draw_networkx_nodes(G, pos=pos, nodelist = cut['Nation'], node_size= cut['final'], node_color ='#AB89EF', edgecolors ='#000000')
nx.draw_networkx_labels(G,pos_nodes, font_size=15)
plt.text(-1.15,-1.15,s='hs : ')
plt.savefig(location,dpi=300)
Sorry for the crude code. But I want to ask that I'm using fixed coordinates. So nodes are not supposed to move there location. So I think the plt's size is kinda interacting with the contents...? But I don't know how it does that.
Could anyone enlighten me please? This drives me crazy...
Thanks to #Paul Brodersen's comment, I found a way to fix the location.
I just added these codes in my codes.
fig = plt.figure(figsize=(10,10), dpi = 100)
axes = fig.add_axes([0,0,1,1])
axes.set_xlim([-1.3,1.3])
axes.set_ylim([-1.3,1.3])
Thank you for the help again!

Jupyter Notebook HOw can i assign a default value to datepicker and sort the time within the selected date?

I can try to set condition some data that is within the time picker in the datepicker. I got two problem
MY code was
def show_filter_date(start ,end):
print(start_dater.value)
print(end_dater.value)
time_df = (id_one[(id_one['_source.timestampstring']>pd.to_datetime(start))&(id_one['_source.timestampstring']<pd.to_datetime(end))])
#print(time_df.head(20))
# time_df = (id_one[(id_one['_source.timestampstring']>pd.to_datetime(start_dater.value))&(id_one['_source.timestampstring']<pd.to_datetime(end_dater.value))])
time_df.head(20)
layout = widgets.Layout(width='auto', height='40px')
start_dater = widgets.DatePicker(description='Pick a Start Date',disabled=False)
end_dater = widgets.DatePicker(description='Pick an End Date',disabled=False )
#display(widgets.HBox((start_dater, end_dater)))
#display(start_dater)
#display(end_dater)
#id_one.head()
#combine_date = widgets.HBox((start = start_dater, end = end_dater))
#country_selector = widgets.Dropdown(
interact(show_filter_date,start = start_dater , end = end_dater)
everytime i run the code it show Error
"Invalid comparison between dtype=datetime64[ns] and NoneType"
I have tried to assign default value like
start_dater = widgets.DatePicker(description='Pick a Start Date',disabled=False, year = 2020 ,month = 12, day = 1)
but it won't change to 2020/12/01
So, how can I get a value other than null?
I fail in interact for the datepicker in which
A) print(time_df.head(20))
B)
time_df = (id_one[(id_one['_source.timestampstring']>pd.to_datetime(start_dater.value))&(id_one['_source.timestampstring']<pd.to_datetime(end_dater.value))])
time_df.head(20)
Only (A) can be "interact" or "refresh" when I pick a day but not (B)
And for Question 2, when I put time_df.head(20) in the NEXT CELL it does work tho.......
But what i want is to show the result like in time_df
I would appreciate if any help
the id_one is something like
Index _source.hdrrId _source.hdrfId _source.hdrType \
199 1300 1234 1
_source.timestampstring
199 2020-11-06 09:36:04.800
Thanks!
Jeff
I replicated your first error with
id_one = pd.DataFrame(pd.date_range('20200101','20200202'), columns = ['_source.timestampstring'])
This is because you did not set a default value for the DatePicker. The value is None by default hence the error. Here is the fix (value argument is the default value):
start_dater = widgets.DatePicker(description='Pick a Start Date',disabled=False, value = datetime.date(2020,1,1))
end_dater = widgets.DatePicker(description='Pick an End Date',disabled=False, value = datetime.date(2020,2,1))
I cannot replicate your second error. My guess is you put print before the time_df = statement. You need to put the print after the line that calculates time_df

How can I use stamps to create a shape in turtle?

I tried to do this
def ship_merge(*ship_parts, translate_parts = {'part', (x, y)}):
merge.pu()
merge.home()
merge.begin_poly()
for part in ship_parts:
merge.home()
merge.goto(translate_parts[part])
merge.shape(part)
merge.stamp()
merge.end_poly()
merged_shape = merge.get_poly()
name = 'Ship 1'
win.register_shape(name, merged_shape)
new_ship = turtle.Turtle()
new_ship.shape(name)
return new_ship
But the 'new_ship' turtle has no shape. I think that it might be the result of the stamps not registering between 'begin_poly' and 'end_poly'. How do I fix this?

Pygal bar chart says “No data”

I am trying to create a bar graph in pygal that uses the api for hacker news and charts the most active news based on comments. I posted my code below, but I cannot figure out why my graph keep saying "No data"??? Any suggestions? Thanks!
import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS
from operator import itemgetter
# Make an API call, and store the response.
url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
r = requests.get(url)
print("Status code:", r.status_code)
# Process information about each submission.
submission_ids = r.json()
submission_dicts = []
for submission_id in submission_ids[:30]:
# Make a separate API call for each submission.
url = ('https://hacker-news.firebaseio.com/v0/item/' +
str(submission_id) + '.json')
submission_r = requests.get(url)
print(submission_r.status_code)
response_dict = submission_r.json()
submission_dict = {
'comments': int(response_dict.get('descendants', 0)),
'title': response_dict['title'],
'link': 'http://news.ycombinator.com/item?id=' + str(submission_id),
}
submission_dicts.append(submission_dict)
# Visualization
my_style = LS('#336699', base_style=LCS)
my_config = pygal.Config()
my_config.show_legend = False
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.show_y_guides = False
my_config.width = 1000
chart = pygal.Bar(my_config, style=my_style)
chart.title = 'Most Active News on Hacker News'
chart.add('', submission_dicts)
chart.render_to_file('hn_submissons_repos.svg')
The values in the array passed to the add function need to be either numbers or dicts that contain the key value (or a mixture of the two). The simplest solution would be to change the keys used when creating submission_dict:
submission_dict = {
'value': int(response_dict.get('descendants', 0)),
'label': response_dict['title'],
'xlink': 'http://news.ycombinator.com/item?id=' + str(submission_id),
}
Notice that link has become xlink, this is one of the optional parameters that are defined in the Value Configuration section of the pygal docs.

Updating multiple line plots dynamically in callback in bokeh

I have a use case where I have multiple line plots (with legends), and I need to update the line plots based on a column condition. Below is an example of two data set, based on the country, the column data source changes. But the issue I am facing is, the number of columns is not fixed for the data source, and even the types can vary. So, when I update the data source based on a callback when there is a new country selected, I get this error:
Error: attempted to retrieve property array for nonexistent field 'pay_conv_7d.content'.
I am guessing because in the new data source, the pay_conv_7d.content column doesn't exist, but in my plot those lines were already there. I have been trying to fix this issue by various means (making common columns for all country selection - adding the missing column in the data source in callback, but still get issues.
Is there any clean way to have multiple line plots updating using callback, and not do a lot of hackish way? Any insights or help would be really appreciated. Thanks much in advance! :)
def setup_multiline_plots(x_axis, y_axis, title_text, data_source, plot):
num_categories = len(data_source.data['categories'])
legends_list = list(data_source.data['categories'])
colors_list = Spectral11[0:num_categories]
# xs = [data_source.data['%s.'%x_axis].values] * num_categories
# ys = [data_source.data[('%s.%s')%(y_axis,column)] for column in data_source.data['categories']]
# data_source.data['x_series'] = xs
# data_source.data['y_series'] = ys
# plot.multi_line('x_series', 'y_series', line_color=colors_list,legend='categories', line_width=3, source=data_source)
plot_list = []
for (colr, leg, column) in zip(colors_list, legends_list, data_source.data['categories']):
xs, ys = '%s.'%x_axis, ('%s.%s')%(y_axis,column)
plot.line(xs,ys, source=data_source, color=colr, legend=leg, line_width=3, name=ys)
plot_list.append(ys)
data_source.data['plot_names'] = data_source.data.get('plot_names',[]) + plot_list
plot.title.text = title_text
def update_plot(country, timeseries_df, timeseries_source,
aggregate_df, aggregate_source, category,
plot_pay_7d, plot_r_pay_90d):
aggregate_metrics = aggregate_df.loc[aggregate_df.country == country]
aggregate_metrics = aggregate_metrics.nlargest(10, 'cost')
category_types = list(aggregate_metrics[category].unique())
timeseries_df = timeseries_df[timeseries_df[category].isin(category_types)]
timeseries_multi_line_metrics = get_multiline_column_datasource(timeseries_df, category, country)
# len_series = len(timeseries_multi_line_metrics.data['time.'])
# previous_legends = timeseries_source.data['plot_names']
# current_legends = timeseries_multi_line_metrics.data.keys()
# common_legends = list(set(previous_legends) & set(current_legends))
# additional_legends_list = list(set(previous_legends) - set(current_legends))
# for legend in additional_legends_list:
# zeros = pd.Series(np.array([0] * len_series), name=legend)
# timeseries_multi_line_metrics.add(zeros, legend)
# timeseries_multi_line_metrics.data['plot_names'] = previous_legends
timeseries_source.data = timeseries_multi_line_metrics.data
aggregate_source.data = aggregate_source.from_df(aggregate_metrics)
def get_multiline_column_datasource(df, category, country):
df_country = df[df.country == country]
df_pivoted = pd.DataFrame(df_country.pivot_table(index='time', columns=category, aggfunc=np.sum).reset_index())
df_pivoted.columns = df_pivoted.columns.to_series().str.join('.')
categories = list(set([column.split('.')[1] for column in list(df_pivoted.columns)]))[1:]
data_source = ColumnDataSource(df_pivoted)
data_source.data['categories'] = categories
Recently I had to update data on a Multiline glyph. Check my question if you want to take a look at my algorithm.
I think you can update a ColumnDataSource in three ways at least:
You can create a dataframe to instantiate a new CDS
cds = ColumnDataSource(df_pivoted)
data_source.data = cds.data
You can create a dictionary and assign it to the data attribute directly
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [17.0, 116.0], [17.0, 126.0]],
'ys1': [[179.0, 169.0], [179.0, 1169.0], [1729.0, 169.0]],
'xs2': [[27.0, 276.0], [27.0, 216.0], [27.0, 226.0]],
'ys2': [[279.0, 269.0], [279.0, 2619.0], [2579.0, 2569.0]]
}
data_source.data = d
Here if you need different sizes of columns or empty columns you can fill the gaps with NaN values in order to keep column sizes. And I think this is the solution to your question:
import numpy as np
d = {
'xs0': [[7.0, 986.0], [17.0, 6.0], [7.0, 67.0]],
'ys0': [[79.0, 69.0], [179.0, 169.0], [729.0, 69.0]],
'xs1': [[17.0, 166.0], [np.nan], [np.nan]],
'ys1': [[179.0, 169.0], [np.nan], [np.nan]],
'xs2': [[np.nan], [np.nan], [np.nan]],
'ys2': [[np.nan], [np.nan], [np.nan]]
}
data_source.data = d
Or if you only need to modify a few values then you can use the method patch. Check the documentation here.
The following example shows how to patch entire column elements. In this case,
source = ColumnDataSource(data=dict(foo=[10, 20, 30], bar=[100, 200, 300]))
patches = {
'foo' : [ (slice(2), [11, 12]) ],
'bar' : [ (0, 101), (2, 301) ],
}
source.patch(patches)
After this operation, the value of the source.data will be:
dict(foo=[11, 22, 30], bar=[101, 200, 301])
NOTE: It is important to make the update in one go to avoid performance issues

Resources