How to update the date range on X-Axis with python-pptx - python-3.x

I have a multi-line chart that I'm trying to update the data for. I can change the data for the data series (1 to 5) in my case using a dataframe; I'm unable to figure out how to change the range for the category axis.
In the current scenario, I have the daterange starting from 2010; I can't figure out how to update that dynamically bases on input data
My chart is as shown below:
My chart data is as below:
My code is as below:
import pandas as pd
from pptx import Presentation
from pptx.chart.data import CategoryChartData, ChartData
df = pd.DataFrame({
'Date':['2010-01-01','2010-02-01','2010-03-01','2010-04-01','2010-05-01'],
'Series 1': [0.262918, 0.259484,0.263314,0.262108,0.252113],
'Series 2': [0.372340,0.368741,0.375740,0.386040,0.388732],
'Series 3': [0.109422,0.109256,0.112426,0.123932,0.136620],
'Series 4': [0.109422,0.109256,0.112426,0.123932,0.136620], # copy of series 3 for easy testing
'Series 5': [0.109422,0.109256,0.112426,0.123932,0.136620], # copy of series 3 for easy testing
})
prs = Presentation(presentation_path)
def update_multiline(chart,df):
plot = chart.plots[0]
category_labels = [c.label for c in plot.categories]
# series = plot.series[0]
chart_data = CategoryChartData()
chart_data.categories = [c.label for c in plot.categories]
category_axis = chart.category_axis
category_axis.minimum_scale = 1 # this should be a date
category_axis.minimum_scale = 100 # this should be a date
tick_labels = category_axis.tick_labels
df = df.drop(columns=['Date'])
for index in range(df.shape[1]):
columnSeriesObj = df.iloc[:, index]
chart_data.add_series(plot.series[index].name, columnSeriesObj)
chart.replace_data(chart_data)
# ================================ slide index 3 =============================================
slide_3 = prs.slides[3]
slide_3_title = slide_3.shapes.title # assigning a title
graphic_frame = slide_3.shapes
# slide has only one chart and that's the 3rd shape, hence graphic_frame[2]
slide_3_chart = graphic_frame[2].chart
update_multiline(slide_3_chart, df)
prs.save(output_path)
How to update the date range if my date in the dataframe starts from say 2015 i.e. 'Date':['2015-01-01','2015-02-01','2015-03-01','2015-04-01','2015-05-01']

You are simply copying the categories of the old chart into the new chart with:
chart_data.categories = [c.label for c in plot.categories]
You must draw the category labels from the dataframe if you expect them to change.

Related

python plotly choropleth dropdown + time slider

I am trying to plot a choropleth map with a drop-down menu to select multiple variables and also show each variable with a time slider to display data over time.
import plotly
import numpy as np
plotly.offline.init_notebook_mode()
# Reading sample data using pandas DataFrame
df = pd.read_csv('https://raw.githubusercontent.\
com/plotly/datasets/master/2011_us_ag_exports.csv')
data = [dict(type='choropleth',
locations = df['code'].astype(str),
z=df['total exports'].astype(float),
locationmode='USA-states')]
# let's create some more additional, data
for i in range(5):
data.append(data[0].copy())
data[-1]['z'] = data[0]['z'] * np.random.rand(*data[0]['z'].shape)
# let's now create slider for map
steps = []
for i in range(len(data)):
step = dict(method='restyle',
args=['visible', [False] * len(data)],
label='Year {}'.format(i + 1980))
step['args'][1][i] = True
steps.append(step)
slider = [dict(active=0,
pad={"t": 1},
steps=steps)]
layout = dict(geo=dict(scope='usa',
projection={'type': 'albers usa'}),
sliders=slider)
fig = dict(data=data,
layout=layout)
plotly.offline.iplot(fig)
This example is taken from: here
Any ideas how to add another column as data points?

Selecting data for a specific coordinate, openpyxl

Firstly - piece of my data:
I need to plot a line chart with years on the X axis and people's height on the Y axis:
But I'm getting it:
For plotting chart I using code below
from openpyxl.chart import LineChart, Reference
chart = LineChart()
chart.title='Average height'
chart.y_axis.title = 'Height'
chart.x_axis.title = 'Year'
values = Reference(working_sheet, min_col = 7, min_row = 1, max_col=9, max_row = 102)
chart.add_data(values, titles_from_data=True)
working_sheet.add_chart(chart, 'K2')
So I need to know how I can select data for a specific coordinate using openpyxl

Why is Bokeh's plot not changing with plot selection?

Struggling to understand why this bokeh visual will not allow me to change plots and see the predicted data. The plot and select (dropdown-looking) menu appears, but I'm not able to change the plot for items in the menu.
Running Bokeh 1.2.0 via Anaconda. The code has been run both inside & outside of Jupyter. No errors display when the code is run. I've looked through the handful of SO posts relating to this same issue, but I've not been able to apply the same solutions successfully.
I wasn't sure how to create a toy problem out of this, so in addition to the code sample below, the full code (including the regression code and corresponding data) can be found at my github here (code: Regression&Plotting.ipynb, data: pred_data.csv, historical_data.csv, features_created.pkd.)
import pandas as pd
import datetime
from bokeh.io import curdoc, output_notebook, output_file
from bokeh.layouts import row, column
from bokeh.models import Select, DataRange1d, ColumnDataSource
from bokeh.plotting import figure
#Must be run from the command line
def get_historical_data(src_hist, drug_id):
historical_data = src_hist.loc[src_hist['ndc'] == drug_id]
historical_data.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)#.dropna()
historical_data['date'] = pd.to_datetime(historical_data[['year', 'month', 'day']], infer_datetime_format=True)
historical_data = historical_data.set_index(['date'])
historical_data.sort_index(inplace = True)
# csd_historical = ColumnDataSource(historical_data)
return historical_data
def get_prediction_data(src_test, drug_id):
#Assign the new date
#Write a new dataframe with values for the new dates
df_pred = src_test.loc[src_test['ndc'] == drug_id].copy()
df_pred.loc[:, 'year'] = input_date.year
df_pred.loc[:, 'month'] = input_date.month
df_pred.loc[:, 'day'] = input_date.day
df_pred.drop(['Unnamed: 0', 'date'], inplace = True, axis = 1)
prediction = lin_model.predict(df_pred)
prediction_data = pd.DataFrame({'drug_id': prediction[0][0], 'predictions': prediction[0][1], 'date': pd.to_datetime(df_pred[['year', 'month', 'day']], infer_datetime_format=True, errors = 'coerce')})
prediction_data = prediction_data.set_index(['date'])
prediction_data.sort_index(inplace = True)
# csd_prediction = ColumnDataSource(prediction_data)
return prediction_data
def make_plot(historical_data, prediction_data, title):
#Historical Data
plot = figure(plot_width=800, plot_height = 800, x_axis_type = 'datetime',
toolbar_location = 'below')
plot.xaxis.axis_label = 'Time'
plot.yaxis.axis_label = 'Price ($)'
plot.axis.axis_label_text_font_style = 'bold'
plot.x_range = DataRange1d(range_padding = 0.0)
plot.grid.grid_line_alpha = 0.3
plot.title.text = title
plot.line(x = 'date', y='nadac_per_unit', source = historical_data, line_color = 'blue', ) #plot historical data
plot.line(x = 'date', y='predictions', source = prediction_data, line_color = 'red') #plot prediction data (line from last date/price point to date, price point for input_date above)
return plot
def update_plot(attrname, old, new):
ver = vselect.value
new_hist_source = get_historical_data(src_hist, ver) #calls the function above to get the data instead of handling it here on its own
historical_data.data = ColumnDataSource.from_df(new_hist_source)
# new_pred_source = get_prediction_data(src_pred, ver)
# prediction_data.data = new_pred_source.data
#Import data source
src_hist = pd.read_csv('data/historical_data.csv')
src_pred = pd.read_csv('data/pred_data.csv')
#Prep for default view
#Initialize plot with ID number
ver = 781593600
#Set the prediction date
input_date = datetime.datetime(2020, 3, 31) #Make this selectable in future
#Select-menu options
menu_options = src_pred['ndc'].astype(str) #already contains unique values
#Create select (dropdown) menu
vselect = Select(value=str(ver), title='Drug ID', options=sorted((menu_options)))
#Prep datasets for plotting
historical_data = get_historical_data(src_hist, ver)
prediction_data = get_prediction_data(src_pred, ver)
#Create a new plot with the source data
plot = make_plot(historical_data, prediction_data, "Drug Prices")
#Update the plot every time 'vselect' is changed'
vselect.on_change('value', update_plot)
controls = row(vselect)
curdoc().add_root(row(plot, controls))
UPDATED: ERRORS:
1) No errors show up in Jupyter Notebook.
2) CLI shows a UserWarning: Pandas doesn't allow columns to be careated via a new attribute name, referencing `historical_data.data = ColumnDatasource.from_df(new_hist_source).
Ultimately, the plot should have a line for historical data, and another line or dot for predicted data derived from sklearn. It also has a dropdown menu to select each item to plot (one at a time).
Your update_plot is a no-op that does not actually make any changes to Bokeh model state, which is what is necessary to change a Bokeh plot. Changing Bokeh model state means assigning a new value to a property on a Bokeh object. Typically, to update a plot, you would compute a new data dict and then set an existing CDS from it:
source.data = new_data # plain python dict
Or, if you want to update from a DataFame:
source.data = ColumnDataSource.from_df(new_df)
As an aside, don't assign the .data from one CDS to another:
source.data = other_source.data # BAD
By contrast, your update_plot computes some new data and then throws it away. Note there is never any purpose to returning anything at all from any Bokeh callback. The callbacks are called by Bokeh library code, which does not expect or use any return values.
Lastly, I don't think any of those last JS console errors were generated by BokehJS.

Add outer borders using xlsxwritter

I have a dataframe and I want to set outer borders. I tried the below code but it adds border to each and every cell within 'A1:I83' range. I only want to add outer thick border:
border_format=workbook.add_format({
'border':1,
'align':'left',
'font_size':10
})
worksheet_rating_input.conditional_format('A1:I83' , { 'type' : 'no_blanks' , 'format' : border_format})
One way you could do it is to set the format of J1 through J183 with a thick left border and set the format of A184 to I184 to have a thick top border.
I've posted a fully reproducible example of this below. In my example, I make use of df.shape to set the borders dependant on dimensionality of my dataframe.
import xlsxwriter
import pandas as pd
import numpy as np
# Creating a dataframe
df = pd.DataFrame(np.random.randn(182, 9), columns=list('ABCDEFGHI'))
column_list = df.columns
# Create a Pandas Excel writer using XlsxWriter engine.
writer = pd.ExcelWriter("test.xlsx", engine='xlsxwriter')
df.to_excel(writer, index=False)
workbook = writer.book
worksheet = writer.sheets['Sheet1']
leftFormat = workbook.add_format({'left': 5})
topFormat = workbook.add_format({'top': 5})
for row in range(0, df.shape[0] + 1):
worksheet.write(row, df.shape[1], '', leftFormat)
for col in range(0, df.shape[1]):
worksheet.write(df.shape[0] + 1, col, '', topFormat)
worksheet.freeze_panes(1, 0) #freezing top row
writer.save()
With Expected Output:

Bokeh BoxPlot > KeyError: 'the label [SomeCategory] is not in the [index]'

I'm attempting to create a BoxPlot using Bokeh. When I get to the section where I need to identify outliers, it fails if a given category has no outliers.
If I remove the "problem" category, the BoxPlot executes properly. it's only when I attempt to create this BoxPlot with a category that has no outliers it fails.
Any instruction on how to remedy this?
The failure occurs at the commented section "Prepare outlier data for plotting [...]"
import numpy as np
import pandas as pd
import datetime
import math
from bokeh.plotting import figure, show, output_file
from bokeh.models import NumeralTickFormatter
# Create time stamps to allow for figure to display span in title
today = datetime.date.today()
delta1 = datetime.timedelta(days=7)
delta2 = datetime.timedelta(days=1)
start = str(today - delta1)
end = str(today - delta2)
#Identify location of prices
itemloc = 'Everywhere'
df = pd.read_excel(r'C:\Users\me\prices.xlsx')
# Create a list from the dataframe that identifies distinct categories for the separate box plots
cats = df['subcategory_desc'].unique().tolist()
# Find the quartiles and IQR for each category
groups = df.groupby('subcategory_desc', sort=False)
q1 = groups.quantile(q=0.25)
q2 = groups.quantile(q=0.5)
q3 = groups.quantile(q=0.75)
iqr = q3 - q1
upper = q3 + 1.5*iqr
lower = q1 - 1.5*iqr
# Find the outliers for each category
def outliers(group):
cat = group.name
return group[(group.price > upper.loc[cat][0]) | (group.price < lower.loc[cat][0])]['price']
out = groups.apply(outliers).dropna()
# Prepare outlier data for plotting, we need coordinates for every outlier.
outx = []
outy = []
for cat in cats:
# only add outliers if they exist
if not out.loc[cat].empty:
for value in out[cat]:
outx.append(cat)
outy.append(value)
I expect that the Box-and-whisker portion of categories with no outliers merely show up without the outlier dots.
Have you tried the code from official documentation, https://docs.bokeh.org/en/latest/docs/gallery/boxplot.html?
# prepare outlier data for plotting, we need coordinates for every outlier.
if not out.empty:
outx = []
outy = []
for keys in out.index:
outx.append(keys[0])
outy.append(out.loc[keys[0]].loc[keys[1]])

Resources