Bokeh ValueError: expected an element of either Seq(String) - python-3.x

I'm trying to build a simple bar chart via bokeh but struggling for it to recognize the x-axis and keep getting a ValueError... I think it needs to be in string format but for some reason whatever I try it just won't work. Please note, the column that contains the Years (as floats by the looks of it) is called RegionName, if it seems confusing. Please see my code below, any suggestions?
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
import os
from bokeh.palettes import Spectral5
from bokeh.transform import factor_cmap
os.chdir("C:/Users/Vladimir.Tikhnenko/Python/Land Reg")
# Pivot data
def pivot2(infile="Land Registry.csv", outfile="SalesVolume.csv"):
df=pd.read_csv(infile)
table=pd.pivot_table(df,index=
["RegionName"],columns="Year",values="SalesVolume",aggfunc=sum)
table.to_csv(outfile)
return table
pivot2()
# Transpose data
df=pd.read_csv("SalesVolume.csv")
df=df.drop(df.columns[1:28],1)
df=pd.read_csv("SalesVolume.csv", index_col=0, header=None).T
df.to_csv("C:\\Users\Vladimir.Tikhnenko\Python\Land
Reg\SalesVolume.csv",index=None)
df=pd.read_csv("SalesVolume.csv")
source = ColumnDataSource(df)
years = source.data['RegionName'].tolist()
p = figure(x_range=['RegionName'])
color_map = factor_cmap(field_name='RegionName',palette=Spectral5,
factors=years)
p.vbar(x='RegionName', top='Southwark', source=source, width=1,
color=color_map)
p.title.text ='Transactions'
p.xaxis.axis_label = 'Years'
p.yaxis.axis_label = 'Number of Sales'
show(p)
the error message is
ValueError: expected an element of either Seq(String), Seq(Tuple(String,
String)) or Seq(Tuple(String, String, String)), got [1968.0, 1969.0, 1970.0,
1971.0, 1972.0, 1973.0, 1974.0, 1975.0, 1976.0, 1977.0, 1978.0, 1979.0,
1980.0, 1981.0, 1982.0, 1983.0, 1984.0, 1985.0, 1986.0, 1987.0, 1988.0,
1989.0, 1990.0, 1991.0, 1992.0, 1993.0, 1994.0, 1995.0, 1996.0, 1997.0,
1998.0, 1999.0, 2000.0, 2001.0, 2002.0, 2003.0, 2004.0, 2005.0, 2006.0,
2007.0, 2008.0, 2009.0, 2010.0, 2011.0, 2012.0, 2013.0, 2014.0, 2015.0,
2016.0, 2017.0, 2018.0]

Categorical factors must only be strings (or sequences of strings for nested factors), so factor_cmap only accepts lists of those things. You passed it a list a numbers, which causes the error shown. To use use the years as categorical factors, you need to convert them to strings as suggested, and use those string values to initialize x_range, and for the coordinates to vbar.
Alternatively, if you want to use numerical values for the years, but just want to have fixed, controlled tick locations, do this:
p = figure() # don't pass x_range
p.xaxis.ticker = years
And then also use linear_cmap to map the numerical values (instead of factor_cmap)

Related

How to show alternative calendar dates in mplfinance?

TL;DR - The issue
I have an mplfinance plot based on a pandas dataframe in which the indices are in Georgian calendar format and I need to have them displayed as Jalali format.
My data and code
My data looks like this:
open high low close
date
2021-03-15 67330.0 69200.0 66870.0 68720.0
2021-03-16 69190.0 71980.0 69000.0 71620.0
2021-03-17 72450.0 73170.0 71700.0 71820.0
2021-03-27 71970.0 73580.0 70000.0 73330.0
2021-03-28 73330.0 73570.0 71300.0 71850.0
... ... ... ... ...
The first column is both a date and the index. This is required by mplfinance plot the data correctly;
Which I can plot with something like this:
import mplfinance as mpf
mpf.plot(chart_data.tail(7), figratio=(16,9), type="candle", style='yahoo', ylabel='', tight_layout=True, xrotation=90)
Where chart_data is the data above and the rest are pretty much formatting stuff.
What I have now
My chart looks like this:
However, the I need the dates to look like this: 1400-01-12. Here's a table of equivalence to further demonstrate my case.
2021-03-15 1399-12-25
2021-03-16 1399-12-26
2021-03-17 1399-12-27
2021-03-27 1400-01-07
2021-03-28 1400-01-08
What I've tried
Setting Jdates as my indices:
chart_data.index = history.jdate
mpf.plot(chart_data_j)
Throws this exception:
TypeError('Expect data.index as DatetimeIndex')
So I tried converting the jdates into datetimes:
chart_data_j.index = pd.to_datetime(history.jdate)
Which threw an out of bounds exception:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1398-03-18 00:00:00
So I though maybe changing the timezone/locale would be an option, so I tried changing the timezones, following the official docs:
pd.to_datetime(history.date).tz_localize(tz='US/Eastern')
But I got this exception:
raise TypeError(f"{ax_name} is not a valid DatetimeIndex or PeriodIndex")
And finally I tried using libraries such as PersianTools and pandas_jalali to no avail.
You can get this to work by creating your own custom DateFormatter class, and using mpf.plot() kwarg returnfig=True to gain access to the Axes objects to be able to install your own custom DateFormatter.
I have written a custom DateFormatter (see code below) that is aware of the special way that MPLfinance handles the x-axis when show_nontrading=False (i.e. the default value).
import pandas as pd
import mplfinance as mpf
import jdatetime as jd
import matplotlib.dates as mdates
from matplotlib.ticker import Formatter
class JalaliDateTimeFormatter(Formatter):
"""
Formatter for JalaliDate in mplfinance.
Handles both `show_nontrading=False` and `show_nontrading=True`.
When show_nonntrading=False, then the x-axis is indexed by an
integer representing the row number in the dataframe, thus:
Formatter for axis that is indexed by integer, where the integers
represent the index location of the datetime object that should be
formatted at that lcoation. This formatter is used typically when
plotting datetime on an axis but the user does NOT want to see gaps
where days (or times) are missing. To use: plot the data against
a range of integers equal in length to the array of datetimes that
you would otherwise plot on that axis. Construct this formatter
by providing the arrange of datetimes (as matplotlib floats). When
the formatter receives an integer in the range, it will look up the
datetime and format it.
"""
def __init__(self, dates=None, fmt='%b %d, %H:%M', show_nontrading=False):
self.dates = dates
self.len = len(dates) if dates is not None else 0
self.fmt = fmt
self.snt = show_nontrading
def __call__(self, x, pos=0):
'''
Return label for time x at position pos
'''
if self.snt:
jdate = jd.date.fromgregorian(date=mdates.num2date(x))
formatted_date = jdate.strftime(self.fmt)
return formatted_date
ix = int(round(x,0))
if ix >= self.len or ix < 0:
date = None
formatted_date = ''
else:
date = self.dates[ix]
jdate = jd.date.fromgregorian(date=mdates.num2date(date))
formatted_date = jdate.strftime(self.fmt)
return formatted_date
# ---------------------------------------------------
df = pd.read_csv('so_67001540.csv',index_col=0,parse_dates=True)
mpf.plot(df,figratio=(16,9),type="candle",style='yahoo',ylabel='',xrotation=90)
dates = [mdates.date2num(d) for d in df.index]
formatter = JalaliDateTimeFormatter(dates=dates,fmt='%Y-%m-%d')
fig, axlist = mpf.plot(df,figratio=(16,9),
type="candle",style='yahoo',
ylabel='',xrotation=90,
returnfig=True)
axlist[0].xaxis.set_major_formatter(formatter)
mpf.show()
The file 'so_67001540.csv' looks like this:
date,open,high,low,close,alt_date
2021-03-15,67330.0,69200.0,66870.0,68720.0,1399-12-25
2021-03-16,69190.0,71980.0,69000.0,71620.0,1399-12-26
2021-03-17,72450.0,73170.0,71700.0,71820.0,1399-12-27
2021-03-27,71970.0,73580.0,70000.0,73330.0,1400-01-07
2021-03-28,73330.0,73570.0,71300.0,71850.0,1400-01-08
When you run the above script, you should get the following two plots:
Have you tried making these dates
1399-12-25
1399-12-26
1399-12-27
1400-01-07
1400-01-08
the index of the dataframe (maybe that's what you mean by "swapping the indices"?) and set kwarg datetime_format='%Y-%m-%d' ?
I think that should work.
UPDATE:
It appears to me that the problem is that
mplfinace requires a Pandas.DatetimeIndex as the index of your dataframe, and
Pandas.DatetimeIndex is made up of Pandas.Timestamp objects, and
Pandas.Timestamp has limits which preclude dates having years less than 1677:
In [1]: import pandas as pd
In [2]: pd.Timestamp.max
Out[2]: Timestamp('2262-04-11 23:47:16.854775807')
In [3]: pd.Timestamp.min
Out[3]: Timestamp('1677-09-21 00:12:43.145225')
I am going to poke around and see if I can come up with another solution. Internally Matplotlib dates can go to year zero.

How to add traces in plotly.express

I am very new to python and plotly.express, and I find it very confusing...
I am trying to use the principle of adding different traces to my figure, using example code shown here https://plotly.com/python/line-charts/, Line Plot Modes, #Create traces.
BUT I get my data from a .CSV file.
import plotly.express as px
import plotly as plotly
import plotly.graph_objs as go
import pandas as pd
data = pd.read_csv(r"C:\Users\x.csv")
fig = px.scatter(data, x="Time", y="OD", color="C-source", size="C:A 1 ratio")
fig = px.line(data, x="Time", y="OD", color="C-source")
fig.show()
The above lines produces scatter/line plots with the correct data, but the data is mixed together. I have data from 2 different sources marked by a column named "Strain" in my .csv file that I would like the chart to reflect.
Is the traces option a possible way to do it, or is there another way?
You can add traces using an Express plot by using .select_traces(). Something like:
fig.add_traces(
list(px.line(...).select_traces())
)
Note the need to convert to list, since .select_traces() returns a generator.
It looks like you probably want the lines with the scatter dots as well on a single plot?
You're setting fig to equal px.scatter() and then setting (changing) it to equal px.line(). When set to line, the scatter plot is overwritten.
You're already importing graph objects so you can use add_trace with go, something like this:
fig.add_trace(go.Scatter(x=data["Time"], y=data["OD"], mode='markers', marker=dict(color=data["C-source"], size=data["C:A 1 ratio"])))
Depending on how your data is set up, you may need to add each C-source separately doing something like:
x=data.query("C-source=='Term'")["Time"], ... , name='Term'`
Here's a few references with examples and options you can use to set up your scatter:
Scatter plot examples  
Marker styles  
Scatter arguments and attributes
You can use the apporach stated in Plotly: How to combine scatter and line plots using Plotly Express?
fig3 = go.Figure(data=fig1.data + fig2.data)
or a more convenient and scalable approach:
fig1.data and fig2.data are common tuples that hold all the info needed for a plot and the + just concatenates them.
# this will hold all figures until they are combined
all_figures = []
# data_collection: dictionary with Pandas dataframes
for df_label in data_collection:
df = data_collection[df_label]
fig = px.line(df, x='Date', y=['Value'])
all_figures.append(fig)
import operator
import functools
# now you can concatenate all the data tuples
# by using the programmatic add operator
fig3 = go.Figure(data=functools.reduce(operator.add, [_.data for _ in all_figures]))
fig3.show()
thanks for taking the time to help me out. I ended up with two solutions that worked, of which using "facet_col" to divide the plot into two subplots (1 for each strain) was the most simple solution.
https://plotly.com/python/axes/
Thanks. this worked for me also where Fig_Set_B is a list of scatter plots
# create a tuple of first line plots in first 6 plots from plot set Fig_Set_B`
fig_combined = go.Figure(data= tuple(Fig_Set_B[x].data[0] for x in range(6)) )
fig_combined.show()

Bokeh plot line not updating after checking CheckboxGroup in server mode (python callback)

I have just initiated myself to Bokeh library and I would like to add interactivity in my dashboard. To do so, I want to use CheckboxGroup widget in order to select which one of a pandas DataFrame column to plot.
I have followed tutorials but I must have misunderstood the use of ColumnDataSource as I cannot make a simple example work...
I am aware of previous questions on the matter, and one that seems relevant on the StackOverflow forum is the latter :
Bokeh not updating plot line update from CheckboxGroup
Sadly I did not succeed in reproducing the right behavior.
I have tried to reproduce an example following the same updating structure presented in Bokeh Server plot not updating as wanted, also it keeps shifting and axis information vanishes by #bigreddot without success.
import numpy as np
import pandas as pd
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.palettes import Spectral
from bokeh.layouts import row
from bokeh.models.widgets import CheckboxGroup
from bokeh.io import curdoc
# UPDATE FUNCTION ------------------------------------------------
# make update function
def update(attr, old, new):
feature_selected_test = [feature_checkbox.labels[i] for i in feature_checkbox.active]
# add index to plot
feature_selected_test.insert(0, 'index')
# create new DataFrame
new_df = dummy_df.filter(feature_selected_test)
plot_src.data = ColumnDataSource.from_df(data=new_df)
# CREATE DATA SOURCE ------------------------------------------------
# create dummy data for debugging purpose
index = list(range(0, 890))
index.extend(list(range(2376, 3618)))
feature_1 = np.random.rand(len(index))
feature_2 = np.random.rand(len(index))
feature_3 = np.random.rand(len(index))
feature_4 = np.random.rand(len(index))
dummy_df = pd.DataFrame(dict(index=index, feature_1=feature_1, feature_2=feature_2, feature_3=feature_3,feature_4=feature_4))
# CREATE CONTROL ------------------------------------------------------
# list available data to plot
available_feature = list(dummy_df.columns[1:])
# initialize control
feature_checkbox = CheckboxGroup(labels=available_feature, active=[0, 1], name='checkbox')
feature_checkbox.on_change('active', update)
# INITIALIZE DASHBOARD ---------------------------------------------------
# initialize ColumnDataSource object
plot_src = ColumnDataSource(dummy_df)
# create figure
line_fig = figure()
feature_selected = [feature_checkbox.labels[i] for i in feature_checkbox.active]
# feature_selected = ['feature_1', 'feature_2', 'feature_3', 'feature_4']
for index_int, col_name_str in enumerate(feature_selected):
line_fig.line(x='index', y=col_name_str, line_width=2, color=Spectral[11][index_int % 11], source=plot_src)
curdoc().add_root(row(feature_checkbox, line_fig))
The program should work with a copy/paste... well without interactivity...
Would someone please help me ? Thanks a lot in advance.
You are only adding glyphs for the initial subset of selected features:
for index_int, col_name_str in enumerate(feature_selected):
line_fig.line(x='index', y=col_name_str, line_width=2, color=Spectral[11][index_int % 11], source=plot_src)
So that is all that is ever going to show.
Adding new columns to the CDS does not automatically make anything in particular happen, it's just extra data that is available for glyphs or hover tools to potentially use. To actually show it, there have to be glyphs configured to display those columns. You could do that, add and remove glyphs dynamically, but it would be far simpler to just add everything once up front, and use the checkbox to toggle only the visibility. There is an example of just this in the repo:
https://github.com/bokeh/bokeh/blob/master/examples/app/line_on_off.py
That example passes the data as literals the the glyph function but you could put all the data in CDS up front, too.

Not able to make a class in Python and output it to excel

I'm fairly new in Python and working in a inventory management position.
One important thing in inventory management is calculating the safety stock.
So, this is what I'm trying to achieve.
I have imported a file with 3 columns; FR, sigma and LT for 3 rows. See hereunder the code and the output:
code:
import pandas as pd
df = pd.read_excel("Desktop\\TestPython4.xlsx")
xcol=["FR","sigma","LT"]
x=df[xcol].values
output:
snapshot
To calculate the safety stock, I have the following (simplified) formula of it;
CDF(FR)*sigma*sqrt(LT)
where CDF is the cumulative distribution function of the normal distribution and FR is a number between 0 and 1 (thus the well-knowned z-value is the output).
I want to output a the file with an extra column that displays the safety stock.
For this I made a class safetystock with the following code:
class Safetystock:
def __init__(self,FR,sigma,LT):
self.FR = FR
self.sigma = sigma
self.LT = LT
pass
def calculate():
SS=st.norm.ppf(FR)
return print(SS*sigma*np.sqrt(LT))
pass
Then I made the variable: "output"
Output = Safetystock(df.FR,df.sigma,df.LT)
I said that the data in the file needs to be taken into account.
Then I added a column to df, named output that needs to contain the variable "Output":
df["output"]=Output
Now, when I want to call df, it gives me this:
actual output
What am I doing wrong?
Cheers,
Steven
What about
import pandas as pd
import numpy as np
import scipy.stats as st
df = pd.read_excel("Desktop\\TestPython4.xlsx")
df["output"] = st.norm.ppf(df.FR)*df.sigma*np.sqrt(df.LT)

Matplotlib: Import and plot multiple time series with legends direct from .csv

I have several spreadsheets containing data saved as comma delimited (.csv) files in the following format: The first row contains column labels as strings ('Time', 'Parameter_1'...). The first column of data is Time and each subsequent column contains the corresponding parameter data, as a float or integer.
I want to plot each parameter against Time on the same plot, with parameter legends which are derived directly from the first row of the .csv file.
My spreadsheets have different numbers of (columns of) parameters to be plotted against Time; so I'd like to find a generic solution which will also derive the number of columns directly from the .csv file.
The attached minimal working example shows what I'm trying to achieve using np.loadtxt (minus the legend); but I can't find a way to import the column labels from the .csv file to make the legends using this approach.
np.genfromtext offers more functionality, but I'm not familiar with this and am struggling to find a way of using it to do the above.
Plotting data in this style from .csv files must be a common problem, but I've been unable to find a solution on the web. I'd be very grateful for your help & suggestions.
Many thanks
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
data = np.loadtxt('Data.csv', skiprows=1, delimiter=',') # skip the column labels
cols = data.shape[1] # get the number of columns in the array
for n in range (1,cols):
plt.plot(data[:,0],data[:,n]) # plot each parameter against time
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
plt.show()
Here's my minimal working example for the above using genfromtxt rather than loadtxt, in case it is helpful for anyone else.
I'm sure there are more concise and elegant ways of doing this (I'm always happy to get constructive criticism on how to improve my coding), but it makes sense and works OK:
import numpy as np
import matplotlib.pyplot as plt
arr = np.genfromtxt('Data.csv', delimiter=',', dtype=None) # dtype=None automatically defines appropriate format (e.g. string, int, etc.) based on cell contents
names = (arr[0]) # select the first row of data = column names
for n in range (1,len(names)): # plot each column in turn against column 0 (= time)
plt.plot (arr[1:,0],arr[1:,n],label=names[n]) # omitting the first row ( = column names)
plt.legend()
plt.show()
The function numpy.genfromtxt is more for broken tables with missing values rather than what you're trying to do. What you can do is simply open the file before handing it to numpy.loadtxt and read the first line. Then you don't even need to skip it. Here is an edited version of what you have here above that reads the labels and makes the legend:
"""
Example data: Data.csv:
Time,Parameter_1,Parameter_2,Parameter_3
0,10,0,10
1,20,30,10
2,40,20,20
3,20,10,30
"""
import numpy as np
import matplotlib.pyplot as plt
#open the file
with open('Data.csv') as f:
#read the names of the colums first
names = f.readline().strip().split(',')
#np.loadtxt can also handle already open files
data = np.loadtxt(f, delimiter=',') # no skip needed anymore
cols = data.shape[1]
for n in range (1,cols):
#labels go in here
plt.plot(data[:,0],data[:,n],label=names[n])
plt.xlabel('Time',fontsize=14)
plt.ylabel('Parameter values',fontsize=14)
#And finally the legend is made
plt.legend()
plt.show()

Resources