Nested categories, source = ColumnDataSource problem with Bokeh: - nested

I just can't see what I'm doing wrong. Please help me.
I would like to output a bar chart for nested categories.
If I comment out source, and I don't add source=source in p1.vbar, I get something... but I can't figure out for the life of me how to get the bars to render.
Thank you for your help and time.
Here is the code:
colors=['#90be6d','#adb5bd','#d90429']
x = [('positive', 'k1'), ('positive', 'k2'),
('positive', 'k3'), ('positive', 'k4'),
('positive', 'k5'), ('positive', 'k6'),
('positive', 'k7'), ('positive', 'k8'),
('positive', 'k9'), ('positive', 'k10'),
('positive', 'k11'), ('positive', 'k12'),
('positive', 'k13'),
('neutral', 'k1'), ('neutral', 'k2'),
('neutral', 'k3'), ('neutral', 'k4'),
('neutral', 'k5'), ('neutral', 'k6'),
('neutral', 'k7'), ('neutral', 'k8'),
('negative', 'k1'), ('negative', 'k2'),
('negative', 'k3'), ('negative', 'k4'),
('negative', 'k5'), ('negative', 'k6'),
('negative', 'k7'), ('negative', 'k8'),
('negative', 'k9'), ('negative', 'k10'),
('negative', 'k11')]
counts = (404, 190, 174, 213, 178, 152, 146, 173, 97,
81, 88, 77, 144, 60, 26, 31, 30, 44, 42, 32,
135, 302, 88, 68, 96, 72, 87, 72, 47, 59, 42, 113)
source = ColumnDataSource(data=dict(x=x, counts=counts)) # here is my problem...
p1 = figure(x_range=FactorRange(*x), plot_height=400, title="test",
toolbar_location=None, tools="")
p1.vbar(x='x', top='counts', width=.9, color=colors, source=source)
p1.y_range.start = 0
p1.x_range.range_padding = 0.1
p1.xaxis.major_label_orientation = 1
p1.xgrid.grid_line_color = None
show(p1)
Thank you.

Your problem is not with the source but with colors. When you run your code, you should see this error:
Traceback (most recent call last):
File "/.../test.py", line 33, in <module>
p1.vbar(x='x', top='counts', width=.9, color=colors, source=source)
File "/.../bokeh/plotting/_decorators.py", line 54, in wrapped
return create_renderer(glyphclass, self, **kwargs)
File "/.../bokeh/plotting/_renderer.py", line 94, in create_renderer
raise RuntimeError(_GLYPH_SOURCE_MSG % nice_join(incompatible_literal_spec_values, conjuction="and"))
RuntimeError:
Expected fill_color and line_color to reference fields in the supplied data source.
When a 'source' argument is passed to a glyph method, values that are sequences
(like lists or arrays) must come from references to data columns in the source.
For instance, as an example:
source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circle(x='x', y='y', source=source, ...) # pass column names and a source
Alternatively, *all* data sequences may be provided as literals as long as a
source is *not* provided:
p.circle(x=a_list, y=an_array, ...) # pass actual sequences and no source
It's pretty self-descriptive, except for one part - it mentions fill_color and line_color argument which you don't use. The thing is, specifying color for vbar is actually a shorthand for specifying both fill_color and line_color, that's why the error mentions them.
What you gotta do to fix the issue is to provide the colors in a way that Bokeh understands. Either create a separate color column in the data source (it has to be the same length as all other columns, so you'll have to repeat some values) or create a categorical color mapper. An example of the latter can be seen here: https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html?highlight=categoricalcolormapper#colors

Related

Scikit-learn StackingClassifier with missing values

When I call model.predict_proba(X) on my StackingClassifier model the run execution crashes because the library is calling a method assert_all_finite() to check whether my dataframe contains missing values.
Since the estimators I stacked are able to handle missing values, I don't see the reason why this should happen and I didn't find anything in the documentation that says that the StackingClassifier requires data without missing values.
It's a bit hard for me to come up with a short reproducibile snippet of code given that it comes from several layers of model abstraction, but I can print out the model effectively raising the error call.
p = model.predict_proba(X_loyal)
where model is:
StackingClassifier(estimators=[('ExtraTreesClassifier_117',
ExtraTreesClassifier(bootstrap=True,
class_weight={0: 1, 1: 5},
criterion='entropy',
max_depth=11,
max_features='log2',
max_samples=0.5946040593595099,
min_samples_leaf=2,
n_estimators=163,
random_state=117)),
('RandomForestClassifier_117',
RandomForestClassifier(class_weight={0: 1,
1: 5},
criterion='entropy',
max_depth=11,
max_features='log2',
max_samples=0.5946040593595099,
min_samples_leaf=2,
n_estimators=163,
random_state=117)),
('LGBMClassifier_117',
LGBMClassifier(class_weight={0: 1, 1: 1},
deterministic=True, max_depth=9,
n_estimators=183, num_leaves=3,
subsample=0.2986274713775564,
verbose=-1))])
Error
Traceback (most recent call last):
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-12-eafa75c49322>", line 1, in <module>
model.predict_proba(X_loyal)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/metaestimators.py", line 120, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 485, in predict_proba
return self.final_estimator_.predict_proba(self.transform(X))
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 522, in transform
return self._transform(X)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 215, in _transform
predictions = [
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py", line 216, in <listcomp>
getattr(est, meth)(X)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 674, in predict_proba
X = self._validate_X_predict(X)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/ensemble/_forest.py", line 422, in _validate_X_predict
return self.estimators_[0]._validate_X_predict(X, check_input=True)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/tree/_classes.py", line 407, in _validate_X_predict
X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr",
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/base.py", line 421, in _validate_data
X = check_array(X, **check_params)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/validation.py", line 720, in check_array
_assert_all_finite(array,
File "/home/mlpoc/miniconda3/envs/churn/lib/python3.8/site-packages/sklearn/utils/validation.py", line 103, in _assert_all_finite
raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
Versions
sklearn.__version__
Out[6]: '0.24.2'
lightgbm.__version__
Out[8]: '3.2.1'

Remove title from ScatterChart

I would like to know how to be able to assign a name just to a serie of data (using a scatterchart), without getting the same serie's name as title of the chart. I would like to get just the series name as legend and NOT chart title at all.
I realized that autoTitleDeleted needs to have a value of 1.
So, consulting documentation I found that ´chartContainer´ is the class to implement the aforementioned option. Therefore, I import the class and apply it as next:
from openpyxl import Workbook
from openpyxl.chart import (
ScatterChart,
Reference,
Series,
)
from openpyxl.chart.chartspace import ChartContainer
wb = Workbook()
ws = wb.active
rows = [
["Size", "Batch 1", "Batch 2"],
[2, 40, 30],
[3, 40, 25],
[4, 50, 30],
[5, 30, 25],
[6, 25, 35],
[7, 20, 40],
]
for row in rows:
ws.append(row)
chart2 = ScatterChart()
chart2.x_axis.title = "Size"
chart2.y_axis.title = "Percentage"
xvalues = Reference(ws, min_col = 1, min_row = 2, max_row = 7)
values = Reference(ws, min_col = 3, min_row = 1, max_row = 7)
series1 = Series(values, xvalues)
chart2.series.append(series1)
chart2 = ChartContainer(autoTitleDeleted = 1)
ws.add_chart(chart2, "J10")
wb.save("Ex1.xlsx")
However, next error comes up:
`runfile('C:/Users/Administrador/Desktop/Pre-Try/Ex1/Ex1.py', wdir='C:/Users/Administrador/Desktop/Pre-Try/Ex1')
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/Administrador/Desktop/Pre-Try/Ex1/Ex1.py', wdir='C:/Users/Administrador/Desktop/Pre-Try/Ex1')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Administrador/Desktop/Pre-Try/Ex1/Ex1.py", line 46, in
wb.save("Ex1.xlsx")
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\workbook\workbook.py", line 397, in save
save_workbook(self, filename)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 294, in save_workbook
writer.save()
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 276, in save
self.write_data()
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 76, in write_data
self._write_worksheets()
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 219, in _write_worksheets
self._write_drawing(ws._drawing)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 142, in _write_drawing
self._archive.writestr(drawing.path[1:], tostring(drawing._write()))
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\drawing\spreadsheet_drawing.py", line 296, in _write
self._rels.append(rel)
UnboundLocalError: local variable 'rel' referenced before assignment`
I do not really understand this error. If you could help would be grateful!
chart2 = ChartContainer(autoTitleDeleted = 1)
This the problem: you must use openpyxl chart objects so that the library can manage the plumbing between objects: charts are very complicated objects and we try and hide some of the complexity Your idea is correct - to set the value to True – but that won't work like this because a ChartContainer doesn't know how to add itself to the XLSX package.
As a workaround it's probably easiest create a title with an empty string. Otherwise you could submit a PR that allows the mapping of the autoTitleDeleted attribute from chartContainers to charts and vice versa.

Creating and accessing datasets in an HDF5 file

I am trying to create an HDF5 file with two datasets, 'data' and 'label'. When I tried to access the said file, however, I got an error as follows:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1.4\helpers\pydev\pydevd.py", line 1664, in <module>
main()
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1.4\helpers\pydev\pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1.4\helpers\pydev\pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/pycharm/Input_Pipeline.py", line 140, in <module>
data_h5 = f['data'][:]
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "C:\Users\u20x47\PycharmProjects\PCL\venv\lib\site-packages\h5py\_hl\group.py", line 177, in __getitem__
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5o.pyx", line 190, in h5py.h5o.open
ValueError: Not a location (invalid object ID)
Code used to create the dataset:
h5_file.create_dataset('data', data=data_x, compression='gzip', compression_opts=4, dtype='float32')
h5_file.create_dataset('label', data=label, compression='gzip', compression_opts=1, dtype='uint8')
data_x an array of arrays. Each element in data_x is a 3D array of 1024 elements.
label is an array of arrays as well. Each element is a 1D array of 1 element.
Code to access the said file:
f = h5_file
data_h5 = f['data'][:]
label_h5 = f['label'][:]
print (data_h5, label_h5)
How can I fix this? Is this a syntax error or a logical one?
I was unable to reproduce the error.
Maybe you forgot to close the file or you change the content of your h5 during execution.
Also you can use print h5_file.items() to check the content of your h5 file
Tested code:
import h5py
import numpy as np
h5_file = h5py.File('test.h5', 'w')
# bogus data with the correct size
data_x = np.random.rand(16,8,8)
label = np.random.randint(100, size=(1,1),dtype='uint8')
#
h5_file.create_dataset('data', data=data_x, compression='gzip', compression_opts=4, dtype='float32')
h5_file.create_dataset('label', data=label, compression='gzip', compression_opts=1, dtype='uint8')
h5_file.close()
h5_file = h5py.File('test.h5', 'r')
f = h5_file
print f.items()
data_h5 = f['data'][...]
label_h5 = f['label'][...]
print (data_h5, label_h5)
h5_file.close()
Produces
[(u'data', <HDF5 dataset "data": shape (16, 8, 8), type "<f4">), (u'label', <HDF5 dataset "label": shape (1, 1), type "|u1">)]
(array([[[4.36837107e-01, 8.05664659e-01, 3.34415197e-01, ...,
8.89135897e-01, 1.84097692e-01, 3.60782951e-01],
[8.86442482e-01, 6.07181549e-01, 2.42844030e-01, ...,
[4.24369454e-01, 6.04596496e-01, 5.56676507e-01, ...,
7.22884715e-01, 2.45932683e-01, 9.18777227e-01]]], dtype=float32), array([[25]], dtype=uint8))

ValueError: Shapes (?, 83) and (?, 128) are incompatible

I'm getting the following error when trying to create a LSTM using TensorFlow:
ValueError: Shapes (?, 83) and (?, 128) are incompatible
The input to the model have the following shape:
(batch_size, time, features) / (50, 68, 83)
Here is the relevant code for the model:
x_text = tf.placeholder(tf.float32, [None, *text.shape[1:]])
cells = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.ResidualWrapper(
tf.contrib.rnn.BasicLSTMCell(num_units=units))
for units in [128, 256]
])
text_outputs, text_state = tf.nn.dynamic_rnn(
cell=cells,
inputs=x_text,
dtype=tf.float32,
)
I've tried for so long to figure out what's wrong, but I can't. I've search the entire internett (no, really!) and no one seems to be having the same problem where shapes (?, a) and (?, b) are the problem, but rather all other combinations where the solutions has not helped.
Oh - and here's the stack trace:
Traceback (most recent call last):
File "residual_hierachical_rnn.py", line 97, in <module>
dtype=tf.float32,
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn.py", line 627, in dynamic_rnn
dtype=dtype)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn.py", line 824, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3224, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2956, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2893, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 3194, in <lambda>
body = lambda i, lv: (i + 1, orig_body(*lv))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn.py", line 795, in _time_step
(output, new_state) = call_cell()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn.py", line 781, in <lambda>
call_cell = lambda: cell(input_t, state)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 232, in __call__
return super(RNNCell, self).__call__(inputs, state)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py", line 717, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1292, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1168, in __call__
res_outputs = (self._residual_fn or default_residual_fn)(inputs, outputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1166, in default_residual_fn
nest.map_structure(assert_shape_match, inputs, outputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 375, in map_structure
structure[0], [func(*x) for x in entries])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py", line 375, in <listcomp>
structure[0], [func(*x) for x in entries])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1163, in assert_shape_match
inp.get_shape().assert_is_compatible_with(out.get_shape())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_shape.py", line 844, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (?, 83) and (?, 128) are incompatible
Thank you so much in advance for any help!!!
As MPKenning says, your LSTM cells should match your input features.
But the fact that you're using a ResidualWrapper forces you to keep the same depth because what it does is summing up the inputs and the outputs of the cells.
If you remove the ResidualWrapper it should work with [83, 256]:
cells = tf.contrib.rnn.MultiRNNCell([
tf.contrib.rnn.BasicLSTMCell(num_units=units)
for units in [83, 256]
])
The unit-size of your LSTM cells should match the number of features. To fix this, use [83, 256].
Also, I'm aware that the connections between the LSTM layers are fully connected, but from what I've been told it's better to keep the unit size in the layers consistent to make things less confusing. In other words, consider using [83, 83] for your unit-sizes.

Python Deap GP Evaluating individual causes error

I am currently experiencing an issue whenever I try to evaluate an individual using the GP portion of DEAP.
I receive the following error:
Traceback (most recent call last):
File "ImageGP.py", line 297, in <module>
pop, logs = algorithms.eaSimple(pop, toolbox, 0.9, 0.1, 60, stats=mstats, halloffame=hof, verbose=True)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/deap/algorithms.py", line 148, in eaSimple
for ind, fit in zip(invalid_ind, fitnesses):
File "ImageGP.py", line 229, in evalFunc
func = toolbox.compile(expr=individual)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/deap/gp.py", line 451, in compile
return eval(code, pset.context, {})
File "<string>", line 1
lambda oValue,oAvg13,oAvg17,oAvg21,sobelVal(v),sobelVal(h),edgeVal,blotchVal: [[[0, 75, 82.2857142857, 83.0, 82.9090909091, 4, 12, 4, 180], ... Proceed to print out all of my data ... [0, 147, 151.244897959, 150.728395062, 150.73553719, 248, 244, 5, 210]]]
^
SyntaxError: invalid syntax
If anyone has any ideas about what could be causing this problem, then I would really appreciate some advice. My current evaluation function looks like this:
def evalFunc(individual, data, points):
func = toolbox.compile(expr=individual)
total = 1.0
for point in points:
tmp = [float(x) for x in data[point[1]][point[0]][1:9]]
total += int((0 if (func(*tmp)) < 0 else 1) == points[2])
print ("Fitness: " + str(total))
return total,
Where the data contains the data being used (the values for the 8 variables listed in the error) and point specifying the x and y co-ordinates from which to get those 8 values. Thank you for your suggestions!

Resources