Problem Python-weka-wrapper3 with GridSearch - python-3.x

Hi i'm haveing problems with this code, i'm trying to make a gridsearch for optimize some properties
from weka.classifiers import GridSearch
grid = GridSearch(options=["-sample-size", "100.0", "-traversal", "ROW-WISE", "-num-slots", "1", "-S", "1"])
grid.evaluation = "CC"
grid.y = {"property": "kernel.gamma", "min": -3.0, "max": 3.0, "step": 1.0, "base": 10.0, "expression": "pow(BASE,I)"}
grid.x = {"property": "C", "min": -3.0, "max": 3.0, "step": 1.0, "base": 10.0, "expression": "pow(BASE,I)"}
cls = Classifier(
classname="weka.classifiers.functions.SMOreg",
options=["-K", "weka.classifiers.functions.supportVector.RBFKernel"])
grid.classifier = cls
grid.build_classifier(train)
print("Model:\n" + str(grid))
print("\nBest setup:\n" + grid.best.to_commandline())
I'm haveing this problems:
Failed to get class weka.classifiers.meta.GridSearch
Exception in thread "Thread-0" java.lang.NoClassDefFoundError: weka.classifiers.meta.GridSearch
Caused by: java.lang.ClassNotFoundException: weka.classifiers.meta.GridSearch
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/tmp/ipykernel_11771/802912337.py in <module>
9 options=["-K", "weka.classifiers.functions.supportVector.RBFKernel"])
10 grid.classifier = cls
---> 11 grid.build_classifier(train)
12 print("Model:\n" + str(grid))
13 print("\nBest setup:\n" + grid.best.to_commandline())
NameError: name 'train' is not defined
Any sugestions?

The GridSearch and MultiSearch meta-classifiers are available through Weka packages, they are not part of the core Weka distribution. In order to use them you need to install the respective packages.
If you want use pww3 for installing/managing packages, then take a look at the examples on packages.

Related

AFTER INSTALLING MAYAVI USING PIP - ModuleNotFoundError: No module named 'mayavi'

I am using vs code
to plot a series of XYZ coordinats, and I looked for a way to convert the scatter points plot into a 3d plot, which can be exported to an STL file.
As I was searching for a solution i stumbled upon the "mayavi" library, which to my impression is the best for my needs.
I tried to run one of the example codes listed on their site and since then i am losing it.
I have successfully installed "mayavi", but I still get an error, I've looked everywhere and have no solution.
*installed properly Vtk and PyQt
here's the full error and code:
import numpy as np
from mayavi.mlab import *
from mayavi import mlab
def test_points3d():
t = np.linspace(0, 4 * np.pi, 20)
x = np.sin(2 * t)
y = np.cos(t)
z = np.cos(2 * t)
s = 2 + np.sin(t)
return (x, y, z, s, colormap="copper", scale_factor=.25)
# View it.
test_points3d()
s = mlab.mesh()
mlab.show()
[{
"resource": "/c:/Users/dvirc/OneDrive/Desktop/dvir/.vs/Desktop/v16/DvirCodes/trysurf.py",
"owner": "generated_diagnostic_collection_name#2",
"code": {
"value": "reportMissingImports",
"target": {
"$mid": 1,
"external": "https://github.com/microsoft/pylance-release/blob/main/DIAGNOSTIC_SEVERITY_RULES.md#diagnostic-severity-rules",
"path": "/microsoft/pylance-release/blob/main/DIAGNOSTIC_SEVERITY_RULES.md",
"scheme": "https",
"authority": "github.com",
"fragment": "diagnostic-severity-rules"
}
},
"severity": 4,
"message": "Import "mayavi" could not be resolved",
"source": "Pylance",
"startLineNumber": 3,
"startColumn": 6,
"endLineNumber": 3,
"endColumn": 12
}]

Get feature importance for the model trained using RandomizedSearchCV and Multinomial Naiye Bayes

After fitting the model, can not get the feature importance. I have done following steps:
model_bow = RandomizedSearchCV(MultinomialNB(class_prior=[0.5,0.5]),param_distributions={'alpha':alpha},scoring='roc_auc',n_iter=10,return_train_score=True,)
model_bow.fit(X_train_en_bow,y_train)
model_bow.best_params_ gives me best value is {'alpha': 0.5}
now if i want the feature importance using model_bow.estimator.feature_log_prob_, it gives an error as below.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-40-cc27acc11117> in <module>
----> 1 model_bow.estimator.feature_log_prob_()
AttributeError: 'MultinomialNB' object has no attribute 'feature_log_prob_'
when i print model_bow it shows
param_distributions={'alpha': [1e-05, 0.0001, 0.001, 0.01,
0.1, 0.5, 1, 5, 10, 50,
100]},
return_train_score=True, scoring='roc_auc')
Please advice where i am missing !
To get feature importance, get the best model trained using
model_bow.best_params_
then you can call the method coef_ (will get depreciated soon) or feature_log_prob_ to get feature importance.

Get feature importance with PySpark and XGboost

I have trained a model using XGboost and PySpark
params = {
'eta': 0.1,
'gamma': 0.1,
'missing': 0.0,
'treeMethod': 'gpu_hist',
'maxDepth': 10,
'maxLeaves': 256,
'growPolicy': 'depthwise',
'objective': 'binary:logistic',
'minChildWeight': 30.0,
'lambda_': 1.0,
'scalePosWeight': 2.0,
'subsample': 1.0,
'nthread': 1,
'numRound': 100,
'numWorkers': 1,
}
classifier = XGBoostClassifier(**params).setLabelCol(label).setFeaturesCols(features)
model = classifier.fit(train_data)
When I try to get the feature importance using
model.nativeBooster.getFeatureScore()
It returns the following error:
Py4JError: An error occurred while calling o2167.getFeatureScore. Trace:
py4j.Py4JException: Method getFeatureScore([]) does not exist
Is there a correct way of getting feature importance when using XGboost with PySpark
I am a newbie in this field. I happened to encounter what you are experiencing. You may want to try using: model.nativeBooster.getScore("", "gain") or model.nativeBooster.getFeatureScore('').
My 'model' is of type "sparkxgb.xgboost.XGBoostClassificationModel".
Regards

Fit convergence failure in pyhf for small signal model

(This is a question that we (the pyhf dev team) recently got and thought was good and worth sharing. So we're posting a modified version of it here.)
I am trying to do a simple hypothesis test with pyhf v0.4.0. The model I am using has a small signal and so I need to scan signal strengths almost all the way out to mu=100. However, I am consistently getting a convergence problem. Why is the fit failing to converge?
The following is my environment, the code I'm using, and my error.
Environment
$ "$(which python3)" --version
Python 3.7.5
$ python3 -m venv "${HOME}/.venvs/example"
$ . "${HOME}/.venvs/example/bin/activate"
(example) $ python -m pip install --upgrade pip setuptools wheel
(example) $ cat requirements.txt
pyhf~=0.4.0
black
(example) $ python -m pip install -r requirements.txt
(example) $ pip list
Package Version
------------------ --------
appdirs 1.4.3
attrs 19.3.0
black 19.10b0
Click 7.0
importlib-metadata 1.5.0
jsonpatch 1.25
jsonpointer 2.0
jsonschema 3.2.0
numpy 1.18.1
pathspec 0.7.0
pip 20.0.2
pkg-resources 0.0.0
pyhf 0.4.0
pyrsistent 0.15.7
PyYAML 5.3
regex 2020.1.8
scipy 1.4.1
setuptools 45.1.0
six 1.14.0
toml 0.10.0
tqdm 4.42.1
typed-ast 1.4.1
wheel 0.34.2
zipp 2.1.0
Code
# example.py
import pyhf
from pyhf import Model, infer
def main():
signal=[0.00000000e+00,2.16147594e-04,4.26391320e-04,8.53157029e-04,
7.95947245e-04,1.85458682e-03,3.15515589e-03,4.22895664e-03,
4.65887617e-03,7.35380863e-03,8.71947686e-03,7.94697901e-03,
1.02721341e-02,9.24346489e-03,9.38926633e-03,9.68742497e-03,
8.11072856e-03,7.71003446e-03,6.80873211e-03,5.43234586e-03,
4.98376829e-03,4.72218222e-03,3.40645378e-03,3.44950579e-03,
2.61473009e-03,2.18345641e-03,2.00960464e-03,1.33786215e-03,
1.18440675e-03,8.36366201e-04,5.99855228e-04,4.27406780e-04,
2.71607026e-04,1.81370902e-04,1.03710513e-04,4.42737056e-05,
2.25835175e-05,1.04470885e-05,4.08162922e-06,3.20004812e-06,
3.37990384e-07,6.72843977e-07,0.00000000e+00,9.08675772e-08,
0.00000000e+00]
bkgrd=[1.47142981e+03,9.07095061e+02,9.11188195e+02,7.06123452e+02,
6.08054685e+02,5.23577562e+02,4.41672633e+02,4.00423307e+02,
3.59576067e+02,3.26368076e+02,2.88077216e+02,2.48887339e+02,
2.20355981e+02,1.91623853e+02,1.57733823e+02,1.32733279e+02,
1.12789438e+02,9.53141118e+01,8.15735557e+01,6.89604141e+01,
5.64245978e+01,4.49094779e+01,3.95547919e+01,3.13005748e+01,
2.55212288e+01,1.93057913e+01,1.48268648e+01,1.13639821e+01,
8.64408136e+00,5.81608649e+00,3.98839138e+00,2.61636610e+00,
1.55906281e+00,1.08550560e+00,5.57450828e-01,2.25258250e-01,
2.05230728e-01,1.28735312e-01,6.13798028e-02,2.00805073e-02,
5.91436617e-02,0.00000000e+00,0.00000000e+00,0.00000000e+00,
0.00000000e+00]
spec = {
"channels": [
{
"name": "singlechannel",
"samples": [
{
"name": "signal",
"data": signal,
"modifiers": [
{"name": "mu", "type": "normfactor", "data": None}
],
},
{"name": "background", "data": bkgrd, "modifiers": [],},
],
}
]
}
model = pyhf.Model(spec)
hypo_tests = pyhf.infer.hypotest(
1.0,
model.expected_data([0]),
model,
0.5,
[(0, 80)],
return_expected_set=True,
return_test_statistics=True,
qtilde=True,
)
print(hypo_tests)
if __name__ == "__main__":
main()
Error
(example) $ python example.py
/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/tensor/numpy_backend.py:253: RuntimeWarning: divide by zero encountered in log
return n * np.log(lam) - lam - gammaln(n + 1.0)
/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/tensor/numpy_backend.py:253: RuntimeWarning: invalid value encountered in multiply
return n * np.log(lam) - lam - gammaln(n + 1.0)
ERROR:pyhf.optimize.opt_scipy: fun: nan
jac: array([nan])
message: 'Iteration limit exceeded'
nfev: 1300003
nit: 100001
njev: 100001
status: 9
success: False
x: array([0.499995])
Traceback (most recent call last):
File "example.py", line 65, in <module>
main()
File "example.py", line 59, in main
qtilde=True,
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/infer/__init__.py", line 82, in hypotest
asimov_data = generate_asimov_data(asimov_mu, data, pdf, init_pars, par_bounds)
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/infer/utils.py", line 8, in generate_asimov_data
bestfit_nuisance_asimov = fixed_poi_fit(asimov_mu, data, pdf, init_pars, par_bounds)
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/infer/mle.py", line 62, in fixed_poi_fit
**kwargs,
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/optimize/opt_scipy.py", line 47, in minimize
assert result.success
AssertionError
Looking at the model, the background estimate shouldn't be zero, so add an epsilon of 1e-7 to it and then an 1% background uncertainty. Though the issue here is that reasonable intervals for signal strength are between μ ∈ [0,10]. If your model is such that you aren't sensitive to a signal strength in this range then you should test a new signal model which is the original signal scaled by some scale factor.
Environment
For visualization purposes let's extend the environment a bit
(example) $ cat requirements.txt
pyhf~=0.4.0
black
matplotlib~=3.1
altair~=4.0
Code
# answer.py
import pyhf
from pyhf import Model, infer
import numpy as np
import matplotlib.pyplot as plt
import pyhf.contrib.viz.brazil
def invert_interval(test_mus, hypo_tests, test_size=0.05):
cls_obs = np.array([test[0] for test in hypo_tests]).flatten()
cls_exp = [
np.array([test[1][i] for test in hypo_tests]).flatten() for i in range(5)
]
crossing_test_stats = {"exp": [], "obs": None}
for cls_exp_sigma in cls_exp:
crossing_test_stats["exp"].append(
np.interp(
test_size, list(reversed(cls_exp_sigma)), list(reversed(test_mus))
)
)
crossing_test_stats["obs"] = np.interp(
test_size, list(reversed(cls_obs)), list(reversed(test_mus))
)
return crossing_test_stats
def main():
unscaled_signal=[0.00000000e+00,2.16147594e-04,4.26391320e-04,8.53157029e-04,
7.95947245e-04,1.85458682e-03,3.15515589e-03,4.22895664e-03,
4.65887617e-03,7.35380863e-03,8.71947686e-03,7.94697901e-03,
1.02721341e-02,9.24346489e-03,9.38926633e-03,9.68742497e-03,
8.11072856e-03,7.71003446e-03,6.80873211e-03,5.43234586e-03,
4.98376829e-03,4.72218222e-03,3.40645378e-03,3.44950579e-03,
2.61473009e-03,2.18345641e-03,2.00960464e-03,1.33786215e-03,
1.18440675e-03,8.36366201e-04,5.99855228e-04,4.27406780e-04,
2.71607026e-04,1.81370902e-04,1.03710513e-04,4.42737056e-05,
2.25835175e-05,1.04470885e-05,4.08162922e-06,3.20004812e-06,
3.37990384e-07,6.72843977e-07,0.00000000e+00,9.08675772e-08,
0.00000000e+00]
bkgrd=[1.47142981e+03,9.07095061e+02,9.11188195e+02,7.06123452e+02,
6.08054685e+02,5.23577562e+02,4.41672633e+02,4.00423307e+02,
3.59576067e+02,3.26368076e+02,2.88077216e+02,2.48887339e+02,
2.20355981e+02,1.91623853e+02,1.57733823e+02,1.32733279e+02,
1.12789438e+02,9.53141118e+01,8.15735557e+01,6.89604141e+01,
5.64245978e+01,4.49094779e+01,3.95547919e+01,3.13005748e+01,
2.55212288e+01,1.93057913e+01,1.48268648e+01,1.13639821e+01,
8.64408136e+00,5.81608649e+00,3.98839138e+00,2.61636610e+00,
1.55906281e+00,1.08550560e+00,5.57450828e-01,2.25258250e-01,
2.05230728e-01,1.28735312e-01,6.13798028e-02,2.00805073e-02,
5.91436617e-02,0.00000000e+00,0.00000000e+00,0.00000000e+00,
0.00000000e+00]
scale_factor = 500
signal = np.asarray(unscaled_signal) * scale_factor
epsilon = 1e-7
background = np.asarray(bkgrd) + epsilon
spec = {
"channels": [
{
"name": "singlechannel",
"samples": [
{
"name": "signal",
"data": signal.tolist(),
"modifiers": [
{"name": "mu", "type": "normfactor", "data": None}
],
},
{
"name": "background",
"data": background.tolist(),
"modifiers": [
{
"name": "uncert",
"type": "shapesys",
"data": (0.01 * background).tolist(),
},
],
},
],
}
]
}
model = pyhf.Model(spec)
init_pars = model.config.suggested_init()
par_bounds = model.config.suggested_bounds()
data = model.expected_data(init_pars)
cls_obs, cls_exp = pyhf.infer.hypotest(
1.0,
data,
model,
init_pars,
par_bounds,
return_expected_set=True,
return_test_statistics=True,
qtilde=True,
)
# Show that the scale factor chosen gives reasonable values
print(f"Observed CLs for µ=1: {cls_obs[0]:.2f}")
print("-----")
for idx, n_sigma in enumerate(np.arange(-2, 3)):
print(
"Expected {}CLs for µ=1: {:.3f}".format(
" " if n_sigma == 0 else "({} σ) ".format(n_sigma),
cls_exp[idx][0],
)
)
# Perform hypothesis test scan
_start = 0.1
_stop = 4
_step = 0.1
poi_tests = np.arange(_start, _stop + _step, _step)
print("\nPerforming hypothesis tests\n")
hypo_tests = [
pyhf.infer.hypotest(
mu_test,
data,
model,
init_pars,
par_bounds,
return_expected_set=True,
return_test_statistics=True,
qtilde=True,
)
for mu_test in poi_tests
]
# This is all you need. Below is just to demonstrate.
# Upper limits on signal strength
results = invert_interval(poi_tests, hypo_tests)
print(f"Observed Limit on µ: {results['obs']:.2f}")
print("-----")
for idx, n_sigma in enumerate(np.arange(-2, 3)):
print(
"Expected {}Limit on µ: {:.3f}".format(
" " if n_sigma == 0 else "({} σ) ".format(n_sigma),
results["exp"][idx],
)
)
# Visualize the "Brazil band"
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
ax.set_title("Hypothesis Tests")
ax.set_ylabel("CLs")
ax.set_xlabel(f"µ (for Signal x {scale_factor})")
pyhf.contrib.viz.brazil.plot_results(ax, poi_tests, hypo_tests)
fig.savefig("brazil_band.pdf")
if __name__ == "__main__":
main()
Output
The value that the signal needs to be scaled by can be determined by just trying a few scale factor values until the CLs values for a signal strength of mu=1 begin to look reasonable (something larger than 1e-3 or so). In this particular example, a scale factor of 500 seems okay.
The upper limit on the unscaled signal strength is then just the observed limit divided by the scale factor, which in this case there is obviously no sensitivity.
(example) $ python answer.py
Observed CLs for µ=1: 0.54
-----
Expected (-2 σ) CLs for µ=1: 0.014
Expected (-1 σ) CLs for µ=1: 0.049
Expected CLs for µ=1: 0.157
Expected (1 σ) CLs for µ=1: 0.403
Expected (2 σ) CLs for µ=1: 0.737
Performing hypothesis tests
Observed Limit on µ: 2.22
-----
Expected (-2 σ) Limit on µ: 0.746
Expected (-1 σ) Limit on µ: 0.998
Expected Limit on µ: 1.392
Expected (1 σ) Limit on µ: 1.953
Expected (2 σ) Limit on µ: 2.638

ONNXRuntime Issue: Output:Y [ShapeInferenceError] Mismatch between number of source and target dimensions

I am trying to build a onnx graph using helper APIs. The simplest example I started is the following. A MatMul op that takes two [1] matrix inputs (X and W), and produces [1] matrix output Y.
import numpy as np
import onnxruntime as rt
from onnx import *
from onnxmltools.utils import save_mode
initializer = []
initializer.append(helper.make_tensor(name="W", data_type=TensorProto.FLOAT, dims=(1,), vals=np.ones(1).tolist()))
graph = helper.make_graph(
[
helper.make_node('MatMul', ["X", "W"], ["Y"]),
],
"TEST",
[
helper.make_tensor_value_info('X' , TensorProto.FLOAT, [1]),
helper.make_tensor_value_info('W', TensorProto.FLOAT, [1]),
],
[
helper.make_tensor_value_info('Y', TensorProto.FLOAT, [1]),
],
initializer=initializer,
)
checker.check_graph(graph)
model = helper.make_model(graph, producer_name='TEST')
save_model(model, "model.onnx")
sess = rt.InferenceSession('model.onnx')
When I ran this, it complains like this:
Traceback (most recent call last):
File "onnxruntime_test.py", line 35, in <module>
sess = rt.InferenceSession('model.onnx')
File "/usr/local/lib/python3.5/dist-packages/onnxruntime/capi/session.py", line 29, in __init__
self._sess.load_model(path_or_bytes)
RuntimeError: [ONNXRuntimeError] : 1 : GENERAL ERROR : Node: Output:Y [ShapeInferenceError] Mismatch between number of source and target dimensions. Source=0 Target=1
I am stuck here for hours. Could anybody please give me any help?
See https://github.com/microsoft/onnxruntime/issues/380
I changed a few place to make your code work. Below is the new one
import numpy as np
import onnxruntime as rt
from onnx import *
from onnx import utils
initializer = []
initializer.append(helper.make_tensor(name="W", data_type=TensorProto.FLOAT, dims=(1,), vals=np.ones(1).tolist()))
graph = helper.make_graph(
[
helper.make_node('MatMul', ["X", "W"], ["Y"]),
],
"TEST",
[
helper.make_tensor_value_info('X' , TensorProto.FLOAT, [1]),
helper.make_tensor_value_info('W', TensorProto.FLOAT, [1]),
],
[
helper.make_tensor_value_info('Y', TensorProto.FLOAT, []),
],
initializer=initializer,
)
checker.check_graph(graph)
model = helper.make_model(graph, producer_name='TEST')
final_model = onnx.utils.polish_model(model)
onnx.save(final_model, 'model.onnx')
sess = rt.InferenceSession('model.onnx')
To represent a scalar, you should use shape of "[]" ,not "[1]".

Resources