ONNXRuntime Issue: Output:Y [ShapeInferenceError] Mismatch between number of source and target dimensions - onnx

I am trying to build a onnx graph using helper APIs. The simplest example I started is the following. A MatMul op that takes two [1] matrix inputs (X and W), and produces [1] matrix output Y.
import numpy as np
import onnxruntime as rt
from onnx import *
from onnxmltools.utils import save_mode
initializer = []
initializer.append(helper.make_tensor(name="W", data_type=TensorProto.FLOAT, dims=(1,), vals=np.ones(1).tolist()))
graph = helper.make_graph(
[
helper.make_node('MatMul', ["X", "W"], ["Y"]),
],
"TEST",
[
helper.make_tensor_value_info('X' , TensorProto.FLOAT, [1]),
helper.make_tensor_value_info('W', TensorProto.FLOAT, [1]),
],
[
helper.make_tensor_value_info('Y', TensorProto.FLOAT, [1]),
],
initializer=initializer,
)
checker.check_graph(graph)
model = helper.make_model(graph, producer_name='TEST')
save_model(model, "model.onnx")
sess = rt.InferenceSession('model.onnx')
When I ran this, it complains like this:
Traceback (most recent call last):
File "onnxruntime_test.py", line 35, in <module>
sess = rt.InferenceSession('model.onnx')
File "/usr/local/lib/python3.5/dist-packages/onnxruntime/capi/session.py", line 29, in __init__
self._sess.load_model(path_or_bytes)
RuntimeError: [ONNXRuntimeError] : 1 : GENERAL ERROR : Node: Output:Y [ShapeInferenceError] Mismatch between number of source and target dimensions. Source=0 Target=1
I am stuck here for hours. Could anybody please give me any help?

See https://github.com/microsoft/onnxruntime/issues/380
I changed a few place to make your code work. Below is the new one
import numpy as np
import onnxruntime as rt
from onnx import *
from onnx import utils
initializer = []
initializer.append(helper.make_tensor(name="W", data_type=TensorProto.FLOAT, dims=(1,), vals=np.ones(1).tolist()))
graph = helper.make_graph(
[
helper.make_node('MatMul', ["X", "W"], ["Y"]),
],
"TEST",
[
helper.make_tensor_value_info('X' , TensorProto.FLOAT, [1]),
helper.make_tensor_value_info('W', TensorProto.FLOAT, [1]),
],
[
helper.make_tensor_value_info('Y', TensorProto.FLOAT, []),
],
initializer=initializer,
)
checker.check_graph(graph)
model = helper.make_model(graph, producer_name='TEST')
final_model = onnx.utils.polish_model(model)
onnx.save(final_model, 'model.onnx')
sess = rt.InferenceSession('model.onnx')
To represent a scalar, you should use shape of "[]" ,not "[1]".

Related

Incompatible row dimensions when using passthrough in GridSearch over sklearn Pipeline with FeatureUnion

I am trying to do grid search over a sklearn pipeline that uses a custom transformer in a pipeline with FeatureUnion. It works fine when the pipeline uses the custom transformer class in FeatureUnion; however, it fails when the custom class is ignored in the pipeline by setting passthrough in the grid search parameters.
The full pipeline is defined as follows:
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline, FeatureUnion
ngram_vectorizer = Pipeline([
("vectorizer", CountVectorizer(analyzer="char_wb", ngram_range=(1,3))),
("tfidf", TfidfTransformer())
])
pipe_full = Pipeline(
[
("features", FeatureUnion(
[
("ngrams", ngram_vectorizer),
("lengths", TextLengthExtractor())
]
)
),
("classifier", MultinomialNB())
]
)
The custom transformer class TextLengthExtractor simply computes the number of characters from an input string:
from sklearn.base import BaseEstimator, TransformerMixin
class TextLengthExtractor(BaseEstimator, TransformerMixin):
def fit(self, X, y = None):
return self
def transform(self, X, y = None):
string_lengths = np.array([len(doc) for doc in X])
return string_lengths.reshape(-1,1)
The tuning parameters for grid search are defined through a dictionary params. Importantly, the parameters for the custom TextLengthExtractor contain the passthrough option to ignore the entire features__lengths step from the pipeline (see also the sklearn's documentation on pipelines):
params = {
"features__lengths": [TextLengthExtractor(), "passthrough"],
"features__ngrams__vectorizer__ngram_range" : [(1,3), (2,6)],
}
When the pipeline is fit on the following dummy data
X_train_dummy = ["a", "ab", "a bc", "aaaaa", "b ab cc b", "ba", "baba", "cc bb aa", "c", "bca"]
y_train_dummy = [1,0,1, 1, 0, 1, 0, 1, 0, 0]
pipe_full.fit(X_train_dummy, y_train_dummy)
it can be seen that the lengths step of the FeatureUnion pipeline works as expected:
pipe_full["features"].get_params()["lengths"].transform(X_train_dummy)
# gives the following output of shape (10,1)
# array([[1], [2], [4], [5], [9], [2], [4], [8], [1], [3]])
However - and now comes the problem - when grid search is performed as follows:
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(pipe_full, params, cv=5, n_jobs=-1, verbose=10)
grid_search.fit(X_train_dummy, y_train_dummy)
all fits that ignore the lengths step (as defined by the passthrough option from params["features__lengths"] throw the following error:
5 fits failed out of a total of 10.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
File "C:\dev\NameClassification\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "C:\dev\NameClassification\venv\lib\site-packages\sklearn\pipeline.py", line 378, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "C:\dev\NameClassification\venv\lib\site-packages\sklearn\pipeline.py", line 336, in _fit
X, fitted_transformer = fit_transform_one_cached(
File "C:\dev\NameClassification\venv\lib\site-packages\joblib\memory.py", line 349, in __call__
return self.func(*args, **kwargs)
File "C:\dev\NameClassification\venv\lib\site-packages\sklearn\pipeline.py", line 870, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "C:\dev\NameClassification\venv\lib\site-packages\sklearn\pipeline.py", line 1162, in fit_transform
return self._hstack(Xs)
File "C:\dev\NameClassification\venv\lib\site-packages\sklearn\pipeline.py", line 1216, in _hstack
Xs = sparse.hstack(Xs).tocsr()
File "C:\dev\NameClassification\venv\lib\site-packages\scipy\sparse\_construct.py", line 532, in hstack
return bmat([blocks], format=format, dtype=dtype)
File "C:\dev\NameClassification\venv\lib\site-packages\scipy\sparse\_construct.py", line 665, in bmat
raise ValueError(msg)
ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,1].shape[0] == 1, expected 8.
I do understand that both steps require identical row dimensions for both ngrams and lengths in the FeatureUnion, where the number of rows in the extracted feature matrices must equal the number of samples in the respective split. However, I have no idea how to control the shape of matrices when ignoring the lengths part of FeatureUnion using the passthrough option in the gird search params.
I have found any solution to the problem on SE or any other sklearn related resource. Does anyone have an idea on how to solve the issue?
I think I found the solution to the problem: To ignore an individual step in a FeatureUnion, the string drop rather than passthrough must be used. According to sklearn's documentation of FeatureUnion:
Parameters of the transformers may be set using its name and the parameter name separated by a '__'. A transformer may be replaced entirely by setting the parameter with its name to another transformer, removed by setting to 'drop' or disabled by setting to 'passthrough' (features are passed without transformation).
An example of dropping an entire transformer in FeatureUnion is also shown in sklearn's user guide on pipelines.
In conclusion, to solve my problem, I had to replace passthrough with drop in the grid search parameter dictionary as follows
Change from
params = {
"features__lengths": [TextLengthExtractor(), "passthrough"],
"features__ngrams__vectorizer__ngram_range" : [(1,3), (2,6)],
}
to
params = {
"features__lengths": [TextLengthExtractor(), "drop"],
"features__ngrams__vectorizer__ngram_range" : [(1,3), (2,6)],
}

Is this a bug in xgboost's XGBClassifier?

import numpy as np
from xgboost import XGBClassifier
model = XGBClassifier(
use_label_encoder=False,
label_lower_bound=0, label_upper_bound=1
# setting the bounds doesn't seem to help
)
x = np.array([ [1,2,3], [4,5,6] ], 'ushort' )
y = [ 1, 1 ]
try :
model.fit(x,y)
# this fails with ValueError:
# "The label must consist of integer labels
# of form 0, 1, 2, ..., [num_class - 1]."
except Exception as e :
print(e)
y = [ 0, 0 ]
# this works
model.fit(x,y)
model = XGBClassifier()
y = [ 1, 1 ]
# this works, but with UserWarning:
# "The use of label encoder in XGBClassifier is deprecated, etc."
model.fit(x,y)
Seems to me like label encoder is deprecated but we are FORCED to use it, if our classifications don't happen to contain a zero.
I had the same problem. I solved using use_label_encoder=False as parameter and the warning message disappear.
I think in your case the problem is that you have only 1 in your y, but XGBoost wants the target starting from 0. If you change y = [ 1, 1 ] with y = [ 0, 0 ] the UserWarning should disappear.

Fit convergence failure in pyhf for small signal model

(This is a question that we (the pyhf dev team) recently got and thought was good and worth sharing. So we're posting a modified version of it here.)
I am trying to do a simple hypothesis test with pyhf v0.4.0. The model I am using has a small signal and so I need to scan signal strengths almost all the way out to mu=100. However, I am consistently getting a convergence problem. Why is the fit failing to converge?
The following is my environment, the code I'm using, and my error.
Environment
$ "$(which python3)" --version
Python 3.7.5
$ python3 -m venv "${HOME}/.venvs/example"
$ . "${HOME}/.venvs/example/bin/activate"
(example) $ python -m pip install --upgrade pip setuptools wheel
(example) $ cat requirements.txt
pyhf~=0.4.0
black
(example) $ python -m pip install -r requirements.txt
(example) $ pip list
Package Version
------------------ --------
appdirs 1.4.3
attrs 19.3.0
black 19.10b0
Click 7.0
importlib-metadata 1.5.0
jsonpatch 1.25
jsonpointer 2.0
jsonschema 3.2.0
numpy 1.18.1
pathspec 0.7.0
pip 20.0.2
pkg-resources 0.0.0
pyhf 0.4.0
pyrsistent 0.15.7
PyYAML 5.3
regex 2020.1.8
scipy 1.4.1
setuptools 45.1.0
six 1.14.0
toml 0.10.0
tqdm 4.42.1
typed-ast 1.4.1
wheel 0.34.2
zipp 2.1.0
Code
# example.py
import pyhf
from pyhf import Model, infer
def main():
signal=[0.00000000e+00,2.16147594e-04,4.26391320e-04,8.53157029e-04,
7.95947245e-04,1.85458682e-03,3.15515589e-03,4.22895664e-03,
4.65887617e-03,7.35380863e-03,8.71947686e-03,7.94697901e-03,
1.02721341e-02,9.24346489e-03,9.38926633e-03,9.68742497e-03,
8.11072856e-03,7.71003446e-03,6.80873211e-03,5.43234586e-03,
4.98376829e-03,4.72218222e-03,3.40645378e-03,3.44950579e-03,
2.61473009e-03,2.18345641e-03,2.00960464e-03,1.33786215e-03,
1.18440675e-03,8.36366201e-04,5.99855228e-04,4.27406780e-04,
2.71607026e-04,1.81370902e-04,1.03710513e-04,4.42737056e-05,
2.25835175e-05,1.04470885e-05,4.08162922e-06,3.20004812e-06,
3.37990384e-07,6.72843977e-07,0.00000000e+00,9.08675772e-08,
0.00000000e+00]
bkgrd=[1.47142981e+03,9.07095061e+02,9.11188195e+02,7.06123452e+02,
6.08054685e+02,5.23577562e+02,4.41672633e+02,4.00423307e+02,
3.59576067e+02,3.26368076e+02,2.88077216e+02,2.48887339e+02,
2.20355981e+02,1.91623853e+02,1.57733823e+02,1.32733279e+02,
1.12789438e+02,9.53141118e+01,8.15735557e+01,6.89604141e+01,
5.64245978e+01,4.49094779e+01,3.95547919e+01,3.13005748e+01,
2.55212288e+01,1.93057913e+01,1.48268648e+01,1.13639821e+01,
8.64408136e+00,5.81608649e+00,3.98839138e+00,2.61636610e+00,
1.55906281e+00,1.08550560e+00,5.57450828e-01,2.25258250e-01,
2.05230728e-01,1.28735312e-01,6.13798028e-02,2.00805073e-02,
5.91436617e-02,0.00000000e+00,0.00000000e+00,0.00000000e+00,
0.00000000e+00]
spec = {
"channels": [
{
"name": "singlechannel",
"samples": [
{
"name": "signal",
"data": signal,
"modifiers": [
{"name": "mu", "type": "normfactor", "data": None}
],
},
{"name": "background", "data": bkgrd, "modifiers": [],},
],
}
]
}
model = pyhf.Model(spec)
hypo_tests = pyhf.infer.hypotest(
1.0,
model.expected_data([0]),
model,
0.5,
[(0, 80)],
return_expected_set=True,
return_test_statistics=True,
qtilde=True,
)
print(hypo_tests)
if __name__ == "__main__":
main()
Error
(example) $ python example.py
/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/tensor/numpy_backend.py:253: RuntimeWarning: divide by zero encountered in log
return n * np.log(lam) - lam - gammaln(n + 1.0)
/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/tensor/numpy_backend.py:253: RuntimeWarning: invalid value encountered in multiply
return n * np.log(lam) - lam - gammaln(n + 1.0)
ERROR:pyhf.optimize.opt_scipy: fun: nan
jac: array([nan])
message: 'Iteration limit exceeded'
nfev: 1300003
nit: 100001
njev: 100001
status: 9
success: False
x: array([0.499995])
Traceback (most recent call last):
File "example.py", line 65, in <module>
main()
File "example.py", line 59, in main
qtilde=True,
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/infer/__init__.py", line 82, in hypotest
asimov_data = generate_asimov_data(asimov_mu, data, pdf, init_pars, par_bounds)
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/infer/utils.py", line 8, in generate_asimov_data
bestfit_nuisance_asimov = fixed_poi_fit(asimov_mu, data, pdf, init_pars, par_bounds)
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/infer/mle.py", line 62, in fixed_poi_fit
**kwargs,
File "/home/jovyan/.venvs/example/lib/python3.7/site-packages/pyhf/optimize/opt_scipy.py", line 47, in minimize
assert result.success
AssertionError
Looking at the model, the background estimate shouldn't be zero, so add an epsilon of 1e-7 to it and then an 1% background uncertainty. Though the issue here is that reasonable intervals for signal strength are between μ ∈ [0,10]. If your model is such that you aren't sensitive to a signal strength in this range then you should test a new signal model which is the original signal scaled by some scale factor.
Environment
For visualization purposes let's extend the environment a bit
(example) $ cat requirements.txt
pyhf~=0.4.0
black
matplotlib~=3.1
altair~=4.0
Code
# answer.py
import pyhf
from pyhf import Model, infer
import numpy as np
import matplotlib.pyplot as plt
import pyhf.contrib.viz.brazil
def invert_interval(test_mus, hypo_tests, test_size=0.05):
cls_obs = np.array([test[0] for test in hypo_tests]).flatten()
cls_exp = [
np.array([test[1][i] for test in hypo_tests]).flatten() for i in range(5)
]
crossing_test_stats = {"exp": [], "obs": None}
for cls_exp_sigma in cls_exp:
crossing_test_stats["exp"].append(
np.interp(
test_size, list(reversed(cls_exp_sigma)), list(reversed(test_mus))
)
)
crossing_test_stats["obs"] = np.interp(
test_size, list(reversed(cls_obs)), list(reversed(test_mus))
)
return crossing_test_stats
def main():
unscaled_signal=[0.00000000e+00,2.16147594e-04,4.26391320e-04,8.53157029e-04,
7.95947245e-04,1.85458682e-03,3.15515589e-03,4.22895664e-03,
4.65887617e-03,7.35380863e-03,8.71947686e-03,7.94697901e-03,
1.02721341e-02,9.24346489e-03,9.38926633e-03,9.68742497e-03,
8.11072856e-03,7.71003446e-03,6.80873211e-03,5.43234586e-03,
4.98376829e-03,4.72218222e-03,3.40645378e-03,3.44950579e-03,
2.61473009e-03,2.18345641e-03,2.00960464e-03,1.33786215e-03,
1.18440675e-03,8.36366201e-04,5.99855228e-04,4.27406780e-04,
2.71607026e-04,1.81370902e-04,1.03710513e-04,4.42737056e-05,
2.25835175e-05,1.04470885e-05,4.08162922e-06,3.20004812e-06,
3.37990384e-07,6.72843977e-07,0.00000000e+00,9.08675772e-08,
0.00000000e+00]
bkgrd=[1.47142981e+03,9.07095061e+02,9.11188195e+02,7.06123452e+02,
6.08054685e+02,5.23577562e+02,4.41672633e+02,4.00423307e+02,
3.59576067e+02,3.26368076e+02,2.88077216e+02,2.48887339e+02,
2.20355981e+02,1.91623853e+02,1.57733823e+02,1.32733279e+02,
1.12789438e+02,9.53141118e+01,8.15735557e+01,6.89604141e+01,
5.64245978e+01,4.49094779e+01,3.95547919e+01,3.13005748e+01,
2.55212288e+01,1.93057913e+01,1.48268648e+01,1.13639821e+01,
8.64408136e+00,5.81608649e+00,3.98839138e+00,2.61636610e+00,
1.55906281e+00,1.08550560e+00,5.57450828e-01,2.25258250e-01,
2.05230728e-01,1.28735312e-01,6.13798028e-02,2.00805073e-02,
5.91436617e-02,0.00000000e+00,0.00000000e+00,0.00000000e+00,
0.00000000e+00]
scale_factor = 500
signal = np.asarray(unscaled_signal) * scale_factor
epsilon = 1e-7
background = np.asarray(bkgrd) + epsilon
spec = {
"channels": [
{
"name": "singlechannel",
"samples": [
{
"name": "signal",
"data": signal.tolist(),
"modifiers": [
{"name": "mu", "type": "normfactor", "data": None}
],
},
{
"name": "background",
"data": background.tolist(),
"modifiers": [
{
"name": "uncert",
"type": "shapesys",
"data": (0.01 * background).tolist(),
},
],
},
],
}
]
}
model = pyhf.Model(spec)
init_pars = model.config.suggested_init()
par_bounds = model.config.suggested_bounds()
data = model.expected_data(init_pars)
cls_obs, cls_exp = pyhf.infer.hypotest(
1.0,
data,
model,
init_pars,
par_bounds,
return_expected_set=True,
return_test_statistics=True,
qtilde=True,
)
# Show that the scale factor chosen gives reasonable values
print(f"Observed CLs for µ=1: {cls_obs[0]:.2f}")
print("-----")
for idx, n_sigma in enumerate(np.arange(-2, 3)):
print(
"Expected {}CLs for µ=1: {:.3f}".format(
" " if n_sigma == 0 else "({} σ) ".format(n_sigma),
cls_exp[idx][0],
)
)
# Perform hypothesis test scan
_start = 0.1
_stop = 4
_step = 0.1
poi_tests = np.arange(_start, _stop + _step, _step)
print("\nPerforming hypothesis tests\n")
hypo_tests = [
pyhf.infer.hypotest(
mu_test,
data,
model,
init_pars,
par_bounds,
return_expected_set=True,
return_test_statistics=True,
qtilde=True,
)
for mu_test in poi_tests
]
# This is all you need. Below is just to demonstrate.
# Upper limits on signal strength
results = invert_interval(poi_tests, hypo_tests)
print(f"Observed Limit on µ: {results['obs']:.2f}")
print("-----")
for idx, n_sigma in enumerate(np.arange(-2, 3)):
print(
"Expected {}Limit on µ: {:.3f}".format(
" " if n_sigma == 0 else "({} σ) ".format(n_sigma),
results["exp"][idx],
)
)
# Visualize the "Brazil band"
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
ax.set_title("Hypothesis Tests")
ax.set_ylabel("CLs")
ax.set_xlabel(f"µ (for Signal x {scale_factor})")
pyhf.contrib.viz.brazil.plot_results(ax, poi_tests, hypo_tests)
fig.savefig("brazil_band.pdf")
if __name__ == "__main__":
main()
Output
The value that the signal needs to be scaled by can be determined by just trying a few scale factor values until the CLs values for a signal strength of mu=1 begin to look reasonable (something larger than 1e-3 or so). In this particular example, a scale factor of 500 seems okay.
The upper limit on the unscaled signal strength is then just the observed limit divided by the scale factor, which in this case there is obviously no sensitivity.
(example) $ python answer.py
Observed CLs for µ=1: 0.54
-----
Expected (-2 σ) CLs for µ=1: 0.014
Expected (-1 σ) CLs for µ=1: 0.049
Expected CLs for µ=1: 0.157
Expected (1 σ) CLs for µ=1: 0.403
Expected (2 σ) CLs for µ=1: 0.737
Performing hypothesis tests
Observed Limit on µ: 2.22
-----
Expected (-2 σ) Limit on µ: 0.746
Expected (-1 σ) Limit on µ: 0.998
Expected Limit on µ: 1.392
Expected (1 σ) Limit on µ: 1.953
Expected (2 σ) Limit on µ: 2.638

pytorch how to select channels by mask?

I want to know how do I select channels by the mask in Pytorch.
[channel1 channel2 channel3 channel4] x [1,0,0,1] --> [channel1,channel4]
I tried torch.masked_select() and it did't work.
if the input has a shape like [B,C,H,W] the output's shape should be [B,masked_C,H,W],
import torch
from torch import nn
input = torch.randn((1,5,3,3))
pool = nn.AdaptiveAvgPool2d(1)
w = torch.sigmoid(pool(input)).view(1,-1)
mask = torch.gt(w,0.5)
print(input)
print(w)
print(mask)
the output is as following:
tensor([[[[ 0.9129, -0.9763, 1.4460],
[ 0.3608, 0.5561, -1.4612],
[ 1.4953, -1.2474, 0.4069]],
[[-0.9121, 0.1261, 0.4661],
[-1.1624, -1.0266, -1.5419],
[ 1.0644, 1.0039, -0.4022]],
[[-1.8454, -0.2150, 2.3703],
[ 0.5224, 0.3366, 1.7545],
[-0.4624, 1.2639, 1.8032]],
[[-1.1558, -1.9985, -1.1336],
[-0.4400, -0.2092, 0.0677],
[-0.4172, -0.3614, -1.3193]],
[[-0.9441, -0.2944, 0.3381],
[ 1.6562, -0.5623, 0.0599],
[ 0.7229, 0.0472, -0.5122]]]])
tensor([[0.5414, 0.4341, 0.6489, 0.3156, 0.5142]])
tensor([[1, 0, 1, 0, 1]], dtype=torch.uint8)
the result I want is like this:
tensor([[[[ 0.9129, -0.9763, 1.4460],
[ 0.3608, 0.5561, -1.4612],
[ 1.4953, -1.2474, 0.4069]],
[[-1.8454, -0.2150, 2.3703],
[ 0.5224, 0.3366, 1.7545],
[-0.4624, 1.2639, 1.8032]],
[[-0.9441, -0.2944, 0.3381],
[ 1.6562, -0.5623, 0.0599],
[ 0.7229, 0.0472, -0.5122]]]])
I believe you can simply do:
input[mask]
Btw. you don't need to call sigmoid and then .gt(0.5). You can directly do .gt(0.0) without calling the sigmoid.

Training a Random Forest on Tensorflow

I am trying to train a tensorflow based random forest regression on numerical and continuos data.
When I try to fit my estimator it begins with the message below:
INFO:tensorflow:Constructing forest with params =
INFO:tensorflow:{'num_trees': 10, 'max_nodes': 1000, 'bagging_fraction': 1.0, 'feature_bagging_fraction': 1.0, 'num_splits_to_consider': 10, 'max_fertile_nodes': 0, 'split_after_samples': 250, 'valid_leaf_threshold': 1, 'dominate_method': 'bootstrap', 'dominate_fraction': 0.99, 'model_name': 'all_dense', 'split_finish_name': 'basic', 'split_pruning_name': 'none', 'collate_examples': False, 'checkpoint_stats': False, 'use_running_stats_method': False, 'initialize_average_splits': False, 'inference_tree_paths': False, 'param_file': None, 'split_name': 'less_or_equal', 'early_finish_check_every_samples': 0, 'prune_every_samples': 0, 'feature_columns': [_NumericColumn(key='Average_Score', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), _NumericColumn(key='lat', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), _NumericColumn(key='lng', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)], 'num_classes': 1, 'num_features': 2, 'regression': True, 'bagged_num_features': 2, 'bagged_features': None, 'num_outputs': 1, 'num_output_columns': 2, 'base_random_seed': 0, 'leaf_model_type': 2, 'stats_model_type': 2, 'finish_type': 0, 'pruning_type': 0, 'split_type': 0}
Then the process breaks down and I get a value error below:
ValueError: Shape must be at least rank 2 but is rank 1 for 'concat' (op: 'ConcatV2') with input shapes: [?], [?], [?], [] and with computed input tensors: input[3] = <1>.
This is the code I am using:
import tensorflow as tf
from tensorflow.contrib.tensor_forest.python import tensor_forest
from tensorflow.python.ops import resources
import pandas as pd
from tensorflow.contrib.tensor_forest.client import random_forest
from tensorflow.python.estimator.inputs import numpy_io
import numpy as np
def getFeatures():
Average_Score = tf.feature_column.numeric_column('Average_Score')
lat = tf.feature_column.numeric_column('lat')
lng = tf.feature_column.numeric_column('lng')
return [Average_Score,lat ,lng]
# Import hotel data
Hotel_Reviews=pd.read_csv("./DataMining/Hotel_Reviews.csv")
Hotel_Reviews_Filtered=Hotel_Reviews[(Hotel_Reviews.lat.notnull() |
Hotel_Reviews.lng.notnull())]
Hotel_Reviews_Filtered_Target = Hotel_Reviews_Filtered[["Reviewer_Score"]]
Hotel_Reviews_Filtered_Features = Hotel_Reviews_Filtered[["Average_Score","lat","lng"]]
#Preprocess the data
x=Hotel_Reviews_Filtered_Features.to_dict('list')
for key in x:
x[key] = np.array(x[key])
y=Hotel_Reviews_Filtered_Target.values
#specify params
params = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams(
feature_colums= getFeatures(),
num_classes=1,
num_features=2,
regression=True,
num_trees=10,
max_nodes=1000)
#build the graph
graph_builder_class = tensor_forest.RandomForestGraphs
est=random_forest.TensorForestEstimator(
params, graph_builder_class=graph_builder_class)
#define input function
train_input_fn = numpy_io.numpy_input_fn(
x=x,
y=y,
batch_size=1000,
num_epochs=1,
shuffle=True)
est.fit(input_fn=train_input_fn, steps=500)
The variables x is a list of numpy array of shape (512470,):
{'Average_Score': array([ 7.7, 7.7, 7.7, ..., 8.1, 8.1, 8.1]),
'lat': array([ 52.3605759, 52.3605759, 52.3605759, ..., 48.2037451,
48.2037451, 48.2037451]),
'lng': array([ 4.9159683, 4.9159683, 4.9159683, ..., 16.3356767,
16.3356767, 16.3356767])}
The variable y is numpy array of shape (512470,1):
array([[ 2.9],
[ 7.5],
[ 7.1],
...,
[ 2.5],
[ 8.8],
[ 8.3]])
Force each array in x to be 2 dim using ndmin=2. Then the shapes should match and concat should be able to operate.

Resources