I want to use the sklearn AdaBoostRegressor
with different base estimators. The general AdaBoost introduction does not help too much, since they use the
DecisionTreeClassifier
Where do I find a list of all base possible base estimators?
Could I use a neural Network, too?
What qualifies the possible base estimators?
Any Regressor model which depicts the sklearn's RegressorMixin() can be fed as the base estimator.
Yes, you can use neural network or simple linear regressor as base estimators.
Related
I am working on a multiclass problem with six different classes and I am using OneVsRestClassifier.
I have then performed hyperparameter tuning with GridSearchCV and obtained the optimized classifier with clf.best_estimator_.
As far as I understand, this returns one set of the hyperparameters for the aggregated model/every base estimator.
Is there a way to perform hyperparameter tuning separately for each base estimator?
Sure, just reverse the order of the search and the multiclass wrapper:
one_class_clf = GridSearchCV(base_classifier, params, ...)
clf = OneVsRestClassifier(one_class_clf)
Fitting clf generates the one-vs-rest problems, and for each of those fits a copy of the grid-searched base_classifier.
The H2OSupportVectorMachineEstimator in H2O seems to only support "gaussian" as the value of the kernel_type parameter. Is there a way to train a linear SVM with H2O?
As you mentioned, based on the documentation (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/svm.html) currently there is no way to train linear SVM on H2O. Within linear models, I think it only has GLM (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glm.html).
In package tf.estimator, there's a lot of defined estimators. I want to use them in Keras.
I checked TF docs, there's only one converting method that could convert keras. Model to tf. estimator, but no way to convert from estimator to Model.
For example, if we want to convert the following estimator:
tf.estimator.DNNLinearCombinedRegressor
How could it be converted into Keras Model?
You cannot because estimators can run arbitrary code in their model_fn functions and Keras models must be much more structured, whether sequential or functional they must consist of layers, basically.
A Keras model is a very specific type of object that can therefore be easily wrapped and plugged into other abstractions.
Estimators are based on arbitrary Python code with arbitrary control flow and so it's quite tricky to force any structure onto them.
Estimators support 3 modes - train, eval and predict. Each of these could in theory have completely independent flows, with different weights, architectures etc. This is almost unthinkable in Keras and would essentially amount to 3 separate models.
Keras, in contrast, supports 2 modes - train and test (which is necessary for things like Dropout and Regularisation).
My goal is to make binary classification, using neural network.
The problem is that dataset is unbalanced, I have 90% of class 1 and 10 of class 0.
To deal with it I want to use Stratified cross-validation.
The problem that is I am working with Pytorch, I can't find any example and documentation doesn't provide it, and I'm student, quite new for neural networks.
Can anybody help?
Thank you!
The easiest way I've found is to do you stratified splits before passing your data to Pytorch Dataset and DataLoader. That lets you avoid having to port all your code to skorch, which can break compatibility with some cluster computing frameworks.
Have a look at skorch. It's a scikit-learn compatible neural network library that wraps PyTorch. It has a function CVSplit for cross validation or you can use sklearn.
From the docs:
net = NeuralNetClassifier(
module=MyModule,
train_split=None,
)
from sklearn.model_selection import cross_val_predict
y_pred = cross_val_predict(net, X, y, cv=5)
As far as I know, multi-label problem can be solved with one-vs-all scheme, for which Scikit-learn implements OneVsRestClassifier as a wrapper on classifier such as svm.SVC. I am wondering how would it be different if I literally train, say we have a multi-label problem with n classes, n individual binary classifiers for each label and thereby evaluate them separately.
I know it is like a "manual" way of implementing one-vs-all rather than using the wrapper, but are two ways actually different? If so, how are they different, like in execution time or performance of classifier(s)?
There would be no difference. For multi-label classification, sklearn one-versus-rest implements binary relevance which is what you have described.