Python => AutoML: XGBoost is not available; skipping it - python-3.x

I am getting this warning when running h2o AutoML. I have version 3.32.1.2 installed, and running it on python 3.8.
AutoML progress: |
11:30:52.773: AutoML: XGBoost is not available; skipping it.
CODE:
import h2o
h2o.init()
h2o_df = h2o.H2OFrame(df)
train, test = h2o_df.split_frame(ratios=[.75])
# Identify predictors and response
x = train.columns
y = "TERM_DEPOSIT"
x.remove(y)
from h2o.automl import H2OAutoML
aml = H2OAutoML(max_runtime_secs=600,
#exclude_algos=['DeepLearning'],
seed=1,
#stopping_metric='logloss',
#sort_metric='logloss',
balance_classes=False,
project_name='Completed'
)
%time aml.train(x=x, y=y, training_frame=train)

XGBoost is not supported on Windows, see the limitations in the H2O documentation.
If you are not using Windows and you didn't find another reason in the documentation mentioned above, you can try to reinstall h2o, e.g.,
pip install --force-reinstall https://h2o-release.s3.amazonaws.com/h2o/rel-zipf/2/Python/h2o-3.32.1.2-py2.py3-none-any.whl

I think I found the answer to this warning. I am running a windows machine.
https://twitter.com/ledell/status/1148512129659625472?lang=en
If you're on Windows, XGBoost is not supported 😿 so the parts of the tutorial that use XGBoost can be replaced by h2o.gbm(). The AutoML process will also exclude XGBoost models.

Related

Error while importing 'en_core_web_sm' for spacy in Azure Databricks

I am getting an error while loading 'en_core_web_sm' of spacy in Databricks notebook. I have seen a lot of other questions regarding the same, but they are of no help.
The code is as follows
import spacy
!python -m spacy download en_core_web_sm
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
# Process
text = ("This is a test document")
doc = nlp(text)
I get the error "OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory"
The details of installation are
Python - 3.8.10
spaCy version 3.3
It simply does not work. I tried the following
ℹ spaCy installation:
/databricks/python3/lib/python3.8/site-packages/spacy
NAME SPACY VERSION
en_core_web_sm >=2.2.2 3.3.0 ✔
But the error still remains
Not sure if this message is relevant
/databricks/python3/lib/python3.8/site-packages/spacy/util.py:845: UserWarning: [W094] Model 'en_core_web_sm' (2.2.5) specifies an under-constrained spaCy version requirement: >=2.2.2. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.3.0,<3.4.0
warnings.warn(warn_msg)
Also the message when installing 'en_core_web_sm"
"Defaulting to user installation because normal site-packages is not writeable"
Any help will be appreciated
Ganesh
I suspect that you have cluster with autoscaling, and when autoscaling happened, new nodes didn't have the that module installed. Another reason could be that cluster node was terminated by cloud provider & cluster manager pulled a new node.
To prevent such situations I would recommend to use cluster init script as it's described in the following answer - it will guarantee that the module is installed even on the new nodes. Content of the script is really simple:
#!/bin/bash
pip install spacy
python -m spacy download en_core_web_sm

When downloading MNIST, I can't get the "processed" folder

I am following a tutorial in here https://www.youtube.com/watch?v=IQpP_cH8rrA
I followed all the initial steps (except I am in VS not in Colab) but I stop pretty soon because when running:
torchvision.datasets.MNIST('./', download=True)
I get only the raw folder, not the processed one (which should contain training.pt and test.pt).
Can anybody help?
I am running on python 3.8.10, torch version 1.10.1, torchvision 0.11.2
PS: I found the same issue here https://github.com/pytorch/vision/issues/4685
should I really downgrade torchvision to 0.9.1 to have both folders?
if yes, how can I just downgrade torchvision from cmd, without uninstall torch and install everything back?
I found this work around, downloading the data from tensorflow and then just switching the data types so you can follow along the tutorial again. hope this helps
import tensorflow as tf
import torch
import numpy as np
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape)
images = torch.from_numpy(x_train)
ground_truth = torch.from_numpy(y_train)
print(images.shape)
print(ground_truth.shape)
`
This works in my notebook, hopefully it does for you too
I am not sure if this answer will help anyone but this was my solution to it (after lots of trying and searching in the internet, I am not too experienced):
(I used anaconda prompt)
I created a virtual environment called "test" for python 3.6:
conda create -n test python=3.6
activate test
I installed the recommended torchvision version on it:
pip install torchvision==0.9.1
I ran my program in the virtual environment:
python yourprogram.py
I am sure this is not the best solution to exist but it worked for me and was very easy as it is just a few lines in anaconda prompt.

After downgrading Tensorflow 2.0 to 1.5 results changed and results reproduction is not available

Would you help me to achieve reproducible results with Tensorflow 1.15 without restarting Python kernel. And why the output results in TF 2.0 and TF 1.5 are different with absolutely identical parameters and dataset? Is it possible to achieve identical output?
More details:
I tried to interpret model results in TF 2.0 by:
import shap
background = df3.iloc[np.random.choice(df3.shape[0], 100, replace=False)]
explainer = shap.DeepExplainer(model, background)
I recieved an error:
`get_session` is not available when using TensorFlow 2.0.`get_session` is not available when using TensorFlow 2.0.
According to the SO topic, I tried to setup TF 2.0 compatibility with TF 1 by using in the front of my code:
import tensorflow.compat.v1 as tf
But the error appeared again.
Following advice by many users, I downgraded TF2 to TF 1.15 it solved the problem, and shap module interprets the results but:
1) to make results reproducible now I have to change tf.random.set_seed(7) on tf.random.set_random_seed(7) and restart the Python kernel every time! In TF2 I didn't have to restart the kernel.
2) prediction results has been changed, especially, Economical efficiency (that is, TF1.5. wrongly classifies more important samples than TF2.0).
TF 2:
Accuracy: 94.95%, Economical efficiency = 64%
TF 1:
Accuracy: 94.85%, Economical efficiency = 56%
The code of the model is here
First, results differ from each other not only in TF1 and TF2 versions, but also in TF2.0 and TF2.2 versions. Probably, it depends on diffenent internal parameters in the packages.
Second, TensorFlow2 works with DeepExplainer in the following versions:
import tensorflow
import pandas as pd
import keras
import xgboost
import numpy
import shap
print(tensorflow.__version__)
print(pd.__version__)
print(keras.__version__)
print(xgboost.__version__)
print(numpy.__version__)
print(shap.__version__)
output:
2.2.0
0.24.2
2.3.1
0.90
1.17.5
0.35.0
But you will face some difficulties in updating the libraries.
In Python 3.5, running TF2.2, you will face the error 'DLL load failed: The specified module could not be found'.
It 100% can be solved by installing newer C++ package. See this:https://github.com/tensorflow/tensorflow/issues/22794#issuecomment-573297027
Link to download the package:https://support.microsoft.com/ru-ru/help/2977003/the-latest-supported-visual-c-downloads
In Python 3.7 you will not find the shap 0.35.0 version with whl extention. Only tar.gz extension which gives the error: "Install visual c++ package". But installation doesn't help.
Then download shap 0.35.0 for Python 3.7 here: https://anaconda.org/conda-forge/shap/files. Run Anaconda shell. Type: conda install -c conda-forge C:\shap-0.35.0-py37h3bbf574_0.tar.bz2.

After installing Tensorflow 2.0 in a python 3.7.1 env, do I need to install Keras, or does Keras come bundled with TF2.0?

I need to use Tensorflow 2.0(TF2.0) and Keras but I don't know if it's necessary to install both seperately or just TF2.0 (assuming TF2.0 has Keras bundled inside it). If I need to install TF2.0 only, will installing in a Python 3.7.1 be acceptable?
This is for Ubuntu 16.04 64 bit.
In Tensorflow 2.0 there is strong integration between TensorFlow and the Keras API specification (TF ships its own Keras implementation, that respects the Keras standard), therefore you don't have to install Keras separately since Keras already comes with TF in the tf.keras package.

keras -> mlmodel: coreml object has no attribute 'convert'

I am trying to convert my keras model into mlmodel using coreml. However, it is saying that coremltools module has no attribute 'convert'.
AttributeError: 'module' object has no attribute 'convert'
My coremltools, keras, tensorflow(tensorflow-gpu) modules are all up to date.
I am also using python 2.7.10.
I've used windows and mac, in which, neither worked. However, caffe.convert is working using a caffe model.
Code:
coreml_model = coremltools.converters.keras.convert(MODEL_PATH)
As per the documentation, I expected the converters.keras.convert method to be available in coremltools.
Documentation: https://apple.github.io/coremltools/generated/coremltools.converters.keras.convert.html
Please help, thanks in advance!
Edit:
import coremltools
# from keras.models import load_model
import keras
import sys
from keras.applications import MobileNet
from keras.utils.generic_utils import CustomObjectScope
with CustomObjectScope({'relu6': keras.applications.MobileNet.relu6, 'DepthwiseConv2D': keras.applications.mobilenet.DepthwiseConv2D}):
model = load_model('weights.hdf5')
MODEL_PATH = "data/model_wide_cifar-10_fruits_model.h5"
def main():
""" Takes in keras model and convert to .mlmodel"""
print(sys.version)
# Load in keras model.
# model = load_model(MODEL_PATH)
# load labels
labels=[]
label_handler = open("fruit-labels.txt", 'r')
for label in label_handler:
labels.append(label.rstrip())
label_handler.close()
print("[INFO] Labels: {0}".format(labels))
# Convert to .mlmodel
coreml_model = coremltools.converters.keras.convert(
model=MODEL_PATH,
input_names="image",
output_names="image",
class_labels=labels)
labels = 'fruit-labels.txt'
# Save .mlmodel
coreml_model.utils.save_spec('fruitclassifier.mlmodel')
The solution is to use virtualenv. Follow the instructions from the coremltools README:
Installation
We recommend using virtualenv to use, install, or build coremltools. Be
sure to install virtualenv using your system pip.
pip install virtualenv
The method for installing coremltools follows the
standard python package installation steps.
To create a Python virtual environment called pythonenv follow these steps:
# Create a folder for virtualenv
mkdir virtualenvs
cd virtualenvs
# Create a Python virtual environment for your Core ML project
virtualenv coremltools
To activate your new virtual environment and install coremltools in this environment, follow these steps:
# Active your virtual environment
source coremltools/bin/activate
# Install coremltools in the new virtual environment, pythonenv
pip install --upgrade pip
pip install -U coremltools==3.0b5
Install keras and tensorflow
pip install keras tensorflow
Now make sure it works. With the coremltools environment activated, run
>>> python
Python 3.7.4 (v3.7.4:e09359112e, Sep 5 2019, 14:54:52)
>>> import coremltools
>>> coremltools.converters.keras.convert()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: convert() missing 1 required positional argument: 'model'
coremltools documentation
Credit to this guys issue: https://github.com/apple/coremltools/issues/440
Communist Hacker's answer does not work for my current setup:
tensorflow 2.4.1
coremltools 4.1
Python 3.8.7
However, after reviewing the documentation for coremltools here, I was able to fix it by removing keras from the function and the call now works:
import coremltools
coreml_model = coremltools.converters.convert(model,
input_names="inputname",
output_names="outputname")
Running the above command now produces this in my Jupyter notebook:
Running TensorFlow Graph Passes: 100%|██████████| 5/5 [00:00<00:00, 37.53 passes/s]
Converting Frontend ==> MIL Ops: 100%|██████████| 6/6 [00:00<00:00, 5764.05 ops/s]
Running MIL optimization passes: 100%|██████████| 17/17 [00:00<00:00, 5633.05 passes/s]
Translating MIL ==> MLModel Ops: 100%|██████████| 3/3 [00:00<00:00, 6864.65 ops/s]
Note - I am a complete noob with this, so I probably described things
incorrectly.

Resources