MLflow not tracking anything with pycaret - mlflow

I have the mlflow ui (v.2.1.1.) running. This code with pycaret 2.3.10 does not track anything for some reason. Only the experiment name shows up.
import pycaret
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
# import libraries
import pandas as pd
import numpy as np
# read csv data
data = pd.read_csv('https://raw.githubusercontent.com/srees1988/predict-churn-py/main/customer_churn_data.csv')
data = data.sample(1000)
# initialize setup
from pycaret.classification import *
s = setup(data, target = 'Churn', session_id = 123, ignore_features = ['customerID'], log_experiment = True, experiment_name = 'churn1', silent=True)
best_model = compare_models()
model = create_model('rf')
plot_model(model, 'confusion_matrix')
predict_model(model)
finalize_model(model)
save_model(model, 'model')
I would expect the metrics and potentially the images to appear.

Did you try to expand the square button in front of Session Initialized? For me, it's logged under the session. If it does not work try to downgrade mlflow to the version that PyCaret supports.
The current version of PyCaret (3.0.0rc8) only supports the mlflow between 1.24.0 and less than 2.0.0 you can reference from here.

Related

GPU not used on d3rlpy

I am new to using d3rlpy for offline RL training and makes use of pytorch. So I installed cuda 1.16 as recommended from PYtorch doc: pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116. I installed d3rlpy after and run the following sample code:
from d3rlpy.algos import BC,DDPG,CRR,PLAS,PLASWithPerturbation,TD3PlusBC,IQL
import d3rlpy
import numpy as np
import glob
import time
#models
continuous_models = {
"BehaviorCloning": BC,
"DeepDeterministicPolicyGradients": DDPG,
"CriticRegularizedRegression": CRR,
"PolicyLatentActionSpace": PLAS,
"PolicyLatentActionSpacePerturbation": PLASWithPerturbation,
"TwinDelayedPlusBehaviorCloning": TD3PlusBC,
"ImplicitQLearning": IQL,
}
#load dataset data_batch is created as a*.h5 file with d3rlpy
dataset = d3rlpy.dataset.MDPDataset.load(data_batch)
# preprocess
mean = np.mean(dataset.observations, axis=0, keepdims=True)
std = np.std(dataset.observations, axis=0, keepdims=True)
scaler = d3rlpy.preprocessing.StandardScaler(mean=mean, std=std)
# test models
for _model in continuous_models:
the_model = continuous_models[_model](scaler = scaler)
the_model.use_gpu = True
the_model.build_with_dataset(dataset)
the_model.fit(dataset = dataset.episodes,
n_steps_per_epoch = 10800,
n_steps = 54000,
logdir = './logs',
experiment_name = f"{_model}",
tensorboard_dir = 'logs',
save_interval = 900, # we don't want to save intermediate parameters
)
#save model
the_timestamp = int(time.time())
the_model.save_model(f"./models/{_model}/{_model}_{the_timestamp}.pt")
The issue is that None of the models, despite being set with use_gpu =True are actually using the GPU. With a sample code of pytotch and testing torch.cuda.current_device() I can see that pytorch is properly set and detecting the gpu. Any idea where to look for solving this issue? I am not sure this is a bug from the d3rlpy so I would bother creating an issue on github yet :)

Tensorflow and PyCharm autocomplete problem

I have a problem with autocomplete in PyCharm and tensor flow, I have a sample code which runs perfectly, but the problem in my IDE which can't show autocomplete code option.
import tensorflow as tf
import tensorflow_datasets as tfds
Mnist_dataSet, Mnist_Info = tfds.load(name='mnist', with_info=True, as_supervised=True)
Mnist_Train, Mnist_Test = Mnist_dataSet['train'], Mnist_dataSet['test']
num_validation_sample = 0.1 * Mnist_Info.splits['train'].num_examples #No code complete option
num_validation_sample = tf.cast(num_validation_sample, tf.int64)
num_Test_sample =0.1*Mnist_Info.splits['test'].num_examples #No code complete option
num_Test_sample = tf.cast(num_Test_sample, tf.int64)
.....
......
I know python is dynamically programming language and at declaring variable it declares like this
Mnist_dataSet: Any = tfds.load(name='mnist', with_info=True, as_supervised=True)
I try to declare variables like this and it works
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
Mnist_Info: tfds.core.DatasetInfo
Mnist_dataSet, Mnist_Info = tfds.load(name='mnist', with_info=True, as_supervised=True)
Mnist_Train, Mnist_Test = Mnist_dataSet['train'], Mnist_dataSet['test']
num_validation_sample = 0.1 * Mnist_Info.splits['train'].num_examples
num_validation_sample = tf.cast(num_validation_sample, tf.int64)
num_Test_sample =0.1*Mnist_Info.splits['test'].num_examples
num_Test_sample = tf.cast(num_Test_sample, tf.int64)
Is there any option to make python declare variables according to the main class automatically?

Deploy Keras model on Spark

I have a trained keras model.
https://github.com/qubvel/efficientnet
I have a large updating dataset I want to get predictions on. Meaning to run my spark job every 2 hours or so.
What is the way to implement this? MlLib does not support efficientNet.
When searching online I saw this kind of implementation using sparkdl, but it does not support efficentNet as modelName parameter.
featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3")
rf = RandomForestClassifier(labelCol="label", featuresCol="features")
My naive approach would be
import efficientnet.keras as efn
model = efn.EfficientNetB0(weights='imagenet')
from sparkdl import readImages
image_df = readImages("flower_photos/sample/")
image_df.withcolumn("modelTags", efficient_net_udf($"image".data))
and creating a UDF that calls model.predict...
Another method I saw is
from keras.preprocessing.image import img_to_array, load_img
import numpy as np
import os
from pyspark.sql.types import StringType
from sparkdl import KerasImageFileTransformer
import efficientnet.keras as efn
model = efn.EfficientNetB0(weights='imagenet')
model.save("kerasModel.h5")
def loadAndPreprocessKeras(uri):
image = img_to_array(load_img(uri, target_size=(299, 299)))
image = np.expand_dims(image, axis=0)
return image
transformer = KerasImageFileTransformer(inputCol="uri", outputCol="predictions",
modelFile='path/kerasModel.h5',
imageLoader=loadAndPreprocessKeras,
outputMode="vector")
files = [os.path.abspath(os.path.join(dirpath, f)) for f in os.listdir("/data/myimages") if f.endswith('.jpg')]
uri_df = sqlContext.createDataFrame(files, StringType()).toDF("uri")
keras_pred_df = transformer.transform(uri_df)
What is the correct (and working) way to approach this?

Why is my image_path undefined when using export_graphviz? - Python 3

I'm trying to run this machine learning tree algorithm code in IPython:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
iris = load_iris()
X = iris.data[:, 2:] # petal length and width
y = iris.target
tree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X, y)
from sklearn.tree import export_graphviz
export_graphviz(tree_clf, out_file=image_path("iris_tree.dot"),
feature_names=iris.feature_names[2:],
class_names=iris.target_names,
rounded=True,
filled=True
)
But I get this error when run in IPython:
I'm unfamiliar with export_graphviz, does anyone have any idea how to correct this?
I guess you are following "Hands on Machine Learning with Scikit-Learn and TensorFlow" book by Aurelien Geron. I encountered with the same problem while trying out "Decision Trees" chapter. You can always refer to his GitHub notebooks . For your code, you may refer "decision tree" notebook.
Below I paste the code from notebook. Please do go ahead and have a look at the notebook also.
# To support both python 2 and python 3
from __future__ import division, print_function, unicode_literals
# Common imports
import numpy as np
import os
# to make this notebook's output stable across runs
np.random.seed(42)
# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "decision_trees"
def image_path(fig_id):
return os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID, fig_id)
def save_fig(fig_id, tight_layout=True):
print("Saving figure", fig_id)
if tight_layout:
plt.tight_layout()
plt.savefig(image_path(fig_id) + ".png", format='png', dpi=300)
To get rid of all the mess simply remove image_path,
now out_file="iris_tree.dot", after running that command a file will be saved in your folder named iris_tree. Open that file in Microsoft Word and copy all of its content. Now open your browser and type "webgraphviz" and then click on the first link. Then delete whatever is written in white space and paste your code which is copied from iris_tree. Then click "generate graph". Scroll down and your graph is ready.
I know you might have got what you were looking for. But in case you don't, all you need to do is just replace:
out_file=image_path("iris_tree.dot")
with:
out_file="iris_tree.dot"
This will create the .dot file in the same directory in which your current script is.
You can also give the absolute path to where you want to save the .dot file as:
out_file="/home/cipher/iris_tree.dot"
you must correct
out_file=image_path("iris_tree.dot"),
in below code line:
out_file="C:/Users/VIDA/Desktop/python/iris_tree.dot",
You can directly type instead of using the webgraphviz, if you are using sklearn version 0.20.
import graphviz
with open ("iris_tree.dot") as f:
dot_graph = f.read()
display (graphviz.Source(dot_graph))
With sklearn 0.22 you have to change again. See sklearn users guide.
I have a sklearn with the version of 0.20.1, and I got the example to work through the line below.
export_graphviz(
tree_clf,
out_file = "iris_tree.dot",
feature_names = iris.feature_names[2:])

Tensorflow Scikit Flow get GraphDef for Android (save *.pb file)

I want to use my Tensorflow algorithm in an Android app. The Tensorflow Android example starts by downloading a GraphDef that contains the model definition and weights (in a *.pb file). Now this should be from my Scikit Flow algorithm (part of Tensorflow).
At the first glance it seems easy you just have to say classifier.save('model/') but the files saved to that folder are not *.ckpt, *.def and certainly not *.pb. Instead you have to deal with a *.pbtxt and a checkpoint (without ending) file.
I'm stuck there since quite a while. Here a code example to export something:
#imports
import tensorflow as tf
import tensorflow.contrib.learn as skflow
import tensorflow.contrib.learn.python.learn as learn
from sklearn import datasets, metrics
#skflow example
iris = datasets.load_iris()
feature_columns = learn.infer_real_valued_columns_from_input(iris.data)
classifier = learn.LinearClassifier(n_classes=3, feature_columns=feature_columns,model_dir="modeltest")
classifier.fit(iris.data, iris.target, steps=200, batch_size=32)
iris_predictions = list(classifier.predict(iris.data, as_iterable=True))
score = metrics.accuracy_score(iris.target, iris_predictions)
print("Accuracy: %f" % score)
The files you get are:
checkpoint
graph.pbtxt
model.ckpt-1.meta
model.ckpt-1-00000-of-00001
model.ckpt-200.meta
model.ckpt-200-00000-of-00001
Many possible workarounds I found would require having the GraphDef in a variable (don't know how with Scikit Flow). Or a Tensorflow session which doesn't seem to be required using Scikit Flow.
To save as pb file, you need to extract the graph_def from the constructed graph. You can do that as--
from tensorflow.python.framework import tensor_shape, graph_util
from tensorflow.python.platform import gfile
sess = tf.Session()
final_tensor_name = 'results:0' #Replace final_tensor_name with name of the final tensor in your graph
#########Build your graph and train########
## Your tensorflow code to build the graph
###########################################
outpt_filename = 'output_graph.pb'
output_graph_def = sess.graph.as_graph_def()
with gfile.FastGFile(outpt_filename, 'wb') as f:
f.write(output_graph_def.SerializeToString())
If you want to convert your trained variables to constants (to avoid using ckpt files to load the weights), you can use:
output_graph_def = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), [final_tensor_name])
Hope this helps!

Resources