Change trialId in Google AI Platform hyperparameter tuning - gcp-ai-platform-notebook

I'm trying to follow this tutorial on hyperparameter tuning on AI Platform: https://cloud.google.com/blog/products/gcp/hyperparameter-tuning-on-google-cloud-platform-is-now-faster-and-smarter.
My configuration yaml file looks like this:
trainingInput:
hyperparameters:
goal: MINIMIZE
hyperparameterMetricTag: loss
maxTrials: 4
maxParallelTrials: 2
params:
- parameterName: learning_rate
type: DISCRETE
discreteValues:
- 0.0005
- 0.001
- 0.0015
- 0.002
The expected output:
"completedTrialCount": "4",
"trials": [
{
"trialId": "3",
"hyperparameters": {
"learning_rate": "2e-03"
},
"finalMetric": {
"trainingStep": "123456",
"objectiveValue": 0.123456
},
},
Is there any way to customize the trialId instead the defaults numeric values (e.g. 1,2,3,4...)?

It is not possible to customize the trialId as it is dependent on the parameter maxTrials in your hyperparameter tuning config.
maxTrials only accepts integers, so the assigned value to trialId will be a range from 1 to your defined maxTrials.
Also as mentioned in the example in your post where maxTrials: 40 is set and it yields a json that shows trialId: 35 which is within the range of maxTrials.
This indicates that 40 trials have been completed, and the best so far
is trial 35, which achieved an objective of 1.079 with the
hyperparameter values of nembeds=18 and nnsize=32.
Example output:

Related

Clarifications on training job parameters with Tensorflow

Im using the new Tensorflow object detection API.
I need to replicate training parameters used on a paper but Im a bit confused.
In the paper is stated
When training neural network models, their base confguration is similar to that used to
train on the COCO 2017 dataset. For the unambiguous comparison of the selected models, the total number of
training steps was set to 100 equal to 100′000 iterations of learning.
Inside model_main_tf2.py, which is the script used to start the training, I can read the following:
"""Creates and runs TF2 object detection models.
For local training/evaluation run:
PIPELINE_CONFIG_PATH=path/to/pipeline.config
MODEL_DIR=/tmp/model_outputs
NUM_TRAIN_STEPS=10000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python model_main_tf2.py -- \
--model_dir=$MODEL_DIR --num_train_steps=$NUM_TRAIN_STEPS \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--pipeline_config_path=$PIPELINE_CONFIG_PATH \
--alsologtostderr
"""
Also, you can specify the num_steps and total_steps parameters in the pipeline.config file (used by the training script):
train_config: {
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .16
total_steps: 50000
warmup_learning_rate: 0
warmup_steps: 2500
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
So, what Im not understanding is how should I map what is written in the paper with tensorflow parameters.
What is the num steps and total_steps inside the pipeline.config file?
What is the NUM_TRAIN_STEPS argument instead?
Does it overwrite config file steps or its a completely different thing?
If more details are needed feel free to ask.

My tensorflow object detection model produced the average precision of zero for first class

I have trained the object detection model for three classes: id=1 (LR), id=2 (PM), id=3 (YR). Model produced AP(LR):0.002, PM:0.84,YR:1.00 and after that changed id=1 (YR), id=2(PM), id=3(YR). Model gives AP(YR):0.002, AP(PM):0.79, AP(LR):0.89.
Is this is taking first class as dummy class or there is another reason for that. Please help me out this.
Following are the changes i performed in the .config file to get the average precision.
eval_config: {
metrics_set: "pascal_voc_detection_metrics"
use_moving_averages: false
batch_size: 1;
num_visualizations: 20
max_num_boxes_to_visualize: 10
visualize_groundtruth_boxes: true
eval_interval_secs: 30
}

MLflow webserver returns 400 status, "Incompatible input types for column X. Can not safely convert float64 to <U0."

I am implementing an anomaly detection web service using MLflow and sklearn.pipeline.Pipeline(). The aim of the model is to detect web crawlers using server log and response_length column is one of my features. After serving model, for testing the web service I send below request that contains the 20 first columns of the train data.
$ curl --location --request POST '127.0.0.1:8000/invocations'
--header 'Content-Type: text/csv' \
--data-binary 'datasets/test.csv'
But response of the web server has status code 400 (BAD REQUEST) and this JSON body:
{
"error_code": "BAD_REQUEST",
"message": "Incompatible input types for column response_length. Can not safely convert float64 to <U0."
}
Here is the model compilation MLflow Tracking component log:
[Pipeline] ......... (step 1 of 3) Processing transform, total=11.8min
[Pipeline] ............... (step 2 of 3) Processing pca, total= 4.8s
[Pipeline] ........ (step 3 of 3) Processing rule_based, total= 0.0s
2021/07/16 04:55:12 WARNING mlflow.sklearn: Training metrics will not be recorded because training labels were not specified. To automatically record training metrics, provide training labels as inputs to the model training function.
2021/07/16 04:55:12 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/matin/workspace/Rahnema College/venv/lib/python3.8/site-packages/mlflow/models/signature.py:129: UserWarning: Hint: Inferred schema contains integer column(s). Integer columns in Python cannot represent missing values. If your input data contains missing values at inference time, it will be encoded as floats and will cause a schema enforcement error. The best way to avoid this problem is to infer the model schema based on a realistic data sample (training dataset) that includes missing values. Alternatively, you can declare integer columns as doubles (float64) whenever these columns may have missing values. See `Handling Integers With Missing Values <https://www.mlflow.org/docs/latest/models.html#handling-integers-with-missing-values>`_ for more details."
Logged data and model in run: 8843336f5c31482c9e246669944b1370
---------- logged params ----------
{'memory': 'None',
'pca': 'PCAEstimator()',
'rule_based': 'RuleBasedEstimator()',
'steps': "[('transform', <log_transformer.LogTransformer object at "
"0x7f05a8b95760>), ('pca', PCAEstimator()), ('rule_based', "
'RuleBasedEstimator())]',
'transform': '<log_transformer.LogTransformer object at 0x7f05a8b95760>',
'verbose': 'True'}
---------- logged metrics ----------
{}
---------- logged tags ----------
{'estimator_class': 'sklearn.pipeline.Pipeline', 'estimator_name': 'Pipeline'}
---------- logged artifacts ----------
['model/MLmodel',
'model/conda.yaml',
'model/model.pkl',
'model/requirements.txt']
Could anyone tell me exactly how I can fix this model serve problem?
The problem caused by mlflow.utils.autologging_utils WARNING.
When the model is created, data input signature is saved on the MLmodel file with some.
You should change response_length signature input type from string to double by replacing
{"name": "response_length", "type": "double"}
instead of
{"name": "response_length", "type": "string"}
so it doesn't need to be converted. After serving the model with edited MLmodel file, the web server worked as expected.

sklearn, Keras, DeepStack - ValueError: multi_class must be in ('ovo', 'ovr')

I trained a set of DNNs and I want to use them in a deep ensemble. The code is implemented in TF2, but the package deepstack works with Keras as well. The code looks something like this
from deepstack.base import KerasMember
from deepstack.ensemble import DirichletEnsemble
dirichletEnsemble = DirichletEnsemble(N=2000 * ensemble_size)
for net_idx in range(0,ensemble_size):
member = KerasMember(name=model_name, keras_model=model,
train_batches=(train_images,train_labels), val_batches=(valid_images, valid_labels))
dirichletEnsemble.add_member(member)
dirichletEnsemble.fit()
where 'model' is essentially a Keras model, thus you need to load one model at each loop (I am using my own implementation). 'ensemble_size' represents the number of DNNs used in the ensemble.
As a result, I get the following error
ValueError: multi_class must be in ('ovo', 'ovr')
which is generated by the sklearn package.
FURTHER DETAILS: deepstack creates a metric
metric = metrics.roc_auc_score
and then returns it as
return metric(y_t, y_p)
which then calls sklearn
if multi_class == 'raise':
raise ValueError("multi_class must be in ('ovo', 'ovr')")
In my specific case, the labels are respectively y_t
[ 7 10 18 52 10 13 10 4 7 7 24 26 7 26 13 13]
and y_p
[ 73 250 250 250 281 281 250 281 281 174 281 250 281 250 250 250]
How do I set multi_class as 'ovo' or 'ovr'?
The documentation for roc_auc_score indicates the following:
roc_auc_score(
y_true,
y_score,
*,
average='macro',
sample_weight=None,
max_fpr=None,
multi_class='raise',
labels=None
)
The second last parameter there is multi_class, which has the following explanation:
Multiclass only. Determines the type of configuration to use. The default value raises an error, so either 'ovr' or 'ovo' must be passed explicitly.
So, it seems that there is some variation in how roc auc is calculated and they are forcing you to explicitly choose which variation you want them to use. If you don't make the choice, the default will result in an exception being raised. And that exception is the error that you are reporting in your question title.
if you are getting this error while using sklearn roc_auc_score library, try roc_auc_score(YTEST,YPRED, multi_class='ovr') ovr is one vs rest which will convert your multiclass problem to a binary problem

How does sklearn.linear_model.LinearRegression work with insufficient data?

To solve a 5 parameter model, I need at least 5 data points to get a unique solution. For x and y data below:
import numpy as np
x = np.array([[-0.24155831, 0.37083184, -1.69002708, 1.4578805 , 0.91790011,
0.31648635, -0.15957368],
[-0.37541846, -0.14572825, -2.19695883, 1.01136142, 0.57288752,
0.32080956, -0.82986857],
[ 0.33815532, 3.1123936 , -0.29317028, 3.01493602, 1.64978158,
0.56301755, 1.3958912 ],
[ 0.84486735, 4.74567324, 0.7982888 , 3.56604097, 1.47633894,
1.38743513, 3.0679506 ],
[-0.2752026 , 2.9110031 , 0.19218081, 2.0691105 , 0.49240373,
1.63213241, 2.4235483 ],
[ 0.89942508, 5.09052174, 1.26048572, 3.73477373, 1.4302902 ,
1.91907482, 3.70126468]])
y = np.array([-0.81388378, -1.59719762, -0.08256274, 0.61297275, 0.99359647,
1.11315445])
I used only 6 data to fit a 8 parameter model (7 slopes and 1 intercept).
lr = LinearRegression().fit(x, y)
print(lr.coef_)
array([-0.83916772, -0.57249998, 0.73025938, -0.02065629, 0.47637768,
-0.36962192, 0.99128474])
print(lr.intercept_)
0.2978781587718828
Clearly, it's using some kind of assignment to reduce the degrees of freedom. I tried to look into the source code but couldn't found anything about that. What method do they use to find the parameter of under specified model?
You don't need to reduce the degrees of freedom, it simply finds a solution to the least squares problem min sum_i (dot(beta,x_i)+beta_0-y_i)**2. For example, in the non-sparse case it uses the linalg.lstsq module from scipy. The default solver for this optimization problem is the gelsd LAPACK driver. If
A= np.concatenate((ones_v, X), axis=1)
is the augmented array with ones as its first column, then your solution is given by
x=numpy.linalg.pinv(A.T*A)*A.T*y
Where we use the pseudoinverse precisely because the matrix may not be of full rank. Of course, the solver doesn't actually use this formula but uses singular value Decomposition of A to reduce this formula.

Resources