Clarifications on training job parameters with Tensorflow

Clarifications on training job parameters with Tensorflow - python-3.x

Im using the new Tensorflow object detection API.
I need to replicate training parameters used on a paper but Im a bit confused.
In the paper is stated
When training neural network models, their base confguration is similar to that used to
train on the COCO 2017 dataset. For the unambiguous comparison of the selected models, the total number of
training steps was set to 100 equal to 100′000 iterations of learning.
Inside model_main_tf2.py, which is the script used to start the training, I can read the following:
"""Creates and runs TF2 object detection models.
For local training/evaluation run:
PIPELINE_CONFIG_PATH=path/to/pipeline.config
MODEL_DIR=/tmp/model_outputs
NUM_TRAIN_STEPS=10000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python model_main_tf2.py -- \
--model_dir=$MODEL_DIR --num_train_steps=$NUM_TRAIN_STEPS \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--pipeline_config_path=$PIPELINE_CONFIG_PATH \
--alsologtostderr
"""
Also, you can specify the num_steps and total_steps parameters in the pipeline.config file (used by the training script):
train_config: {
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .16
total_steps: 50000
warmup_learning_rate: 0
warmup_steps: 2500
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
So, what Im not understanding is how should I map what is written in the paper with tensorflow parameters.
What is the num steps and total_steps inside the pipeline.config file?
What is the NUM_TRAIN_STEPS argument instead?
Does it overwrite config file steps or its a completely different thing?
If more details are needed feel free to ask.

Related

My tensorflow object detection model produced the average precision of zero for first class

I have trained the object detection model for three classes: id=1 (LR), id=2 (PM), id=3 (YR). Model produced AP(LR):0.002, PM:0.84,YR:1.00 and after that changed id=1 (YR), id=2(PM), id=3(YR). Model gives AP(YR):0.002, AP(PM):0.79, AP(LR):0.89.
Is this is taking first class as dummy class or there is another reason for that. Please help me out this.
Following are the changes i performed in the .config file to get the average precision.
eval_config: {
metrics_set: "pascal_voc_detection_metrics"
use_moving_averages: false
batch_size: 1;
num_visualizations: 20
max_num_boxes_to_visualize: 10
visualize_groundtruth_boxes: true
eval_interval_secs: 30
}

Gradients vanishing despite using Kaiming initialization

I was implementing a conv block in pytorch with activation function(prelu). I used Kaiming initilization to initialize all my weights and set all the bias to zero. However as I tested these blocks (by stacking 100 such conv and activation blocks on top of each other), I noticed that the output I am getting values of the order of 10^(-10). Is this normal, considering I am stacking upto 100 layers. Adding a small bias to each layer fixes the problem. But in Kaiming initialization the biases are supposed to be zero.
Here is the conv block code
from collections import Iterable
def convBlock(
input_channels, output_channels, kernel_size=3, padding=None, activation="prelu"
):
"""
Initializes a conv block using Kaiming Initialization
"""
padding_par = 0
if padding == "same":
padding_par = same_padding(kernel_size)
conv = nn.Conv2d(input_channels, output_channels, kernel_size, padding=padding_par)
relu_negative_slope = 0.25
act = None
if activation == "prelu" or activation == "leaky_relu":
nn.init.kaiming_normal_(conv.weight, a=relu_negative_slope, mode="fan_in")
if activation == "prelu":
act = nn.PReLU(init=relu_negative_slope)
else:
act = nn.LeakyReLU(negative_slope=relu_negative_slope)
if activation == "relu":
nn.init.kaiming_normal_(conv.weight, nonlinearity="relu")
act = nn.ReLU()
nn.init.constant_(conv.bias.data, 0)
block = nn.Sequential(conv, act)
return block
def flatten(lis):
for item in lis:
if isinstance(item, Iterable) and not isinstance(item, str):
for x in flatten(item):
yield x
else:
yield item
def Sequential(args):
flattened_args = list(flatten(args))
return nn.Sequential(*flattened_args)
This is the test Code
ls=[]
for i in range(100):
ls.append(convBlock(3,3,3,"same"))
model=Sequential(ls)
test=np.ones((1,3,5,5))
model(torch.Tensor(test))
And the output I am getting is
tensor([[[[-1.7771e-10, -3.5088e-10, 5.9369e-09, 4.2668e-09, 9.8803e-10],
[ 1.8657e-09, -4.0271e-10, 3.1189e-09, 1.5117e-09, 6.6546e-09],
[ 2.4237e-09, -6.2249e-10, -5.7327e-10, 4.2867e-09, 6.0034e-09],
[-1.8757e-10, 5.5446e-09, 1.7641e-09, 5.7018e-09, 6.4347e-09],
[ 1.2352e-09, -3.4732e-10, 4.1553e-10, -1.2996e-09, 3.8971e-09]],
[[ 2.6607e-09, 1.7756e-09, -1.0923e-09, -1.4272e-09, -1.1840e-09],
[ 2.0668e-10, -1.8130e-09, -2.3864e-09, -1.7061e-09, -1.7147e-10],
[-6.7161e-10, -1.3440e-09, -6.3196e-10, -8.7677e-10, -1.4851e-09],
[ 3.1475e-09, -1.6574e-09, -3.4180e-09, -3.5224e-09, -2.6642e-09],
[-1.9703e-09, -3.2277e-09, -2.4733e-09, -2.3707e-09, -8.7598e-10]],
[[ 3.5573e-09, 7.8113e-09, 6.8232e-09, 1.2285e-09, -9.3973e-10],
[ 6.6368e-09, 8.2877e-09, 9.2108e-10, 9.7531e-10, 7.0011e-10],
[ 6.6954e-09, 9.1019e-09, 1.5128e-08, 3.3151e-09, 2.1899e-10],
[ 1.2152e-08, 7.7002e-09, 1.6406e-08, 1.4948e-08, -6.0882e-10],
[ 6.9930e-09, 7.3222e-09, -7.4308e-10, 5.2505e-09, 3.4365e-09]]]],
grad_fn=<PreluBackward>)

Amazing question (and welcome to StackOverflow)! Research paper for quick reference.
TLDR
Try wider networks (64 channels)
Add Batch Normalization after activation (or even before, shouldn't make much difference)
Add residual connections (shouldn't improve much over batch norm, last resort)
Please check this out in this order and give a comment what (and if) any of that worked in your case (as I'm also curious).
Things you do differently
Your neural network is very deep, yet very narrow (81 parameters per layer only!)
Due to above, one cannot reliably create those weights from normal distribution as the sample is just too small.
Try wider networks, 64 channels or more
You are trying much deeper network than they did
Section: Comparison Experiments
We conducted comparisons on a deep but efficient model with 14 weight
layers (actually 22 was also tested in comparison with Xavier)
That was due to date of release of this paper (2015) and hardware limitations "back in the days" (let's say)
Is this normal?
Approach itself is quite strange with layers of this depth, at least currently;
each conv block is usually followed by activation like ReLU and Batch Normalization (which normalizes signal and helps with exploding/vanishing signals)
usually networks of this depth (even of depth half of what you've got) use also residual connections (though this is not directly linked to vanishing/small signal, more connected to degradation problem of even deep networks, like 1000 layers)

Can SAC be used instead PPO in Cartpole example?

I'm studying AzureML RL with example codes.
I could run cartpole example (cartpole_ci.ipynb) which trains
the PPO model on compute instance.
I tried SAC instead of PPO by changing training_algorithm = "PPO" to training_algorithm = "SAC"
but it failed with the message below.
ray.rllib.utils.error.UnsupportedSpaceException: Action space Discrete(2) is not supported for SAC.
Has someone tried SAC algorithm on AzureML RL and did it work?

AzureML RL does support SAC Discrete Actions but not parametric and I have confirmed it in the doc - https://docs.ray.io/en/latest/rllib-algorithms.html#feature-compatibility-matrix
Are you following the code sample?
from azureml.contrib.train.rl import ReinforcementLearningEstimator, Ray
training_algorithm = "PPO" rl_environment = "CartPole-v0"
script_params = {
# Training algorithm
"--run": training_algorithm,
# Training environment
"--env": rl_environment,
# Algorithm-specific parameters
"--config": '\'{"num_gpus": 0, "num_workers": 1}\'',
# Stop conditions
"--stop": '\'{"episode_reward_mean": 200, "time_total_s": 300}\'',
# Frequency of taking checkpoints
"--checkpoint-freq": 2,
# If a checkpoint should be taken at the end - optional argument with no value
"--checkpoint-at-end": "",
# Log directory
"--local-dir": './logs' }
training_estimator = ReinforcementLearningEstimator(
# Location of source files
source_directory='files',
# Python script file
entry_script='cartpole_training.py',
# A dictionary of arguments to pass to the training script specified in ``entry_script``
script_params=script_params,
# The Azure Machine Learning compute target set up for Ray head nodes
compute_target=compute_target,
# Reinforcement learning framework. Currently must be Ray.
rl_framework=Ray() )

Tflite (TF2) trained model running but not detecting in iOS app

I have trained a pre-model (ssd_mobilenet_v2_fpnlite_640x640) using TF2 Object Detection API then exported to an intermediate SavedModel to then convert it in TFlite model, using the following tutorials:
TF2 Object Detection API, Running TF2 Detection API Models on mobile, Converter Python API guide, and, Edge TF Lite iOS tutorial.
After many work hours I managed to make my model to predict in Python environment and run in the pre-made iOS app from TF lite.
However, after trying many ways of exporting and converting the model I cannot make the model to detect the objects I trained it for to detect.
The following is the instruction for training the model using TF2 API:
python3 model_main_tf2.py \
--pipeline_config_path={pipeline_path}\
--model_dir={output_model_dir} \
--alsologtostderr
This is the instructions for exporting the SavedModel using TF2 API:
python export_tflite_graph_tf2.py \
--pipeline_config_path {pipeline_path} \
--trained_checkpoint_dir {output_model_dir} \
--output_directory {exported_models_dir}
And the following the code to convert the model to tflite from Python API:
converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
tflite_model = converter.convert()
I have also tried some other alternatives to convert in TF1 like:
converter = tf.compat.v1.lite.TFLiteConverter.from_saved_model(export_dir)
converter.inference_type = tf.compat.v1.lite.constants.QUANTIZED_UINT8
input_arrays = converter.get_input_arrays()
converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} # mean_value, std_dev
tflite_model = converter.convert()
and with command line:
tflite_convert \
--saved_model_dir={saved_model} \
--output_file={output_dir} \
--output_format=TFLITE \
--input_shapes=1,640,640,3 \
--input_arrays='normalized_input_image_tensor' \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' \
--inference_type=QUANTIZED_UINT8 \
--mean_values=128 \
--std_dev_values=127 \
--change_concat_input_ranges=false \
--allow_custom_ops
the result is a 500bytes file. This tflite model looks as follows (in Neutron):
In the IOS app I adjusted the code this way:
// MARK: Model parameters
let batchSize = 1
let inputChannels = 3
let inputWidth = 640
let inputHeight = 640
// image mean and std for floating model, should be consistent with parameters used in model training
let imageMean: Float = 128
let imageStd: Float = 127
I have also tried with some other SSD Mobilenet models unsuccesfully. I've been stuck for several days already, I'd appreciate your help.

Install tf-nighly and convert saved_model to tflite file
https://pypi.org/project/tf-nightly/

SVM Model does not support probability estimation?

I am doing some classification task with Support Vector Machines (SVM).
I am using libSVM (with Matlab support) to predict probability estimates matrix. However, the libSVM displays message that;
Model does not support probabiliy estimates
Below is my sample code;
(train_label contains labels for training data and test_label contains label for test data)
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1);
[y,accuracy,prob_estimates]=svmpredict(test_label,test_data,model,'-b 1');
Can someone tell me if there is something wrong with the way I am doing it? Any help/suggestion will be appreciated.

Don't know about the Matlab implementation, but usually you have to set this option:
-b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0)

I am using libsvm in the same way without any problem.
In your code only a ' is missing in the following line
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1);
It should be
model = svmtrain(train_label, train_data, '-t 2 -g .01 -c 0.7 -b 1');

I had the same problem, model hasn't got ProbA and ProbB in it.
Before it was like this and giving error:
linear_model = svmtrain(trainClass, trainData, ['-t 0', cmd]);
Then I changed it to this, error dissappared:) - removed cmd and put exact values
linear_model = svmtrain(trainClass, trainData, ['-t 0 -c 1 -g 0.125 -b 1']);
if still gives error try to change c and g parameters.
Hope this helps.

It is because your model does not support probabiliy estimates.
You should use '-b 1' option both at training and testing process.
See also: https://stackoverflow.com/a/43509667/7893127

You may just train the model with default parameter.
Try to use '-b 1' when you are training and testing programe.

C:\setup\python36\Lib\site-packages\svm.py default value of self.probability is 0. You can set it 1.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Clarifications on training job parameters with Tensorflow - python-3.x

Related

My tensorflow object detection model produced the average precision of zero for first class

Gradients vanishing despite using Kaiming initialization

Can SAC be used instead PPO in Cartpole example?

Tflite (TF2) trained model running but not detecting in iOS app

SVM Model does not support probability estimation?

Categories

Resources