TFRecord format for multiple instances of the same or different classes on one training image - python-3.x

I am trying to train a Faster R-CNN on grocery dataset detection using the new Object Detection API, but I do not quite understand the process of creating a TFRecord file for that. I am aware of the Oxford and VOC dataset examples and the scripts to create TFRecord files, and they work fine if there is only one object in a training image , which is what I see in all of the official examples and github's projects. I have images where more than 20 objects are defined and By the way objects have different classes. I don't want to iterate 20+ times per one image and create 20 almost the same tf_examples where only img_encoded that will be 20+ will take all my space.
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_image_data),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
I believe that the answer for my question in the field that during creating tf_records xmin, xmax, ymin, ymax, classes_text, and classes should all be lists with one value per bounding box, so I can add different objects and parameters into these lists per one image.
Maybe someone has experience and can help with advice. The way I've described is going to work or not and if not is there any ways to create tf_recrds for multiple objects in one image in delicate and simple way?
I just put here some features(not all of them) for creating tfrecords the way I think has to work because of what is said in comments(List of ... (1 per box)) in link I attached. Hope idea is clean from the attached json.
To clean some situation : xmin for example has 4 different normalized xmins [0.4056372549019608, 0.47794117647058826, 0.4840686274509804, 0.4877450980392157] for 4 different bboxes in attached feature example . Don't forget that lists were converted using dataset_util.float_list_feature method into serializable json format. c
features {
feature {
key: "image/filename"
value {
bytes_list {
value: "C4_P06_N1_S4_1.JPG"
}
}
}
feature {
key: "image/format"
value {
bytes_list {
value: "jpeg"
}
}
}
feature {
key: "image/height"
value {
int64_list {
value: 2112
}
}
}
feature {
key: "image/key/sha256"
value {
bytes_list {
value: "4e0b458e4537f87d72878af4201c55b0555f10a0e90decbd397fd60476e6e973"
}
}
}
feature {
key: "image/object/bbox/xmax"
value {
float_list {
value: 0.43323863636363635
value: 0.4403409090909091
value: 0.46448863636363635
value: 0.5085227272727273
}
}
}
feature {
key: "image/object/bbox/xmin"
value {
float_list {
value: 0.3565340909090909
value: 0.36363636363636365
value: 0.39204545454545453
value: 0.4318181818181818
}
}
}
feature {
key: "image/object/bbox/ymax"
value {
float_list {
value: 0.9943181818181818
value: 0.7708333333333334
value: 0.20265151515151514
value: 0.9943181818181818
}
}
}
feature {
key: "image/object/bbox/ymin"
value {
float_list {
value: 0.8712121212121212
value: 0.6174242424242424
value: 0.06818181818181818
value: 0.8712121212121212
}
}
}
feature {
key: "image/object/class/label"
value {
int64_list {
value: 1
value: 0
value: 3
value: 0
}
}
}
}
I kinda did what I thought have to help but I got these numbers during training and that's abnormal.
INFO:tensorflow:global step 204: loss = 1.4067 (1.177 sec/step)
INFO:tensorflow:global step 205: loss = 1.0570 (1.684 sec/step)
INFO:tensorflow:global step 206: loss = 1.0229 (0.916 sec/step)
INFO:tensorflow:global step 207: loss = 80484784668672.0000 (0.587 sec/step)
INFO:tensorflow:global step 208: loss = 981436265922560.0000 (0.560 sec/step)
INFO:tensorflow:global step 209: loss = 303916113723392.0000 (0.539 sec/step)
INFO:tensorflow:global step 210: loss = 4743170218786816.0000 (0.613 sec/step)
INFO:tensorflow:global step 211: loss = 2933532187951104.0000 (0.518 sec/step)
INFO:tensorflow:global step 212: loss = 1.8134 (1.513 sec/step)
INFO:tensorflow:global step 213: loss = 73507901414572032.0000 (0.553 sec/step)
INFO:tensorflow:global step 214: loss = 650799901688463360.0000 (0.622 sec/step)
P.S additional information: for normal view where 1 image has 1 object class from this dataset all works fine.

You are correct in that xmin, xmax, ymin, ymax, classes_text, and classes are all lists with one value per bounding box. There is no need to duplicate the image for each bounding box; it would indeed take up a lot of disk space. As #gautam-mistry pointed out, the records are streamed into tensorflow; as long as each image will fit into RAM you should be okay, even if you duplicated the images (so long as you have the disk space).

TFRecords file represents a sequence of (binary) strings. The format is not random access, so it is suitable for streaming large amounts of data but not suitable if fast sharding or other non-sequential access is desired.
tf.python_io.TFRecordWritertf.python_io.tf_record_iteratortf.python_io.TFRecordCompressionTypetf.python_io.TFRecordOptions

I found what was the problem --> I had a mistake in my protobuf class file. Different type of classes related to the one number of class. For example:
item {
id: 1
name: 'raccoon'
}
item {
id: 1
name: 'lion'
}
And so on, but because I had around 50 classes only in some step loss is went tremendously hight. Maybe it'll help someone, be cautious with proto txt :)

Related

Python comparing values from two dictionaries where keys match and one set of values is greater

I have the following datasets:
kpi = {
"latency": 3,
"cpu_utilisation": 0.98,
"memory_utilisation": 0.95,
"MIR": 200,
}
ns_metrics = {
"timestamp": "2022-10-04T15:24:10.765000",
"ns_id": "cache",
"ns_data": {
"cpu_utilisation": 0.012666666666700622,
"memory_utilisation": 8.68265852766783,
},
}
What I'm looking for is an elegant way to compare the cpu_utilisation and memory_utilisation values from each dictionary and if the two utilisation figures from ns_metrics is greater than kpi, for now, print a message as to which utilisation value was greater,i.e. was it either cpu or memory or both. Naturally, I can do something simple like this:
if ns_metrics["ns_data"]["cpu_utilisation"] > kpi["cpu_utilisation"]:
print("true: over cpu threshold")
if ns_metrics["ns_data"]["memory_utilisation"] > kpi["memory_utilisation"]:
print("true: over memory threshold")
But this seems a bit longer winded to have many if conditions, and I was hoping there is a more elegant way of doing it. Any help would be greatly appreciated.
maybe you can use a loop to do this:
check_list = ["cpu_utilisation", "memory_utilisation"]
for i in check_list:
if ns_metrics["ns_data"][i] > kpi[i]:
print("true: over {} threshold".format(i.split('_')[0]))
if the key is different,you can use a mapping dict to do it,like this:
check_mapping = {"cpu_utilisation": "cpu_utilisation_1"}
for kpi_key, ns_key in check_mapping.items():
....

Clarifications on training job parameters with Tensorflow

Im using the new Tensorflow object detection API.
I need to replicate training parameters used on a paper but Im a bit confused.
In the paper is stated
When training neural network models, their base confguration is similar to that used to
train on the COCO 2017 dataset. For the unambiguous comparison of the selected models, the total number of
training steps was set to 100 equal to 100′000 iterations of learning.
Inside model_main_tf2.py, which is the script used to start the training, I can read the following:
"""Creates and runs TF2 object detection models.
For local training/evaluation run:
PIPELINE_CONFIG_PATH=path/to/pipeline.config
MODEL_DIR=/tmp/model_outputs
NUM_TRAIN_STEPS=10000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python model_main_tf2.py -- \
--model_dir=$MODEL_DIR --num_train_steps=$NUM_TRAIN_STEPS \
--sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \
--pipeline_config_path=$PIPELINE_CONFIG_PATH \
--alsologtostderr
"""
Also, you can specify the num_steps and total_steps parameters in the pipeline.config file (used by the training script):
train_config: {
batch_size: 1
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .16
total_steps: 50000
warmup_learning_rate: 0
warmup_steps: 2500
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
So, what Im not understanding is how should I map what is written in the paper with tensorflow parameters.
What is the num steps and total_steps inside the pipeline.config file?
What is the NUM_TRAIN_STEPS argument instead?
Does it overwrite config file steps or its a completely different thing?
If more details are needed feel free to ask.

My tensorflow object detection model produced the average precision of zero for first class

I have trained the object detection model for three classes: id=1 (LR), id=2 (PM), id=3 (YR). Model produced AP(LR):0.002, PM:0.84,YR:1.00 and after that changed id=1 (YR), id=2(PM), id=3(YR). Model gives AP(YR):0.002, AP(PM):0.79, AP(LR):0.89.
Is this is taking first class as dummy class or there is another reason for that. Please help me out this.
Following are the changes i performed in the .config file to get the average precision.
eval_config: {
metrics_set: "pascal_voc_detection_metrics"
use_moving_averages: false
batch_size: 1;
num_visualizations: 20
max_num_boxes_to_visualize: 10
visualize_groundtruth_boxes: true
eval_interval_secs: 30
}

NodeJS Sharp Provides Incorrect Quality For JPEG Images

I am working with sharp node package to calculate the quality of an image.
I couldn't find any API in the package which would calculate the quality.
So I came up with an implementation that follows following steps -
Accept incoming image buffer & create sharp instance
Create another instance from this instance by setting quality to 100
Compare the size of original sharp instance and new sharp instance
If there is a difference in the size, update the quality and execute step 2 with updated quality
Return the quality once comparison in step 4 gives smallest difference
I tested this approach by using an image of known quality i.e. 50 (confirmed)
EDIT - I generated images with different quality values using Photoshop
However, the above logic returns the quality as 82 (expected is something close to 50)
Problem
So the problem is I am not able to figure out the quality of image.
It is fine if the above logic returns a closest value such as 49 or 51.
However the result is totally different.
Results
As per this logic, I get following results for a given quality -
Actual Quality 50 - Result 82
Actual Quality 60 - Result 90
Actual Quality 70 - Result 90
Actual Quality 80 - Result 94
Actual Quality 90 - Result 98
Actual Quality 100 - Result 98
Code
The following code snippet is used to calculate the quality.
I do understand that it is needs improvements for precise results.
But it should at least provide close values.
async function getJpegQuality(image: sharp.Sharp, min_quality: number, max_quality: number): Promise<number> {
if (Math.abs(max_quality - min_quality) <= 1) {
return max_quality;
}
const updated_image: sharp.Sharp = sharp(await image.jpeg({ quality: max_quality }).toBuffer());
const [metadata, updated_metadata]: sharp.Metadata[] = await Promise.all([image.metadata(), updated_image.metadata()]);
// update quality as per size comparison
if (metadata.size > updated_metadata.size) {
const temp_max = Math.round(max_quality);
max_quality = Math.round((max_quality * 2) - min_quality);
min_quality = Math.round(temp_max);
} else {
max_quality = Math.round((min_quality + max_quality) / 2);
min_quality = Math.round((min_quality + max_quality) / 2);
}
// recursion
return await getJpegQuality(image, min_quality, max_quality);
}
Usage
const image: sharp.Sharp = sharp(file.originalImage.buffer);
const quality = await getJpegQuality(image, 1, 100);
console.log(quality);
Thanks!

Find clusters by edge definition in arangodb

Does arangodb provide a utility to list clusters for a given edge definition?
E.g. Given the graph:
Tyrion ----sibling---> Cercei ---sibling---> Jamie
Bran ---sibling--> Arya ---sibling--> Jon
I'd want something like the following:
my_graph._getClusters({edge: "sibling"}) -> [ [Tyrion, Cercei, Jamie], [Bran, Arya, Jon] ]
Provided you have a graph named siblings, then the following query will find all paths in the graph that are connected by edges with type sibling and that have a (path) length of 3. This should match the example data you provided:
LET options = {
followEdges: [
{ type: 'sibling' }
]
}
FOR i IN GRAPH_TRAVERSAL('sibling', { }, "outbound", options)
FILTER LENGTH(i) == 3
RETURN i[*].vertex._key
Omitting or adjusting the FILTER will also find longer or shorter paths in the graph.

Resources