How to calculate CPU Utilization in Prometheus? - linux

I'm new to Prometheus and I got confused about CPU usage metrics.
Here are my two cents about CPU usage. (Please correct me if I'm wrong.)
container_cpu_usage_seconds_total = container_cpu_user_seconds_total + container_cpu_system_seconds_total
"cores" = container_spec_cpu_quota / container_spec_cpu_period
This is the pod used CPU time in 1 second:
rate(container_cpu_usage_seconds_total{image!=""}[1m]) by (pod, namespace)
This is the pod allowed CPU time in 1 second:
(sum(container_spec_cpu_quota{image!=""}/100000) by (pod, namespace))
So the utilizations is like:
rate(container_cpu_usage_seconds_total{image!=""}[1m]) by (pod, namespace) / (sum(container_spec_cpu_quota{image!=""}/100000) by (pod, namespace)) *100
But what about those pods don't have spec.cpu.limits? I presume:
rate(container_cpu_usage_seconds_total{image!=""}[1m]) by (pod, namespace) / CORES_ON_NODE *100

Related

How to Fully Utilize CPU cores for skopt.forest_minimize

So I have the following code for running skopt.forest_minimize(), but the biggest challenge I am facing right now is that it is taking upwards of days to finish running even just 2 iterations.
SPACE = [skopt.space.Integer(4, max_neighbour, name='n_neighbors', prior='log-uniform'),
skopt.space.Integer(6, 10, name='nr_cubes', prior='uniform'),
skopt.space.Categorical(overlap_cat, name='overlap_perc')]
#skopt.utils.use_named_args(SPACE)
def objective(**params):
score, scomp = tune_clustering(X_cont=X_cont, df=df, pl_brewer=pl_brewer, **params)
if score == 0:
print('saving new scomp')
with open(scomp_file, 'w') as filehandle:
json.dump(scomp, filehandle, default = json_default)
return score
results = skopt.forest_minimize(objective, SPACE, n_calls=1, n_initial_points=1, callback=[scoring])
Is it possible to optimize the following code so that it can compute faster? I noticed that it was barely making use of my CPU, highest CPU utilized is about 30% (it's i7 9th gen with
8 cores).
Also a question while I'm at it, is it possible to utilize a GPU for these computational tasks? I have a 3050 that I can use.

How to modify the alert manager's threshold value

I use a node exporter program to monitor the CPU's threshold and receive notifications.
The example below is a rule file that sets conditions when it is more than 80%.
alert: HostHighCpuLoad
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
for: 0m
labels:
severity: warning
annotations:
summary: Host high CPU load (instance {{ $labels.instance }})
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
I need to receive a threshold from the web UI and change the threshold set in the rule file to 70%.
Can I convert the threshold into a variable?
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > {THRESHOLD_VALUE}
Or I wonder if I should make a script that modifies the 80% value to 70%.
Please give me advice.

Ceph-rgw Service stop automatically after installation

in my local cluster (4 Raspberry PIs) i try to configure a rgw gateway. Unfortunately the services disappears automatically after 2 minutes.
[ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host OSD1 and default port 7480
cephuser#admin:~/mycluster $ ceph -s
cluster:
id: 745d44c2-86dd-4b2f-9c9c-ab50160ea353
health: HEALTH_WARN
too few PGs per OSD (24 < min 30)
services:
mon: 1 daemons, quorum admin
mgr: admin(active)
osd: 4 osds: 4 up, 4 in
rgw: 1 daemon active
data:
pools: 4 pools, 32 pgs
objects: 80 objects, 1.09KiB
usage: 4.01GiB used, 93.6GiB / 97.6GiB avail
pgs: 32 active+clean
io:
client: 5.83KiB/s rd, 0B/s wr, 7op/s rd, 1op/s wr
After one minute the service(rgw: 1 daemon active) is no longer visible:
cephuser#admin:~/mycluster $ ceph -s
cluster:
id: 745d44c2-86dd-4b2f-9c9c-ab50160ea353
health: HEALTH_WARN
too few PGs per OSD (24 < min 30)
services:
mon: 1 daemons, quorum admin
mgr: admin(active)
osd: 4 osds: 4 up, 4 in
data:
pools: 4 pools, 32 pgs
objects: 80 objects, 1.09KiB
usage: 4.01GiB used, 93.6GiB / 97.6GiB avail
pgs: 32 active+clean
Many thanks for the help
Solution:
On the gateway node, open the Ceph configuration file in the /etc/ceph/ directory.
Find an RGW client section similar to the example:
[client.rgw.gateway-node1]
host = gateway-node1
keyring = /var/lib/ceph/radosgw/ceph-rgw.gateway-node1/keyring
log file = /var/log/ceph/ceph-rgw-gateway-node1.log
rgw frontends = civetweb port=192.168.178.50:8080 num_threads=100
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index

Jvm is using more memory than Native Memory Tracking says, how do I locate where the extra meeory goes? [duplicate]

This question already has answers here:
Java process memory usage (jcmd vs pmap)
(3 answers)
Where do these java native memory allocated from?
(1 answer)
Closed 5 years ago.
I'm running jetty on my web server. My current jvm setting: -Xmx4g -Xms2g, however jetty uses a lot more memory and I don't know where these extra memory goes.
Jetty uses 4.547g memory in total:
heap usage shows heap memory usage at 2.5g:
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 483196928 (460.8125MB)
used = 277626712 (264.76546478271484MB)
free = 205570216 (196.04703521728516MB)
57.45622455612963% used
Eden Space:
capacity = 429522944 (409.625MB)
used = 251267840 (239.627685546875MB)
free = 178255104 (169.997314453125MB)
58.4992824038755% used
From Space:
capacity = 53673984 (51.1875MB)
used = 26358872 (25.137779235839844MB)
free = 27315112 (26.049720764160156MB)
49.109214624351345% used
To Space:
capacity = 53673984 (51.1875MB)
used = 0 (0.0MB)
free = 53673984 (51.1875MB)
0.0% used
concurrent mark-sweep generation:
capacity = 2166849536 (2066.46875MB)
used = 1317710872 (1256.6670150756836MB)
free = 849138664 (809.8017349243164MB)
60.81229222922842% used
Still 2g missing, then I use Native Memory Tracking, it shows:
Total: reserved=5986478KB, committed=3259678KB
- Java Heap (reserved=4194304KB, committed=2640352KB)
(mmap: reserved=4194304KB, committed=2640352KB)
- Class (reserved=1159154KB, committed=122778KB)
(classes #18260)
(malloc=4082KB #62204)
(mmap: reserved=1155072KB, committed=118696KB)
- Thread (reserved=145568KB, committed=145568KB)
(thread #141)
(stack: reserved=143920KB, committed=143920KB)
(malloc=461KB #707)
(arena=1187KB #280)
- Code (reserved=275048KB, committed=143620KB)
(malloc=25448KB #30875)
(mmap: reserved=249600KB, committed=118172KB)
- GC (reserved=25836KB, committed=20792KB)
(malloc=11492KB #1615)
(mmap: reserved=14344KB, committed=9300KB)
- Compiler (reserved=583KB, committed=583KB)
(malloc=453KB #769)
(arena=131KB #3)
- Internal (reserved=76399KB, committed=76399KB)
(malloc=76367KB #25878)
(mmap: reserved=32KB, committed=32KB)
- Symbol (reserved=21603KB, committed=21603KB)
(malloc=17791KB #201952)
(arena=3812KB #1)
- Native Memory Tracking (reserved=5096KB, committed=5096KB)
(malloc=22KB #261)
(tracking overhead=5074KB)
- Arena Chunk (reserved=190KB, committed=190KB)
(malloc=190KB)
- Unknown (reserved=82696KB, committed=82696KB)
(mmap: reserved=82696KB, committed=82696KB)
Still does't explain where memory goes, can someone shed light on how to locate the missing memory?

distributed Tensorflow tracking timestamps for synchronization operations

I am new to TensorFlow. Currently, I am trying to evaluate the performance of distributed TensorFlow using Inception model provided by TensorFlow team.
The thing I want is to generate timestamps for some critical operations in a Parameter Server - Worker architecture, so I can measure the bottleneck (the network lag due to parameter transfer/synchronization or parameter computation cost) on replicas for one iteration (batch).
I came up with the idea of adding a customized dummy py_func operator designated of printing timestamps inside inception_distributed_train.py, with some control dependencies. Here are some pieces of code that I added:
def timer(s):
print ("-------- thread ID ", threading.current_thread().ident, ", ---- Process ID ----- ", getpid(), " ~~~~~~~~~~~~~~~ ", s, datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S.%f'))
return Falsedf
dummy1 = tf.py_func(timer, ["got gradients, before dequeues token "], tf.bool)
dummy2 = tf.py_func(timer, ["finished dequeueing the token "], tf.bool)
I modified
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
train_op = tf.identity(total_loss, name='train_op')
into
with tf.control_dependencies([dummy1]):
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
with tf.control_dependencies([dummy2]):
train_op = tf.identity(total_loss, name='train_op')
hoping to print the timestamps before evaluating the apply_gradient_op and after finishing evaluating the apply_gradient_op by enforcing node dependencies.
I did similar things inside sync_replicas_optimizer.apply_gradients, by adding two dummy print nodes before and after update_op:
dummy1 = py_func(timer, ["---------- before update_op "], tf_bool)
dummy2 = py_func(timer, ["---------- finished update_op "], tf_bool)
# sync_op will be assigned to the same device as the global step.
with ops.device(global_step.device), ops.name_scope(""):
with ops.control_dependencies([dummy1]):
update_op = self._opt.apply_gradients(aggregated_grads_and_vars, global_step)
# Clear all the gradients queues in case there are stale gradients.
clear_queue_ops = []
with ops.control_dependencies([update_op]):
with ops.control_dependencies([dummy2]):
for queue, dev in self._one_element_queue_list:
with ops.device(dev):
stale_grads = queue.dequeue_many(queue.size())
clear_queue_ops.append(stale_grads)
I understand that apply_gradient_op is the train_op returned by sync_replicas_optimizer.apply_gradient. And apply_gradient_op is the op to dequeue a token (global_step) from sync_queue managed by the chief worker using chief_queue_runner, so that replica can exit current batch and start a new batch.
In theory, apply_gradient_op should take some time as replica has to wait before it can dequeue the token (global_step) from sync_queue, but the print result for one replica I got, such as the time differences for executing apply_gradient_op is pretty short (~1/1000 sec) and sometimes the print output is indeterministic (especially for chief worker). Here is a snippet of the output on the workers (I am running 2 workers and 1 PS):
chief worker (worker 0) output
worker 1 output
My questions are:
1) I need to record the time TensorFlow takes to execute an op (such as train_op, apply_gradients_op, compute_gradients_op, etc.)
2) Is this the right direction to go, given my ultimate goal is to record the elapsed time for executing certain operations (such as the difference between the time a replica finishes computing gradients and the time it gets the global_step from sync_token)?
3) If this is not the way it should go, please guide me with some insights about the possible ways I could achieve my ultimate goal.
Thank you so much for reading my long long posts. as I have spent weeks working on this!

Resources