Push aggregated metrics to the Prometheus Pushgateway from clustered Node JS - node.js

I've ran my node apps in a cluster with Prometheus client implemented. And I've mutated the metrics (for testing purpose) so I've got these result:
# HELP otp_generator_generate_failures A counter counts the number of generate OTP fails.
# TYPE otp_generator_generate_failures counter
otp_generator_generate_failures{priority="high"} 794
# HELP otp_generator_verify_failures A counter counts the number of verify OTP fails.
# TYPE otp_generator_verify_failures counter
otp_generator_verify_failures{priority="high"} 802
# HELP smsblast_failures A counter counts the number of SMS delivery fails.
# TYPE smsblast_failures counter
smsblast_failures{priority="high"} 831
# HELP send_success_calls A counter counts the number of send API success calls.
# TYPE send_success_calls counter
send_success_calls{priority="low"} 847
# HELP send_failure_calls A counter counts the number of send API failure calls.
# TYPE send_failure_calls counter
send_failure_calls{priority="high"} 884
# HELP verify_success_calls A counter counts the number of verify API success calls.
# TYPE verify_success_calls counter
verify_success_calls{priority="low"} 839
# HELP verify_failure_calls A counter counts the number of verify API failure calls.
# TYPE verify_failure_calls counter
verify_failure_calls{priority="high"} 840
FYI: I have adopted the given example from: https://github.com/siimon/prom-client/blob/master/example/cluster.js
Then, I pushed the metrics and everything work fine at the process. But, when I checked the gateway portal, it gives different result from those metrics above.
The one that becomes my opinion is the pushed metrics came from single instance instead of aggregated metrics from few instances running. Isn't it?
So, has anyone ever solved this problem?
Oh, this is my push code:
const { Pushgateway, register } = require('prom-client')
const promPg = new Pushgateway(
config.get('prometheus.pushgateway'),
{
timeout: 5000
},
register
)
promPg.push({ jobName: 'sms-otp-middleware' }, (err, resp, body) => {
console.log(`Error: ${err}`)
console.log(`Body: ${body}`)
console.log(`Response status: ${resp.statusCode}`)
res.status(200).send('metrics_pushed')
})

Related

Lost Clients PBI

I am trying to get the number of lost clients per month. The code I'm using for the measure is set forth next:
`LostClientsRunningTotal =
VAR currdate = MAX('Date'[Date])
VAR turnoverinperiod=[Turnover]
VAR clients=
ADDCOLUMNS(
Client,
"Turnover Until Now",
CALCULATE([Turnover],
DATESINPERIOD(
'Date'[Date],
currdate,
-1,
MONTH)),
"Running Total Turnover",
[RunningTotalTurnover]
)
VAR lostclients=
FILTER(clients,
[Running Total Turnover]>0 &&
[Turnover Until Now]=0)
RETURN
IF(turnoverinperiod>0,
COUNTROWS(lostclients))`
The problem is that I'm getting the running total and the result it returns is the following:
enter image description here
What I need is the lost clients per month so I tried to use the dateadd function to get the lost clients of the previous month and then subtract the current.
The desired result would be, for Nov-22 for instance, 629 (December running total) - 544 (November running total) = 85.
For some reason the **dateadd **function is not returning the desired result and I can't make head or tails of it.
Can you tell me how should I approach this issue please? Thank you in advance.

Azure FaceAPI limits iteration to 20 items

I have a list of image urls from which I use MS Azure faceAPI to extract some features from the photos. The problem is that whenever I iterate more than 20 urls, it seems not to work on any url after the 20th one. There is no error shown. However, when I manually changed the range to iterate the next 20 urls, it worked.
Side note, on free version, MS Azure Face allows only 20 requests/minute; however, even when I let time sleep up to 60s per 10 requests, the problem still persists.
FYI, I have 360,000 urls in total, and sofar I have made only about 1000 requests.
Can anyone help tell me why this happens and how to solve this? Thank you so much!
# My codes
i = 0
for post in list_post[800:1000]:
i += 1
try:
image_url = post['photo_url']
headers = {'Ocp-Apim-Subscription-Key': KEY}
params = {
'returnFaceId': 'true',
'returnFaceLandmarks': 'false',
'returnFaceAttributes': 'age,gender,headPose,smile,facialHair,glasses,emotion,hair,makeup,occlusion,accessories,blur,exposure,noise',
}
response = requests.post(face_api_url, params=params, headers=headers, json={"url": image_url})
post['face_feature'] = response.json()[0]
except (KeyError, IndexError):
continue
if i % 10 == 0:
time.sleep(60)
The free version has a max of 30 000 request per month, your 356 000 faces will therefore take a year to run.
The standard version costs USD 1 per 1000 requests, giving a total cost of USD 360. This option supports 10 transactions per second.
https://azure.microsoft.com/en-au/pricing/details/cognitive-services/face-api/

How to Mimic nRF Connect (for Android) Actions to Pygatt Script?

I'm using nRF Connect for Android to test a BLE peripheral. The peripheral is a BSX Insight residual muscle oxygen monitor whose software application is no longer functional or supported by the manufacturer. Thus, my only option to use my device (BSX) is to write my own control software. I've written a Python 3.7 script that I run within a tkinter routine on my 64-bit Win 10 laptop. Also, I'm using the Pygatt library and a BLED112 BT dongle.
I can connect to the peripheral, read and write values just fine to characteristics, but I'm sure that the "conversion" from the process used in nRF Connect and to my script is incomplete and inefficient. So the first thing I'd like to confirm is that the correct respective functions from Pygatt are used. Once I'm using the correct functions from Pygatt, then I can compare respective outputs for the two data (characteristic values) streams that I want to capture and store.
The basic process in nRF Connect:
1. scan
2. select/connect the BSX Insight
3. expose the service and characteristics of interest
4. enable CCCDs
5. write the "start data" values (04-02)
These are the process command results from the nRF Connect log file. Starting with number four:
4.
D 09:04:54.491 gatt.setCharacteristicNotification(00002a37-0000-1000-8000-00805f9b34fb, true) 11
D 09:04:54.496 gatt.setCharacteristicNotification(2e4ee00b-d9f0-5490-ff4b-d17374c433ef, true) 20x
D 09:04:54.499 gatt.setCharacteristicNotification(2e4ee00d-d9f0-5490-ff4b-d17374c433ef, true) 25x
D 09:04:54.516 gatt.setCharacteristicNotification(2e4ee00e-d9f0-5490-ff4b-d17374c433ef, true) 32x
D 09:04:54.519 gatt.setCharacteristicNotification(00002a63-0000-1000-8000-00805f9b34fb, true) 36
D 09:04:54.523 gatt.setCharacteristicNotification(00002a53-0000-1000-8000-00805f9b34fb, true) 40
The above resulted from using the nRF command "Enable CCCDs." Basically every characteristic that could be enabled was enabled which is fine. The 'x' are the three that I need enabled. The others are extra. Note, I've annotated the respective handles for these UUIDs on the end of the line.
V 09:05:39.211 Writing command to characteristic 2e4ee00a-d9f0-5490-ff4b-d17374c433ef
D 09:05:39.211 gatt.writeCharacteristic(2e4ee00a-d9f0-5490-ff4b-d17374c433ef, value=0x0402)
I 09:05:39.214 Data written to 2e4ee00a-d9f0-5490-ff4b-d17374c433ef, value: (0x) 04-02
A 09:05:39.214 "(0x) 04-02" sent
Number five is where I write 0402 to the UUID above. This action sends the data/value streams from:
2e4ee00d-d9f0-5490-ff4b-d17374c433ef, with a descriptor handle 26
2e4ee00e-d9f0-5490-ff4b-d17374c433ef, with a descriptor handle 33
Once I've done the basic steps above in nRF Connect, the two characteristic value streams become active, and I can immediately see the converted values in my Garmin Edge 810 head unit.
So attempting to duplicate the same process within my tkinter snippet:
# this function fires from the 'On' button click event
def powerON():
powerON_buttonevent = 1
print(f"\tpowerON_buttonevent OK {powerON_buttonevent}")
# Connect to the BSX Insight
try:
adapter = pygatt.BGAPIBackend() # serial_port='COM3'
adapter.start()
device = adapter.connect('0C:EF:AF:81:0B:76', address_type=pygatt.BLEAddressType.public)
print(f"\tConnected: {device}")
except:
print(f"BSX Insight connection failure")
finally:
# adapter.stop()
pass
# Enable only these CCCDs
try:
device.char_write_handle(21, bytearray([0x01, 0x00]), wait_for_response=True)
device.char_write_handle(26, bytearray([0x01, 0x00]), wait_for_response=True)
device.char_write_handle(33, bytearray([0x01, 0x00]), wait_for_response=True)
print(f"\te00b DESC: {device.char_read_long_handle(21)}") # notifiy e00b
print(f"\te00d DESC: {device.char_read_long_handle(26)}") # notify e00d SmO2
print(f"\te00e DESC: {device.char_read_long_handle(33)}") # notify e00e tHb
# Here's where I tested functions from Pygatt...
# print(f"\t{device.get_handle('UUID_here')}") # function works
# print(f"\tvalue_handle/characteristic_config_handle: {device._notification_handles('UUID_here')}") # function works
# print(f"{device.char_read('UUID_here')}")
# print(f"{device.char_read_long_handle(handle_here)}") # function works
except:
print(f"CCCD write value failure")
finally:
# adapter.stop()
pass
# Enable the data streams
try:
device.char_write('2e4ee00a-d9f0-5490-ff4b-d17374c433ef', bytearray([0x04, 0x02]), wait_for_response=True) # function works
print(f"\te00a Power ON: {device.char_read('2e4ee00e-d9f0-5490-ff4b-d17374c433ef')}")
except:
print(f"e00a Power ON write failure")
finally:
# adapter.stop()
pass
# Subscribe to SmO2 and tHb UUIDs
try:
def data_handler(handle, value):
"""
Indication and notification come asynchronously, we use this function to
handle them either one at the time as they come.
:param handle:
:param value:
:return:
"""
if handle == 25:
print(f"\tSmO2: {value} Handle: {handle}")
elif handle == 32:
print(f"\ttHb: {value} Handle: {handle}")
else:
print(f"\tvalue: {value}, handle: {handle}")
device.subscribe("2e4ee00d-d9f0-5490-ff4b-d17374c433ef", callback=data_handler, indication=False, wait_for_response=True)
device.subscribe("2e4ee00e-d9f0-5490-ff4b-d17374c433ef", callback=data_handler, indication=False, wait_for_response=True)
print(f"\tSuccess 2e4ee00d: {device.char_read('2e4ee00d-d9f0-5490-ff4b-d17374c433ef')}")
print(f"\tSuccess 2e4ee00e: {device.char_read('2e4ee00e-d9f0-5490-ff4b-d17374c433ef')}")
# this statement causes a run-on continuity when enabled
# while True:
# sleep(1)
except:
print("e00d/e00e subscribe failure")
finally:
adapter.stop()
# pass
Problem: in the output window of my Atom editor, the two data streams start as expected. For example:
I 09:05:39.983 Notification received from 2e4ee00d-d9f0-5490-ff4b-d17374c433ef, value: (0x) 00- 00-00-00-C0-FF-00-00-C0-FF-84-65-B4-3B-9E-AB-83-3C-FF-03
and...
I 09:05:39.984 Notification received from 2e4ee00e-d9f0-5490-ff4b-d17374c433ef, value: (0x) 1C-00-00-FF-03-FF-0F-63-00-00-00-00-00-00-16-32-00-00-00-00
I'll see about seven to ten lines of data before the "stream" stops. There'll be a gap of about 20 seconds, and then a big dump of values. This is different from the output from nRF Connect, which is immediate and continous.
I have the logs from nRF Connect and Python...but I'm not sure which log entry points to the cause of the stop. Might this issue be related to the Peripheral Preferred Connection Parameters? The nRF Connect property read shows:
ConnectionInterval = 50ms~100ms
SlaveLatency = 1
SuperTimeoutMonitor = 200
The Python log entry shows this:
INFO:pygatt.backends.bgapi.bgapi:Connection status: handle=0x0, flags=5, address=0xb'760b81afef0c', connection interval=75.000000ms, timeout=1000, latency=0 intervals, bonding=0xff
Thoughts anyone? (And truly, thanks in advance.)
I've answered my questions. I now have to solve the new problem of why my tKinter dialog is "not responding" as a separate issue.
Thanks All
Edit 3/31/2020: I re-wrote the script using pyQt and now have a functional app.

Redis Node - Querying a list of 250k items of ~15 bytes takes at least 10 seconds

I'd like to query a whole list of 250k items of ~15 bytes each.
Each item (some coordinates) is a 15 bytes string like that xxxxxx_xxxxxx_xxxxxx.
I'm storing them using this function :
function setLocation({id, lat, lng}) {
const str = `${id}_${lat}_${lng}`
client.lpush('locations', str, (err, status) => {
console.log('pushed:', status)
})
}
Using nodejs, doing a lrange('locations', 0, -1) takes between 10 seconds and 15 seconds.
Slowlog redis lab:
I tried to use sets, same results.
According to this post
This shouldn't take more than a few milliseconds.
What am I doing wrong here ?
Update:
I'm using an instance on Redis lab

distributed Tensorflow tracking timestamps for synchronization operations

I am new to TensorFlow. Currently, I am trying to evaluate the performance of distributed TensorFlow using Inception model provided by TensorFlow team.
The thing I want is to generate timestamps for some critical operations in a Parameter Server - Worker architecture, so I can measure the bottleneck (the network lag due to parameter transfer/synchronization or parameter computation cost) on replicas for one iteration (batch).
I came up with the idea of adding a customized dummy py_func operator designated of printing timestamps inside inception_distributed_train.py, with some control dependencies. Here are some pieces of code that I added:
def timer(s):
print ("-------- thread ID ", threading.current_thread().ident, ", ---- Process ID ----- ", getpid(), " ~~~~~~~~~~~~~~~ ", s, datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S.%f'))
return Falsedf
dummy1 = tf.py_func(timer, ["got gradients, before dequeues token "], tf.bool)
dummy2 = tf.py_func(timer, ["finished dequeueing the token "], tf.bool)
I modified
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
train_op = tf.identity(total_loss, name='train_op')
into
with tf.control_dependencies([dummy1]):
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
with tf.control_dependencies([dummy2]):
train_op = tf.identity(total_loss, name='train_op')
hoping to print the timestamps before evaluating the apply_gradient_op and after finishing evaluating the apply_gradient_op by enforcing node dependencies.
I did similar things inside sync_replicas_optimizer.apply_gradients, by adding two dummy print nodes before and after update_op:
dummy1 = py_func(timer, ["---------- before update_op "], tf_bool)
dummy2 = py_func(timer, ["---------- finished update_op "], tf_bool)
# sync_op will be assigned to the same device as the global step.
with ops.device(global_step.device), ops.name_scope(""):
with ops.control_dependencies([dummy1]):
update_op = self._opt.apply_gradients(aggregated_grads_and_vars, global_step)
# Clear all the gradients queues in case there are stale gradients.
clear_queue_ops = []
with ops.control_dependencies([update_op]):
with ops.control_dependencies([dummy2]):
for queue, dev in self._one_element_queue_list:
with ops.device(dev):
stale_grads = queue.dequeue_many(queue.size())
clear_queue_ops.append(stale_grads)
I understand that apply_gradient_op is the train_op returned by sync_replicas_optimizer.apply_gradient. And apply_gradient_op is the op to dequeue a token (global_step) from sync_queue managed by the chief worker using chief_queue_runner, so that replica can exit current batch and start a new batch.
In theory, apply_gradient_op should take some time as replica has to wait before it can dequeue the token (global_step) from sync_queue, but the print result for one replica I got, such as the time differences for executing apply_gradient_op is pretty short (~1/1000 sec) and sometimes the print output is indeterministic (especially for chief worker). Here is a snippet of the output on the workers (I am running 2 workers and 1 PS):
chief worker (worker 0) output
worker 1 output
My questions are:
1) I need to record the time TensorFlow takes to execute an op (such as train_op, apply_gradients_op, compute_gradients_op, etc.)
2) Is this the right direction to go, given my ultimate goal is to record the elapsed time for executing certain operations (such as the difference between the time a replica finishes computing gradients and the time it gets the global_step from sync_token)?
3) If this is not the way it should go, please guide me with some insights about the possible ways I could achieve my ultimate goal.
Thank you so much for reading my long long posts. as I have spent weeks working on this!

Resources