Changing The base time unit displayed on prometheus endpoint - micrometer

Before switching to Micrometer, we were leveraging Prometheus in several of our Spring Boot applications. Prometheus showed all our timings in milliseconds. Now that we have switched to Micrometer, All the timer and #Timed metrics are all in seconds. I was able to modify the max metric by extending the PrometheusMeterRegistry and overriding getBaseTimeUnit the but the sum metric of the Timer does not respect the same override.

You are describing the very concerns covered by a Prometheus core committer.
I would recommend against trying to adjust it to milliseconds since it would be fighting the system. If you are using Grafana to display your dashboards, then it will handle the unit regardless of the underlying implementation.

Related

Constant Load pattern via JMeter

I am using JMeter to generate load for Azure Event Hub to do performance testing. I want to have constant load in Event Hub( at the time of message ingestion). I tried follwoing options.
Constant Throughput Timer
Number of active threads(users) -100 and ramp up time - 20 seconds.
I am not getting constant load in Event Hub. Getting too much spikes in Event hub. Please suggest a way to get constant load in Event hub via JMeter.
Regards,
Amit
JMeter is capable of creating a constant load pattern, just make sure to follow JMeter Best Practices and recommendations from 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure article, the essential points are:
Run JMeter in non-GUI mode
Ensure that JMeter has enough headroom to operate in terms of CPU, RAM, network and disk IO, etc. This can be done using JMeter PerfMon Plugin
It might also be the case your application and/or middleware configuration is not appropriate for high constant load, check out i.e. Concurrent, High Throughput Performance Testing with JMeter where the guy initially had load pattern like this:
and after tuning his application and JMeter he got to the following result:

How to get better performance with Azure ServiceBus Standard plan

I don't manage to get over 14 msg/second with the Azure ServiceBus Standard Plan. I'm running some benchmark tests with the Azure-Sample tool that I found in this question:
The test is done with a ServiceBus resource with a single Queue and all default configurations:
If I read this correctly, you've got the maximum concurrency of one (MaxInflightReceives) with 5 receivers (ReceiverCount). Increasing concurrency and enabling prefetch on the clients will increase the overall throughput. But,
Testing should be done within the same Azure data centre. If you're testing from a local machine, you're introducing a substantial latency that cannot be avoided.
The receive mode used is PeekLock. It is slower than ReceiveAndDelete. Not suggesting to switch, but this needs to be taken into consideration as you're trading throughput for safety by using PeekLock.
The standard tier has a cap on the number of operations per second. In addition to that, your namespace is deployed in a shared environment with entities scattered in various deployment containers. Performance will vary and cannot be guaranteed. If you want to have a guaranteed throughput, use Premium SKU.

Limit number of servers on Azure functions (consumption plan)

Is it possible to fix a cap on the number of servers on which the azure functions scale? I have a consumption plan and basically I would like to set a cap on the number of resources that azure functions can use.
The only solutions I found are:
set a cap for daily GB/sec threshold, after which the functions are stopped until the following day, which is definitely something I do not want because I need to use some functions for online tasks.
In the host.json, changing parameters for http.maxConcurrentRequests and http.maxOutstandingRequests, which will affect the number of concurrent functions running. Is this the thing should I look into? isn't this setting per-server level? my fear is that this won't end up capping resources, but insted will let Azure create just more and more servers, in order to comply with request loads
You can use the WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting: The maximum number of instances that the function app can scale out to. Default is no limit.
Note: This setting is a preview feature - and only reliable if set to a value <= 5
Ref: https://learn.microsoft.com/en-us/azure/azure-functions/functions-app-settings#websitemaxdynamicapplicationscaleout
One thing to note is that timer-triggered functions are automatically singletons. In my case that was sufficient, as I can wake-up such function every minute and process specific amount of data. Even if the function takes longer than expected, there's no risk a second one will be run concurrently.
More info: https://stackoverflow.com/a/53919048/4619705

Resource metrics (memory, CPU etc) should I be looking at for auto scaling purpose

What cloud resource metrics (memory, CPU, disk io etc) should I be looking at for auto scaling purpose? FYI, The metrics is strictly used for auto scaling purpose. I have kubernetes architecture and prometheus (for monitoring and scraping metrics)
I have a kubernetes cluster set up in local as well as cloud. I am using Prometheus tool(https://prometheus.io/) set up for scraping system level metrics. Now, I want to have Auto-scaling feature in my system. I have been using prometheus for saving metrics like this. "Memory and CPU used, allocated, total for the last 24 hours." I want to save more metrics. This is the list of metrics that I am getting from Prometheus: http://demo.robustperception.io:9100/metrics I can't decide what more metrics I am going to need for auto scaling purpose. Can anyone suggest some metrics for this purpose? TIA.
Normally, the common bottleneck is the memory hierarchy rather than CPU usage. The more requests your application receives, the more likely to have an out-of-memory error. What is more, if your application is not HPC, it is not likely that it needs to be so CPU-intensive.
In the memory hierarchy, Disk I/O can dramatically affect performance. You would need to check how Disk I/O intensive your application is. In this sense, changing the disk hardware could be a better solution rather than spinning up more instances. However, that depends on the application.
In any case, it would be interesting if you could measure the average response time, and then take decisions accordingly.

Dynamic Service Creation to Distribute Load

Background
The problem we're facing is that we are doing video encoding and want to distribute the load to multiple nodes in the cluster.
We would like to constrain the number of video encoding jobs on a particular node to some maximum value. We would also like to have small video encoding jobs sent to a certain grouping of nodes in the cluster, and long video encoding jobs sent to another grouping of nodes in the cluster.
The idea behind this is to help maintain fairness amongst clients by partitioning the large jobs into a separate pool of nodes. This helps ensure that the small video encoding jobs are not blocked / throttled by a single tenant running a long encoding job.
Using Service Fabric
We plan on using an ASF service for the video encoding. With this in mind we had an idea of dynamically creating a service for each job that comes in. Placement constraints could then be used to determine which pool of nodes a job would run in. Custom metrics based on memory usage, CPU usage ... could be used to limit the number of active jobs on a node.
With this method the node distributing the jobs would have to poll whether a new service could currently be created that satisfies the placement constraints and metrics.
Questions
What happens when a service can't be placed on a node? (Using CreateServiceAsync I assume?)
Will this polling be prohibitively expensive?
Our video encoding executable is packaged along with the service which is approximately 80MB. Will this make the spinning up of a new service take a long time? (Minutes vs seconds)
As an alternative to this we could use a reliable queue based system, where the large jobs pool pulls from one queue and the small jobs pool pulls from another queue. This seems like the simpler way, but I want to explore all options to make sure I'm not missing out on some of the features of Service Fabric. Is there another better way you would suggest?
I have no experience with placement constraints and dynamic services, so I can't speak to that.
The polling of the perf counters isn't terribly expensive, that being said it's not a free operation. A one second poll interval shouldn't cause any huge perf impact while still providing a decent degree of resolution.
The service packages get copied to each node at deployment time rather than when services get spun up, so it'll make the deployment a bit slower but not affect service creation.
You're going to want to put the job data in reliable collections any way you structure it, but the question is how. One idea I just had that might be worth considering is making the job processing service a partitioned service and base your partitioning strategy based off encoding job size and/or tenant so that large jobs from the same tenant get stuck in the same queue, and smaller jobs for others go elsewhere.
As an aside, one thing I've dealt with in the past is SF remoting limits the size of the messages sent and throws if its too big, so if your video files are being passed from service to service you're going to want to consider a paging strategy for inter service communication.

Resources