I have one Prometheus for all types of environments, i.e. DEV, PROD, STAGING ...
it's useful to see whole big picture in one place but i would like to have something like tabs or color mark for different types of environments
I have different titles for different types of environments, i.e.
DEV | Hight CPU usage
PROD| Hight CPU usage
....
i would like to have different tabs or nested groups something like this:
DEV
|_ DEV | Hight CPU usage
PROD
|_ PROD| Hight CPU usage
Related
Let's say I'm building something like AWS Lambda / Cloudflare Workers, where I allow users to submit arbitrary binaries, and then I run them wrapped in sandboxes (e.g. Docker containers / gVisor / etc), packed multitenant-ly onto a fleet of machines.
Ignore the problem of ensuring the sandboxing is effective for now; assume that problem is solved.
Each individual execution of one of these worker-processes is potentially a very heavy workload (think SQL OLAP reports.) A worker-process may spend tons of CPU, memory, IOPS, etc. We want to allow them to do this. We don't want to limit users to a small fixed slice of a machine, as traditional cgroups limits enable. Part of our service's value-proposition is low latency (rather than high throughput) in answering heavy queries, and that means allowing each query to essentially monopolize our infrastructure as much as it needs, with as much parallelization as it can manage, to get done as quickly as possible.
We want to charge users in credits for the resources they use, according to some formula that combines the CPU-seconds, memory-GB-seconds, IO operations, etc. This will disincentivize users from submitting "sloppy" worker-processes (because a process that costs us more to run, costs them more to submit.) It will also prevent users from DoSing us with ultra-heavy workloads, without first buying enough credits to pay the ensuing autoscaling bills in advance :)
We would also like to enable users to set, for each worker-process launch, a limit on the total credit spend during execution — where if it spends too many CPU-seconds, or allocates too much memory for too long, or does too many IO operations, or any combination of these that adds up to "spending too many credits", then the worker-process gets hard-killed by the host machine. (And we then bill their account for exactly as many credits as the resource-limit they specified at launch, despite not successfully completing the job.) This would protect users (and us) from the monetary consequences of launching faulty/leaky workers; and would also enable us to predict an upper limit on how heavy a workload could be before running it, and autoscale accordingly.
This second requirement implies that we can't do the credit-spend accounting after the fact, async, using observed per-cgroup metrics fed into some time-series server; but instead, we need each worker hypervisor to do the credit-spend accounting as the worker runs, in order to stop it as close to the time it overruns its budget as possible.
Basically, this is, to a tee, a description of the "gas" accounting system in the Ethereum Virtual Machine: the EVM does credit-spend accounting based on a formula that combines resource-costs for each op, and hard-kills any "worker process" (smart contract) that goes over its allocated credit (gas) limit for this launch (tx and/or CALL op) of the worker.
However, the "credit-spend accounting" in the EVM is enabled by instrumenting the VM that executes code such that each VM ISA op also updates a gas-left-to-spend VM register, and aborts VM execution if the gas-left-to-spend ever goes negative. Running native code on bare-metal/regular IaaS VMs, we don't have the ability to instrument our CPU like that. (And doing so through static binary translation would probably introduce far too much overhead.) So doing this the way the EVM does it, is not really an option.
I know Linux does CPU accounting, memory accounting, etc. Is there a way, using some combination of cgroups + gVisor-alike syscall proxying, to approximate the function of the EVM's "tx gas limit", i.e. to enable processes to be hard-killed (instantly/within a few ms of) when they go over their credit limit?
I'm assuming there's no off-the-shelf solution for this (haven't been able to find one after much research.) But are the right CPU counters + kernel data structures + syscalls in place to be able to develop such a solution, and to have it be efficient/low-overhead?
I have written a program that tries to do some calculations. It can be run any number of cores.
Below it's a breakdown on how many calculations are performed if it's run on 1,2,3,4 cores (on a 4 logical processors laptop). The numbers in parenthesis show calculations per thread/core. My question is why the performance decreases very rapidly then number of threads/cores increases ? I dont expect the performance to double but it's significantly lower. I also observer the same issue when running 4 instance of the same program setup to run on one thread so i know that it's not an issue with the program it self.
The greatest improvement is going form 1 thread to 2 why is that ?
# Threads | calculations/sec
1 | 87000
2 | 129000 (65000,64000)
3 | 135000 (46000,45000,44000)
4 | 140000 (34000,34000,34000,32000)
One interesting thing is that exactly the same issue i can see on Google Cloud Platform in Compute Engine, when I run the program on 16 threads on 16 virtual cores platform the performance of each core drops down to like 8K states per second. However if I run 16 instances of the same program each instance does around 100K states /s. When I do the same test but on 4 cores on my home laptop i can still see the same drop of performance whether I run 4 separate .exe of one with 4 cores, I was expecting to see the same behavior on GCP but it's not the case, is it because of virtual cores ? do they behave differently.
Example code that reproduces this issue, just need to paste to your console application, but need to refresh the stats like 20 times for the performance to stabilize, not sure why it fluctuates so much. code example you can see that if you run the app with with 4 threads the you get significant performance impact compared to 4 instances with 1 thread each. I was hoping that that enabling gcServer will solver the problem but did not see any improvement
I have a number of different deployment situations for ArangoDB. One of which is on a users desktop machine or laptop.
I've read and implemented the instructions on how to run ArangoDB in Spartan Mode (very helpfull).
However, I need more. The desktop user may work with a number of different collections in the database and all of these stay loaded and consume a lot of virtual memory. This can cause some apps to behave differently if they detect they are running in a memory constrained environment.
So, I'm looking for a way to unload collections that haven't been accessed "recently" (some configurable amount of time).
Is there a (good) way to go about doing this?
For version 3.4, I added the following params to arangod.conf to start it in so called spartan mode.
More details can be found at their blog post
[javascript]
# number of V8 contexts available for JavaScript execution. use 0 to
# make arangod determine the number of contexts automatically.
v8-contexts = 1
[foxx]
# enable Foxx queues in the server
# Disable task scheduling - reduce CPU
queues = false
[wal]
# Reduce the number of historic WAL files which will reduce the memory usage when ArangoDB is in use.
historic-logfiles = 1
# Reduce the prepared WAL log files which are kept ready for future write operations
reserve-logfiles = 1
# In addition you can reduce the size of all WAL files to e.g. 8 MB by setting
logfile-size = 8388608
I wanna write a script that monitors one (selected) item component for the following operating systems:
Process Management
Management of main memory
Virtual memory management
Management of input / output
Network Management
and I came up with this idea how to show it:
process | main memory | virtual memory | input/output | network
% usage | % usage | % usage | ??? | data send/recived
I don't know how to show % usage of CPU, main memory, and virtual memory.
Also I don't know what to manage in input/output.
Systemtap is better suited to monitoring events rather than levels like some of those. Consider using something like PCP (http://oss.sgi.com/projects/pcp) for simple periodic level monitoring.
Windows Azure advertises three types of IO performance levels:
Extra Small : Low
Small: Moderate
Medium and above: High
So, if I have an IO bound application (rather than CPU or Memory bound) and need at least 6 CPUs to process my work load - will I get better IO performance with 12-15 Extra Smalls, 6 Smalls, or 3 Mediums?
I'm sure this varies based on applications - is there an easy way to go about testing this? Are there any numbers that give a better picture of how much of an IO performance increase you get as you move to large instance roles?
It seems like the IO performance for smaller roles could be equivalent to the larger ones, they are just the ones that get throttled down first if the overall load becomes too great. Does that sound right?
Windows Azure compute sizes offer approx. 100Mbps per core. Extra Small instances are much lower, at 5Mbps. See this blog post for more details. If you're IO-bound, the 6-Small setup is going to offer far greater bandwidth than 12 Extra-Smalls.
When you talk about processing your workload, are you working off a queue? If so, multiple worker roles, each being Small instance, could then each work with a 100Mbps pipe. You'd have to do some benchmarking to determine if 3 Mediums gives you enough of a performance boost to justify the larger VM size, knowing that when workload is down, your "idle" cost footprint per hour is now 2 cores (medium, $0.24) vs 1 (small, $0.12).
As I understand it, the amount of IO allowed per-core is constant and supposed to be dedicated. But I haven't been able to get formal confirmation of this. This likely is different for x-small instances which operatin in a shared mode and not dedicated like the other Windows Azure vm instances.
I'd imagine what you suspect is in fact true, that even being IO-bound varies by application. I think you could accomplish your goal of timing by using Timers and writing the output to a file on storage you could then retrieve. Do some math to figure out you can process X number of work units / hour by cramming as many through a small then a medium instance as possible. If your work unit size drastically fluctuates, you might have to do some averaging too. I would always prefer smaller instances if possible and just spin up more copies as you have need for more firepower.