Sizing Azure Kubernetes Services (AKS) Cluster

Sizing Azure Kubernetes Services (AKS) Cluster - azure

I am trying to size my AKS clusters. What I understood and followed is the number of micro services and their replication copies would be primary parameters. Also the resource usage by each micro services and prediction of that usage increase during coming years also needs to be considered. But all these information seems totally scattered to reach a number for AKS sizing. Sizing I meant by how many nodes to be assigned? what could be the configuration of nodes, how many pods to be considered, how many IP numbers to be reserved based on number of pods etc..
Is there any standard matrix here or practical way of calculation to
compute AKS cluster sizing, based on any ones'experience?

no, pretty sure there is none (and how it could be)? just take your pod cpu\memory usage and sum that up, you'll get an expectation of the resources needed to run your stuff, add k8s services on top of that.
also, like Peter mentions in his comment, you can always scale your cluster, so such planning seems a bit unreasonable.

Actually, you may be interested in the Sizing of your nodes, things like Memory, CPU, Networking, Disk are directly linked with the node you chose, example:
Not all memory and CPU in a Node can be used to run Pods. The resources are partitioned in 4:
Memory and CPU reserved to the operating system and system daemons such as SSH
Memory and CPU reserved to the Kubelet and Kubernetes agents such as the CRI
Memory reserved for the hard eviction threshold
Memory and CPU available to Pods
CPU and Memory available for PODs
________________________________________________
Memory | % Available | CPU | % Available
________________________________________________
1 | 0.00% | 1 | 84.00%
2 | 32.50% | 2 | 90.00%
4 | 53.75% | 4 | 94.00%
8 | 66.88% | 8 | 96.50%
16 | 78.44% | 16 | 97.75%
64 | 90.11% | 32 | 98.38%
128 | 92.05% | 64 | 98.69%
192 | 93.54%
256 | 94.65%
Other things are Disk and Networking, example:
Node Size | Maximum Disks| Maximum Disk IOPS | Maximum Throughput (MBps)
_______________________________________________________________________________
Standard_DS2_v2 | 8 | 6,400 | 96
Standard_B2ms | 4 | 1,920 | 22.5

Related

dsbulk unload missing data

I'm using dsbulk 1.6.0 to unload data from cassandra 3.11.3.
Each unload results in wildly different counts of rows. Here are results from 3 invocations of unload, on the same cluster, connecting to the same cassandra host. The table being unloaded is only ever appended, data is never deleted, so a decrease in unloaded rows should not occur. There are 3 cassandra databases in the cluster, and a replication factor of 3, so all data should be present on the chosen host. Furthermore, these were executed in quick succession, the number of added rows would be in the hundreds (if there were any) not in the tens of thousands.
Run 1:
│ total | failed | rows/s | p50ms | p99ms | p999ms
│ 10,937 | 7 | 97 | 15,935.46 | 20,937.97 | 20,937.97
│ Operation UNLOAD_20201024-084213-097267 completed with 7 errors in
1 minute and 51 seconds.
Run 2:
│ total | failed | rows/s | p50ms | p99ms | p999ms
│ 60,558 | 3 | 266 | 12,551.34 | 21,609.05 | 21,609.05
│ Operation UNLOAD_20201025-084208-749105 completed with 3 errors in
3 minutes and 47 seconds.
Run 3:
│ total | failed | rows/s | p50ms | p99ms | p999ms
│ 45,404 | 4 | 211 | 16,664.92 | 30,870.08 | 30,870.08
│ Operation UNLOAD_20201026-084206-791305 completed with 4 errors in
3 minutes and 35 seconds.
It would appear that Run 1 is missing the majority of the data. Run 2 may be closer to complete and Run 3 is missing significant data.
I'm invoking unload as follows:
dsbulk unload -h $CASSANDRA_IP -k $KEYSPACE -t $CASSANDRA_TABLE > $DATA_FILE
I'm assuming this isn't expected behaviour for dsbulk. How do I configure it to reliably unload a complete table without errors?

Data could be missing from host if host wasn't reachable when the data was written, and hints weren't replayed, and you don't run repairs periodically. And because DSBulk reads by default with consistency level LOCAL_ONE, different hosts will provide different views (the host that you're providing is just a contact point - after that the cluster topology will be discovered, and DSBulk will select replica based on the load balancing policy).
You can enforce that DSBulk read the data with another consistency level by using -cl command line option (doc). You can compare results with using LOCAL_QUORUM or ALL - in these modes Cassandra will also "fix" the inconsistencies as they will be discovered, although this would be much slower & will add the load onto the nodes because of the repaired data writes.

yarn top command explained

Can someone explain output of this yarn top command? I mean what exactly those shortcuts (#CONT #RCONT VCORES RVCORES MEM RMEM VCORESECS MEMSECS %PROGR TIME NAME) mean and how the numbers are calculated?

I have the same question, so searched in the code base and find TopCLI.java.
APPID | ApplicationId of the application, e.g. application_1614636765551_1548
USER | user e.g. hadoop
TYPE | application's Type, e.g. spark
QUEUE | to which the application was submitted
PRIORITY | Application's priority, e.g. 0=VERY_HIGH, see JobPriority.java
CONT | the number of used containers
RCONT | the number of reserved containers
VCORES | the used Resource - virtual cores
RVCORES | the reserved Resource - virtual cores
MEM | the used Resource - memory in GB
RMEM | the resevered Resource - memroy in GB
VCORESECS | the aggregated number of vcores that the application has allocated times the number of seconds the application has been running.
MEMSECS | the aggregated amount of memory (in megabytes) the application has allocated times the number of seconds the application has been running.
PROGRESS | application's progress
TIME | application running time in "dd:HH:mm"
NAME | application name

Frequent Spikes in Cassandra write latency

In Production cluster , the Cluster Write latency frequently spikes from 7ms to 4Sec. Due to this clients face a lot of Read and Write Timeouts. This repeats in every few hours.
Observation:
Cluster Write latency (99th percentile) - 4Sec
Local Write latency (99th percentile) - 10ms
Read & Write consistency - local_one
Total nodes - 7
I tried to enable trace using settraceprobability for few mins and observed that mostly of the time is taken in internode communication
session_id | event_id | activity | source | source_elapsed | thread
--------------------------------------+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------+---------------+----------------+------------------------------------------
4267dca2-bb79-11e8-aeca-439c84a4762c | 429c3314-bb79-11e8-aeca-439c84a4762c | Parsing SELECT * FROM table1 WHERE uaid = '506a5f3b' AND messageid >= '01;' | cassandranode3 | 7 | SharedPool-Worker-47
4267dca2-bb79-11e8-aeca-439c84a4762c | 429c5a20-bb79-11e8-aeca-439c84a4762c | Preparing statement | Cassandranode3 | 47 | SharedPool-Worker-47
4267dca2-bb79-11e8-aeca-439c84a4762c | 429c5a21-bb79-11e8-aeca-439c84a4762c | reading data from /Cassandranode1 | Cassandranode3 | 121 | SharedPool-Worker-47
4267dca2-bb79-11e8-aeca-439c84a4762c | 42a38610-bb79-11e8-aeca-439c84a4762c | REQUEST_RESPONSE message received from /cassandranode1 | cassandranode3 | 40614 | MessagingService-Incoming-/Cassandranode1
4267dca2-bb79-11e8-aeca-439c84a4762c | 42a38611-bb79-11e8-aeca-439c84a4762c | Processing response from /Cassandranode1 | Cassandranode3 | 40626 | SharedPool-Worker-5
I tried checking the connectivity between Cassandra nodes but did not see any issues. Cassandra logs are flooded with Read timeout exceptions as this is a pretty busy cluster with 30k reads/sec and 10k writes/sec.
Warning in the system.log:
WARN [SharedPool-Worker-28] 2018-09-19 01:39:16,999 SliceQueryFilter.java:320 - Read 122 live and 266 tombstone cells in system.schema_columns for key: system (see tombstone_warn_threshold). 2147483593 columns were requested, slices=[-]
During the spike the cluster just stalls and simple commands like "use system_traces" command also fails.
cassandra#cqlsh:system_traces> select * from sessions ;
Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers.
Schema metadata was not refreshed. See log for details.
I validated the schema versions on all nodes and its the same but looks like during the issue time Cassandra is not even able to read the metadata.
Has anyone faced similar issues ? any suggestions ?

(from data from your comments above) The long full gc pauses can definitely cause this. Add -XX:+DisableExplicitGC you are getting full GCs because of calls to system.gc which is most likely from a silly DGC rmi thing that gets called at regular intervals regardless of if needed. With the larger heap that is VERY expensive. It is safe to disable.
Check your gc log header, make sure min heap size is not set. I would recommend setting -XX:G1ReservePercent=20

PostgreSQL-BDR: Some of the nodes starts to replicate only after 2 hours after network problems

My setup is PostgreSQL-BDR on 4 servers with the same configuration.
After network problems (e.g. connection lost for some minutes), some of the nodes start to replicate in some seconds again, but other nodes starts to replicate only after 2 hours.
I couldn't find any configuration switch to set the timing of the replication.
I see the following lines when i am monitoring replication slots:
slot_name | database | active | retained_bytes
bdr_16385_6255603470654648304_1_16385__ | mvcn | t | 56
bdr_16385_6255603530602290326_1_16385__ | mvcn | f | 17640
bdr_16385_6255603501002479656_1_16385__ | mvcn | f | 17640
Any idea why this is happening?

The problem was that the default tcp_keepalive_time is 7200 seconds whitch is excatly 2 hour, so changing the value of /proc/sys/net/ipv4/tcp_keepalive_time solved the problem.

Resource Optimization: which one to use Knapsack, brute-force or any other approach?

In a resource allocation problem i have n bucket sizes and m resources. Resources should be allocated to buckets in such a way that there will be max utilization. I need to write algorithm in Node js
Here's the problem: Let's say i have 2 buckets of sizes 50 and 60 respectively. Resource sizes are 20, 25, 40. Following is the more proper representation with possible solutions:
Solution 1:
| Bucket Size | Resource(s) allocated | Utilization |
| 50 | 20, 25 | 45/50 = 0.9 |
| 60 | 40 | 40/60 = 0.667 |
Total Utilization in this case is >1.5
Solution 2:
| Bucket Size | Resource(s) allocated | Utilization |
| 50 | 25 | 25/50 = 0.5 |
| 60 | 20, 40 | 60/60 = 1.0 |
Total Utilization in this case is 1.5
Inference:
-- Knapsack approach will return Solution 2 because it will do optimization based on higher bucket size.
-- Brute-Force approach will return both the solutions. One concern with this approach i have is; given that i have to use Node js and it is single threaded, i am little skeptic about performance when n (buckets) and m (resources) will be very large.
Will Brute-Force would do just fine or is there a better way/algorithm with which i can solve this problem? Also, is the concern which I've cited above is valid in any sense?

Knapsack problem (and this is knapsack problem) is NPC, which means, you can find solution only by brute-force or with alghoritms which have O-complexity same as bruteforce, but can be better in average case...
it is single threaded, i am little skeptic about performance when n
(buckets) and m (resources) will be very large.
I am not sure, if you know how thing works. If you do not create child threads and handle them (which is not that easy), every standard language will work in one thread, therefore in one processor. And if you want more processors that much, you can create child threads even in Node.Js.
Also in complexity problems, it does not matter, if solution takes multiple-time more, if the "multiple" is constant. In your case, I suppose the "multiple" means 4, if you have quad-core.
There are two good solutions :
1)Backtracking - it is basically advanced brute-force mechanism, which can in same cases return solution much faster.
2)Dynamic programming - If you have items with relatively low-values, then while classic brute-force is not able to find solution for 200 items in the expected time of universe itself, the dynamic approach can give you solution in (mili)seconds.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sizing Azure Kubernetes Services (AKS) Cluster - azure

Related

dsbulk unload missing data

yarn top command explained

Frequent Spikes in Cassandra write latency

PostgreSQL-BDR: Some of the nodes starts to replicate only after 2 hours after network problems

Resource Optimization: which one to use Knapsack, brute-force or any other approach?

Categories

Resources