Get maxmemory of redis azure cache from client - azure

We are using Azure Redis Cache and we need to monitor the state of it. One of thing we need is information about maximum memory. Currently, we enter the information manually, however we want to avoid it in future. Standard command used for this purpose config get maxmemory is disabled in Azure. For completeness, we are using StackExchange.Redis as a client.
Any idea, how to get the information? Also, why is the get version of command disabled?

There is currently no way to get the maxmemory setting. The "config" command is blocked for a few reasons. One is that setting certain config settings could impact the stability of our service. Another is that any changes to config would be lost if the server instance was restarted. We are looking into ways to enable "config get" but keep "config set" blocked.
Here are the current values for maxmemory for each size cache offering:
Name Size maxmemory
C0 250 MB 285,000,000
C1 1 GB 1,100,000,000
C2 2.5 GB 2,600,000,000
C3 6 GB 6,100,000,000
C4 13 GB 13,100,000,000
C5 26 GB 26,200,000,000
C6 53 GB 53,300,000,000

Related

Bursts of Redis errors

We've recently created a new Standard 1 GB Azure Redis cache specifically for distributed locking - separated from our main Redis cache. This was done to improve stability on our main Redis cache which is a very long term issue which this action seems to of significantly helped with.
On our new cache, we observe bursts of ~100 errors within the same few seconds every 1 - 3 days. The errors are either:
No connection is available to service this operation (StackExchange.Redis error)
Or:
Could not acquire distributed lock: Conflicted (RedLock.net error)
As they are errors from different packages, I suspect the Redis cache itself is the problem here. None of the stats during this time look out of the ordinary and the workload should fit comfortably in the Standard 1GB size.
I'm guessing this could be caused by the advertised Low network performance advertised, is this likely the cause?
Your theory sounds plausible.
Checking for insufficient network bandwidth
Here is a handy table showing the maximum observed bandwidth for various pricing tiers. Take a look at the observed maximum bandwidth for your SKU, then head over to your Redis blade in the Azure Portal and choose Metrics. Set the aggregation to Max, and look at the sum of cache read and cache write. This is your total bandwidth consumed. Overlay the sum of these two against the time period when you're experiencing the errors, and see if the problem is network throughput. If that's the case, scale up.
Checking server load
Also on the Metrics tab, take a look at server load. This is the percentage that Redis is busy and is unable to process requests. If you hit 100%, Redis cannot respond to new requests and you will experience timeout issues. If that's the case, scale up.
Reusing ConnectionMultiplexer
You can also run out of connections to a Redis server if you're spinning up a new instance of StackExchange.Redis.ConnectionMultiplexer per request. The service limits for the number of connections available based on your SKU are here on the pricing page. You can see if you're exceeding the maximum allowed connections for your SKU on the Metrics tab, select max aggregation, and choose Connected Clients as your metric.
Thread Exhaustion
This doesn't sound like your error, but I'll include it for completeness in this Rogue's Gallery of Redis issues, and it comes into play with Azure Web Apps. By default, the thread pool will start with 4 threads that can be immediately allocated to work. When you need more than four threads, they're doled out at a rate of one thread per 500ms. So if you dump a ton of requests on a Web App in a short period of time, you can end up queuing work and eventually having requests dropped before they even get to Redis. To test to see if this is a problem, go to Metrics for your Web App and choose Threads and set the aggregation to max. If you see a huge spike in a short period of time that corresponds with your trouble, you've found a culprit. Resolutions include making proper use of async/await. And when that gets you no further, use ThreadPool.SetMinThreads to a higher value, preferably one that is close to or above the max thread usage that you see in your bursts.
Rob has some great suggestions but did want to add information on troubleshooting traffic burst and poor ThreadPool settings. Please see: Troubleshoot Azure Cache for Redis client-side issues
Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.
Monitor how your ThreadPool statistics change over time using an example ThreadPoolLogger. You can use TimeoutException messages from StackExchange.Redis like below to further investigate:
System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
Notice that in the IOCP section and the WORKER section you have a Busy value that is greater than the Min value. This difference means your ThreadPool settings need adjusting.
You can also see in: 64221. This value indicates that 64,211 bytes have been received at the client's kernel socket layer but haven't been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn't reading data from the network as quickly as the server is sending it to you.
You can configure your ThreadPool Settings to make sure that your thread pool scales up quickly under burst scenarios.
I hope you find this additional information is helpful.

How is the cost for Azure Function proxy calculated?

We have lots of images in Azure Blob Storage (LRS Hot). We calculate around 15 million downloads per month for a total of 5000 GB egress (files are on average 350kB). I can calculate the price for the Blob Storage but the Function proxy is unknown. The Azure Functions pricing document doesn't say anything about proxy functions and specifically about bandwidth.
Question 1: Are these calculations correct?
Execution count price is €0,169 per million executions, which equals to 15 * 0,169€=2,54€/month.
GB-s price is €0,000014/GB-s and memory usage is rounded to nearest 128MB. If file download time is 0,2s and memory is 128MB we have 0,2 * (128/1024) * 15000000 * 0,000014 = 5,25€/month
Question 2: What about bandwidth? Is there any cost for that?
Q1: Mostly yes.
Azure Functions Proxies (Preview) works just like regular functions, meaning that any routing done by your proxy counts as one execution. Also, just like standard functions, it uses your GB-s while it's running. Your calculation approach is correct, with the caveat that reading from blog storage is actually a streaming activity, which will consume a fixed amount of memory multipled by the time it will take to each file to download.
Q2: This works the same way as Azure App Service. From the pricing page:
165 MB outbound network traffic included, additional outbound network bandwidth charged separately.

Load testing bottleneck on nodejs with Google Compute Engine

I cannot figure out what is the cause of the bottleneck on this site, very bad response times once about 400 users reached. The site is on Google compute engine, using an instance group, with network load balancing. We created the project with sailjs.
I have been doing load testing with Google container engine using kubernetes, running the locust.py script.
The main results for one of the tests are:
RPS : 30
Spawn rate: 5 p/s
TOTALS USERS: 1000
AVG(res time): 27500!! (27,5 seconds)
The response time initially is great, below one second, but when it starts reaching about 400 users the response time starts to jump massively.
I have tested obvious factors that can influence that response time, results below:
Compute engine Instances
(2 x standard-n2, 200gb disk, ram:7.5gb per instance):
Only about 20% cpu utilization used
Outgoing network bytes: 340k bytes/sec
Incoming network bytes: 190k bytes/sec
Disk operations: 1 op/sec
Memory: below 10%
MySQL:
Max_used_connections : 41 (below total possible)
Connection errors: 0
All other results for MySQL also seem fine, no reason to cause bottleneck.
I tried the same test for a new sailjs created project, and it did better, but still had terrible results, 5 seconds res time for about 2000 users.
What else should I test? What could be the bottleneck?
Are you doing any file reading/writing? This is a major obstacle in node.js, and will always cause some issues. Caching read files or removing the need for such code should be done as much as possible. In my own experience, serving files like images, css, js and such trough my node server would start causing trouble when the amount of concurrent requests increased. The solution was to serve all of this trough a CDN.
Another proble could be the mysql driver. We had some problems with connection not being closed correctly (Not using sails.js, but I think they used the same driver at the time I encountered this), so they would cause problems on the mysql server, resulting in long delays when fetching data from the database. You should time/track the amount of mysql queries and make sure they arent delayed.
Lastly, it could be some special issue with sails.js and Google compute engine. You should make sure there arent any open issues on either of these about the same problem you are experiencing.

OpenNMS threshold checks only one server

So I'm trying to configure OpenNMS to check the disk space on my linux servers.
After some work I got it to check one server through SNMP :
I installed snmpd on the server I'm monitoring, defined a threshold(in fact I use the predefined default one) and connected it to an event that triggers when ns-dskPercent goes to high. up until here all went well.
Now I added a second server, installed the same stuff on it, it seems to monitor the snmp daemon and notifies me when the service is down, but it doesn't seem to see the threshold.
When I make changes in the threshold - for example lower it to 20% in order to force it to trigger - only the first server sees that it changed (and also gives a notification that the configuration has changed) and fires the alarm, but the second server doesn't respond.
(These are the notifications I get on the first server:)
High threshold rearmed for SNMP datasource ns-dskPercent on interface
xxx.xxx.xxx.xxx, parms: label="/" ds="ns-dskPercent" description="ns-dskPercent"
value="NaN (the threshold definition has been changed)" instance="1"
instanceLabel="_root_fs" resourceId="node[9].dskIndex[_root_fs]"
threshold="20.0" trigger="1" rearm="75.0" reason="Configuration has been changed"
High threshold exceeded for SNMP datasource ns-dskPercent on interface
xxx.xxx.xxx.xxx, parms: label="/" ds="ns-dskPercent" description="ns-dskPercent"
value="52" instance="1" instanceLabel="_root_fs"
resourceId="node[9].dskIndex[_root_fs]" threshold="20.0" trigger="1" rearm="75.0"
Any ideas why or how I can make the second server to respond also?
The issue could be based upon the source of the data collected. Thresholding in modern versions of OpenNMS (14+) is evaluated inline and in memory as data is collected, so you must ensure that the threshold is evaluated against the exact metrics the node you are interested in contains.
There are usually two forms that file system metrics on linux systems come in- mib2 use of the host resources table (hrStorageSize, etc in $OPENNMS_HOME/etc/datacollection/mib2.xml) or net-snmp metrics from the net-snmp MIB (ns-dskTotal, etc in $OPENNMS_HOME/etc/datacollection/netsnmp.xml).
So, first verify that you are getting good data from the new server and that it is, indeed, collecting metrics from the same MIB table that you seek to threshold against.

Azure DataCache MaxConnectionToServer

I am using the AppFabricCacheSessionStoreProvider and occasionally get the error
ErrorCode:SubStatus:There is a temporary failure. Please retry later.
(The request failed, because you exceeded quota limits for this hour.
If you experience this often, upgrade your subscription to a higher
one). Additional Information : Throttling due to resource :
Connections.
I am using a basic 128mb cache with a web role which has two instances. What is the default MaxConnectionToServer value if it is not set? I think when I fire up a staging instance as well it can cause this error (4 simultaneous instances). Will setting MaxConnectionToServer to a higher value make it better or worse? I believe the 128mb cache has limit of 5 connections so should I set it to 1 which would mean only 4 connections could be used. The cache is not used elsewhere in the app.
The default for MaxConnectToServer is 1, so you shouldn't have to change this setting, but if you do set it to 1, it will avoid anyone else looking at your config from getting confused as well. If you set it to a higher value then you will see this problem more often.
The cache session provider seems to be a little slow at disposing of its connections to the cache when it doesn't need them any more. This means that if you're running a number of instances which is close to the limit for you cache size you do seem to see this error. You're correct a 128MB cache does only allow 5 concurrent connections. If you want to avoid this problem at the moment the only solution I'm aware of is to buy the next cache size up.

Resources