How to fix Redis "memory leak"

How to fix Redis "memory leak" - memory-leaks

I'm using a redis memory store on dotcloud but despite expiring keys its used_memory never drops back down again. Using flushdb or flushall from redis-cli doesn't cause the used_memory to drop from it's ~20Mb. I've had the same problem on RedisToGo.
Anyone know how am I managing to fill it up? and how can I avoid doing this? Perhaps there are certain characters you shouldn't put into redis values or keys? I'm using it with EM and resque from a heroku rails app.

Redis also has a mem_fragmentation_ratio (eg: 2.5), so using both values would likely lead to more accurate measurements. At very low used_memory levels (eg: near-zero) the fragmentation can be quite high, and to mitigate this you would need to stop/start the redis instance manually.
RedisToGo may be reporting real memory usage in this manner, as a combination of used_memory x mem_fragmentation_ratio.

Related

What is the ideal value size range for redis? Is 100KB too large?

Is there an upper limit to the suggested size of the value stored for a particular key in Redis?
Is 100KB too large?

There are two things that you need to take into consideration when deciding if something is "too big".
Does Redis have support for the size of key/value object that you want to store?
The answer to this question is documented pretty well on the Redis site (https://redis.io/topics/data-types), so I won't go into detail here.
For a given key/value size, what are the consequences I need to be aware of?
This is a much more nuanced answer as it depends heavily on how you are using Redis and what behaviors are acceptable to your application and which ones are not.
For instance, larger key/value sizes can lead to fragmentation of the memory space within your server. If you aren't using all the memory in your Redis server anyway, then this may not be a big deal to you. However, if you need to squeeze all of the memory out of your Redis server you can, then you are now reducing the efficiency of how memory is allocated and you are losing access to some memory that you would otherwise have.
As another example, when you are reading these large key/value entries from Redis, it means you have to transfer more data over the network from the server to the client. Some consequences of this are:
It takes more time to transfer the data, so your client may need to have a higher timeout value configured to allow for this additional transfer time.
Requests made to the server on the same TCP connection can get stuck behind the big transfer and cause other requests to timeout. See here for an example scenario.
Your network buffers used to transfer this data can impact available memory on the client or server, which can aggravate the available memory issues already described around fragmentation.
If these large key/value items are accessed frequently, this magnifies the impacts described above as you are repeatedly transferring this data over and over again.
So, the answer is not a crisp "yes" or "no", but some things that you should consider and possibly test for your expected workload. In general, I do advise our customers to try to stay as small as possible and I have often said to try to stay below 100kb, but I have also seen plenty of customers use Redis with larger values (in the MB range). Sometimes those larger values are no big deal. In other cases, it may not be an issue until months or years later when their application changes in load or behavior.

Is there an upper limit to the suggested size of the value stored for a particular key in Redis?
According to the official docs, the maximum size of key(String) in redis is 512MB.
Is 100KB too large?
It depends on the application and use, for a general purpose applications it should be fine.

AWS Lambda Memory vs Execution time

I have a (Nodejs 4.3) lambda function that I have tested with several memory limit settings (128, 256, 512).
As I pull up the memory limit, the execution time decreases as expected. However the max memory used also goes down. Every time I reduce the memory limit the execution time and max memory used go back up.
Any thoughts? Trying to figure out how to hit the execution time I need while not over paying.

This is the Node VM utilizing memory. If it's available, it's going to use it (and you could probably reproduce this scenario locally with VMs or Docker). I wouldn't worry too much and I wouldn't recommend trying to instruct the VM on what to do or even thinking about garbage collection (which is not easy with Node.js anyway). Node is very opportunistic about memory is all I can say. I would select the amount you need in order to get a reasonable response time and leave it at that.
I would also probably have to imagine that on "warm" runs your speed will increase when you have a lower amount of memory set, but a "cold" run will be higher. So in production, it may not be as big of a concern.
You may wish to start profiling your code and trying to optimize it...But again, Node doesn't really want the developer worrying about the resources on a machine. It tries to optimize for you. It's a bit unfortunate when you're billed that way though. This is part of why I wish Go was natively supported with Lambda.

Why swap needs to be turned off in Datastax Cassandra?

I am new to Datastax cassandra. While going through the installation procedure of cassandra. It is recommended that the swap area of OS should be turned off. Does anyone provide the reason for that? Will it affect any OS level operations ?

In production, if your database is using swap you will have very bad performance. In a ring of Cassandra nodes, you are better off having one node go completely down than allowing it to limp along in swap.
The easiest way to ensure that you never go into swap is to simply disable it.

If you dont disable swap space, when there is an Out of Memory(when the address space is used up by cassandra mmap) problem at OS level, the OS will try to take a slice of the JVM which is in turn actually trying to clear it via JNI by default. Now , your JVM is slowed down as a slice of its heap memory is lost. Now GC will be happening along with the cassandra write operation with lesser heap memory. This brings the overall performance of the cassandra node down and gradually kills it one point of time where there is no more memory left at OS level.
Thats why they suggest you two things.
Bundle jna.jar. When there is a GC operation for mmap of cassandra, it is all done by java code and not JNI portion that is shipped by default in cassandra. So, it avoids a portion of address space that JNI tries to store in case of native operations.
Disable swap space . A low performing cassandra node will slow down all the operations at its client end. Even the replication to this node will be slowed down and hence you read/write will appear slower than you think. A node shall die and restart when such an Out of Memory occurs instead of taking a portion of JVM that slows down the entire process.

Mongo suffering from a huge number of faults

I'm seeing a huge (~200++) faults/sec number in my mongostat output, though very low lock %:
My Mongo servers are running on m1.large instances on the amazon cloud, so they each have 7.5GB of RAM ::
root:~# free -tm
total used free shared buffers cached
Mem: 7700 7654 45 0 0 6848
Clearly, I do not have enough memory for all the cahing mongo wants to do (which, btw, results in huge CPU usage %, due to disk IO).
I found this document that suggests that in my scenario (high fault, low lock %), I need to "scale out reads" and "more disk IOPS."
I'm looking for advice on how to best achieve this. Namely, there are LOTS of different potential queries executed by my node.js application, and I'm not sure where the bottleneck is happening. Of course, I've tried
db.setProfilingLevel(1);
However, this doesn't help me that much, because the outputted stats just show me slow queries, but I'm having a hard time translating that information into which queries are causing the page faults...
As you can see, this is resulting in a HUGE (nearly 100%) CPU wait time on my PRIMARY mongo server, though the 2x SECONDARY servers are unaffected...
Here's what the Mongo docs have to say about page faults:
Page faults represent the number of times that MongoDB requires data not located in physical memory, and must read from virtual memory. To check for page faults, see the extra_info.page_faults value in the serverStatus command. This data is only available on Linux systems.
Alone, page faults are minor and complete quickly; however, in aggregate, large numbers of page fault typically indicate that MongoDB is reading too much data from disk and can indicate a number of underlying causes and recommendations. In many situations, MongoDB’s read locks will “yield” after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read into memory. This approach improves concurrency, and in high volume systems this also improves overall throughput.
If possible, increasing the amount of RAM accessible to MongoDB may help reduce the number of page faults. If this is not possible, you may want to consider deploying a shard cluster and/or adding one or more shards to your deployment to distribute load among mongod instances.
So, I tried the recommended command, which is terribly unhelpful:
PRIMARY> db.serverStatus().extra_info
{
"note" : "fields vary by platform",
"heap_usage_bytes" : 36265008,
"page_faults" : 4536924
}
Of course, I could increase the server size (more RAM), but that is expensive and seems to be overkill. I should implement sharding, but I'm actually unsure what collections need sharding! Thus, I need a way to isolate where the faults are happening (what specific commands are causing faults).
Thanks for the help.

We don't really know what your data/indexes look like.
Still, an important rule of MongoDB optimization:
Make sure your indexes fit in RAM. http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-MakesureyourindexescanfitinRAM.
Consider that the smaller your documents are, the higher your key/document ratio will be, and the higher your RAM/Disksize ratio will need to be.
If you can adjust your schema a bit to lump some data together, and reduce the number of keys you need, that might help.

Solr uses too much memory

We have a Solr 3.4 instance running on Windows 2008 R2 with Oracle Java 6 Hotspot JDK that becomes unresponsive. When we looked at the machine, we noticed that the available physical memory went to zero.
The Tomcat7.exe process was using ~70Gigs (Private Working Set) but Working Set (Memory) was using all the memory on the system. There were no errors in the Tomcat / Solr logs. We used VMMap to identify that the memory was being used for memory mapping the Solr segement files.
Restarting Tomcat fixed the problem temporarily, but it eventually came back.
We then tried decreasing the JVM size to give more space for the memory mapped files, but then the Solr eventually becomes unresponsive with the old generation at 100%. Again resetting fixed the problem, but it did not throw an out-of-memory exception before we reset.
Currently our spidey sense is telling us that the cache doesn't shrink when there is memory pressure, and that maybe there are too many MappedByteBuffers hanging around so that the OS can not free up the memory from memory mapped files.

There are too many parameters and too little information to help with any details. This answer is also quite old as are the mentioned systems.
Here are some things that helped in my experience:
rather decrease RAM usage in Tomcat as well as in SOLR to decrease the risk of swapping. Leave the system space to breath.
if this "starts" to appear without any changes to Tomcat or the SOLR config - than maybe it is due to the fact that the amount of data that SOLR has to index and query has increased. This can either mean that the original config was never good to begin with or that the limit of the current resources have been reached and have to be reviewed. Which one is it?
check the queries (if you can influence them): move any subquery constructs that are often requested into filterqueries, move very individual request constructs into the regular query parameter. Decrease query cache, increase/keep filter query cache - or decrease filter cache in case filter queries aren't used that much in your system.
check the schema.xml of SOLR on configuration errors (could just be misconceptions really). I ran into this once: while importing fields would be created en masse causing the RAM to overflow.
if it happens during import: check whether the import process is set to autocommit and commits rather often and does also optimize - maybe it is possible to commit less often and optimize only once at the end.
upgrade Java, Tomcat and SOLR

It's probably due to the new default policy SOLR uses for directories (it tries to map them to RAM or something like that).
Read this:
http://grokbase.com/t/lucene/solr-user/11789qq3nm/virtual-memory-usage-increases-beyond-xmx-with-solr-3-3

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string