How internal working of memcache and redis is different - multithreading

How redis performs below functionality internally.
Memory Management:
I know memcache does memory management using fixed size and frame and
fixed sized slab inside frame.How different redis memory management is?
For eviction memcache uses LRU .For this each memcache node has Map and Doubly linked list.On read write operation both data structure is accesses using global lock. Again how Redis perform this? As Redis in single threaded locking for these data structure would not be required.

First of all I'll suggest you to go through this post.
In that stackoverflow answer, you can see answer of your first point "Memory Management" along with many other detailed information.
Rest for your second point, I would like to tell that you can check default configuration file of redis from where you can manage different available behaviours.
This snippet is mentioned in redis configuration file:
# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached. You can select among five behaviors:
#
# volatile-lru -> remove the key with an expire set using an LRU algorithm
# allkeys-lru -> remove any key according to the LRU algorithm
# volatile-random -> remove a random key with an expire set
# allkeys-random -> remove a random key, any key
# volatile-ttl -> remove the key with the nearest expire time (minor TTL)
# noeviction -> don't expire at all, just return an error on write operations
Hope that it will help you. Rest if I'm missing something then please update me with that.

Related

Store and retrieve data from coordinates in Redis

I'm using Redis as a caching system to store data to avoid unnecessary consume on an API. What i was thinking is after get the result from the API, store the coordinate (as a key, for example) and the data on Redis and in a later search before consume the external api again, passing the a new coordinate check in Redis if matches with the saved coordinate (in meters, or anything) and if it matches, bring me the data stored.
I already searched for at least 1 hour and could not find any relevant results that fits my needs. For example, GEOADD could not help me because it doesn't expire automatically by Redis.
The only solution would be storing the coordinates as key (example: -51.356156,-50.356945) with a json value, and check with a functional programming all the keys (coordinates) if it's matches with an other coordinate. But it seens not elegant, also with bad performance.
Any ideas?
I'm using Redis in NodeJS (Express).
If I understand correctly, you want to able to:
Cache the API's response for a given tuple of coordinates
Be able to perform an efficient radius search over the cached responses
Use Redis' expiration to invalidate old cache entries
To satisfy #1, you've already outlined the right approach - store each API call under its own key. You can name the key by your coordinates, or use their geohash value (computing it can be done in the client or with a temporary element in Redis). Also don't forget setting a TTL on that key and the global maxmemory eviction policy for eviction to actually work.
The 2nd requirement calls for using a Geo Set. Store the coordinates and the key names in it. Perform your query by calling GEORADIUS and then fetch the relevant keys according to the reply.
While fetching the keys from #2's query, you may find some of them have been evicted from the keyspace but not in the Geo Set. Call ZREM for each of these to keep a semblance of sync between your index (the Geo Set) and the keyspace. Additionally, you can also run a periodic background task that ZSCAN's the Geo Set and does housekeeping. That should take care of #3.

Global Redis Twemproxy Architecture

I'm launching a global service on a Node/Mongo/Redis stack. I've got an Architecture question about my Redis/Twemproxy config. Here's the crux of the issue: all of my data is 'global' - that is to say, users from anywhere around the world need to access the same data. Unfortunately, there's it's a ~300ms hop across an ocean - so, to avoid slow reads, I need to host a copy of all my data on a server that's 'local' to the user.
This is pretty easy to accomplish with MongoDB. You simply create a replica set, with members all over the globe, and you set readPreference to 'nearest' (least lag). Done.
However, with Redis/Twemproxy, it's not that easy...
My current solution is to take a serious hit on write performance by writing to every global server (within the req/res cycle). This does lead to faster reads since I can let every user read from a local set of the data. If you do it the other way around, write 'local', read 'global' -- you save a bunch of space (you only have one copy of the data), but reads take a huge performance hit. If I had to choose, I need faster reads.
I've tried creating a 'master' cluster (AMER) and then slaving other 'global' clusters (ASIA, EUROPE) to that, but when I tried to read from the 'global' clusters, it returned nothing. This works with a single Redis instance, so, I'm assuming this has to do with the addition of Twemproxy, and key mapping.
Does anyone have any suggestions or ideas? What's the optimal way to configure a global Redis/Twemprox architecture?

Are GridCacheQueue elements also GridCacheElements?

I'm in the process of evaluating GridGain and have read and re-read all the documentation I could find. While much of it is very thorough, you can tell that it's mostly written by the developers. It would be great if there were a reference book written by an outsider's perspective.
Anyway, I have five basic questions I'm hoping someone from GridGain can answer and clarify for me.
It's my understanding that GridCacheQueue (and the other Distributed Data Structures) are built on top of the GridCache implementation. Does that mean that each element of the GridCacheQueue is really just a GridCacheElement of the GridCache map, or is each GridCacheQueue a GridCacheElement, or do I have this totally wrong?
If I set a default TTL on the GridCache, will the elements of a GridCacheQueue expire in the TTL time, or does it only apply to GridCacheElements (which might be answered in #1 above)?
Is there a way to make a GridCacheQueue expire after some period of time without having to remove it manually?
If a cache is set-up to be backed-up onto other nodes and the cache is using off-heap memory and/or swap storage, is the off-heap memory and/or swap storage also replicated onto the back-up nodes?
Is it possible to create a new cache dynamically, or can it only be created via configuration when the node is created?
Thanks for any insightful information!
-Colin
After experimenting with a GridCache and a GridCacheQueue, here's what I've learned about my 5 questions:
I don't know how the GridCacheQueue or its elements are attached to a GridCache, but I know that the elements of a GridCacheQueue DO NOT show up as GridCacheElements of the GridCache.
If you set a TTL on a GridCache and add a GridCacheQueue to it, once the elements of the GridCache begin expiring, the GridCacheQueue becomes unusable and will cause a GridRuntimeException to be thrown.
Yes, see #2 above. However, there doesn't seem to be a safe way to test if the queue is still in existence once the elements of the GridCache start to expire.
Still have no information about this yet. Would REALLY like some feedback on that.
That was a question I never should have asked. A GridCache can be created entirely in code and configured.
Let me first of all say that GridGain supports several queue configuration parameters:
Colocated vs. non-colocated. In colocated mode you can have many queues. Each queue will be assigned to some grid node and all the data in that queue will be cached on that grid node. This way, if you have many queues, each queue may be cached on a different node, but queues themselves should be evenly distributed across all nodes. Non-colocated mode, on the other hand is meant for larger queues, where data for the same queue is partitioned across multiple nodes.
Capacity - this parameter defines maximum queue capacity. When queue reaches this capacity it will automatically start evicting elements oldest elements.
Now, let me try to tackle some of these questions.
I believe each element of GridCacheQuery is a separate element in cache, but implementation marks them as internal elements. That is why you don't see these elements when iterating through cache.
TTL should not be used with elements in the queue (GridGain will be adding this feature soon). For now, you should limit the maximum size of the queue by specifying queue 'capacity' at creation time.
I don't believe so, but I think this feature is being added. For now, you can try using org.gridgain.grid.schedule.GridScheduler to schedule a job that will delete a queue later.
The answer is YES. Both, data in off-heap and swap spaces is backed up and replicated the same way as main on-heap cache data.
A cache should be created in configuration, either from code or XML. However, GridGain has a cool notion of GridCacheProjection which allows to create various sub-caches (cache views) on the same cache. For example, if you store Person and Organization classes in the same cache, then you can use cache projection for type Person when working with Person class, and cache projection of type Organization when working with Organization class.

Find all instances of CacheKey from CacheClient object and remove one or more from it

I am caching couple of requests with following unique keys... I am using In-Memory cache.
urn:Product:/site.api.rest/product?pagenumber=0&pagesize=0&params=<Product><ProductID>3</ProductID></Product>
urn:Product:/site.api.rest/product?pagenumber=0&pagesize=0&params=<Product><ProductID>1</ProductID></Product>
urn:Product:/site.api.rest/product?pagenumber=0&pagesize=0&params=<Product><ProductID>5</ProductID></Product>
urn:Product:/site.api.rest/product?pagenumber=0&pagesize=0&params=<Product><ProductID>3</ProductID><Description>test</Description></Product>
...
...
Now, in create/update/delete I would like to remove specific cache (based upon what params passed in the request body, for example everything with 3.
In order to do that I would get singleNode from request params (for example: 3).
How can I?
Get all cache objects which matches <ProductID>3</ProductID> and remove it?
Also, what is the right approach to remove cache?
base.RequestContext.RemoveFromCache(base.Cache, cachekey);
or
CacheClient.Remove(keyname)?
There is a better approach, which is to use generational caching.
When you construct your cache key include a generation number e.g.
urn:Product:gen_1:/site.api.rest/product?pagenumber=0&pagesize=0&params=
(This number could be stored as a counter in your caching service.)
Then when you want to (pseudo) invalidate a large set of cached items, just increment the generation number.
Just be sure to set an expiry date on the cached items so that older generations are cleaned up over time.

On-disk lookup table with node.js bindings

For a project I am creating a queuing library and basically store URLs in a Set (it's actually an object, where I set keys to true, but one can see it as an array), so the queue only takes every url once. This works really well, however I am facing the problem that there are many URLs and so the RAM usage becomes really high.
Therefor I want to use an on-disk key-value store (actually only keys are required, no idea whether there is some different approach) with the following requirements:
No need to load the whole data set into RAM
Speedy lookups
Node.js bindings
It doesn't have to be too safe (losing data once in a while isn't a huge problem, low RAM requirements are more important) and even though I use Node.JS in this scenario this lookup doesn't necessarily need to run async.
Actually a side question would be whether there is some better way than a on-disk key-value approach. A term would be nice. Lookuptables somehow always lets me find data sets (IPs, ZIP codes, etc.)
I'd use a sql table with a single column (to store the url). Better control on memory usage than redis (which pretty much stores all in memory).
easy to check if there is already the same value
easy to insert
easy to remove one element
If it really "doesn't have to be too safe", another design would be to keep storing everything in memory but limit the number of URLs you store, for example by using an LRU cache.
You could either use a cache in node.js (easy to find via Google) or use a separate memcached server, possibly on the same machine.

Resources