I'm using Redis as a caching system to store data to avoid unnecessary consume on an API. What i was thinking is after get the result from the API, store the coordinate (as a key, for example) and the data on Redis and in a later search before consume the external api again, passing the a new coordinate check in Redis if matches with the saved coordinate (in meters, or anything) and if it matches, bring me the data stored.
I already searched for at least 1 hour and could not find any relevant results that fits my needs. For example, GEOADD could not help me because it doesn't expire automatically by Redis.
The only solution would be storing the coordinates as key (example: -51.356156,-50.356945) with a json value, and check with a functional programming all the keys (coordinates) if it's matches with an other coordinate. But it seens not elegant, also with bad performance.
Any ideas?
I'm using Redis in NodeJS (Express).
If I understand correctly, you want to able to:
Cache the API's response for a given tuple of coordinates
Be able to perform an efficient radius search over the cached responses
Use Redis' expiration to invalidate old cache entries
To satisfy #1, you've already outlined the right approach - store each API call under its own key. You can name the key by your coordinates, or use their geohash value (computing it can be done in the client or with a temporary element in Redis). Also don't forget setting a TTL on that key and the global maxmemory eviction policy for eviction to actually work.
The 2nd requirement calls for using a Geo Set. Store the coordinates and the key names in it. Perform your query by calling GEORADIUS and then fetch the relevant keys according to the reply.
While fetching the keys from #2's query, you may find some of them have been evicted from the keyspace but not in the Geo Set. Call ZREM for each of these to keep a semblance of sync between your index (the Geo Set) and the keyspace. Additionally, you can also run a periodic background task that ZSCAN's the Geo Set and does housekeeping. That should take care of #3.
Related
I am using Cloud Firestore in Datastore mode. I have a list of keys of the same Kind, some exist already and some do not. For optimal performance, I want to run a compute-intensive operation only for the keys that do not yet exist. Using the Python client library, I know I can run client.get_multi() which will retrieve the list of keys that exist as needed. The problem is this will also return unneeded Entity data associated with existing keys, increasing the latency and cost of the request.
Is there a better way to check for existence of multiple keys?
You could check whether a key exists using keys-only queries as they return only the keys instead of the entities themselves, at lower latency and cost than retrieving entire entities.
I have recently started using Azure Cosmos DB in our project. For the reporting purpose, we need to get all the Partition Keys in the collection. I could not find any suitable API to achieve it.
UPDATE: According to Brian in the comments below, DISTINCT is now supported. Try something like:
SELECT DISTINCT c.partitionKey FROM c
Prior answer: Idea that could work but for one thing...
The only way to get the actual partition key values is to do a unique aggregate on that field.
You can directly hit the REST endpoint at https://{your endpoint domain}.documents.azure.com/dbs/{your collection's uri fragment}/pkranges to pull back the minInclusive and maxExclusive ranges for each partition but those are hash space ranges and I don't know how to convert those into partition key values nor do a fanout using the actual minInclusive hash.
Also, there is a slim possibility that the pkranges can change between the time you retrieve them and the time you go to do something with them.
At inserting documents, if the key is generated at client-side. does it slow down the writes on a single machine or cluster?
I ask because i think server-side generated keys are sure to be unique and doesn't need to be checked for uniqueness.
However what are the disadvantages or things to remember when generating keys on client side?(in single machine, sharding, master-master replication which is coming)
Generating keys on the client-side should not have any notable performance impact for ArangoDB. ArangoDB will parse the incoming JSON anyway, and will always look for a _key attribute in it. If it does not exist, it will create one itself. If it exists in the JSON, it will be validated for syntactic correctness (because only some characters are allowed inside document keys). That latter operation only happens when a _key value is specified in the JSON, but its impact is very likely negligible, especially when compared to the other things that happen when documents are inserted, such as network latency, disk writes etc.
Regardless of whether a user-defined _key value was specified or not, ArangoDB will check the primary index of the collection for a document with the same key. If it exists, the insert will fail with a unique key constraint violation. If it does not exist, the insert will proceed. As mentioned, this operation will always happen. Looking for the document in the primary index has an amortized complexity of O(1) and should again be negligible when compared to network latency, disk writes etc. Note that this check will always happen, even if ArangoDB generates the key. This is due to the fact that a collection may contain a mix of client-generated keys and ArangoDB-generated keys, and ArangoDB must still make sure it hasn't generated a key that a client had also generated before.
In a cluster, the same steps will happen, apart from that the client will send an insert to a coordinator node, which will need to forward the insert to a dbserver node. This is independent of whether a key is specified or not. The _key attribute will likely be the shard key for the collection, so the coordinator will send the request to exactly one dbserver node. If the _key attribute is not the shard key for the collection because it a different shard key was explicitly set, then user-defined keys are disallowed anyway.
Summary so far: in terms of ArangoDB there should not be relevant performance differences between generating the keys on the client side or having ArangoDB generate them.
The advantages and disadvantages of generating keys in the client application are, among others:
+ client application can make sure keys follow some required pattern / syntax that's not guaranteed by ArangoDB-generated keys and has full control over key creation algorithm (e.g. can use tenant-specific keys in multi-tenant application)
- client may need some data store for storing its key generator state (e.g. id of last generated key) to prevent duplicates (also after a restart of the client application)
- usage of client-side keys are disallowed when different shard keys are used in cluster mode
I have a classified advertisements website ala craigslist.org - just on a much smaller scale.
I'm running MongoDB and is caching all API requests in Redis where the Mongo query is the key and the value is the MongoDB result document.
Pseudo code:
// The mongo query
var query = {section: 'home', category: 'garden', 'region': 'APAC', country: 'au', city: 'sydney', limit: 100}
// Getting the mongo result..
// Storing in Redis
redisClient.set(JSON.stringify(query), result);
Now a user creates a new post in the same category, but Redis now serves up a stale record because Redis has no idea that the dataset has changed.
How can we overcome this?
I could set an expiry in general or on that particular key, but essentially mem cached keys need to expire in the moment a user creates a new post for those keys where the result set would include the newly created record.
One way is to iterate though all the Redis keys and come up with a pattern to detect which keys should be deleted based on the characteristics of the newly created record. Only this approach seem "too clever" and not quite right.
So we want to mem cache at the same time as serving up fresh content instantly.
I would avoid caching bulk query results as a single key. Redis is for use cases where you need to access and update data at very high frequency and where you benefit from use of data structures such as hashes, sets, lists, strings, or sorted sets [1]. Also, keep in mind MongoDB will already have part of the database cached in memory so you might not see much in the way of performance gains.
A better approach would be to cache each post individually. You can add keys to sets to group them into categories or even just pages (like the 20 or so posts that the user expects to see on each page). This way, every time a user makes a new post or updates an existing one, you can update the corresponding key in your Redis cache as well.
I am using Redis key-value pair for storing the data. The data against a particular key can change at any point of time, so after every retrieval request I asynchronously update the data stored against the requested key so that the next request can be served with updated data.
I have done quite a bit of testing but still I am wondering if there could be any case where this approach might have some negative consequences?
PS: The data is consolidated from multiple servers.
Thanks in advance for any help/suggestions.
If you already know the value to be stored, you can use GETSET (or a transaction if it is not a simple string type).
If the new value is some manipulation on the value i.e. f(value), you should do it in a LUA script.
Otherwise some other client might read the old value before you update it.