Redis Caching - Is it a bad practice to store duplicate data

Redis Caching - Is it a bad practice to store duplicate data - node.js

Is it a bad practice to store duplicate data in Redis cache?
I am experimenting with a GraphQL Caching solutions, but I have a few tables which I query by a combination of keys and never their primary key, and appears to be a bit of an issue for me.
Lets consider these tables
Products - id, ...
Images - id, productId, size
I need to be able to get the images ( multiple ) by productId or a single row by a combination of productId and size.
What I currently store is something in the form of
{
images:productId:1:size:sm: {...},
images:productId:1:size:xs: {...},
images:productId:1: ['images:productId:1:size:sm', 'images:productId:1:size:xs']
}
The third object contains references to all of the available images in cache for the product, so I end up performing two queries to retrieve the data.
If I want one, I can directly go ahead and get it. If I want all of them, I first have to hit the third key, and then use the keys within it to get the actual objects.
Is this a bad idea? Should I bother with it, or just go with the simpler form
{
images:productId:1:size:sm: {...},
images:productId:1:size:xs: {...},
images:productId:1: [ {...}, {...} ] // Copies of the two objects from above
}
To provide some context, some of these objects might become a bit large over time, because they might contain long text / html from Rich text editors.
I read that hashes compress data better, so I organized them in a way that they are placed in a single hash, that way invalidation becomes easier too ( I don't care about invalidating some of them, they will always be invalidated at once ).
It is a multi-tenant system, where I would be using a tenant id to scope the data to specific users.

Related

Point Read a subdocument Azure CosmosDB

I am using the Core API and the NodeJS API for Cosmos DB. I am trying to do a point read to save on RUs and latency. This documentation lead me to this as my "solution".
However, this makes no sense to me. I believe from similar systems that needs the item ID and partition key, but this makes no reference to the latter to top things off.
By modifying some update code, by mostly pure luck I ended up with what MIGHT be a point read but it gets the full item, not the "map" value I am looking for.
const { resource: updated } = await container
.item(
id = email,
partitionKeyValue = email
)
.read('SELECT c.map');
console.log(updated)
How do I read just the "map" value? The full document has much more in it and it would probably waste the benefit of a point read to get the whole thing.

There are two ways to read data: either via a query (where you can do a projection, grouping, filtering, etc) or by a point-read (a direct read, specifying id and partition key value), and that point-read bypasses the query engine.
Point-reads cost a bit less in RU, but could potentially consume a bit more bandwidth, as it returns an entire document (the actual underlying API call only allows for ID plus partition key value, and returns a single matching document, in its entirety).
Via query, you have flexibility to return as much or as little as you want, but the resulting operation will cost a bit more in RU.

Azure.Search.Documents - Update or insert object with collections

I need to either add new document with children or add child to already existing parent.
The only way I know how to do it is ugly:
public async Task AddOrUpdateCase(params)
{
try
{
await UpdateCase(params);
}
catch (RequestFailedException ex)
{
if (ex.Status != (int)HttpStatusCode.NotFound)
throw;
await AddCase(params);
}
}
private async Task UpdateCase(params)
{
// this line throws when document is not found
var caseResponse = await searchClient.GetDocumentAsync<Case>(params.CaseId)
// need to add to existing collection
caseResponse.Value.Children.Add(params.child);
}
I think there wouldn't be any problem if this document didn't contain collection. You cannot use MergeOrUpload if there are child collections. You need to load them from index and add element.
Is there better way to do it?

Azure Cognitive Search doesn't support partial updates to collection fields, so retrieving the entire document, modifying the relevant collection field, and sending the document back to the index is the only way to accomplish this.
The only improvement I would suggest to the code you've shown is to search for the documents you want to update instead of retrieving them one-by-one. That way, you can update them in batches. Index updates are much more expensive than queries, so to reduce overhead you should batch updates together wherever possible.
Note that if you have all the data needed to re-construct the entire document at indexing time, you can skip the step of retrieving the document first, which would be a big improvement. Azure Cognitive Search doesn't yet support concurrency control for updating documents in the index, so you're better off having a single process writing to the index anyway. This should hopefully eliminate the need to read the documents before updating and writing them back. This is assuming you're not using the search index as your primary store, which you really should avoid.
If you need to add or update items in complex collections often, it's probably a sign that you need a different data model for your index. Complex collections have limitations (see "Maximum elements across all complex collections per document") that make them impractical for scenarios where the cardinality of the parent-to-child relationship is high. For situations like this, it's better to have a secondary index that includes the "child" entities as top-level documents instead of elements of a complex collection. That has benefits for incremental updates, but also for storage utilization and some types of queries.

Redis Node JS - storing multiple objects of the same class

I want to store all objects of a class in redis cache and be able to retrive them, as I understand hashmaps are used for storing objects, but they are require a different key to be saved. So I can't save them all under key e.g. "items" and retrieve them by that key. Only way I can do it is something like this:
items.forEach(item => {
redis.hmset(`item${item.id}`, item);
}
But this feels wrong and I have to have a for loop again when I want to get this data. Is there a better solution?
Also there is a problem of associated objects, I can't find anywhere how they are stored and used in redis.

As I understand, you want to save different keys with same prefix
You can use mset to store them
For retrieving the data you use the mget
with your keys as params
In case you still want to use the hmset
Use pipline in the loop
So the call to redis will be only one with the sync action

Regarding Azure table design

I am working as freelancer and right now working on one of my game and trying to use Azure table service to log my user moves in Azure tables.
The game is based on Cards.
The flow is like this:
Many users(UserId) will be playing on a table(TableId). Each game on the table will have a unique GameId. In each game there could be multiple deals with Unique DealId.
There can be multiple deals on the same table with same gameId. Also each user will have same DealId in a single game.
Winner is decided after multiple chances of a player.
Problem:
I can make TableId as PartitionKey and but I am not sure what to chose for RowKey because combination of TableId and RowKey (GameId/UserId/DealId) should be unique in the table.
I can have entries like:
TableId GameId DealId UserId timestamp
1 201 300 12345
1 201 300 12567
May be what I can do is to create 4 Azure tables like below but I am doing a lot of duplication; also I would not be able to fire a a point query as mentioned here at https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/#guidelines-for-table-design
GameLogsByTableId -- this will have TableId as PartitionKey and GUID as RowKey
GameLogsByGameId -- this will have GameId as PartitionKey and GUID as RowKey
GameLogsByUserId -- this will have UserId as PartitionKey and GUID as RowKey
GameLogsByDealId -- this will have DealId as PartitionKey and GUID as RowKey
Thoughts please?
Format of TableId,GameId,DealId and UserId is long.
I would like to query data such that
Get me all the logs from a TableId.
Get me all the logs from a TableId and in a particular game(GameId)
Get me all the logs of a user(userid) in this game(GameId)
Get me all the logs of a user in a deal(dealId)
Get me all the logs from a table on a date; similarly for a user,game and deal

Based on my knowledge so far on Azure Tables, I believe you're on right track.
However there are certain things I would like to mention:
You could use a single table for storing all data
You don't really need to use separate tables for storing each kind of data though this approach logically separates the data nicely. If you want, you could possibly store them in a single table. If you go with single table, since these ids (Game, Table, User, and Deal) are numbers what I would recommend is to prefix the value appropriately so that you can nicely identify them. For example, when specifying PartitionKey denoting a Game Id, you can prefix the value with G| so that you know it's the Game Id e.g. G|101.
Pre-pad your Id values with 0 to make them equal length string
You mentioned that your id values are long. However the PartitionKey value is of string type. I would recommend prepadding the values so that they are of equal length. For example, when storing Game Id as PartitionKey instead of storing them as 1, 2, 103 etc. store them as 00000000001, 00000000002, 00000000103. This way when you list all Ids, they will be sorted in proper order. Without prepadding, you will get the results as 1, 10, 11, 12....19, 20.
You will loose transaction support
Since you're using multiple tables (or even single table with different PartitionKeys), you will not be able to use Entity Batch Transactions available in Azure Tables and all the inserts need to be done as atomic operations. Since each operation is a network call and can possibly fail, you may want to do that through an idempotent background process which will keep on trying inserting the data into multiple tables till the time it succeeds.
Instead of Guid for RowKey, I suggest you create a composite RowKey based on other values
This is more applicable for update scenario. Since an update requires both PartitionKey and RowKey, I would recommend using a RowKey which is created as a composition of other values. For example, if you're using TableId as PartitionKey for GameLogsByTableId, I would suggest creating a RowKey using other values e.g. U|[UserId]|D|[DealId]|G|[GameId]. This way, when you get a record to update, you automatically know how to create a RowKey instead of fetching the data first from the table.
Partition Scans
I looked at your querying requirements and almost all of them would result in Partition Scans. To avoid that, I would suggest keeping even more duplicate copies of the data. For example, consider #3 and #4 in your querying requirements. In this case, you will need to scan the entire partition for a user to find information about a Game Id and Deal Id. So please be prepared for the scenario where table service returns you nothing but continuation tokens.

Personally, unless you have absolutely massive data requirements, I would not use table storage for this. It will make your job much harder than using an SQL database; you can use any index you like, have relational integrity, and so much more. The only thing in favour of ATS is that it's cheap for large data.

Node.js Redis expiry vs fresh content

I have a classified advertisements website ala craigslist.org - just on a much smaller scale.
I'm running MongoDB and is caching all API requests in Redis where the Mongo query is the key and the value is the MongoDB result document.
Pseudo code:
// The mongo query
var query = {section: 'home', category: 'garden', 'region': 'APAC', country: 'au', city: 'sydney', limit: 100}
// Getting the mongo result..
// Storing in Redis
redisClient.set(JSON.stringify(query), result);
Now a user creates a new post in the same category, but Redis now serves up a stale record because Redis has no idea that the dataset has changed.
How can we overcome this?
I could set an expiry in general or on that particular key, but essentially mem cached keys need to expire in the moment a user creates a new post for those keys where the result set would include the newly created record.
One way is to iterate though all the Redis keys and come up with a pattern to detect which keys should be deleted based on the characteristics of the newly created record. Only this approach seem "too clever" and not quite right.
So we want to mem cache at the same time as serving up fresh content instantly.

I would avoid caching bulk query results as a single key. Redis is for use cases where you need to access and update data at very high frequency and where you benefit from use of data structures such as hashes, sets, lists, strings, or sorted sets [1]. Also, keep in mind MongoDB will already have part of the database cached in memory so you might not see much in the way of performance gains.
A better approach would be to cache each post individually. You can add keys to sets to group them into categories or even just pages (like the 20 or so posts that the user expects to see on each page). This way, every time a user makes a new post or updates an existing one, you can update the corresponding key in your Redis cache as well.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string