Caching whether or not an item in a database exists - node.js

I have a simple caching mechanism that involves me checking if an item is in a database cache (table:id#id_number), then if not I check if that item is in the database. If it is in the database, then I cache that item. If it is not, then I obviously don't.
The issue is this, in my current situation, going to be happening frequently. Every time someone visits the front page, I will see if id 10 exists, then if id 9 exists, etc.
If id 9 doesn't exist, nothing will be cached and my server will constantly be hitting my database every time someone visits my front page.
My solution right now is very dirty and could easily lead to confusion in the future. I am now caching whether or not an id in the database probably exists (pexists:table:id#id_number). If it doesn't probably exist, I just assume it doesn't exist, or if the cache isn't set. If it does probably exist, I check if it's in the cache. If it's not, only then will I hit my database. I will then cache the result from my database and whether or not it exists.
I am asking if there is a better way of achieving this effect.
/*
This method takes an amount (how many posts you need) and start
(the post id we want to start at). If, for example,
the DB has ids 1, 2, 4, 7, 9, and we ask for
amount=3 and start=7, we should return the items
with ids 7, 4, and 2.
*/
const parametersValidity = await this.assureIsValidPollsRequestAsync(
amount,
start
);
const validParameters = parametersValidity.result;
if (!validParameters) {
throw new Error(parametersValidity.err);
}
let polls = [];
for (
let i=start, pollId=start;
(i > start - amount && pollId > 0);
i--, pollId--
) {
// There is probably no poll logged in the database.
if(!(await this.pollProbablyExists(pollId))) {
i++;
continue;
}
let poll;
const cacheHash = `polls:id#${pollId}`;
if(await cache.existsAsync(cacheHash)) {
poll =
await this.findKeysFromCacheHash(
cacheHash,
"Id", "Title", "Description", "Option1", "Option2", "AuthorId"
);
} else {
// Simulates a slow database retreival thingy
// for testing.
await new Promise(resolve => {
setTimeout(resolve, 500)
});
poll = await this.getPollById(pollId);
if(typeof poll !== "undefined") {
this.insertKeysIntoCacheHash(
cacheHash, 60 * 60 * 3 /* 3 hours to expire */,
poll
);
}
}
if(typeof poll === "undefined") {
// This would happen if a user deleted a poll
// when the exists:polls:id#poll_id key in cache
// hadn't expired yet.
cache.setAsync(`exists:${cacheHash}`, 0);
cache.expire(`exists:polls:id#${pollId}`, 60 * 60 * 10 /* 10 hours. */);
i++;
} else {
polls.push(poll);
cache.setAsync(`exists:${cacheHash}`, 1);
}
}
return polls;

If I understand correctly, you want to avoid those non-existent keys hitting the database frequently.
If that's the case, a better and simpler way is to cache the non-existent keys.
If the key exists in database, you can cache it in Redis with the value gotten from database.
If the key doesn't exist in database, also cache it in Redis, but with a special value.
For example, you cache players' score in Redis. If the player doesn't exist, you also cache the key, but with -1 as the score. When searching the cache, if the key exists but the cached value is -1, that means the key doesn't exist in database.

Related

ioredis allowing to create repeated keys

Trying to LOCK key for a scheduler, but ioredis is allowing me to create multiple keys with same name.
import * as redis from '../redis'
const redCli = redis.get(); // get function that starts ioredis
scheduleJob('my-job', '*/05 * * * * *', async () => {
const key = await redCli.set('my-key', 'key-value', 'EX', 30); // 30 seconds key lifetime
console.log(`KEY: ${key}`); // always log 'OK'
if (!key) {
// log error and return. NEVER gets here.
}
// ALSO TRIED:
if (redCli.exists('my-key')...
if (await redCli.ttl('my-key')...
// continue the flow...
});
I create a key with 30 seconds of lifetime. And my scheduler runs every 5 seconds.
When I try to redCli.set() a key that already exists, shouldn't return an ERROR/FALSE? Anything but 'OK'...
Not sure why my attempts didn't work, but I solved this using get
const cachedKey = await redisClient.get('my-key');
if(cachedKey) {
// Key Already Exists
}
Like most mutating commands in Redis, the SET command in Redis is an upsert. That's just how Redis works.
From the docs for the SET command:
If key already holds a value, it is overwritten, regardless of its type. Any previous time to live associated with the key is discarded on successful SET operation.

mssql nodejs operation timed out for an unknown reason

I am trying to run several hundred thousand sql update queries using node/mssql. I am trying to:
insert each record individually (if one fails I don't want the batch to fail)
batch the queries so I don't overload the SQL server (I can open a new connection for every query but the server explodes if I do that)
With my existing code (which works 99% of the time) I occasionally get: operation timed out for an unknown reason and I'm hoping someone can suggest a fix, or improvements.
this is what I have:
try {
const sql = require("mssql");
let pool=await new sql.connect(CONFIG_OBJ)
let batchSize=1000
let queries=[
`update xxx set [AwsCoID]='10118' where [PrimaryKey]='10118-78843' IF ##ROWCOUNT=0 insert into xxx([AwsCoID]) values('10118')`,
`update or insert 2`,
`update or insert 3`,....]
for (let i = 0; i < queries.length; i += batchSize) {
let prom = queries
.slice(i, i + batchSize)
.map((qq) => pool.request().query(qq));
for (let p of await (Promise as any).allSettled(prom)) {
//make sure connection is still active after batch finishes
pool=await new sql.connect(cc)
//console.error(`promerr:`, p);
let status: "fulfilled" | "rejected" = p.status;
let value = p.value as SqlResult;
if (status != "fulfilled" || !value.isSuccess) {
console.log(`batchRunSqlCommands() promERR:`, value);
errs.push(value);
}
}
}
} catch (e) {
console.log(`batchSqlCommand err:`, e);
} finally {
pool.close();
}
For anyone else who writes something like I did, the issue is that SQL server does a table lock of the affected rows when doing an upsert. The fix is to add a clustered index that ensures each record being updated is in its own cluster, so the cluster gets locked but only one row is modified within the cluster at a time.
TLDR: set a "line unique" column (eg PrimaryKey) as the clustered index on the table.
This is not good for DB performance, but will quickly and simply solve the issue. You could also intelligently cluster groups of data, but then you would need to ensure your batch update only touched each cluster once and finished before trying to access it again.

Azure Search .net SDK- How to use "FindFailedActionsToRetry"?

Using the Azure Search .net SDK, when you try to index documents you might get an exception IndexBatchException.
From the documentation here:
try
{
var batch = IndexBatch.Upload(documents);
indexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
// Sometimes when your Search service is under load, indexing will fail for some of the documents in
// the batch. Depending on your application, you can take compensating actions like delaying and
// retrying. For this simple demo, we just log the failed document keys and continue.
Console.WriteLine(
"Failed to index some of the documents: {0}",
String.Join(", ", e.IndexingResults.Where(r => !r.Succeeded).Select(r => r.Key)));
}
How should e.FindFailedActionsToRetry be used to create a new batch to retry the indexing for failed actions?
I've created a function like this:
public void UploadDocuments<T>(SearchIndexClient searchIndexClient, IndexBatch<T> batch, int count) where T : class, IMyAppSearchDocument
{
try
{
searchIndexClient.Documents.Index(batch);
}
catch (IndexBatchException e)
{
if (count == 5) //we will try to index 5 times and give up if it still doesn't work.
{
throw new Exception("IndexBatchException: Indexing Failed for some documents.");
}
Thread.Sleep(5000); //we got an error, wait 5 seconds and try again (in case it's an intermitent or network issue
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
UploadDocuments(searchIndexClient, retryBatch, count++);
}
}
But I think this part is wrong:
var retryBatch = e.FindFailedActionsToRetry<T>(batch, arg => arg.ToString());
The second parameter to FindFailedActionsToRetry, named keySelector, is a function that should return whatever property on your model type represents your document key. In your example, your model type is not known at compile time inside UploadDocuments, so you'll need to change UploadsDocuments to also take the keySelector parameter and pass it through to FindFailedActionsToRetry. The caller of UploadDocuments would need to specify a lambda specific to type T. For example, if T is the sample Hotel class from the sample code in this article, the lambda must be hotel => hotel.HotelId since HotelId is the property of Hotel that is used as the document key.
Incidentally, the wait inside your catch block should not wait a constant amount of time. If your search service is under heavy load, waiting for a constant delay won't really help to give it time to recover. Instead, we recommend exponentially backing off (e.g. -- the first delay is 2 seconds, then 4 seconds, then 8 seconds, then 16 seconds, up to some maximum).
I've taken Bruce's recommendations in his answer and comment and implemented it using Polly.
Exponential backoff up to one minute, after which it retries every other minute.
Retry as long as there is progress. Timeout after 5 requests without any progress.
IndexBatchException is also thrown for unknown documents. I chose to ignore such non-transient failures since they are likely indicative of requests which are no longer relevant (e.g., removed document in separate request).
int curActionCount = work.Actions.Count();
int noProgressCount = 0;
await Polly.Policy
.Handle<IndexBatchException>() // One or more of the actions has failed.
.WaitAndRetryForeverAsync(
// Exponential backoff (2s, 4s, 8s, 16s, ...) and constant delay after 1 minute.
retryAttempt => TimeSpan.FromSeconds( Math.Min( Math.Pow( 2, retryAttempt ), 60 ) ),
(ex, _) =>
{
var batchEx = ex as IndexBatchException;
work = batchEx.FindFailedActionsToRetry( work, d => d.Id );
// Verify whether any progress was made.
int remainingActionCount = work.Actions.Count();
if ( remainingActionCount == curActionCount ) ++noProgressCount;
curActionCount = remainingActionCount;
} )
.ExecuteAsync( async () =>
{
// Limit retries if no progress is made after multiple requests.
if ( noProgressCount > 5 )
{
throw new TimeoutException( "Updating Azure search index timed out." );
}
// Only retry if the error is transient (determined by FindFailedActionsToRetry).
// IndexBatchException is also thrown for unknown document IDs;
// consider them outdated requests and ignore.
if ( curActionCount > 0 )
{
await _search.Documents.IndexAsync( work );
}
} );

Concurrent writing to redis in node.js

In my node.js application I read messages from AWS Kinesis stream, and I need store all messages, for last minute in cache (Redis). I run next code in one node worker:
var loopCallback = function(record) {
var nowMinute = moment.utc(record.Data.ts).minute();
//get all cached kinesis records
var key = "kinesis";
cache.get(key,function (err, cachedData) {
if (err) {
utils.logError(err);
} else {
if(!cachedData) {
cachedData = [];
} else {
cachedData = JSON.parse(cachedData);
}
//get records with the same minute
var filtered = _.filter(cachedData, function (item) {
return moment.utc(item.ts).minute() === nowMinute;
});
filtered.push(record.Data);
cache.set(key, JSON.stringify(filtered), function (saveErr) {
if (saveErr) {
utils.logError(saveErr);
}
//do other things with record;
});
}
});
};
Most of the records (few dozens) I receive exactly in the same moment. So when I try to save it, some records are not stored.
I uderstand it happen due to race condition.
Node reads old version of array from Redis and overwrites array while it writes another record to cache.
I have read about redis transactions, but as I understand it will not help me, because only one transaction will be completed, and other will be rejected.
There is way to save all records to cache in my case?
Thank you
You could use a sorted set, with the score being a Unix timestamp
ZADD kinesis <unixtimestamp> "some data to be cached"
To get the elements added less than one minute ago, create a timestamp for (now - 60 seconds) then use ZRANGEBYSCORE to get the oldest element first:
ZRANGEBYSCORE myzset -inf (timestamp
or ZREVRANGEBYSCORE if you want the newest element first:
ZRANGEBYSCORE myzset -inf (timestamp
To remove the elements older than one minute, create a timestamp for (now - 60 seconds) then use ZREMRANGEBYSCORE
ZREMRANGEBYSCORE myzset -inf (timestamp

With bookshelf.js, how do I update a model record atomically?

I have a one-to-many relationship with a JOB model and many TASK(s). I have a route for individual tasks, where I fetch the TASK model for display, and some data from its JOB model. When I request a TASK, I need to update the locked and a user_id fields, so I can lock the task and show who has it locked, so other users can't access that task view. Therefore, I need to be guaranteed the task has locked=0, and instantly update that field with a time stamp.
My current router code is:
var route_task = function(req, res, next) {
new Model.Task({id: req.params.id}).fetch(withRelated: ['jobs']})
.then(function(task) {
if (!task) {
res.status(404);
res.render('404_tpl');
return;
}
if (task.get('locked') !== 0) {
res.status(403);
res.render('403_tpl', {title: '403 - Access Denied - Task is locked'});
return;
}
else {
/* I would update it here, but there's a slim */
/* chance someone else can come in and select */
/* the task. */
}
/* .. I set some res.locals vals from my task here .. */
var jobs = task.related('jobs');
jobs.fetch().then(function(job) {
/* .. I set some res.local vals here from my jobs .. */
switch (task.get('task_type')) {
case 0:
res.render('task_alpha_tpl');
break;
/* ... */
}
});
})
}
When I hit my router for a particular task ID, I pretty much want to select * where tasks.id = id and locked = 0, and then set locked with the current timestamp, but, I need to be able to determine if the record with that ID didn't exist, or if it did, but was just locked.
I hope this makes sense. I'm coming from the C and PHP world, so I'm slowly learning async programming.
I think you should do it in a transaction I guess if you want the value not to change, I don't think you should use semaphore or other stuff in the client side to simulate a critical section or something in that mindset.
The general purpose solution for this problem is to run something in the form of:
UPDATE task SET locked=1,user=? WHERE job=? AND locked=0
And then check that the update actually modified at least one row.
If you're doing that in node.js, then I would do something like:
Tasks.forge().where({job: req.param('code'), locked:0}).save({locked:1},{method:"update"});

Resources