ioredis allowing to create repeated keys - node.js

Trying to LOCK key for a scheduler, but ioredis is allowing me to create multiple keys with same name.
import * as redis from '../redis'
const redCli = redis.get(); // get function that starts ioredis
scheduleJob('my-job', '*/05 * * * * *', async () => {
const key = await redCli.set('my-key', 'key-value', 'EX', 30); // 30 seconds key lifetime
console.log(`KEY: ${key}`); // always log 'OK'
if (!key) {
// log error and return. NEVER gets here.
}
// ALSO TRIED:
if (redCli.exists('my-key')...
if (await redCli.ttl('my-key')...
// continue the flow...
});
I create a key with 30 seconds of lifetime. And my scheduler runs every 5 seconds.
When I try to redCli.set() a key that already exists, shouldn't return an ERROR/FALSE? Anything but 'OK'...

Not sure why my attempts didn't work, but I solved this using get
const cachedKey = await redisClient.get('my-key');
if(cachedKey) {
// Key Already Exists
}

Like most mutating commands in Redis, the SET command in Redis is an upsert. That's just how Redis works.
From the docs for the SET command:
If key already holds a value, it is overwritten, regardless of its type. Any previous time to live associated with the key is discarded on successful SET operation.

Related

what does the function incrementTransactionNumber() do in mongodb node driver?

I know the function's name seems to be self explanatory, however, after researching for quite a while I can't find a transaction number anywhere within a clientSession.
Is it an internal number ? is it possible to get it ?
Transaction numbers are used by mongodb to keep track of operations(read/writes) per transaction per session. Sessions can be started either explicitly by calling startSession() or implicity whenever you create a mongodb connection to db server.
How incrementTransactionNumber() works with sessions (explicit)
When you start a session, by calling client.startSession() method, it will create a new ClientSession. This takes in already created server session pool as one of its' constructor parameters. (See) These server sessions have a property called txnNumber which is initialized to be 0.(Init) So whenever you start a transaction by calling startTransaction(), client session object calls incrementTransactionNumber() internally to increment the txnNumber in server session. And all the successive operations will use the same txnNumber, until you call, commitTransaction() or abortTransaction() methods. Reason that you can't find it anywhere within clientSession is, it is a property of serverSession not clientSession.
ServerSession
class ServerSession {
constructor() {
this.id = { id: new Binary(uuidV4(), Binary.SUBTYPE_UUID) };
this.lastUse = now();
this.txnNumber = 0;
this.isDirty = false;
}
So whenever you try to send a command to database (read/write), this txnNumber would be sent along with it. (Assign transaction number to command)
This is to keep track of database operations that belong to a given transaction per session. (A transaction operation history that uniquely identify each transaction per session.)
How incrementTransactionNumber() works with sessions (implicit)
In this case it would be called every time a new command is issued to the database if that command does not belong to a transaction and it's a write operation where retryWrites are enabled. So each new write operation would have new transaction number as long as it does not belong to a explicitly started transaction with startTransaction(). But in this case as well a txnNumber would be sent along with each command.
execute_operation.
const willRetryWrite =
topology.s.options.retryWrites === true &&
session &&
!inTransaction &&
supportsRetryableWrites(server) &&
operation.canRetryWrite;
if (
operation.hasAspect(Aspect.RETRYABLE) &&
((operation.hasAspect(Aspect.READ_OPERATION) && willRetryRead) ||
(operation.hasAspect(Aspect.WRITE_OPERATION) && willRetryWrite))
) {
if (operation.hasAspect(Aspect.WRITE_OPERATION) && willRetryWrite) {
operation.options.willRetryWrite = true;
session.incrementTransactionNumber();
}
operation.execute(server, callbackWithRetry);
return;
}
operation.execute(server, callback);
Also read this article as well. And yes if you need you can get the transaction number for any session through txnNumber property, clientSession.serverSession.txnNumber.

Run a Cron Job every 30mins after onCreate Firestore event

I want to have a cron job/scheduler that will run every 30 minutes after an onCreate event occurs in Firestore. The cron job should trigger a cloud function that picks the documents created in the last 30 minutes-validates them against a json schema-and saves them in another collection.How do I achieve this,programmatically writing such a scheduler?
What would also be fail-safe mechanism and some sort of queuing/tracking the documents created before the cron job runs to push them to another collection.
Building a queue with Firestore is simple and fits perfectly for your use-case. The idea is to write tasks to a queue collection with a due date that will then be processed when being due.
Here's an example.
Whenever your initial onCreate event for your collection occurs, write a document with the following data to a tasks collection:
duedate: new Date() + 30 minutes
type: 'yourjob'
status: 'scheduled'
data: '...' // <-- put whatever data here you need to know when processing the task
Have a worker pick up available work regularly - e.g. every minute depending on your needs
// Define what happens on what task type
const workers: Workers = {
yourjob: (data) => db.collection('xyz').add({ foo: data }),
}
// The following needs to be scheduled
export const checkQueue = functions.https.onRequest(async (req, res) => {
// Consistent timestamp
const now = admin.firestore.Timestamp.now();
// Check which tasks are due
const query = db.collection('tasks').where('duedate', '<=', new Date()).where('status', '==', 'scheduled');
const tasks = await query.get();
// Process tasks and mark it in queue as done
tasks.forEach(snapshot => {
const { type, data } = snapshot.data();
console.info('Executing job for task ' + JSON.stringify(type) + ' with data ' + JSON.stringify(data));
const job = workers[type](data)
// Update task doc with status or error
.then(() => snapshot.ref.update({ status: 'complete' }))
.catch((err) => {
console.error('Error when executing worker', err);
return snapshot.ref.update({ status: 'error' });
});
jobs.push(job);
});
return Promise.all(jobs).then(() => {
res.send('ok');
return true;
}).catch((onError) => {
console.error('Error', onError);
});
});
You have different options to trigger the checking of the queue if there is a task that is due:
Using a http callable function as in the example above. This requires you to perform a http call to this function regularly so it executes and checks if there is a task to be done. Depending on your needs you could do it from an own server or use a service like cron-job.org to perform the calls. Note that the HTTP callable function will be available publicly and potentially, others could also call it. However, if you make your check code idempotent, it shouldn't be an issue.
Use the Firebase "internal" cron option that uses Cloud Scheduler internally. Using that you can directly trigger the queue checking:
export scheduledFunctionCrontab =
functions.pubsub.schedule('* * * * *').onRun((context) => {
console.log('This will be run every minute!');
// Include code from checkQueue here from above
});
Using such a queue also makes your system more robust - if something goes wrong in between, you will not loose tasks that would somehow only exist in memory but as long as they are not marked as processed, a fixed worker will pick them up and reprocess them. This of course depends on your implementation.
You can trigger a cloud function on the Firestore Create event which will schedule the Cloud Task after 30 minutes. This will have queuing and retrying mechanism.
An easy way is that you could add a created field with a timestamp, and then have a scheduled function run at a predefined period (say, once a minute) and execute certain code for all records where created >= NOW - 31 mins AND created <= NOW - 30 mins (pseudocode). If your time precision requirements are not extremely high, that should work for most cases.
If this doesn't suit your needs, you can add a Cloud Task (Google Cloud product). The details are specified in this good article.

Do not process next job until previous job is completed (BullJS/Redis)?

Basically, each of the clients ---that have a clientId associated with them--- can push messages and it is important that a second message from the same client isn't processed until the first one is finished processing (Even though the client can send multiple messages in a row, and they are ordered, and multiple clients sending messages should ideally not interfere with each other). And, importantly, a job shouldn't be processed twice.
I thought that using Redis I might be able to fix this issue, I started with some quick prototyping using the bull library, but I am clearly not doing it well, I was hoping someone would know how to proceed.
This is what I tried so far:
Create jobs and add them to the same queue name for one process, using the clientId as the job name.
Consume jobs while waiting large random amounts of random time on 2 separate process.
I tried adding the default locking provided by the library that I am using (bull) but it locks on the jobId, which is unique for each job, not on the clientId .
What I would want to happen:
One of the consumers can't take the job from the same clientId until the previous one is finished processing it.
They should be able to, however, get items from different clientIds in parallel without problem (asynchronously). (I haven't gotten this far, I am right now simply dealing with only one clientId)
What I get:
Both consumers consume as many items as they can from the queue without waiting for the previous item for the clientId to be completed.
Is Redis even the right tool for this job?
Example code
// ./setup.ts
import Queue from 'bull';
import * as uuid from 'uuid';
// Check that when a message is taken from a place, no other message is taken
// TO do that test, have two processes that process messages and one that sets messages, and make the job take a long time
// queue for each room https://stackoverflow.com/questions/54178462/how-does-redis-pubsub-subscribe-mechanism-works/54243792#54243792
// https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
export async function sleep(ms: number): Promise<void> {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
export interface JobData {
id: string;
v: number;
}
export const queue = new Queue<JobData>('messages', 'redis://127.0.0.1:6379');
queue.on('error', (err) => {
console.error('Uncaught error on queue.', err);
process.exit(1);
});
export function clientId(): string {
return uuid.v4();
}
export function randomWait(minms: number, maxms: number): Promise<void> {
const ms = Math.random() * (maxms - minms) + minms;
return sleep(ms);
}
// Make a job not be called stalled, waiting enough time https://github.com/OptimalBits/bull/issues/210#issuecomment-190818353
// eslint-disable-next-line #typescript-eslint/ban-ts-comment
//#ts-ignore
queue.LOCK_RENEW_TIME = 5 * 60 * 1000;
// ./create.ts
import { queue, randomWait } from './setup';
const MIN_WAIT = 300;
const MAX_WAIT = 1500;
async function createJobs(n = 10): Promise<void> {
await randomWait(MIN_WAIT, MAX_WAIT);
// always same Id
const clientId = Math.random() > 1 ? 'zero' : 'one';
for (let index = 0; index < n; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
const job = { id: clientId, v: index };
await queue.add(clientId, job).catch(console.error);
console.log('Added job', job);
}
}
export async function create(nIds = 10, nItems = 10): Promise<void> {
const jobs = [];
await randomWait(MIN_WAIT, MAX_WAIT);
for (let index = 0; index < nIds; index++) {
await randomWait(MIN_WAIT, MAX_WAIT);
jobs.push(createJobs(nItems));
await randomWait(MIN_WAIT, MAX_WAIT);
}
await randomWait(MIN_WAIT, MAX_WAIT);
await Promise.all(jobs)
process.exit();
}
(function mainCreate(): void {
create().catch((err) => {
console.error(err);
process.exit(1);
});
})();
// ./consume.ts
import { queue, randomWait, clientId } from './setup';
function startProcessor(minWait = 5000, maxWait = 10000): void {
queue
.process('*', 100, async (job) => {
console.log('LOCKING: ', job.lockKey());
await job.takeLock();
const name = job.name;
const processingId = clientId().split('-', 1)[0];
try {
console.log('START: ', processingId, '\tjobName:', name);
await randomWait(minWait, maxWait);
const data = job.data;
console.log('PROCESSING: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('PROCESSED: ', processingId, '\tjobName:', name, '\tdata:', data);
await randomWait(minWait, maxWait);
console.log('FINISHED: ', processingId, '\tjobName:', name, '\tdata:', data);
} catch (err) {
console.error(err);
} finally {
await job.releaseLock();
}
})
.catch(console.error); // Catches initialization
}
startProcessor();
This is run using 3 different processes, which you might call like this (Although I use different tabs for a clearer view of what is happening)
npx ts-node consume.ts &
npx ts-node consume.ts &
npx ts-node create.ts &
I'm not familir with node.js. But for Redis, I would try this,
Let's say you have client_1, client_2, they are all publisher of events.
You have three machines, consumer_1,consumer_2, consumer_3.
Establish a list of tasks in redis, eg, JOB_LIST.
Clients put(LPUSH) jobs into this JOB_LIST, in a specific form, like "CLIENT_1:[jobcontent]", "CLIENT_2:[jobcontent]"
Each consumer takes out jobs blockingly (RPOP command of Redis) and process them.
For example, consumer_1 takes out a job, content is CLIENT_1:[jobcontent]. It parses the content and recognize it's from CLIENT_1. Then it wants to check if some other consumer is processing CLIENT_1 already, if not, it will lock the key to indicate that it's processing CLIENT_1.
It goes on to set a key of "CLIENT_1_PROCESSING" , with content as "consumer_1", using the Redis SETNX command (set if the key not exists), with an appropriate timeout. For example, the task norally takes one minute to finish, you set a timeout of the key of five minutes, just in case consumer_1 crashes and holds on the lock indefinitely.
If the SETNX returns 0, it means it fails to acquire the lock of CLIENT_1 (someone is already processing a job of client_1). Then it returns the job (a value of "CLIENT_1:[jobcontent]")to the left side of JOB_LIST, by using Redis LPUSH command.Then it might wait a bit (sleep a few seconds), and RPOP another task from the right side of the LIST. If this time SETNX returns 1, consumer_1 acquires the lock. It goes on to process job, after it finishes, it deletes the key of "CLIENT_1_PROCESSING", releasing the lock. Then it goes on to RPOP another job, and so on.
Some things to consider:
The JOB_LIST is not fair,eg, earlier jobs might be processed later
The locking part is a bit rudimentary, but will suffice.
----------update--------------
I've figured another way to keep tasks in order.
For each client(producer), build a list. Like "client_1_list", push jobs into the left side of the list.
Save all the client names in a list "client_names_list", with values "client_1", "client_2", etc.
For each consumer(processor), iterate the "client_names_list", for example, consumer_1 get a "client_1", check if the key of client_1 is locked(some one is processing a task of client_1 already), if not, right pop a value(job) from client_1_list and lock client_1. If client_1 is locked, (probably sleep one second) and iterate to the next client, "client_2", for example, and check the keys and so on.
This way, each client(task producer)'s task is processed by their order of entering.
EDIT: I found the problem regarding BullJS is starting jobs in parallel on one processor: We are using named jobs and where defining many named process functions on one queue/processor. The default concurrency factor for a queue/processor is 1. So the queue should not process any jobs in parallel.
The problem with our mentioned setup is if you define many (named) process-handlers on one queue the concurrency is added up with each process-handler function: So if you define three named process-handlers you get a concurrency factor of 3 for given queue for all the defined named jobs.
So just define one named job per queue for queues where parallel processing should not happen and all jobs should run sequentially one after the other.
That could be important e.g. when pushing a high number of jobs onto the queue and the processing involves API calls that would give errors if handled in parallel.
The following text is my first approach of answering the op's question and describes just a workaround to the problem. So better just go with my edit :) and configure your queues the right way.
I found an easy solution to operators question.
In fact BullJS is processing many jobs in parallel on one worker instance:
Let's say you have one worker instance up and running and push 10 jobs onto the queue than possibly that worker starts all processes in parallel.
My research on BullJS-queues gave that this is not intended behavior: One worker (also called processor by BullJS) should only start a new job from the queue when its in idle state so not processing a former job.
Nevertheless BullJS keeps starting jobs in parallel on one worker.
In our implementation that lead to big problems during API calls that most likely are caused by t00 many API calls at a time. Tests gave that when only starting one worker the API calls finished just fine and gave status 200.
So how to just process one job after the other once the previous is finished if BullJS does not do that for us (just what the op asked)?
We first experimented with delays and other BullJS options but thats kind of workaround and not the exact solution to the problem we are looking for. At least we did not get it working to stop BullJS from processing more than one job at a time.
So we did it ourself and started one job after the other.
The solution was rather simple for our use case after looking into BullJS API reference (BullJS API Ref).
We just used a for-loop to start the jobs one after another. The trick was to use BullJS's
job.finished
method to get a Promise.resolve once the job is finished. By using await inside the for-loop the next job gets just started immediately after the job.finished Promise is awaited (resolved). Thats the nice thing with for-loops: Await works in it!
Here a small code example on how to achieve the intended behavior:
for (let i = 0; i < theValues.length; i++) {
jobCounter++
const job = await this.processingQueue.add(
'update-values',
{
value: theValues[i],
},
{
// delay: i * 90000,
// lifo: true,
}
)
this.jobs[job.id] = {
jobType: 'socket',
jobSocketId: BackgroundJobTasks.UPDATE_VALUES,
data: {
value: theValues[i],
},
jobCount: theValues.length,
jobNumber: jobCounter,
cumulatedJobId
}
await job.finished()
.then((val) => {
console.log('job finished:: ', val)
})
}
The important part is really
await job.finished()
inside the for loop. leasingValues.length jobs get started all just one after the other as intended.
That way horizontally scaling jobs across more than one worker is not possible anymore. Nevertheless this workaround is okay for us at the moment.
I will get in contact with optimalbits - the maker of BullJS to clear things out.

how to set a timeout with node for redis request

I've written a simple service using redis to store data in memory or fetch from disc and then store in memory. I'm trying to now control for rare cases where fetching to redis is slow. I've seen one example (https://gist.github.com/stockholmux/3a4b2d1480f27df8be67#file-timelimitedredis-js) which appears to solve this problem but I've had trouble implementing.
The linked implementation is:
/**
* returns a function that acts like the Redis command indicated by cmd except that it will time out after a given number of milliseconds
*
* #param {string} cmd The redis commmand to execute ('get','hset','sort', etc.)
* #param {integer} timeLimit The number of milliseconds to wait until returning an error to the callback.
*
*/
function timeLimited(cmd, timeLimit) {
return function() {
var
argsAsArr = Array.prototype.slice.call(arguments),
cb = argsAsArr.pop(),
timeoutHandler;
timeoutHandler = setTimeout(function(){
cb(new Error('Redis timed out'));
cb = function() {};
}, timeLimit);
argsAsArr.push(function(err, values){
clearTimeout(timeoutHandler);
cb(err,values);
});
client[cmd].apply(client,argsAsArr);
};
}
however I don't understand how to implement this because client is never defined and the the redis key/value are never passed in. Could someone explain a little about how one could go about implementing this example? I've been searching for more information or a working example but not had any luck so far. Thank you.
This isn't very clearly written but when you call it with cmd (eg. SET, HSET, etc) and time limit it returns a function. You call this returned function with the values. I don't know where client comes from, I guess you need to have it in scope. This isn't very good code, I would suggest posting what you've written and asking how to achieve what you want with that.

Caching whether or not an item in a database exists

I have a simple caching mechanism that involves me checking if an item is in a database cache (table:id#id_number), then if not I check if that item is in the database. If it is in the database, then I cache that item. If it is not, then I obviously don't.
The issue is this, in my current situation, going to be happening frequently. Every time someone visits the front page, I will see if id 10 exists, then if id 9 exists, etc.
If id 9 doesn't exist, nothing will be cached and my server will constantly be hitting my database every time someone visits my front page.
My solution right now is very dirty and could easily lead to confusion in the future. I am now caching whether or not an id in the database probably exists (pexists:table:id#id_number). If it doesn't probably exist, I just assume it doesn't exist, or if the cache isn't set. If it does probably exist, I check if it's in the cache. If it's not, only then will I hit my database. I will then cache the result from my database and whether or not it exists.
I am asking if there is a better way of achieving this effect.
/*
This method takes an amount (how many posts you need) and start
(the post id we want to start at). If, for example,
the DB has ids 1, 2, 4, 7, 9, and we ask for
amount=3 and start=7, we should return the items
with ids 7, 4, and 2.
*/
const parametersValidity = await this.assureIsValidPollsRequestAsync(
amount,
start
);
const validParameters = parametersValidity.result;
if (!validParameters) {
throw new Error(parametersValidity.err);
}
let polls = [];
for (
let i=start, pollId=start;
(i > start - amount && pollId > 0);
i--, pollId--
) {
// There is probably no poll logged in the database.
if(!(await this.pollProbablyExists(pollId))) {
i++;
continue;
}
let poll;
const cacheHash = `polls:id#${pollId}`;
if(await cache.existsAsync(cacheHash)) {
poll =
await this.findKeysFromCacheHash(
cacheHash,
"Id", "Title", "Description", "Option1", "Option2", "AuthorId"
);
} else {
// Simulates a slow database retreival thingy
// for testing.
await new Promise(resolve => {
setTimeout(resolve, 500)
});
poll = await this.getPollById(pollId);
if(typeof poll !== "undefined") {
this.insertKeysIntoCacheHash(
cacheHash, 60 * 60 * 3 /* 3 hours to expire */,
poll
);
}
}
if(typeof poll === "undefined") {
// This would happen if a user deleted a poll
// when the exists:polls:id#poll_id key in cache
// hadn't expired yet.
cache.setAsync(`exists:${cacheHash}`, 0);
cache.expire(`exists:polls:id#${pollId}`, 60 * 60 * 10 /* 10 hours. */);
i++;
} else {
polls.push(poll);
cache.setAsync(`exists:${cacheHash}`, 1);
}
}
return polls;
If I understand correctly, you want to avoid those non-existent keys hitting the database frequently.
If that's the case, a better and simpler way is to cache the non-existent keys.
If the key exists in database, you can cache it in Redis with the value gotten from database.
If the key doesn't exist in database, also cache it in Redis, but with a special value.
For example, you cache players' score in Redis. If the player doesn't exist, you also cache the key, but with -1 as the score. When searching the cache, if the key exists but the cached value is -1, that means the key doesn't exist in database.

Resources