I need to create a webjob that runs 2 processes (maybe more).
All the time.
Process 1 (Continuous)
Get messages from the queue
for each message connect to db and update a value.
repeat 1
Process 2 (schedule - every day early in the morning)
Go to db and move records a tmp table
Send each record vía HTTP
if a record cant sent, retry for all day.
if all records were sent, run again tomorrow
According to the 2 processes (should be more), Can I create one single web job for all processes ? or should I create a single job for each process?
I was thinking about this implementation, but I don't know how accurate it is.
crojobs: 1
Type: Continuous
while(true){
process1();
process2();
}
async function process1() {
// do staff
}
async function process2() {
// do staff
// node-cron lib schedule: (every early morning day)
}
According to the 2 processes (should be more), Can I create one single web job for all processes ? or should I create a single job for each process?
In short, you need to create a single job for each process.
When you run webjob with continuous or schedule, the webjob type is work for all the webjob in it. So you can not create one single webjob which both have continuous and schedule process. For more details, you could refer to this article.
Related
A Node.js app running, which only processes queues from Redis using Bull.js. The job performs a task and then it adds another entry in Redis with same data. Like this:
job({dataId}) {
// Task
queue.add({dataId}, {delay});
}
There is only one job running per dataId and it's been made like this because I need the delay of the queue to be random every time it's processed.
The problem:
Given time with the app running, the jobs will be duplicated as there will be more than one job with the same dataId being processed by Bull.js. I only got realized that when I used the Task Force to see the status.
I can check if there are any jobs running with the dataId before adding it into the queue but that definitely would not be optimal, so I'd like to know what's causing it so I can prevent it from happening.
I am using the Azure service bus queue for one of my requirements. The requirement is simple, an azure function will act as an API and creates multiple jobs in the queue. The function is scalable and on-demand new instance creation. The job which microservice creates will be processed by a windows service. So the sender is Azure function and the receiver is windows service. Since the azure function is scalable, there will be multiple numbers of functions will be executed in parallel. So, the number of jobs getting created into the queue will be in parallel, and probably one job in every 500MS. Windows service is a single instance that is a Queue listener listens to this Queue and executes in parallel. So, the number of senders might be more, the receiver is one instance. And each job can run in parallel must be limited(4, since it takes more time and CPU) Right now, I am using Aure Service Bus Queue with the following configuration. My doubt is which configuration produces the best performance for this particular requirement.
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created. My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Receiving messages in PeekLock receive mode will less performant than ReceiveAndDelete. You'll be saving roundtrips to the broker to complete messages.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
MaxDeliveryCount is how many times a message can be attempted before it's dead-lettered. It appears to be equal to the number of cores, but it shouldn't. Could be just a coincidence.
MessageLockDuration will only matter if you use PeekLock receive mode. For ReceiveAndDelete it won't matter.
As for Concurrency, even though your work is CPU bound, I'd benchmark if higher concurrency would be possible.
An additional parameter on the message receiver to look into would be PrefetchCount. It can improve the overall performance by making fewer roundtrips to the broker.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The listener will immediately start processing the 5th message as your concurrency is set to 4 and one message processing has been completed.
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created.
To process messages in the order they were sent in you will need to send and receive messages using sessions.
My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
There are multiple things to take into consideration. The location of your Windows Service location would impact the latency and message throughput. Scaling out could help, etc.
So suppose that you have an application that lets user request a job. For example (hypothetical): user uploads a video. There is an entry made in RDBMs with the URL to video on blob and the status is set to "Pending".
There is a recurring time triggered functionapp that is executed every 10 seconds or so which gets 10 pending jobs from RDBMS and performs some compression etc.
The problem here is that as long as the number of requests stay 10-30 videos per 10 seconds we should be fine. But if the number of requests increase all of a sudden .. say 200 requests per 10 seconds this would mean that there will be a lot of job pending and the user would have to wait 10 times longer than usual to see status change. How do you scale out function app automatically in such scenario? Does it have to be manual?
There's an easier way to get fan out and parallel processing through multiple concurrently running Azure Functions.
Add an Azure Service Bus Queue to your solution.
For each video that needs to be processed, enqueue a service bus message with the appropriate data you'll need to retrieve and process the video (like the BlobId).
Have your Azure Function triggered by an ServiceBusTrigger.
Azure will spin up additional instances of your Azure Function as the queue depth increases. It'll also scale in idle instances after there's no more data to process.
I'm creating an app that uses a JobQueue using Amazon SQS.
Every time a user logs in, I create a bunch of jobs for that specific user, and I want him to wait until all his jobs have been processed before taking the user to a specific screen.
My problem is that I don't know how to query the queue to see if there are still pending jobs for a specific user, or how is the correct way to implement such solution.
Everything regarding the queue (Job creation and processing is working as expected). But I am missing that final step.
Just for the record:
In my previous implementation I was using Redis + Kue and I had created a key with the user Id and the job count, every time a job was added that job count was incremented, and every time a job finished or failed I decremented that count. But now I want to move away from Redi + Kue and I am not sure how to implement this step.
Amazon SQS is not the ideal tool for the scenario you describe. A queueing system is normally used in a "Send and Forget" situation, where the sending system doesn't remain interested in later processing.
You could investigate Amazon Simple Workflow (SWF), which allows work to be monitored as it goes through several processes. Your existing code could mostly be re-used, just with the SWF framework added. Or even power it from Lambda, since you are already using node.js.
I have an Azure Webjob that copies large CSVs (500 MB to 10+ GB) into a SQL Azure table. I get a new CSV every day and I only retain records for 1 month, because it's expensive to keep them in SQL, so they are pretty volatile.
To get them started, I bulk uploaded last month's data (~200 GBs) and I'm seeing all 30 CSV files getting processed at the same time. This causes a pretty crazy backup in the uploads, as shown by this picture:
I have about 5 pages that look like this counting all of the retries.
If I upload them 2 at a time, everything works great! But as you can see from the running times, some can take over 14 hours to complete.
What I want to do is bulk upload 30 CSVs and have the Webjob only process 3 of the files at a time, then once one completes, start the next one. Is this possible with the current SDK?
Yes, absolutely possible.
Assuming the pattern you are using here is to use Scheduled or On-Demand WebJobs that pop a message on a queue which is then picked up by a constantly running WebJob that processes messages from the queue and then does the work you can use the JobHost.Queues.BatchSize property to limit the number of queue messages that can be processed at one time. H
static void Main()
{
JobHostConfiguration config = new JobHostConfiguration();
//AzCopy cannot be invoked multiple times in the same host
//process, so read and process one message at a time
config.Queues.BatchSize = 1;
var host = new JobHost(config);
host.RunAndBlock();
}
If you would like to see what this looks like in action feel free to clone this GitHub repo I published recently on how to use WebJobs and AzCopy to create your own Blob Backup service. I had the same problem you're facing which is I could not run too many jobs at once.
https://github.com/markjbrown/AzCopyBackup
Hope that is helpful.
Edit, I almost forgot. While you can change the BatchSize property above you can also take advantage of having multiple VM's host and process these jobs too which basically allows you to scale this into multiple, independent, parallel processes. You may find that you can scale up the number of VM's and process the data very quickly instead of having to throttle it using BatchSize.