On demand task processing in Java EE - multithreading

I am working on task processing server side application. Use case for application is:
User submit his item.
Server accept item and add to waiting queue, if task executor is busy.
User get that status of item is submitted.
If item is on top of queue server run it as long running task and save result to database.
User refresh application and application get result of execution from database.
It looks like model case, but I do not have experience in this type of applications.
So I search web and found JSR 352, batch processing, which use case is similar but its batch, no single item, so I do not know if it is good solution for my case. But it has nice design, and it is easy to understand.
Also I found this article http://java.dzone.com/articles/design-flexible-and-scalable which looks good.
So are there any other patterns for task processing application? Or what will be best solution?
Also it should be possible to make task executing in multiple threads.
Thanks for point me to right direction :)

Related

How to poll another server from Node.js?

I'm currently developing a Shopify app with Node/Express and a Postgres database. When a user registers an account and connects their Shopify store, I'll need to download all of their store's orders. They could have 100,000s of orders, so I'd like to use a Shopify GraphQL Bulk Operation. While Shopify is handling this, my Node server will need to poll the Shopify server to check on the progress, and when the operation is complete, Shopify will send me a link where I can download all of the data. Once the data is processed and stored in my database, I'll send the user an email to say that their account is now set up.
How should I handle polling the Shopify server? The process could take anywhere from a few mins to hours. Using setInterval() would be a bad idea right? Because if the server restarts for whatever reason, It will lose the interval? So, should I use some sort of background task? And would I need to store anything in my database? I've researched cron jobs, child processes, worker threads, the bull package -- and it's left me a little confused.
(I also know that I could use a webhook, but Shopify offers no guarantees that my app will receive the webhook.)
Upon installation, launch a background job labeled "GetCustomerOrders". As you know, background jobs are mature, and nicely handle problems. For example, they can retry themselves if something goes wrong.
The Background job itself just sets up the Bulk Download and then settles into Poll. Polling is no big deal and just happens. As you said, could be minutes, could take hours. Nevertheless, a poll gets status on a bulk download, and that can even be hot-rodded. For example, you poll with an ID. So you poll till that ID completes. Regardless of restarts.
At the end of that rather simple setup, you get an URL to download and parse JSON. Spawn another job even for that. Endless fun. Why sweat it? Background jobs are the way to go.
The Webhook idea is OK but as the documentation says, they are not 100% and CRON is bush-league in that it misses out on the mature development of jobs in queues and is more like a simple trigger. Relying on CRON to start something is fine, but gives you zero management over what it starts.
I am guessing NodeJS has a decent background job system by this time. When you look at Sidekiq for Ruby you realize what awesome is. Surely you can find a copycat in Node that comes close anyway.

SQS: Know remaining jobs

I'm creating an app that uses a JobQueue using Amazon SQS.
Every time a user logs in, I create a bunch of jobs for that specific user, and I want him to wait until all his jobs have been processed before taking the user to a specific screen.
My problem is that I don't know how to query the queue to see if there are still pending jobs for a specific user, or how is the correct way to implement such solution.
Everything regarding the queue (Job creation and processing is working as expected). But I am missing that final step.
Just for the record:
In my previous implementation I was using Redis + Kue and I had created a key with the user Id and the job count, every time a job was added that job count was incremented, and every time a job finished or failed I decremented that count. But now I want to move away from Redi + Kue and I am not sure how to implement this step.
Amazon SQS is not the ideal tool for the scenario you describe. A queueing system is normally used in a "Send and Forget" situation, where the sending system doesn't remain interested in later processing.
You could investigate Amazon Simple Workflow (SWF), which allows work to be monitored as it goes through several processes. Your existing code could mostly be re-used, just with the SWF framework added. Or even power it from Lambda, since you are already using node.js.

JMeter Load Test on online purchasing

I would like to talk some issues that I facing. Currently using JMeter to test the purchase train ticket on website. My question is how should I test the website. Example like, user will require to fill in quantity and information in order to purchase train ticket. After fill in, there is the "confirm" button to click and link to the payment gateway. I want to test, whether server able to hold 100 times on confirm button in 1second without slow down the performance. I would like to view the result of latency,ms of server when user press the confirm button. I have try to use HTTPS test recorder, but it seem like is a wrong way to test the server. Can anyone provide solution or guide me? Thank You very much.
HTTP(S) Test Script Recorder is good for building test scenario "skeleton" - it basically speeds up the process of initial test plan draft creation. In other case you would have to use a sniffer tool to capture browser requests and manually create relevant HTTP Request samplers.
The first thing you need to do is verify how does recorded (or developed) script works for a single user. Add View Results Tree listener and inspect responses. If everything is fine - you're good to go. If not (i.e. if there are errors or you hitting one page instead of performing transaction) - you'll need to look into the requests a little bit deeper and may have to do some correlation - the process of extracting some response bit with a JMeter PostProcessor, converting it into a JMeter Variable and adding as a parameter to the next request.
Once you're happy with how your scenario behaves don't forget to disable View Results Tree listener. Now it's time to define a load pattern. JMeter works as follows:
it kicks off threads (each thread represents a virtual user) within the time frame defined by ramp-up time.
each thread starts executing samplers upside down (or according to the Logic Controllers)
if there are no more samplers to execute or loops to iterate - thread is being shut down.
So make sure that you provide enough loops so your threads have enough job to do, otherwise you may run into a situation when half of threads have already finished their jobs and half of them haven't even started. Ultimate Thread Group available via JMeter Plugins is extremely helpful
If your aim is to see whether your server is capable of serving 100 concurrent "Confirm" in a reasonable time frame and without errors - use Synchronizing Timer
Use non-GUI mode to run your test. After test completion you can open JMeter GUI again, add i.e. Aggregate Report listener which will calculate and visualise metrics like response time, connect time, latency, etc.

Azure: offload memory-intensive task synchronously?

I'm designing an ASP.NET web application which is to be hosted in Azure. I plan to resize images that a user uploads on the server side. I've got a library that does this, but currently this happens in the main web front end VM. I'd like to offload this to a worker VM. I understand the pattern for doing this with a queue, but I don't want to return to the user until the task has completed because I plan to display the resized image upon completion of the processed request. So, how can I offload a task that will be performed synchronously? (meaning I won't return from the call site until the remote task has completed.)
Thanks...
-Ben
You can do something similar to what we do in the WebJobs Samples here: http://aspnet.codeplex.com/SourceControl/latest#Samples/AzureWebJobs/PhluffyShuffy/PhluffyShuffyWeb/Views/Shuffle/Index.cshtml
We have a very similar application: process an image and display it when it is done. For that we are just pooling for the result blob using JavaScript.
Holding threads on a web server is usually not a good idea; this negatively impacts scalability.
Consider having a UI JQuery that periodically checks whether the image is ready to be displayed. All you have to do is check for the presence of the image every second or so. There are a couple of ways to do this; in one implementation the client knows ahead of time the name of the image and just attempts to read it (for example a Blob); in another the UI doesn't know the name, but checks a record in an Azure Table that indicates the status of the image creation.

Why Google App Engine Tasks can spuriously be executed more than once?

Why Google App Engine Tasks can be executed more than once?
According do Brett Slatkin talk from Google I/O 2009, it is possible for a task to spuriously run twice even without server failures!
This has something to do with spurious wakeup of threads?
Brant Slatkin gave a similar talk at I/0 2010.
I don't know that he ever gave details of how or when this could happen. His point was that because of the way Task Queues work it is possible by design for tasks to be reenqueued. Because of this you need to write your tasks so that it does not cause problems if that happens.
For example, let's say you have a task that sends an email and then increments a counter in Datastore. If there was a bug in your code OR if Datastore was down, it is possible for the email to be sent successfully but for the write to Datastore to fail. If you didn't handle the failure from Datastore in your code by handling the exception the failure to write to Datastore would result in your task returning a HTTP status code of 500. Task Queue is designed to reenqueue the task if it returns a status code >299. This would result in your task being executed over and over until the write to datastore was successful. Which means that someone would get many duplicate emails.
I think the line about "Possible for a task to spuriously run twice.." was just a way to say App Engine isn't guaranteed to protect against this so you need to make sure you take care of it in your code.

Resources