Return from asynchronous request in for loop - multithreading

I’m trying to get data from my RestAPI, specifically i’m getting an array of integers (which are id’s from other users), i want to loop through this array and download the data from all the other customers. A simplified version of the code is show below.
func asyncFunc(completion: (something:[Int])->Void){
//Get a json Array asynchonous from my RestAPI
let jsonArray = [1,2,3,4,5]
var resultingArray:[Int] = []
for myThing in jsonArray{
anotherAsyncFunc(myThing, completion: { (somethingElse) -> Void in
resultingArray.append(somethingElse)
})
}
}
func anotherAsyncFunc(data:Int, completion: (somethingElse:Int)->Void){
//Get some more jsonData from RestApi/data
let myLoadedData:Int = data*13356
completion(somethingElse: myLoadedData)
}
How would I make my asyncFunc return an array with all the items it has gotten from the second (inner) async request.
I have tried getting the count of the array which is first requested from the Rest Api and just “blocking” the UI thread by using a while loop too see if the “new” array has collected all the data (the count is equal to the count of the first requested array). This has 2 major disadvantages, mainly it blocks the UI thread and further more, it will fail and crash the app if the data connection gets broken while i’m getting the data from the other users (the inner async request), cause the while loop will never complete.
My question is how would I use a completion handler to return all the data that it should return without blocking the main thread and/or having to worry about badly timed data connection losses.

You can use a dispatch group notification. So create a dispatch group, enter the group for each item in the array, exit in the completion handler of the anotherAsyncFunc asynchronous process, and then create a notification that will trigger the final completion closure when all of the dispatch_group_enter calls have been offset by a corresponding dispatch_group_leave call:
func asyncFunc(completion: (something:[Int])->Void){
//Get a json Array asynchonous from my RestAPI
let jsonArray = [1,2,3,4,5]
var resultingArray:[Int] = []
let group = dispatch_group_create()
for myThing in jsonArray {
dispatch_group_enter(group)
anotherAsyncFunc(myThing) { somethingElse in
resultingArray.append(somethingElse)
dispatch_group_leave(group)
}
}
dispatch_group_notify(group, dispatch_get_main_queue()) {
completion(something: resultingArray)
}
}
Note, you will want to make sure you synchronize the updates to resultingArray that the anotherAsyncFunc are performing. The easiest way is to make sure that it dispatches its updates back to the main queue (if your REST API doesn't do that already).
func anotherAsyncFunc(data:Int, completion: (somethingElse:Int)->Void){
//Get some more jsonData from RestApi/data asynchronously
let myLoadedData:Int = data*13356
dispatch_async(dispatch_get_main_queue()) {
completion(somethingElse: myLoadedData)
}
}
This is just an example. You can use whatever synchronization mechanism you want, but make sure you synchronize updates on resultingArray accordingly.

Related

Batch up requests in Groovy?

I'm new to Groovy and am a bit lost on how to batch up requests so they can be submitted to a server as a batch, instead of individually, as I currently have:
class Handler {
private String jobId
// [...]
void submit() {
// [...]
// client is a single instance of Client used by all Handlers
jobId = client.add(args)
}
}
class Client {
//...
String add(String args) {
response = postJson(args)
return parseIdFromJson(response)
}
}
As it is now, something calls Client.add(), which POSTs to a REST API and returns a parsed result.
The issue I have is that the add() method is called maybe thousands of times in quick succession, and it would be much more efficient to collect all the args passed in to add(), wait until there's a moment when the add() calls stop coming in, and then POST to the REST API a single time for that batch, sending all the args in one go.
Is this possible? Potentially, add() can return a fake id immediately, as long as the batching occurs, the submit happens, and Client can later know the lookup between fake id and the ID coming from the REST API (which will return IDs in the order corresponding to the args sent to it).
As mentioned in the comments, this might be a good case for gpars which is excellent at these kinds of scenarios.
This really is less about groovy and more about asynchronous programming in java and on the jvm in general.
If you want to stick with the java concurrent idioms I threw together a code snippet you could use as a potential starting point. This has not been tested and edge cases have not been considered. I wrote this up for fun and since this is asynchronous programming and I haven't spent the appropriate time thinking about it, I suspect there are holes in there big enough to drive a tank through.
That being said, here is some code which makes an attempt at batching up the requests:
import java.util.concurrent.*
import java.util.concurrent.locks.*
// test code
def client = new Client()
client.start()
def futureResponses = []
1000.times {
futureResponses << client.add(it as String)
}
client.stop()
futureResponses.each { futureResponse ->
// resolve future...will wait if the batch has not completed yet
def response = futureResponse.get()
println "received response with index ${response.responseIndex}"
}
// end of test code
class FutureResponse extends CompletableFuture<String> {
String args
}
class Client {
int minMillisLullToSubmitBatch = 100
int maxBatchSizeBeforeSubmit = 100
int millisBetweenChecks = 10
long lastAddTime = Long.MAX_VALUE
def batch = []
def lock = new ReentrantLock()
boolean running = true
def start() {
running = true
Thread.start {
while (running) {
checkForSubmission()
sleep millisBetweenChecks
}
}
}
def stop() {
running = false
checkForSubmission()
}
def withLock(Closure c) {
try {
lock.lock()
c.call()
} finally {
lock.unlock()
}
}
FutureResponse add(String args) {
def future = new FutureResponse(args: args)
withLock {
batch << future
lastAddTime = System.currentTimeMillis()
}
future
}
def checkForSubmission() {
withLock {
if (System.currentTimeMillis() - lastAddTime > minMillisLullToSubmitBatch ||
batch.size() > maxBatchSizeBeforeSubmit) {
submitBatch()
}
}
}
def submitBatch() {
// here you would need to put the combined args on a format
// suitable for the endpoint you are calling. In this
// example we are just creating a list containing the args
def combinedArgs = batch.collect { it.args }
// further there needs to be a way to map one specific set of
// args in the combined args to a specific response. If the
// endpoint responds with the same order as the args we submitted
// were in, then that can be used otherwise something else like
// an id in the response etc would need to be figured out. Here
// we just assume responses are returned in the order args were submitted
List<String> combinedResponses = postJson(combinedArgs)
combinedResponses.indexed().each { index, response ->
// here the FutureResponse gets a value, can be retrieved with
// futureResponse.get()
batch[index].complete(response)
}
// clear the batch
batch = []
}
// bogus method to fake post
def postJson(combinedArgs) {
println "posting json with batch size: ${combinedArgs.size()}"
combinedArgs.collect { [responseIndex: it] }
}
}
A few notes:
something needs to be able to react to the fact that there were no calls to add for a while. This implies a separate monitoring thread and is what the start and stop methods manage.
if we have an infinite sequence of adds without pauses, you might run out of resources. Therefore the code has a max batch size where it will submit the batch even if there is no lull in the calls to add.
the code uses a lock to make sure (or try to, as mentioned above, I have not considered all potential issues here) we stay thread safe during batch submissions etc
assuming the general idea here is sound, you are left with implementing the logic in submitBatch where the main problem is dealing with mapping specific args to specific responses
CompletableFuture is a java 8 class. This can be solved using other constructs in earlier releases, but I happened to be on java 8.
I more or less wrote this without executing or testing, I'm sure there are some mistakes in there.
as can be seen in the printout below, the "maxBatchSizeBeforeSubmit" setting is more a recommendation that an actual max. Since the monitoring thread sleeps for some time and then wakes up to check how we are doing, the threads calling the add method might have accumulated any number of requests in the batch. All we are guaranteed is that every millisBetweenChecks we will wake up and check how we are doing and if the criteria for submitting a batch has been reached, then the batch will be submitted.
If you are unfamiliar with java Futures and locks, I would recommend you read up on them.
If you save the above code in a groovy script code.groovy and run it:
~> groovy code.groovy
posting json with batch size: 153
posting json with batch size: 234
posting json with batch size: 243
posting json with batch size: 370
received response with index 0
received response with index 1
received response with index 2
...
received response with index 998
received response with index 999
~>
it should work and print out the "responses" received from our fake json submissions.

dataTaskWithURL for dummies

I keep learning iDev but I still can't deal with http requests.
It seems to be crazy, but everybody whom I talk about synchronous requests do not understand me. Okay, it's really important to keep on a background queue as much as it possible to provide smooth UI. But in my case I load JSON data from server and I need to use this data immediately.
The only way I achieved it are semaphores. Is it okay? Or I have to use smth else? I tried NSOperation, but in fact I have to many little requests so creating each class for them for me seems to be not easy-reading-code.
func getUserInfo(userID: Int) -> User {
var user = User()
let linkURL = URL(string: "https://server.com")!
let session = URLSession.shared
let semaphore = DispatchSemaphore(value: 0)
let dataRequest = session.dataTask(with: linkURL) { (data, response, error) in
let json = JSON(data: data!)
user.userName = json["first_name"].stringValue
user.userSurname = json["last_name"].stringValue
semaphore.signal()
}
dataRequest.resume()
semaphore.wait(timeout: DispatchTime.distantFuture)
return user
}
You wrote that people don't understand you, but on the other hand it reveals that you don't understand how asynchronous network requests work.
For example imagine you are setting an alarm for a specific time.
Now you have two options to spend the following time.
Do nothing but sitting in front of the alarm clock and wait until the alarm occurs. Have you ever done that? Certainly not, but this is exactly what you have in mind regarding the network request.
Do several useful things ignoring the alarm clock until it rings. That is the way how asynchronous tasks work.
In terms of a programming language you need a completion handler which is called by the network request when the data has been loaded. In Swift you are using a closure for that purpose.
For convenience declare an enum with associated values for the success and failure cases and use it as the return value in the completion handler
enum RequestResult {
case Success(User), Failure(Error)
}
Add a completion handler to your function including the error case. It is highly recommended to handle always the error parameter of an asynchronous task. When the data task returns it calls the completion closure passing the user or the error depending on the situation.
func getUserInfo(userID: Int, completion:#escaping (RequestResult) -> ()) {
let linkURL = URL(string: "https://server.com")!
let session = URLSession.shared
let dataRequest = session.dataTask(with: linkURL) { (data, response, error) in
if error != nil {
completion(.Failure(error!))
} else {
let json = JSON(data: data!)
var user = User()
user.userName = json["first_name"].stringValue
user.userSurname = json["last_name"].stringValue
completion(.Success(user))
}
}
dataRequest.resume()
}
Now you can call the function with this code:
getUserInfo(userID: 12) { result in
switch result {
case .Success(let user) :
print(user)
// do something with the user
case .Failure(let error) :
print(error)
// handle the error
}
}
In practice the point in time right after your semaphore and the switch result line in the completion block is exactly the same.
Never use semaphores as an alibi not to deal with asynchronous patterns
I hope the alarm clock example clarifies how asynchronous data processing works and why it is much more efficient to get notified (active) rather than waiting (passive).
Don't try to force network connections to work synchronously. It invariably leads to problems. Whatever code is making the above call could potentially be blocked for up to 90 seconds (30 second DNS timeout + 60 second request timeout) waiting for that request to complete or fail. That's an eternity. And if that code is running on your main thread on iOS, the operating system will kill your app outright long before you reach the 90 second mark.
Instead, design your code to handle responses asynchronously. Basically:
Create data structures to hold the results of various requests, such as obtaining info from the user.
Kick off those requests.
When each request comes back, check to see if you have all the data you need to do something, and then do it.
For a really simple example, if you have a method that updates the UI with the logged in user's name, instead of:
[self updateUIWithUserInfo:[self getUserInfoForUser:user]];
you would redesign this as:
[self getUserInfoFromServerAndRun:^(NSDictionary *userInfo) {
[self updateUIWithUserInfo:userInfo];
}];
so that when the response to the request arrives, it performs the UI update action, rather than trying to start a UI update action and having it block waiting for data from the server.
If you need two things—say the userInfo and a list of books that the user has read, you could do:
[self getUserInfoFromServerAndRun:^(NSDictionary *userInfo) {
self.userInfo = userInfo;
[self updateUI];
}];
[self getBookListFromServerAndRun:^(NSDictionary *bookList) {
self.bookList = bookList;
[self updateUI];
}];
...
(void)updateUI
{
if (!self.bookList) return;
if (!self.userInfo) return;
...
}
or whatever. Blocks are your friend here. :-)
Yes, it's a pain to rethink your code to work asynchronously, but the end result is much, much more reliable and yields a much better user experience.

Changing State when Using Scala Concurrency

I have a function in my Controller that takes user input, and then, using an infinite loop, queries a database and sends the object returned from the database to a webpage. This all works fine, except that I needed to introduce concurrency in order to both run this logic and render the webpage.
The code is given by:
def getSearchResult = Action { request =>
val search = request.queryString.get("searchInput").head.head
val databaseSupport = new InteractWithDatabase(comm, db)
val put = Future {
while (true) {
val data = databaseSupport.getFromDatabase(search)
if (data.nonEmpty) {
if (data.head.vendorId.equals(search)) {
comm.communicator ! data.head
}
}
}
}
Ok(views.html.singleElement.render)
}
The issue arises when I want to call this again, but with a different input. Because the first thread is in an infinite loop, it never ceases to run and is still running even when I start the second thread. Therefore, both objects are being sent to the webpage at the same time in two separate threads.
How can I stop the first thread once I call this function again? Or, is there a better implementation of this whole idea so that I could do it without using multithreading?
Note: I tried removing the concurrency from this function (as multithreading has been the thing giving me all of these problems) and instead moving it to the web socket itself, but this posed problems as the web socket is connected to a router, and everything connects to the web socket through the router.
Try AsyncAction where you return a Future[Result] as a result. Make database call in side this result. E.g.(pseudo code),
def getSearchResult = AsyncAction { request =>
val search = request.queryString.get("searchInput").head.head
val databaseSupport = new InteractWithDatabase(comm, db)
Future {
val data = databaseSupport.getFromDatabase(search)
if (data.nonEmpty) {
if (data.head.vendorId.equals(search)) {
comm.communicator ! data.head // A
}
}
Ok(views.html.singleElement.render)
}
}
Better if databaseSupport.getFromDatabase(search) returns a Future but that is a story for another day. The tricky part is to figure how to deal with Actor at "A". Just remember at the exit it must return Future[Result] result type.

Function/Code Design with Concurrency in Swift

I'm trying to create my first app in Swift which involves making multiple requests to a website. These requests are each done using the block
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in ... }
task.resume()
From what I understand this block uses a thread different to the main thread.
My question is, what is the best way to design code that relies on the values in that block? For instance, the ideal design (however not possible due to the fact that the thread executing these blocks is not the main thread) is
func prepareEmails() {
var names = getNames()
var emails = getEmails()
...
sendEmails()
}
func getNames() -> NSArray {
var names = nil
....
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
names = ...
})
task.resume()
return names
}
func getEmails() -> NSArray {
var emails = nil
....
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
emails = ...
})
task.resume()
return emails
}
However in the above design, most likely getNames() and getEmails() will return nil, as the the task will not have updated emails/name by the time it returns.
The alternative design (which I currently implement) is by effectively removing the 'prepareEmails' function and doing everything sequentially in the task functions
func prepareEmails() {
getNames()
}
func getNames() {
...
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
getEmails(names)
})
task.resume()
}
func getEmails(names: NSArray) {
...
var task = NSURLSession.sharedSession().dataTaskWithRequest(request, completionHandler: {data, response, error -> Void in
sendEmails(emails, names)
})
task.resume()
}
Is there a more effective design than the latter? This is my first experience with concurrency, so any advice would be greatly appreciated.
The typical pattern when calling an asynchronous method that has a completionHandler parameter is to use the completionHandler closure pattern, yourself. So the methods don't return anything, but rather call a closure with the returned information as a parameter:
func getNames(completionHandler:(NSArray!)->()) {
....
let task = NSURLSession.sharedSession().dataTaskWithRequest(request) {data, response, error -> Void in
let names = ...
completionHandler(names)
}
task.resume()
}
func getEmails(completionHandler:(NSArray!)->()) {
....
let task = NSURLSession.sharedSession().dataTaskWithRequest(request) {data, response, error -> Void in
let emails = ...
completionHandler(emails)
}
task.resume()
}
Then, if you need to perform these sequentially, as suggested by your code sample (i.e. if the retrieval of emails was dependent upon the names returned by getNames), you could do something like:
func prepareEmails() {
getNames() { names in
getEmails() {emails in
sendEmails(names, emails) // I'm assuming the names and emails are in the input to this method
}
}
}
Or, if they can run concurrently, then you should do so, as it will be faster. The trick is how to make a third task dependent upon two other asynchronous tasks. The two traditional alternatives include
Wrapping each of these asynchronous tasks in its own asynchronous NSOperation, and then create a third task dependent upon those other two operations. This is probably beyond the scope of the question, but you can refer to the Operation Queue section of the Concurrency Programming Guide or see the Asynchronous vs Synchronous Operations and Subclassing Notes sections of the NSOperation Class Reference.
Use dispatch groups, entering the group before each request, leaving the group within the completion handler of each request, and then adding a dispatch group notification block (called when all of the group "enter" calls are matched by their corresponding "leave" calls):
func prepareEmails() {
let group = dispatch_group_create()
var emails: NSArray!
var names: NSArray!
dispatch_group_enter(group)
getNames() { results in
names = results
dispatch_group_leave(group)
}
dispatch_group_enter(group)
getEmails() {results in
emails = results
dispatch_group_leave(group)
}
dispatch_group_notify(group, dispatch_get_main_queue()) {
if names != nil && emails != nil {
self.sendEmails(names, emails)
} else {
// one or both of those requests failed; tell the user
}
}
}
Frankly, if there's any way to retrieve both the emails and names in a single network request, that's going to be far more efficient. But if you're stuck with two separate requests, you could do something like the above.
Note, I wouldn't generally use NSArray in my Swift code, but rather use an array of String objects (e.g. [String]). Furthermore, I'd put in error handling where I return the nature of the error if either of these fail. But hopefully this illustrates the concepts involved in (a) writing your own methods with completionHandler blocks; and (b) invoking a third bit of code dependent upon the completion of two other asynchronous tasks.
The answers above (particularly Rob's DispatchQueue based answer) describe the concurrency concepts necessary to run two tasks in parallel and then respond to the result. The answers lack error handling for clarity because traditionally, correct solutions to concurrency problems are quite verbose.
Not so with HoneyBee.
HoneyBee.start()
.setErrorHandler(handleErrorFunc)
.branch {
$0.chain(getNames)
+
$0.chain(getEmails)
}
.chain(sendEmails)
This code snippet manages all of the concurrency, routes all errors to handleErrorFunc and looks like the concurrent pattern that is desired.

Download an undefined number of files with HttpWebRequest.BeginGetResponse

I have to write a small app which downloads a few thousand files. Some of these files contain reference to other files that must be downloaded as part of the same process. The following code downloads the initial list of files, but I would like to download the others files as part of the same loop. What is happening here is that the loop completes before the first request come back. Any idea how to achieve this?
var countdownLatch = new CountdownEvent(Urls.Count);
string url;
while (Urls.TryDequeue(out url))
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.BeginGetResponse(
new AsyncCallback(ar =>
{
using (HttpWebResponse response = (ar.AsyncState as HttpWebRequest).EndGetResponse(ar) as HttpWebResponse)
{
using (var sr = new StreamReader(response.GetResponseStream()))
{
string myFile = sr.ReadToEnd();
// TODO: Look for a reference to another file. If found, queue a new Url.
}
}
}), webRequest);
}
ce.Wait();
One solution which comes to mind is to keep track of the number of pending requests and only finish the loop once no requests are pending and the Url queue is empty:
string url;
int requestCounter = 0;
int temp;
AutoResetEvent requestFinished = new AutoResetEvent(false);
while (Interlocked.Exchange(requestCounter, temp) > 0 || Urls.TryDequeue(out url))
{
if (url != null)
{
Interlocked.Increment(requestCounter);
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.BeginGetResponse(
new AsyncCallback(ar =>
{
try {
using (HttpWebResponse response = (ar.AsyncState as HttpWebRequest).EndGetResponse(ar) as HttpWebResponse)
{
using (var sr = new StreamReader(response.GetResponseStream()))
{
string myFile = sr.ReadToEnd();
// TODO: Look for a reference to another file. If found, queue a new Url.
}
}
}
finally {
Interlocked.Decrement(requestCounter);
requestFinished.Set();
}
}), webRequest);
}
else
{
// no url but requests are still pending
requestFinished.WaitOne();
}
}
You are tryihg to write a webcrawler. In order to write a good webcrawler, you first need to define some parameters...
1) How many request do you want to download simultaneously? In other words, how much throughput do you want? This will determine things like how many requests you want outstanding, what the threadpool size should be etc.
2) You will have to have a queue of URLs. This queue is populated by each request that completes. You now need to decide what the growth strategy of the queue is. For eg, you cannot have an unbounded queue, as you can pump workitems into the queue faster than you can download from the network.
Given this, you can design a system as follows:
Have max N worker threads that actually download from the web. They take one time from the queue, and download the data. They parse the data and populate your URL queue.
If there are more than 'M' URLs in the queue, then the queue blocks and does not allow anymore URLs to be queued. Now, here you can do one of two things. You can either cause the thread that is enqueuing to block, or you can just discard the workitem being enqueued. Once another workitem completes on another thread, and a URL is dequeued, the blocked thread will now be able to enqueue succesfully.
With a system like this, you can ensure that you will not run out of system resources while downloading the data.
Implementation:
Note that if you are using async, then you are using an extra I/O thread to do the download. THis is fine, as long as you are mindful of this fact. You can do a pure async implementation, where you can have 'N' BeginGetResponse() outstanding, and for each one that completes, you start another one. THis way you will always have 'N' requests outstanding.

Resources