Repeat asynchronous requests to the server - multithreading

What I want to do is repeat GET requests to a server asynchronously over and over again so that I can synchronize the local data with the remote one. I'd like to do this by using Futures without involving akka because I just want to understand the basic idea of how to do that at the lower level. No async and await preferably either because they are kind of the high level functions for Futures and Promises, thus I'd like to use Futures and Promises themselves.
So this is my functions:
def sendHttpRequestToServer(): String = { ... }
def send: Unit = {
val f = future { sendHttpRequestToServer() }
f onComplete {
case Success(x) =>
processResult(x) // do something with result "x"
send // delay if needed and send the request again
case onFailure(e) =>
logException(e)
send // send the request again
}
}
That's what I think it might be. How could I change it, is there any mistake in algorithm? Your thoughts.
UPDATE:
As I already know, futures are not designed for recurring tasks, only for one time ones. Therefore, they can't be used here. What do I use then?

Your code has some issues with exception handling. If an exception is thrown in processResult or logException the send will no longer occur, breaking your loop. This exception will also not be logged. A better way is:
f.map(processResult).onFailure(logException)
f.onComplete(x => send())
This way the send still happens despite exceptions in processResult or logException, exceptions from processResult are logged, and the next send can begin intermediately while the results are still being processed. If you want to wait until processing is complete, you could do:
val f2 = f.map(processResult).recover { case e => logException(e) }
f2.onComplete(x => send())

Related

Scala Iterator for multithreading

I am using scala Iterator for waiting loop in synchronized block:
anObject.synchronized {
if (Try(anObject.foo()).isFailure) {
Iterator.continually {
anObject.wait()
Try(anObject.foo())
}.dropWhile(_.isFailure).next()
}
anObject.notifyAll()
}
Is it acceptable to use Iterator with concurrency and multithreading? If not, why? And then what to use and how?
There are some details, if it matters. anObject is a mutable queue. And there are multiple producers and consumers to the queue. So the block above is a code of such producer or consumer. anObject.foo is a common simplified declaration of function that either enqueue (for producer) or dequeue (for consumer) data to/from the queue.
Iterator is mutable internally, so you have to take that into consideration if you use it in multi-threaded environment. If you guaranteed that you won't end up in situation when e.g.
2 threads check hasNext()
one of them calls next() - it happens to be the last element
the other calls next() - NPE
(or similar) then you should be ok. In your example Iterator doesn't even leave the scope, so the errors shouldn't come from Iterator.
However, in your code I see the issue with having aObject.wait() and aObject.notifyAll() next to each other - if you call .wait then you won't reach .notifyAll which would unblock it. You can check in REPL that this hangs:
# val anObject = new Object { def foo() = throw new Exception }
anObject: {def foo(): Nothing} = ammonite.$sess.cmd21$$anon$1#126ae0ca
# anObject.synchronized {
if (Try(anObject.foo()).isFailure) {
Iterator.continually {
anObject.wait()
Try(anObject.foo())
}.dropWhile(_.isFailure).next()
}
anObject.notifyAll()
}
// wait indefinitelly
I would suggest changing the design to NOT rely on wait and notifyAll. However, from your code it is hard to say what you want to achieve so I cannot tell if this is more like Promise-Future case, monix.Observable, monix.Task or something else.
If your use case is a queue, produces and consumers, then it sound like a use case for reactive streams - e.g. FS2 + Monix, but it could be FS2+IO or something from Akka Streams
val queue: Queue[Task, Item] // depending on use case queue might need to be bounded
// in one part of the application
queue.enqueu1(item) // Task[Unit]
// in other part of the application
queue
.dequeue
.evalMap { item =>
// ...
result: Task[Result]
}
.compile
.drain
This approach would require some change in thinking about designing an application, because you would no longer work on thread directly, but rather designed a flow data and declaring what is sequential and what can be done in parallel, where threads become just an implementation detail.

How to handle errors from parallel web requests using Retrofit + RxJava?

I have a situation like this where I make some web requests in parallel. Sometimes I make these calls and all requests see the same error (e.g. no-network):
void main() {
Observable.just("a", "b", "c")
.flatMap(s -> makeNetworkRequest())
.subscribe(
s -> {
// TODO
},
error -> {
// handle error
});
}
Observable<String> makeNetworkRequest() {
return Observable.error(new NoNetworkException());
}
class NoNetworkException extends Exception {
}
Depending on the timing, if one request emits the NoNetworkException before the others can, Retrofit/RxJava will dispose/interrupt** the others. I'll see one of the following logs (not all three) for each request remaining in progress++:
<-- HTTP FAILED: java.io.IOException: Canceled
<-- HTTP FAILED: java.io.InterruptedIOException
<-- HTTP FAILED: java.io.InterruptedIOException: thread interrupted
I'll be able to handle the NoNetworkException error in the subscriber and everything downstream will get disposed of and all is OK.
However based on timing, if two or more web requests emit NoNetworkException, then the first one will trigger the events above, disposing of everything down stream. The second NoNetworkException will have nowhere to go and I'll get the dreaded UndeliverableException. This is the same as example #1 documented here.
In the above article, the author suggested using an error handler. Obviously retry/retryWhen don't make sense if I expect to hear the same errors again. I don't understand how onErrorResumeNext/onErrorReturn help here, unless I map them to something recoverable to be handled downstream:
Observable.just("a", "b", "c")
.flatMap(s ->
makeNetworkRequest()
.onErrorReturn(error -> {
// eat actual error and return something else
return "recoverable error";
}))
.subscribe(
s -> {
if (s.equals("recoverable error")) {
// handle error
} else {
// TODO
}
},
error -> {
// handle error
});
but this seems wonky.
I know another solution is to set a global error handler with RxJavaPlugins.setErrorHandler(). This doesn't seem like a great solution either. I may want to handle NoNetworkException differently in different parts of my app.
So what other options to I have? What do other people do in this case? This must be pretty common.
** I don't fully understand who is interrupting/disposing of who. Is RxJava disposing of all other requests in flatmap which in turn causes Retrofit to cancel requests? Or does Retrofit cancel requests, resulting in each
request in flatmap emitting one of the above IOExceptions? I guess it doesn't really matter to answer the question, just curious.
++ It's possible that not all a, b, and c requests are in flight depending on thread pool.
Have you tried by using flatMap() with delayErrors=true?

Express Node Request For Loop Issue [duplicate]

With node.js I want to http.get a number of remote urls in a way that only 10 (or n) runs at a time.
I also want to retry a request if an exception occures locally (m times), but when the status code returns an error (5XX, 4XX, etc) the request counts as valid.
This is really hard for me to wrap my head around.
Problems:
Cannot try-catch http.get as it is async.
Need a way to retry a request on failure.
I need some kind of semaphore that keeps track of the currently active request count.
When all requests finished I want to get the list of all request urls and response status codes in a list which I want to sort/group/manipulate, so I need to wait for all requests to finish.
Seems like for every async problem using promises are recommended, but I end up nesting too many promises and it quickly becomes uncypherable.
There are lots of ways to approach the 10 requests running at a time.
Async Library - Use the async library with the .parallelLimit() method where you can specify the number of requests you want running at one time.
Bluebird Promise Library - Use the Bluebird promise library and the request library to wrap your http.get() into something that can return a promise and then use Promise.map() with a concurrency option set to 10.
Manually coded - Code your requests manually to start up 10 and then each time one completes, start another one.
In all cases, you will have to manually write some retry code and as with all retry code, you will have to very carefully decide which types of errors you retry, how soon you retry them, how much you backoff between retry attempts and when you eventually give up (all things you have not specified).
Other related answers:
How to make millions of parallel http requests from nodejs app?
Million requests, 10 at a time - manually coded example
My preferred method is with Bluebird and promises. Including retry and result collection in order, that could look something like this:
const request = require('request');
const Promise = require('bluebird');
const get = Promise.promisify(request.get);
let remoteUrls = [...]; // large array of URLs
const maxRetryCnt = 3;
const retryDelay = 500;
Promise.map(remoteUrls, function(url) {
let retryCnt = 0;
function run() {
return get(url).then(function(result) {
// do whatever you want with the result here
return result;
}).catch(function(err) {
// decide what your retry strategy is here
// catch all errors here so other URLs continue to execute
if (err is of retry type && retryCnt < maxRetryCnt) {
++retryCnt;
// try again after a short delay
// chain onto previous promise so Promise.map() is still
// respecting our concurrency value
return Promise.delay(retryDelay).then(run);
}
// make value be null if no retries succeeded
return null;
});
}
return run();
}, {concurrency: 10}).then(function(allResults) {
// everything done here and allResults contains results with null for err URLs
});
The simple way is to use async library, it has a .parallelLimit method that does exactly what you need.

How to abort processing a request in NodeJS http server

I have your standard callback-hell http server like (simplified):
http.createServer((req, res) => {
doSomething(() => {
doMoreSomethings(() => {
if (!isIt())
// explode A
if (!isItReally())
// explode B
req.on('data', (c) => {
if (bad(c))
// explode C
});
req.on('end', () => {
// explode D
});
});
});
};
The only way I know of to end the request is res.end(), which is asynchronous, so stuff after you call that will run (as far as I can tell). Is there a way to do a pseudo-process.exit mid-flight and abort processing the request, so I don't waste time on other stuff? For example how can I stop at the above // explode lines and not process anything else? throw doesn't leap over scope boundaries. Basically, I think, I want to return or break out of the createServer callback from within a nested callback. Is the only way to do this to be able to have code with a single path and just not have more than one statement at each step? Like, horizontal coding, instead of vertical KLOCs.
I'm a Node noob obviously.
Edit: Maybe another way to ask the question is: Is it possible to abort at the // explode C line, not read any more data, not process the end event, and not process any more lines in the doMoreSomethings callback (if there were any)? I'm guessing the latter isn't possible, because as far as I know, doMoreSomethings runs to completion, including attaching the data and end event handlers, and only then will the next event in the queue (probably a data event) get processed, at which point doMoreSomethings is done. But I am probably missing something. Does this code just zip through, call isIt and isItReally, assign the event handlers, hop out of all the callbacks, the createServer callback finishes, and then all the extra stuff (data/end/etc.) runs from the event queue?
Typically you will use a return statement. When you call return any code below it will not get executed. Although, if your functions isIt() or isItReally() are asynchronous functions then you will be running into trouble as those are used in a synchronous fashion.

dataTaskWithURL for dummies

I keep learning iDev but I still can't deal with http requests.
It seems to be crazy, but everybody whom I talk about synchronous requests do not understand me. Okay, it's really important to keep on a background queue as much as it possible to provide smooth UI. But in my case I load JSON data from server and I need to use this data immediately.
The only way I achieved it are semaphores. Is it okay? Or I have to use smth else? I tried NSOperation, but in fact I have to many little requests so creating each class for them for me seems to be not easy-reading-code.
func getUserInfo(userID: Int) -> User {
var user = User()
let linkURL = URL(string: "https://server.com")!
let session = URLSession.shared
let semaphore = DispatchSemaphore(value: 0)
let dataRequest = session.dataTask(with: linkURL) { (data, response, error) in
let json = JSON(data: data!)
user.userName = json["first_name"].stringValue
user.userSurname = json["last_name"].stringValue
semaphore.signal()
}
dataRequest.resume()
semaphore.wait(timeout: DispatchTime.distantFuture)
return user
}
You wrote that people don't understand you, but on the other hand it reveals that you don't understand how asynchronous network requests work.
For example imagine you are setting an alarm for a specific time.
Now you have two options to spend the following time.
Do nothing but sitting in front of the alarm clock and wait until the alarm occurs. Have you ever done that? Certainly not, but this is exactly what you have in mind regarding the network request.
Do several useful things ignoring the alarm clock until it rings. That is the way how asynchronous tasks work.
In terms of a programming language you need a completion handler which is called by the network request when the data has been loaded. In Swift you are using a closure for that purpose.
For convenience declare an enum with associated values for the success and failure cases and use it as the return value in the completion handler
enum RequestResult {
case Success(User), Failure(Error)
}
Add a completion handler to your function including the error case. It is highly recommended to handle always the error parameter of an asynchronous task. When the data task returns it calls the completion closure passing the user or the error depending on the situation.
func getUserInfo(userID: Int, completion:#escaping (RequestResult) -> ()) {
let linkURL = URL(string: "https://server.com")!
let session = URLSession.shared
let dataRequest = session.dataTask(with: linkURL) { (data, response, error) in
if error != nil {
completion(.Failure(error!))
} else {
let json = JSON(data: data!)
var user = User()
user.userName = json["first_name"].stringValue
user.userSurname = json["last_name"].stringValue
completion(.Success(user))
}
}
dataRequest.resume()
}
Now you can call the function with this code:
getUserInfo(userID: 12) { result in
switch result {
case .Success(let user) :
print(user)
// do something with the user
case .Failure(let error) :
print(error)
// handle the error
}
}
In practice the point in time right after your semaphore and the switch result line in the completion block is exactly the same.
Never use semaphores as an alibi not to deal with asynchronous patterns
I hope the alarm clock example clarifies how asynchronous data processing works and why it is much more efficient to get notified (active) rather than waiting (passive).
Don't try to force network connections to work synchronously. It invariably leads to problems. Whatever code is making the above call could potentially be blocked for up to 90 seconds (30 second DNS timeout + 60 second request timeout) waiting for that request to complete or fail. That's an eternity. And if that code is running on your main thread on iOS, the operating system will kill your app outright long before you reach the 90 second mark.
Instead, design your code to handle responses asynchronously. Basically:
Create data structures to hold the results of various requests, such as obtaining info from the user.
Kick off those requests.
When each request comes back, check to see if you have all the data you need to do something, and then do it.
For a really simple example, if you have a method that updates the UI with the logged in user's name, instead of:
[self updateUIWithUserInfo:[self getUserInfoForUser:user]];
you would redesign this as:
[self getUserInfoFromServerAndRun:^(NSDictionary *userInfo) {
[self updateUIWithUserInfo:userInfo];
}];
so that when the response to the request arrives, it performs the UI update action, rather than trying to start a UI update action and having it block waiting for data from the server.
If you need two things—say the userInfo and a list of books that the user has read, you could do:
[self getUserInfoFromServerAndRun:^(NSDictionary *userInfo) {
self.userInfo = userInfo;
[self updateUI];
}];
[self getBookListFromServerAndRun:^(NSDictionary *bookList) {
self.bookList = bookList;
[self updateUI];
}];
...
(void)updateUI
{
if (!self.bookList) return;
if (!self.userInfo) return;
...
}
or whatever. Blocks are your friend here. :-)
Yes, it's a pain to rethink your code to work asynchronously, but the end result is much, much more reliable and yields a much better user experience.

Resources