dataTaskWithURL for dummies - semaphore

I keep learning iDev but I still can't deal with http requests.
It seems to be crazy, but everybody whom I talk about synchronous requests do not understand me. Okay, it's really important to keep on a background queue as much as it possible to provide smooth UI. But in my case I load JSON data from server and I need to use this data immediately.
The only way I achieved it are semaphores. Is it okay? Or I have to use smth else? I tried NSOperation, but in fact I have to many little requests so creating each class for them for me seems to be not easy-reading-code.
func getUserInfo(userID: Int) -> User {
var user = User()
let linkURL = URL(string: "https://server.com")!
let session = URLSession.shared
let semaphore = DispatchSemaphore(value: 0)
let dataRequest = session.dataTask(with: linkURL) { (data, response, error) in
let json = JSON(data: data!)
user.userName = json["first_name"].stringValue
user.userSurname = json["last_name"].stringValue
semaphore.signal()
}
dataRequest.resume()
semaphore.wait(timeout: DispatchTime.distantFuture)
return user
}

You wrote that people don't understand you, but on the other hand it reveals that you don't understand how asynchronous network requests work.
For example imagine you are setting an alarm for a specific time.
Now you have two options to spend the following time.
Do nothing but sitting in front of the alarm clock and wait until the alarm occurs. Have you ever done that? Certainly not, but this is exactly what you have in mind regarding the network request.
Do several useful things ignoring the alarm clock until it rings. That is the way how asynchronous tasks work.
In terms of a programming language you need a completion handler which is called by the network request when the data has been loaded. In Swift you are using a closure for that purpose.
For convenience declare an enum with associated values for the success and failure cases and use it as the return value in the completion handler
enum RequestResult {
case Success(User), Failure(Error)
}
Add a completion handler to your function including the error case. It is highly recommended to handle always the error parameter of an asynchronous task. When the data task returns it calls the completion closure passing the user or the error depending on the situation.
func getUserInfo(userID: Int, completion:#escaping (RequestResult) -> ()) {
let linkURL = URL(string: "https://server.com")!
let session = URLSession.shared
let dataRequest = session.dataTask(with: linkURL) { (data, response, error) in
if error != nil {
completion(.Failure(error!))
} else {
let json = JSON(data: data!)
var user = User()
user.userName = json["first_name"].stringValue
user.userSurname = json["last_name"].stringValue
completion(.Success(user))
}
}
dataRequest.resume()
}
Now you can call the function with this code:
getUserInfo(userID: 12) { result in
switch result {
case .Success(let user) :
print(user)
// do something with the user
case .Failure(let error) :
print(error)
// handle the error
}
}
In practice the point in time right after your semaphore and the switch result line in the completion block is exactly the same.
Never use semaphores as an alibi not to deal with asynchronous patterns
I hope the alarm clock example clarifies how asynchronous data processing works and why it is much more efficient to get notified (active) rather than waiting (passive).

Don't try to force network connections to work synchronously. It invariably leads to problems. Whatever code is making the above call could potentially be blocked for up to 90 seconds (30 second DNS timeout + 60 second request timeout) waiting for that request to complete or fail. That's an eternity. And if that code is running on your main thread on iOS, the operating system will kill your app outright long before you reach the 90 second mark.
Instead, design your code to handle responses asynchronously. Basically:
Create data structures to hold the results of various requests, such as obtaining info from the user.
Kick off those requests.
When each request comes back, check to see if you have all the data you need to do something, and then do it.
For a really simple example, if you have a method that updates the UI with the logged in user's name, instead of:
[self updateUIWithUserInfo:[self getUserInfoForUser:user]];
you would redesign this as:
[self getUserInfoFromServerAndRun:^(NSDictionary *userInfo) {
[self updateUIWithUserInfo:userInfo];
}];
so that when the response to the request arrives, it performs the UI update action, rather than trying to start a UI update action and having it block waiting for data from the server.
If you need two things—say the userInfo and a list of books that the user has read, you could do:
[self getUserInfoFromServerAndRun:^(NSDictionary *userInfo) {
self.userInfo = userInfo;
[self updateUI];
}];
[self getBookListFromServerAndRun:^(NSDictionary *bookList) {
self.bookList = bookList;
[self updateUI];
}];
...
(void)updateUI
{
if (!self.bookList) return;
if (!self.userInfo) return;
...
}
or whatever. Blocks are your friend here. :-)
Yes, it's a pain to rethink your code to work asynchronously, but the end result is much, much more reliable and yields a much better user experience.

Related

How to handle errors from parallel web requests using Retrofit + RxJava?

I have a situation like this where I make some web requests in parallel. Sometimes I make these calls and all requests see the same error (e.g. no-network):
void main() {
Observable.just("a", "b", "c")
.flatMap(s -> makeNetworkRequest())
.subscribe(
s -> {
// TODO
},
error -> {
// handle error
});
}
Observable<String> makeNetworkRequest() {
return Observable.error(new NoNetworkException());
}
class NoNetworkException extends Exception {
}
Depending on the timing, if one request emits the NoNetworkException before the others can, Retrofit/RxJava will dispose/interrupt** the others. I'll see one of the following logs (not all three) for each request remaining in progress++:
<-- HTTP FAILED: java.io.IOException: Canceled
<-- HTTP FAILED: java.io.InterruptedIOException
<-- HTTP FAILED: java.io.InterruptedIOException: thread interrupted
I'll be able to handle the NoNetworkException error in the subscriber and everything downstream will get disposed of and all is OK.
However based on timing, if two or more web requests emit NoNetworkException, then the first one will trigger the events above, disposing of everything down stream. The second NoNetworkException will have nowhere to go and I'll get the dreaded UndeliverableException. This is the same as example #1 documented here.
In the above article, the author suggested using an error handler. Obviously retry/retryWhen don't make sense if I expect to hear the same errors again. I don't understand how onErrorResumeNext/onErrorReturn help here, unless I map them to something recoverable to be handled downstream:
Observable.just("a", "b", "c")
.flatMap(s ->
makeNetworkRequest()
.onErrorReturn(error -> {
// eat actual error and return something else
return "recoverable error";
}))
.subscribe(
s -> {
if (s.equals("recoverable error")) {
// handle error
} else {
// TODO
}
},
error -> {
// handle error
});
but this seems wonky.
I know another solution is to set a global error handler with RxJavaPlugins.setErrorHandler(). This doesn't seem like a great solution either. I may want to handle NoNetworkException differently in different parts of my app.
So what other options to I have? What do other people do in this case? This must be pretty common.
** I don't fully understand who is interrupting/disposing of who. Is RxJava disposing of all other requests in flatmap which in turn causes Retrofit to cancel requests? Or does Retrofit cancel requests, resulting in each
request in flatmap emitting one of the above IOExceptions? I guess it doesn't really matter to answer the question, just curious.
++ It's possible that not all a, b, and c requests are in flight depending on thread pool.
Have you tried by using flatMap() with delayErrors=true?

Changing State when Using Scala Concurrency

I have a function in my Controller that takes user input, and then, using an infinite loop, queries a database and sends the object returned from the database to a webpage. This all works fine, except that I needed to introduce concurrency in order to both run this logic and render the webpage.
The code is given by:
def getSearchResult = Action { request =>
val search = request.queryString.get("searchInput").head.head
val databaseSupport = new InteractWithDatabase(comm, db)
val put = Future {
while (true) {
val data = databaseSupport.getFromDatabase(search)
if (data.nonEmpty) {
if (data.head.vendorId.equals(search)) {
comm.communicator ! data.head
}
}
}
}
Ok(views.html.singleElement.render)
}
The issue arises when I want to call this again, but with a different input. Because the first thread is in an infinite loop, it never ceases to run and is still running even when I start the second thread. Therefore, both objects are being sent to the webpage at the same time in two separate threads.
How can I stop the first thread once I call this function again? Or, is there a better implementation of this whole idea so that I could do it without using multithreading?
Note: I tried removing the concurrency from this function (as multithreading has been the thing giving me all of these problems) and instead moving it to the web socket itself, but this posed problems as the web socket is connected to a router, and everything connects to the web socket through the router.
Try AsyncAction where you return a Future[Result] as a result. Make database call in side this result. E.g.(pseudo code),
def getSearchResult = AsyncAction { request =>
val search = request.queryString.get("searchInput").head.head
val databaseSupport = new InteractWithDatabase(comm, db)
Future {
val data = databaseSupport.getFromDatabase(search)
if (data.nonEmpty) {
if (data.head.vendorId.equals(search)) {
comm.communicator ! data.head // A
}
}
Ok(views.html.singleElement.render)
}
}
Better if databaseSupport.getFromDatabase(search) returns a Future but that is a story for another day. The tricky part is to figure how to deal with Actor at "A". Just remember at the exit it must return Future[Result] result type.

Flux - Isn't it a bad practice to include the dispatcher instance everywhere?

Note: My question is about the way of including/passing the dispatcher instance around, not about how the pattern is useful.
I am studying the Flux Architecture and I cannot get my head around the concept of the dispatcher (instance) potentially being included everywhere...
What if I want to trigger an Action from my Model Layer? It feels weird to me to include an instance of an object in my Model files... I feel like this is missing some injection pattern...
I have the impression that the exact PHP equivalent is something (that feels) horrible similar to:
<?php
$dispatcher = require '../dispatcher_instance.php';
class MyModel {
...
public function someMethod() {
...
$dispatcher->...
}
}
I think my question is not exactly only related to the Flux Architecture but more to the NodeJS "way of doing things"/practices in general.
TLDR:
No, it is not bad practice to pass around the instance of the dispatcher in your stores
All data stores should have a reference to the dispatcher
The invoking/consuming code (in React, this is usually the view) should only have references to the action-creators, not the dispatcher
Your code doesn't quite align with React because you are creating a public mutable function on your data store.
The ONLY way to communicate with a store in Flux is via message passing which always flows through the dispatcher.
For example:
var Dispatcher = require('MyAppDispatcher');
var ExampleActions = require('ExampleActions');
var _data = 10;
var ExampleStore = assign({}, EventEmitter.prototype, {
getData() {
return _data;
},
emitChange() {
this.emit('change');
},
dispatcherKey: Dispatcher.register(payload => {
var {action} = payload;
switch (action.type) {
case ACTIONS.ADD_1:
_data += 1;
ExampleStore.emitChange();
ExampleActions.doThatOtherThing();
break;
}
})
});
module.exports = ExampleStore;
By closing over _data instead of having a data property directly on the store, you can enforce the message passing rule. It's a private member.
Also important to note, although you can call Dispatcher.emit() directly, it's not a good idea.
There are two main reasons to go through the action-creators:
Consistency - This is how your views and other consuming code interacts with the stores
Easier Refactoring - If you ever remove the ADD_1 action from your app, this code will throw an exception rather than silently failing by sending a message that doesn't match any of the switch statements in any of the stores
Main Advantages to this Approach
Loose coupling - Adding and removing features is a breeze. Stores can respond to any event in the system with by adding one line of code.
Less complexity - One way data flow makes wrapping head around data flow a lot easier. Less interdependencies.
Easier debugging - You can debug every change in your system with a few lines of code.
debugging example:
var MyAppDispatcher = require('MyAppDispatcher');
MyAppDispatcher.register(payload => {
console.debug(payload);
});

Repeat asynchronous requests to the server

What I want to do is repeat GET requests to a server asynchronously over and over again so that I can synchronize the local data with the remote one. I'd like to do this by using Futures without involving akka because I just want to understand the basic idea of how to do that at the lower level. No async and await preferably either because they are kind of the high level functions for Futures and Promises, thus I'd like to use Futures and Promises themselves.
So this is my functions:
def sendHttpRequestToServer(): String = { ... }
def send: Unit = {
val f = future { sendHttpRequestToServer() }
f onComplete {
case Success(x) =>
processResult(x) // do something with result "x"
send // delay if needed and send the request again
case onFailure(e) =>
logException(e)
send // send the request again
}
}
That's what I think it might be. How could I change it, is there any mistake in algorithm? Your thoughts.
UPDATE:
As I already know, futures are not designed for recurring tasks, only for one time ones. Therefore, they can't be used here. What do I use then?
Your code has some issues with exception handling. If an exception is thrown in processResult or logException the send will no longer occur, breaking your loop. This exception will also not be logged. A better way is:
f.map(processResult).onFailure(logException)
f.onComplete(x => send())
This way the send still happens despite exceptions in processResult or logException, exceptions from processResult are logged, and the next send can begin intermediately while the results are still being processed. If you want to wait until processing is complete, you could do:
val f2 = f.map(processResult).recover { case e => logException(e) }
f2.onComplete(x => send())

Run NodeJS event loop / wait for child process to finish

I first tried a general description of the problem, then some more detail why the usual approaches don't work. If you would like to read these abstracted explanations go on. In the end I explain the greater problem and the specific application, so if you would rather read that, jump to "Actual application".
I am using a node.js child-process to do some computationally intensive work. The parent process does it's work but at some point in the execution it reaches a point where it must have the information from the child process before continuing. Therefore, I am looking for a way to wait for the child-process to finish.
My current setup looks somewhat like this:
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
importantData = msg.data;
} else if (msg.type === "error") {
importantData = null;
} else {
throw new Error("Unknown message from dataGenerator!");
}
});
and somewhere else
function getImportantData() {
while (importantData === undefined) {
// wait for the importantDataGenerator to finish
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
So when the parent process starts, it executes the first bit of code, spawning a child process to calculate the data and goes on doing it's own bit of work. When the time comes that it needs the result from the child process to continue it calls getImportantData(). So the idea is that getImportantData() blocks until the data is calculated.
However, the way I used doesn't work. I think this is due to me preventing the event loop from executing by using the while-loop. And since the Event-Loop does not execute no message from the child-process can be received and thus the condition of the while-loop can not change, making it an infinite loop.
Of course, I don't really want to use this kind of while-loop. What I would rather do is tell node.js "execute one iteration of the event loop, then get back to me". I would do this repeatedly, until the data I need was received and then continue the execution where I left of by returning from the getter.
I realize that his poses the danger of reentering the same function several times, but the module I want to use this in does almost nothing on the event loop except for waiting for this message from the child process and sending out other messages reporting it's progress, so that shouldn't be a problem.
Is there way to execute just one iteration of the event loop in Node.js? Or is there another way to achieve something similar? Or is there a completely different approach to achieve what I'm trying to do here?
The only solution I could think of so far is to change the calculation in such a way that I introduce yet another process. In this scenario, there would be the process calculating the important data, a process calculating the bits of data for which the important data is not needed and a parent process for these two, which just waits for data from the two child-processes and combines the pieces when they arrive. Since it does not have to do any computationally intensive work itself, it can just wait for events from the event loop (=messages) and react to them, forwarding the combined data as necessary and storing pieces of data that cannot be combined yet.
However this introduces yet another process and even more inter-process communication, which introduces more overhead, which I would like to avoid.
Edit
I see that more detail is needed.
The parent process (let's call it process 1) is itself a process spawned by another process (process 0) to do some computationally intensive work. Actually, it just executes some code over which I don't have control, so I cannot make it work asynchronously. What I can do (and have done) is make the code that is executed regularly call a function to report it's progress and provided partial results. This progress report is then send back to the original process via IPC.
But in rare cases the partial results are not correct, so they have to be modified. To do so I need some data I can calculate independently from the normal calculation. However, this calculation could take several seconds; thus, I start another process (process 2) to do this calculation and provide the result to process 1, via an IPC message. Now process 1 and 2 are happily calculating there stuff, and hopefully the corrective data calculated by process 2 is finished before process 1 needs it. But sometimes one of the early results of process 1 needs to be corrected and in that case I have to wait for process 2 to finish its calculation. Blocking the event loop of process 1 is theoretically not a problem, since the main process (process 0) would not be be affected by it. The only problem is, that by preventing the further execution of code in process 1 I am also blocking the event loop, which prevents it from ever receiving the result from process 2.
So I need to somehow pause the further execution of code in process 1 without blocking the event loop. I was hoping that there was a call like process.runEventLoopIteration that executes an iteration of the event loop and then returns.
I would then change the code like this:
function getImportantData() {
while (importantData === undefined) {
process.runEventLoopIteration();
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
thus executing the event loop until I have received the necessary data but NOT continuing the execution of the code that called getImportantData().
Basically what I'm doing in process 1 is this:
function callback(partialDataMessage) {
if (partialDataMessage.needsCorrection) {
getImportantData();
// use data to correct message
process.send(correctedMessage); // send corrected result to main process
} else {
process.send(partialDataMessage); // send unmodified result to main process
}
}
function executeCode(code) {
run(code, callback); // the callback will be called from time to time when the code produces new data
// this call is synchronous, run is blocking until the calculation is finished
// so if we reach this point we are done
// the only way to pause the execution of the code is to NOT return from the callback
}
Actual application/implementation/problem
I need this behaviour for the following application. If you have a better approach to achieve this feel free to propose it.
I want to execute arbitrary code and be notified about what variables it changes, what functions are called, what exceptions occur etc. I also need the location of these events in the code to be able to display the gathered information in the UI next to the original code.
To achieve this, I instrument the code and insert callbacks into it. I then execute the code, wrapping the execution in a try-catch block. Whenever the callback is called with some data about the execution (e.g. a variable change) I send a message to the main process telling it about the change. This way, the user is notified about the execution of the code, while it is running. The location information for the events generated by these callbacks is added to the callback call during the instrumentation, so that is not a problem.
The problem appears, when an exception occurs. I also want to notify the user about exceptions in the tested code. Therefore, I wrapped the execution of the code in a try-catch and any exceptions that get out of the execution are caught and send to the user interface. But the location of the errors is not correct. An Error object created by node.js has a complete call stack so it knows where it occurred. But this location if relative to the instrumented code, so I cannot use this location information as is, to display the error next to the original code. I need to transform this location in the instrumented code into a location in the original code. To do so, after instrumenting the code, I calculate a source map to map locations in the instrumented code to locations in the original code. However, this calculation might take several seconds. So, I figured, I would start a child process to calculate the source map, while the execution of the instrumented code is already started. Then, when an exception occurs, I check whether the source map has already been calculated, and if it hasn't I wait for the calculation to finish to be able to correct the location.
Since the code to be executed and watched can be completely arbitrary I cannot trivially rewrite it to be asynchronous. I only know that it calls the provided callback, because I instrumented the code to do so. I also cannot just store the message and return to continue the execution of the code, checking back during the next call whether the source map has been finished, because continuing the execution of the code would also block the event-loop, preventing the calculated source map from ever being received in the execution process. Or if it is received, then only after the code to execute has completely finished, which could be quite late or never (if the code to execute contains an infinite loop). But before I receive the sourceMap I cannot send further updates about the execution state. Combined, this means I would only be able to send the corrected progress messages after the code to execute has finished (which might be never) which completely defeats the purpose of the program (to enable the programmer to watch what the code does, while it executes).
Temporarily surrendering control to the event loop would solve this problem. However, that does not seem to be possible. The other idea I have is to introduce a third process which controls both the execution process and the sourceMapGeneration process. It receives progress messages from the execution process and if any of the messages needs correction it waits for the sourceMapGeneration process. Since the processes are independent, the controlling process can store the received messages and wait for the sourceMapGeneration process while the execution process continues executing, and as soon as it receives the source map, it corrects the messages and sends all of them off.
However, this would not only require yet another process (overhead) it also means I have to transfer the code once more between processes and since the code can have thousands of line that in itself can take some time, so I would like to move it around as little as possible.
I hope this explains, why I cannot and didn't use the usual "asynchronous callback" approach.
Adding a third ( :) ) solution to your problem after you clarified what behavior you seek I suggest using Fibers.
Fibers let you do co-routines in nodejs. Coroutines are functions that allow multiple entry/exit points. This means you will be able to yield control and resume it as you please.
Here is a sleep function from the official documentation that does exactly that, sleep for a given amount of time and perform actions.
function sleep(ms) {
var fiber = Fiber.current;
setTimeout(function() {
fiber.run();
}, ms);
Fiber.yield();
}
Fiber(function() {
console.log('wait... ' + new Date);
sleep(1000);
console.log('ok... ' + new Date);
}).run();
console.log('back in main');
You can place the code that does the waiting for the resource in a function, causing it to yield and then run again when the task is done.
For example, adapting your example from the question:
var pausedExecution, importantData;
function getImportantData() {
while (importantData === undefined) {
pausedExecution = Fiber.current;
Fiber.yield();
pausedExecution = undefined;
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have proper data now
return importantData;
}
}
function callback(partialDataMessage) {
if (partialDataMessage.needsCorrection) {
var theData = getImportantData();
// use data to correct message
process.send(correctedMessage); // send corrected result to main process
} else {
process.send(partialDataMessage); // send unmodified result to main process
}
}
function executeCode(code) {
// setup child process to calculate the data
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
importantData = msg.data;
} else if (msg.type === "error") {
importantData = null;
} else {
throw new Error("Unknown message from dataGenerator!");
}
if (pausedExecution) {
// execution is waiting for the data
pausedExecution.run();
}
});
// wrap the execution of the code in a Fiber, so it can be paused
Fiber(function () {
runCodeWithCallback(code, callback); // the callback will be called from time to time when the code produces new data
// this callback is synchronous and blocking,
// but it will yield control to the event loop if it has to wait for the child-process to finish
}).run();
}
Good luck! I always say it is better to solve one problem in 3 ways than solving 3 problems the same way. I'm glad we were able to work out something that worked for you. Admittingly, this was a pretty interesting question.
The rule of asynchronous programming is, once you've entered asynchronous code, you must continue to use asynchronous code. While you can continue to call the function over and over via setImmediate or something of the sort, you still have the issue that you're trying to return from an asynchronous process.
Without knowing more about your program, I can't tell you exactly how you should structure it, but by and large the way to "return" data from a process that involves asynchronous code is to pass in a callback; perhaps this will put you on the right track:
function getImportantData(callback) {
importantDataCalculator = fork("./runtime");
importantDataCalculator.on("message", function (msg) {
if (msg.type === "result") {
callback(null, msg.data);
} else if (msg.type === "error") {
callback(new Error("Data could not be generated."));
} else {
callback(new Error("Unknown message from sourceMapGenerator!"));
}
});
}
You would then use this function like this:
getImportantData(function(error, data) {
if (error) {
// handle the error somehow
} else {
// `data` is the data from the forked process
}
});
I talk about this in a bit more detail in one of my screencasts, Thinking Asynchronously.
What you are running into is a very common scenario that skilled programmers who are starting with nodejs often struggle with.
You're correct. You can't do this the way you are attempting (loop).
The main process in node.js is single threaded and you are blocking the event loop.
The simplest way to resolve this is something like:
function getImportantData() {
if(importantData === undefined){ // not set yet
setImmediate(getImportantData); // try again on the next event loop cycle
return; //stop this attempt
}
if (importantData === null) {
throw new Error("Data could not be generated.");
} else {
// we should have a proper data now
return importantData;
}
}
What we are doing, is that the function is re-attempting to process the data on the next iteration of the event loop using setImmediate.
This introduces a new problem though, your function returns a value. Since it will not be ready, the value you are returning is undefined. So you have to code reactively. You need to tell your code what to do when the data arrives.
This is typically done in node with a callback
function getImportantData(err,whenDone) {
if(importantData === undefined){ // not set yet
setImmediate(getImportantData.bind(null,whenDone)); // try again on the next event loop cycle
return; //stop this attempt
}
if (importantData === null) {
err("Data could not be generated.");
} else {
// we should have a proper data now
whenDone(importantData);
}
}
This can be used in the following way
getImportantData(function(err){
throw new Error(err); // error handling function callback
}, function(data){ //this is whenDone in our case
//perform actions on the important data
})
Your question (updated) is very interesting, it appears to be closely related to a problem I had with asynchronously catching exceptions. (Also Brandon and Ihad an interesting discussion with me about it! It's a small world)
See this question on how to catch exceptions asynchronously. The key concept is that you can use (assuming nodejs 0.8+) nodejs domains to constrain the scope of an exception.
This will allow you to easily get the location of the exception since you can surround asynchronous blocks with atry/catch. I think this should solve the bigger issue here.
You can find the relevant code in the linked question. The usage is something like:
atry(function() {
setTimeout(function(){
throw "something";
},1000);
}).catch(function(err){
console.log("caught "+err);
});
Since you have access to the scope of atry you can get the stack trace there which would let you skip the more complicated source-map usage.
Good luck!

Resources