Perform arbitrary set of asynchronous tasks

Perform arbitrary set of asynchronous tasks - node.js

My input is streamed from another source, which makes it difficult to use async.forEach. I am pulling data from an API endpoint, but I have a limit of 1000 objects per request to the endpoint, and I need to get hundreds of thousands of them (basically all of them) and I will know they're finished when the response contains < 1000 objects. Now, I have tried this approach:
/* List all deposits */
var depositsAll = [];
var depositsIteration = [];
async.doWhilst(this._post(endpoint_path, function (err, response) {
// check err
/* Loop through the data and gather only the deposits */
for (var key in response) {
//do some stuff
}
depositsAll += depositsIteration;
return callback(null, depositsAll);
}, {limit: 1000, offset: 0, sort: 'desc'}),
response.length > 1000, function (err, depositsAll) {
// check for err
// return the complete result
return callback(null, depositsAll);
});
With this code I get an async internal error that iterator is not a function. But in general I am almost sure the logic is not correct as well.
If it's not clear what I'm trying to achieve - I need to perform a request multiple times, and add the response data to a result that at the end contains all the results, so I can return it. And I need to perform requests until the response contains less than 1000 objects.
I also looked into async.queue but could not get the hang of it...
Any ideas?

You should be able to do it like that, but if that example is from your real code you have misunderstood some of how async works. doWhilst takes three arguments, each of them being a function:
The function to be called by async. Gets argument callback that must be called. In your case, you need to wrap this._post inside another function.
The test function (you would give value of response.length > 1000, ie. a boolean, if response would be defined)
The final function to be called once execution is stopped
Example with each needed function separated for readability:
var depositsAll = [];
var responseLength = 1000;
var self = this;
var post = function(asyncCb) {
self._post(endpoint_path, function(err, res) {
...
responseLength = res.length;
asyncCb(err, depositsAll);
});
}
var check = function() {
return responseLength >= 1000;
};
var done = function(err, deposits) {
console.log(deposits);
};
async.doWhilst(post, check, done);

Related

NodeJS parallel running function to fetch data from an REST API

I have been working with a Node.js code where I make a call to an API. I get a paginated response of 50 objects at a time. In a query parameter I set an offset as 0, 50, 100 to keep fetching the more data. if the response has 50 objects I increase my offset query parameter. And once the data is less that 50 I stop the call. Is there any way I can split the calling to get data in a faster way? Suppose I call api?offset=0, api?offset=50, api?offset=100, api?offset=150, in parallel so that I get data in a faster way, and collect the data of all the calls made. Also if there is no data stop calling for the next offset values.
NOTE: I don't know the offset limit.

Use recursive function
Since you don't know what the offset limit would be your best guess is to great groups of requests, so you will reach your offset faster.
For async tasks I love to use async here. Here is an example :
const async = require('async')
const offsets = [0, 50, 100, 150];
async.map(offsets, function (offset, callback) {
// Do request here with 'offset'
// And return response in the callback
// Do not return an error, otherwise the main callback (of map function)
// will be called immediately
return callback (null, response);
}, function (error, results) {
// results is an array containing all responses from all requests
// iterate over results, if all contain a valid result,
// you can increase your offset group and perform this operation again
});
Of course you will need to wrap this into another function which allows for an increasing offsetgroup , repeating the requests as long as you get valid results -> a so called recursive function :
const async = require('async')
function performGroupRequest (offsets, allResults, end) {
async.map(offsets, function (offset, callback) {
// Do request here with 'offset' and return response in the callback
return callback (null, response);
}, function (error, results) {
// check results
results.map(r => {
if (hasItems(r)) {
// valid result, add to allResults
allResults.push(r);
} else {
// invalid result, let's end here
return end(allResults);
}
});
// all results are valid here, so call the function again, with increased offset
performGroupRequest(offsets.map(o => o + 200), allResults, end)
});
}
function getAllItems () {
return Promise((resolve, reject) => {
performGroupRequest([0, 50, 100, 150], [], function(all) {
return resolve(all);
});
});
}
// start the journey
getAllItems().then((allResults) => {
});
👆 Please note : I totally omitted error handling here for simplicity reasons.

Request to API within while loop with accumulator

Many people have asked on this site how to loop through a list of URLs and make a GET request to each of them. This doesn't exactly serve my purpose, as the number of times I make a GET request will be dependent on the values I get from the initial API request.
As a general outline of what I currently have:
var total = null;
var curr = 0;
while (total == null || cur < total) {
request.get('https://host.com/skip=' + curr, function(error, response, body) {
var data = JSON.parse(body);
total = data['totalItems'];
curr += data.items.length;
}
}
Due to Node.js and how it uses asynchronous requests, this gives me a forever loop, as total and cur always stay as null and 0 respectively. I'm not really sure how to rework this to use Promises and callbacks, can someone please help?

So there's a few ways to do this, but the easiest is probably to just recurse on the function that fetches the results.
It's not tested but should be in the ballpark:
function fetch(skip, accumulator, cb) {
// do some input sanitization
request.get('https://host.com/skip=' + skip, (err, res, body) => {
// commonly you'd just callback the error, but this is in case you've fetched a number of results already but then got an error.
if(err) return cb(err, accumulator);
var data = JSON.parse(body);
accumulator.total: data['totalItems'];
accumulator.items.concat(data.items);
if(accumulator.items.length === accumulator.total) return cb(null, accumulator);
return fetch(accumulator.items.length, accumulator, cb);
});
}
fetch(0, { items: [] }, console.log);

How to run asynchronous tasks synchronous?

I'm developing an app with the following node.js stack: Express/Socket.IO + React. In React I have DataTables, wherein you can search and with every keystroke the data gets dynamically updated! :)
I use Socket.IO for data-fetching, so on every keystroke the client socket emits some parameters and the server calls then the callback to return data. This works like a charm, but it is not garanteed that the returned data comes back in the same order as the client sent it.
To simulate: So when I type in 'a', the server responds with this same 'a' and so for every character.
I found the async module for node.js and tried to use the queue to return tasks in the same order it received it. For simplicity I delayed the second incoming task with setTimeout to simulate a slow performing database-query:
Declaration:
const async = require('async');
var queue = async.queue(function(task, callback) {
if(task.count == 1) {
setTimeout(function() {
callback();
}, 3000);
} else {
callback();
}
}, 10);
Usage:
socket.on('result', function(data, fn) {
var filter = data.filter;
if(filter.length === 1) { // TEST SYNCHRONOUSLY
queue.push({name: filter, count: 1}, function(err) {
fn(filter);
// console.log('finished processing slow');
});
} else {
// add some items to the queue
queue.push({name: filter, count: filter.length}, function(err) {
fn(data.filter);
// console.log('finished processing fast');
});
}
});
But the way I receive it in the client console, when I search for abc is as follows:
ab -> abc -> a(after 3 sec)
I want it to return it like this: a(after 3sec) -> ab -> abc
My thought is that the queue runs the setTimeout and then goes further and eventually the setTimeout gets fired somewhere on the event loop later on. This resulting in returning later search filters earlier then the slow performing one.
How can i solve this problem?

First a few comments, which might help clear up your understanding of async calls:
Using "timeout" to try and align async calls is a bad idea, that is not the idea about async calls. You will never know how long an async call will take, so you can never set the appropriate timeout.
I believe you are misunderstanding the usage of queue from async library you described. The documentation for the queue can be found here.
Copy pasting the documentation in here, in-case things are changed or down:
Creates a queue object with the specified concurrency. Tasks added to the queue are processed in parallel (up to the concurrency limit). If all workers are in progress, the task is queued until one becomes available. Once a worker completes a task, that task's callback is called.
The above means that the queue can simply be used to priorities the async task a given worker can perform. The different async tasks can still be finished at different times.
Potential solutions
There are a few solutions to your problem, depending on your requirements.
You can only send one async call at a time and wait for the first one to finish before sending the next one
You store the results and only display the results to the user when all calls have finished
You disregard all calls except for the latest async call
In your case I would pick solution 3 as your are searching for something. Why would you use care about the results for "a" if they are already searching for "abc" before they get the response for "a"?
This can be done by giving each request a timestamp and then sort based on the timestamp taking the latest.

SOLUTION:
Server:
exports = module.exports = function(io){
io.sockets.on('connection', function (socket) {
socket.on('result', function(data, fn) {
var filter = data.filter;
var counter = data.counter;
if(filter.length === 1 || filter.length === 5) { // TEST SYNCHRONOUSLY
setTimeout(function() {
fn({ filter: filter, counter: counter}); // return to client
}, 3000);
} else {
fn({ filter: filter, counter: counter}); // return to client
}
});
});
}
Client:
export class FilterableDataTable extends Component {
constructor(props) {
super();
this.state = {
endpoint: "http://localhost:3001",
filters: {},
counter: 0
};
this.onLazyLoad = this.onLazyLoad.bind(this);
}
onLazyLoad(event) {
var offset = event.first;
if(offset === null) {
offset = 0;
}
var filter = ''; // filter is the search character
if(event.filters.result2 != undefined) {
filter = event.filters.result2.value;
}
var returnedData = null;
this.state.counter++;
this.socket.emit('result', {
offset: offset,
limit: 20,
filter: filter,
counter: this.state.counter
}, function(data) {
returnedData = data;
console.log(returnedData);
if(returnedData.counter === this.state.counter) {
console.log('DATA: ' + JSON.stringify(returnedData));
}
}
This however does send unneeded data to the client, which in return ignores it. Somebody any idea's for further optimizing this kind of communication? For example a method to keep old data at the server and only send the latest?

Stop function from being invoked multiple times

I'm in the process of building a file upload component that allows you to pause/resume file uploads.
The standard way to achieve this seems to be to break the file into chunks on the client machine, then send the chunks along with book-keeping information up to the server which can store the chunks into a staging directory, then merge them together when it has received all of the chunks. So, this is what I am doing.
I am using node/express and I'm able to get the files fine, but I'm running into an issue because my merge_chunks function is being invoked multiple times.
Here's my call stack:
router.post('/api/videos',
upload.single('file'),
validate_params,
rename_uploaded_chunk,
check_completion_status,
merge_chunks,
record_upload_date,
videos.update,
send_completion_notice
);
the check_completion_status function is implemented as follows:
/* Recursively check to see if we have every chunk of a file */
var check_completion_status = function (req, res, next) {
var current_chunk = 1;
var see_if_chunks_exist = function () {
fs.exists(get_chunk_file_name(current_chunk, req.file_id), function (exists) {
if (current_chunk > req.total_chunks) {
next();
} else if (exists) {
current_chunk ++;
see_if_chunks_exist();
} else {
res.sendStatus(202);
}
});
};
see_if_chunks_exist();
};
The file names in the staging directory have the chunk numbers embedded in them, so the idea is to see if we have a file for every chunk number. The function should only next() one time for a given (complete) file.
However, my merge_chunks function is being invoked multiple times. (usually between 1 and 4) Logging does reveal that it's only invoked after I've received all of the chunks.
With this in mind, my assumption here is that it's the async nature of the fs.exists function that's causing the issue.
Even though the n'th invocation of check_completion_status may occur before I have all of the chunks, by the time we get to the nth call to fs.exists(), x more chunks may have arrived and been processed concurrently, so the function can keep going and in some cases get to the end and next(). However those chunks that arrived concurrently are also going to correspond to invocations of check_completion_status, which are also going to next() because we obviously have all of the files at this point.
This is causing issues because I didn't account for this when I wrote merge_chunks.
For completeness, here's the merge_chunks function:
var merge_chunks = (function () {
var pipe_chunks = function (args) {
args.chunk_number = args.chunk_number || 1;
if (args.chunk_number > args.total_chunks) {
args.write_stream.end();
args.next();
} else {
var file_name = get_chunk_file_name(args.chunk_number, args.file_id)
var read_stream = fs.createReadStream(file_name);
read_stream.pipe(args.write_stream, {end: false});
read_stream.on('end', function () {
//once we're done with the chunk we can delete it and move on to the next one.
fs.unlink(file_name);
args.chunk_number += 1;
pipe_chunks(args);
});
}
};
return function (req, res, next) {
var out = path.resolve('videos', req.video_id);
var write_stream = fs.createWriteStream(out);
pipe_chunks({
write_stream: write_stream,
file_id: req.file_id,
total_chunks: req.total_chunks,
next: next
});
};
}());
Currently, I'm receiving an error because the second invocation of the function is trying to read the chunks that have already been deleted by the first invocation.
What is the typical pattern for handling this type of situation? I'd like to avoid a stateful architecture if possible. Is it possible to cancel pending handlers right before calling next() in check_completion_status?

If you just want to make it work ASAP, I would use a lock (much like a db lock) to lock the resource so that only one of the requests processes the chunks. Simply create a unique id on the client, and send it along with the chunks. Then just store that unique id in some sort of a data structure, and look that id up prior to processing. The example below is by far not optimal (in fact this map will keep growing, which is bad), but it should demonstrate the concept
// Create a map (an array would work too) and keep track of the video ids that were processed. This map will persist through each request.
var processedVideos = {};
var check_completion_status = function (req, res, next) {
var current_chunk = 1;
var see_if_chunks_exist = function () {
fs.exists(get_chunk_file_name(current_chunk, req.file_id), function (exists) {
if (processedVideos[req.query.uniqueVideoId]){
res.sendStatus(202);
} else if (current_chunk > req.total_chunks) {
processedVideos[req.query.uniqueVideoId] = true;
next();
} else if (exists) {
current_chunk ++;
see_if_chunks_exist();
} else {
res.sendStatus(202);
}
});
};
see_if_chunks_exist();
};

Why is the zlib inflate function is not working in correct order in node.js?

I am new to Stackoverflow. I am now working on a node.js program. The input is a data stream containing few blocks of data (also zipped by DEFLATE algorithm) concatenated together. My goal is use INFLATE to restore the data blocks and put them into correct orders.
My problem is, when I use while loop to do extraction of data blocks, the data extracted is not in the order I input. Why?
while (initPointer < totalLength) {
...
console.log('Extracting '+rawLengthBuf.readUInt32LE(0));
...
zlib.inflate(dataBuf, function(err, data) {
if (!err) {
console.log('Extracted '+data.length);
}
});
}
Output:
Extracting 18876
Extracting 15912
Extracting 10608
Extracted 15912
Extracted 10608
Extracted 18876
Please forgive me that I may not describe the situation in very clear way.
Thanks.

Use Async.js. Below are steps to make your code work synchronously using async.js.
I ran into similar problem and resolved it using the steps below. I have replaced my function with yours. It should work. You just give it a try, documentation is inline to make you understand the code.
/*
* need an array to run all queries one by one in a definite order
*/
var arr = [];
while (initPointer < totalLength) {
console.log('Extracting '+rawLengthBuf.readUInt32LE(0));
arr.push(dataBuf)
}
/*
* Callback function for initiation of waterfall
*/
var queue = [
function(callback) {
// pass the ref array and run first query by passing starting index - 0
callback(null, arr, 0)
}
];
/*
* Object to store result of all async operation
*/
var finalResult = {};
/*
* Generic Callback function for every dynamic query
*/
var callbackFunc = function(prevModelData, currentIndex, callback) {
//current file/data
zlib.inflate(arr[currentIndex], function(err, data) {
if (!err) {
console.log('Extracted '+data.length);
} else {
// console.log(result)
finalResult[arr[currentIndex]] = data
//call next element in queue
callback(null, data, currentIndex + 1)
}
});
}
/*
* Add callback function for every dynamic query
*/
while (initPointer < totalLength) {
...
console.log('Extracting '+rawLengthBuf.readUInt32LE(0));
...
queue.push(callbackFunc)
}
/*
* Run all dynamic queries one by one using async.js waterfall method
*/
async.waterfall(queue, function (err, result) {
console.log('finish', finalResult)
});

The async zlib methods do their work in the thread pool, so each inflate could be executed in parallel. Which inflate finishes first depends on a number of factors (e.g. CPU scheduling algorithm), so you cannot assume a particular order when you call zlib.inflate() in a loop like that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string