Getting data pushed to an array outside of a Promise - node.js

I'm using https://github.com/Haidy777/node-youtubeAPI-simplifier to grab some information from a playlist of Bounty Killers. The way, this library is setup seems to use Promise via Bluebird (https://github.com/petkaantonov/bluebird) which I don't know much about. Looking up the Beginner's Guide for BlueBird gives http://bluebirdjs.com/docs/beginners-guide.html which literally just shows
This article is partially or completely unfinished. You are welcome to create pull requests to help completing this article.
I am able to set up the library
var ytapi = require('node-youtubeapi-simplifier');
ytapi.setup('My Server Key');
As well as list some information about Bounty Killers
ytdata = [];
ytapi.playlistFunctions.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (data) {
for (var i = 0, len = data.length; i < len; i++) {
ytapi.videoFunctions.getDetailsForVideoIds([data[i].videoId])
.then(function (video) {
console.log(video);
// ytdata.push(video); <- Push a Bounty Killer Video
});
}
});
// console.log(ytdata); This gives []
Basically the above pulls the full playlist (normally there will be some pagination here depending on the length) then it takes the data from getVideosForPlaylist iterates the list and calls getDetailsForVideoIds for each YouTube video. All good here.
The issues arises with getting data out of this. I would like to push the video object to ytdata array and I'm unsure whether the empty array at the end is due to scoping or some out of sync such that console.log(ytdata) gets called before the API calls are finished.
How will I be able to get each Bounty Killer video into the ytdata array to be available globally?

console.log(ytdata) gets called before the API calls are finished
Spot on, that's exactly what's happening here, the API calls are async. Once you're using async functions, you must go the async way if you want to deal with the returned data. Your code could be written like this:
var ytapi = require('node-youtubeapi-simplifier');
ytapi.setup('My Server Key');
// this function return a promise you can "wait"
function getVideos() {
return ytapi.playlistFunctions
.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (videos) {
// extract all videoIds
var videoIds = videos.map(video => video.videoId);
// getDetailsForVideoIds is called with an array of videoIds
// and return a promise, one API call is enough
return ytapi.videoFunctions.getDetailsForVideoIds(videoIds);
});
}
getVideos().then(function (ydata) {
// this is the only place ydata is full of data
console.log(ydata);
});
I made use of ES6's arrow function in videos.map(video => video.videoId);, that should work if your nodejs is v4+.

console.log(ytdata) should be immediately AFTER your FOR loop. This data is NOT available until the promise is resolved and the FOR loop execution is complete and attempting to access it beforehand will give you an empty array.
(your current console.log is not working because that code is being executed immediately before the promise is resolved). Only code inside the THEN block is executed AFTER the promise is resolved.
If you NEED the data available NOW or ASAP and the requests for the videos is taking a long time then can you request 1 video at a time or on demand or on a separate thread (using a webworker maybe)? Can you implement caching?
Can you make the requests up front behind the scenes before the user even visits this page? (not sure this is a good idea but it is an idea)
Can you use video thumbnails (like youtube does) so that when the thumbnail is clicked then you start streaming and playing the video?
Some ideas ... Hope this helps

ytdata = [];
ytapi.playlistFunctions.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (data) {
// THE CODE INSIDE THIS THEN BLOCK IS EXECUTED WHEN ALL THE VIDEO IDS HAVE BEEN RETRIEVED AND ARE AVAILABLE
// YOU COULD SAVE THESE TO A DATASTORE IF YOU WANT
for (var i = 0, len = data.length; i < len; i++) {
var videoIds = [data[i].videoId];
ytapi.videoFunctions.getDetailsForVideoIds(videoIds)
.then(function (video) {
// THE CODE INSIDE THIS THEN BLOCK IS EXECUTED WHEN ALL THE DETAILS HAVE BEEN DOWNLOADED FOR ALL videoIds provided
// AGAIN YOU CAN DO WHATEVER YOU WANT WITH THESE DETAILS
// ALSO NOW THAT THE DATA IS AVAILABLE YOU MIGHT WANT TO HIDE THE LOADING ICON AND RENDER THE PAGE! AGAIN JUST AN IDEA, A DATA STORE WOULD PROVIDE FASTER ACCESS BUT YOU WOULD NEED TO UPDATE THE CACHE EVERY SO OFTEN
// ytdata.push(video); <- Push a Bounty Killer Video
});
// THE DETAILS FOR ANOTHER VIDEO BECOMES AVAILABLE AFTER EACH ITERATION OF THE FOR LOOP
}
// ALL THE DATA IS AVAILABLE WHEN THE FOR LOOP HAS COMPLETED
});
// This is executed immediately before YTAPI has responded.
// console.log(ytdata); This gives []

Related

Firebase Firestore transactions incredibly slow (3-4 minutes)

Edit: Removing irrelevant code to improve readability
Edit 2: Reducing example to only uploadGameRound function and adding log output with times.
I'm working on a mobile multiplayer word game and was previously using the Firebase Realtime Database with fairly snappy performance apart from the cold starts. Saving an updated game and setting stats would take at most a few seconds. Recently I made the decision to switch to using Firestore for my game data and player stats / top lists, primarily because of the more advanced queries and the automatic scaling with no need for manual sharding. Now I've got things working on Firestore, but the time it takes to save an updated game and update a number of stats is just ridiculous. I'm clocking average between 3-4 minutes before the game is updated, stats added and everything is available in the database for other clients and viewable in the web interface. I'm guessing and hoping that this is because of something I've messed up in my implementation, but the transactions all go through and there are no warnings or anything else to go on really. Looking at the cloud functions log, the total time from function call to completion log statement appears to be a bit more than a minute, but that log doesn't appear until after same the 3-4 minute wait for the data.
Here's the code as it is. If someone has time to have a look and maybe spot what's wrong I'd be hugely grateful!
This function is called from Unity client:
exports.uploadGameRound = functions.https.onCall((roundUploadData, response) => {
console.log("UPLOADING GAME ROUND. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// CODE REMOVED FOR READABILITY. JUST PREPARING SOME VARIABLES TO USE BELOW. NOTHING HEAVY, NO DATABASE TRANSACTIONS. //
// Get a new write batch
const batch = firestoreDatabase.batch();
// Save game info to activeGamesInfo
var gameInfoRef = firestoreDatabase.collection('activeGamesInfo').doc(gameId);
batch.set(gameInfoRef, gameInfo);
// Save game data to activeGamesData
const gameDataRef = firestoreDatabase.collection('activeGamesData').doc(gameId);
batch.set(gameDataRef, { gameDataCompressed: updatedGameDataGzippedString });
if (foundWord !== undefined && foundWord !== null) {
const wordId = foundWord.timeStamp + "_" + foundWord.word;
// Save word to allFoundWords
const wordRef = firestoreDatabase.collection('allFoundWords').doc(wordId);
batch.set(wordRef, foundWord);
exports.incrementNumberOfTimesWordFound(gameInfo.language, foundWord.word);
}
console.log("COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
// Commit the batch
batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});
});
Again, any help with understanding this weird behaviour massively appreciated!
Ok, so I found the problem now and thought I should share it:
Simply adding a return statement before the batch commit fixed the function and reduced the time from 4 minutes to less than a second:
RETURN batch.commit().then(result => {
return gameInfoRef.update({ roundUploaded: true }).then(function (result2) {
console.log("DONE COMMITTING BATCH. TIME: ");
var d = new Date();
var n = d.toLocaleTimeString();
console.log(n);
return;
});
});
Your function isn't returning a promise that resolves with the data to send to the client app. In the absence of a returned promise, it will return immediately, with no guarantee that any pending asynchronous work will terminate correctly.
Calling then on a single promise isn't enough to handle promises. You likely have lots of async work going on here, between commit() and other functions like incrementNumberOfTimesWordFound. You will need to handle all of the promises correctly, and make sure your overall function returns only a single promise that resolves when all that work is complete.
I strongly suggest taking some time to learn how promises work in JavaScript - this is crucial to writing effective functions. Without a full understanding, things will appear to go wrong, or not at all, in strange ways.

Difficulty processing CSV file, browser timeout

I was asked to import a csv file from a server daily and parse the respective header to the appropriate fields in mongoose.
My first idea was to make it to run automatically with a scheduler using the cron module.
const CronJob = require('cron').CronJob;
const fs = require("fs");
const csv = require("fast-csv")
new CronJob('30 2 * * *', async function() {
await parseCSV();
this.stop();
}, function() {
this.start()
}, true);
Next, the parseCSV() function code is as follow:
(I have simplify some of the data)
function parseCSV() {
let buffer = [];
let stream = fs.createReadStream("data.csv");
csv.fromStream(stream, {headers:
[
"lot", "order", "cwotdt"
]
, trim:true})
.on("data", async (data) =>{
let data = { "order": data.order, "lot": data.lot, "date": data.cwotdt};
// Only add product that fulfill the following condition
if (data.cwotdt !== "000000"){
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
}
})
.on("end", function(){
db.Product.find({}, function(err, productAvailable){
// Check whether database exists or not
if(productAvailable.length !== 0){
// console.log("Database Exists");
// Add subsequent onward
db.Product.insertMany(buffer)
buffer = [];
} else{
// Add first time
db.Product.insertMany(buffer)
buffer = [];
}
})
});
}
It is not a problem if it's just a few line of rows in the csv file but just only reaching 2k rows, I encountered a problem. The culprit is due to the if condition checking when listening to the event handler on, it needs to check every single row to see whether the database contains the data already or not.
The reason I'm doing this is that the csv file will have new data added into it and I need to add all the data for the first time if database is empty or look into every single row and only add those new data into mongoose.
The 1st approach I did from here (as in the code),was using async/await to make sure that all the datas have been read before proceeding to the event handler end. This helps but I see from time to time (with mongoose.set("debug", true);), some data are being queried twice, which I have no idea why.
The 2nd approach was not to use the async/await feature, this has some downside since the data was not fully queried, it proceeded straight to the event handler end and then insertMany some of the datas which were able to get pushed into the buffer.
If i stick with the current approach, it is not an issue, but the query will take 1 to 2 minutes, not to mention even more if the database keeps growing. So, during those few minutes of querying, the event queue got blocked and therefore when sending request to the server, the server time out.
I used stream.pause() and stream.resume() before this code but I can't get it to work as it just jump straight to the end event handler first. This cause the buffer to be empty every single time since end event handler runs before the on event handler
I cant' remember the links that I used but the fundamentals that I got from is through this.
Import CSV Using Mongoose Schema
I saw these threads:
Insert a large csv file, 200'000 rows+, into MongoDB in NodeJS
Can't populate big chunk of data to mongodb using Node.js
to be similar to what I need but it's a bit too complicated for me to understand what is going on. Seems like using socket or a child process maybe? Furthermore, I still need to check conditions before adding into the buffer
Anyone care to guide me on this?
Edit: await is removed from console.log as it is not asynchronous
Forking a child process approach:
When web service got a request of csv data file save it somewhere in app
Fork a child process -> child process example
Pass the file url to the child_process to run the insert checks
When child process finish processing the csv file, delete the file
Like what Joe said, indexing the DB would speed up the processing time by a lot when there are lots(millions) of tuples.
If you create an index on order and lot. The query should be very fast.
db.Product.createIndex( { order: 1, lot: 1 }
Note: This is a compound index and may not be the ideal solution. Index strategies
Also, your await on console.log is weird. That may be causing your timing issues. console.log is not async. Additionally the function is not marked async
// removing await from console.log
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
I would try with removing the await on console.log (that may be a red herring if console.log is for stackoverflow and hiding the actual async method.) However, be sure to mark the function with async if that is the case.
Lastly, if the problem still exists. I may look into a 2 tiered approach.
Insert all lines from the CSV file into a mongo collection.
Process that mongo collection after the CSV has been parsed. Removing the CSV from the equation.

How to overcome asynchronous non-blocking NOT returning values in times?

I am creating an array of JSON objects which is then stored in mongodb.
Each JSON object contains a number of fields - each being populated before I save the object to mongodb.
Some of the Objects attributes are populated by making API calls to other websites such as last.fm but the returned value is not quick enough to populate the attribute before the object is saved to mongodb.
How can I wait for all attributes of an object to be populated before saving it? I did try async.waterfall but it still falls through without waiting and I end up with a database filled with documents with empty fields..
Any help would be greatly appreciated.
Thanks :)
You have a few options for controlling asynchrony in JavaScript:
Callback pattern: (http://npmjs.org/async) async.all([...], function (err) {
Promises: (http://npmjs.org/q) Q.all([...]).then(function () {
Streams: (http://npmjs.org/concat-stream) see also https://github.com/substack/stream-handbook
Since you say you are making multiple API calls to other websites, you may want to try:
async.each(api_requests,
function(api_request, cb) {
request(api_request, function (error, response, body) {
/* code */
/* add to model for Mongo */
cb();
});
},
function(err) {
// continue execution after all cbs are received
/* code */
/* save to Mongo, etc.. */
}
);
The above example is most applicable when you are making numerous requests following the same format. Please review the documentation for Waterfall (https://github.com/caolan/async#waterfall) if the input into your next step depends on the output of the previous step or Parallel (https://github.com/caolan/async#parallel) if you have a bunch of unrelated tasks that don't rely on each other. The great thing about async is that you can nest and string all the functions together to support what you're trying to do.
You'll either want to use promises or some sort of callback mechanism. Here's an example of the promise method with jPromise:
var jPromise = require('jPromise');
var promises = [];
for(var i = 0; i < 10; i++) {
promises.push(someAsyncApiCall(i));
}
jPromise.when(promises).then(function() {
saveThingsToTheDb();
});
Similarly, without the promise library:
var finished = 0;
var toDo = 10;
function allDone() {
saveThingsToTheDb();
}
for(var i = 0; i < toDo.length; i++) {
someAsyncApiCall(function() {
finished++;
if(finished === toDo) {
allDone();
}
});
}
Personally, I prefer the promise method, but that will only that well if the API you're calling returns some sort of a promise. If it doesn't, you'll be SOL and wrap the callback API with promises somehow (Q does this pretty well).

call back on cheerio node.js

I'm trying to write a scraper using 'request' and 'cheerio'. I have an array of 100 urls. I'm looping over the array and using 'request' on each url and then doing cheerio.load(body). If I increase i above 3 (i.e. change it to i < 3 for testing) the scraper breaks because var productNumber is undefined and I can't call split on undefined variable. I think that the for loop is moving on before the webpage responds and has time to load the body with cheerio, and this question: nodeJS - Using a callback function with Cheerio would seem to agree.
My problem is that I don't understand how I can make sure the webpage has 'loaded' or been parsed in each iteration of the loop so that I don't get any undefined variables. According to the other answer I don't need a callback, but then how do I do it?
for (var i = 0; i < productLinks.length; i++) {
productUrl = productLinks[i];
request(productUrl, function(err, resp, body) {
if (err)
throw err;
$ = cheerio.load(body);
var imageUrl = $("#bigImage").attr('src'),
productNumber = $("#product").attr('class').split(/\s+/)[3].split("_")[1]
console.log(productNumber);
});
};
Example of output:
1461536
1499543
TypeError: Cannot call method 'split' of undefined
Since you're not creating a new $ variable for each iteration, it's being overwritten when a request is completed. This can lead to undefined behaviour, where one iteration of the loop is using $ just as it's being overwritten by another iteration.
So try creating a new variable:
var $ = cheerio.load(body);
^^^ this is the important part
Also, you are correct in assuming that the loop continues before the request is completed (in your situation, it isn't cheerio.load that is asynchronous, but request is). That's how asynchronous I/O works.
To coordinate asynchronous operations you can use, for instance, the async module; in this case, async.eachSeries might be useful.
You are scraping some external site(s). You can't be sure the HTML all fits exactly the same structure, so you need to be defensive on how you traverse it.
var product = $('#product');
if (!product) return console.log('Cannot find a product element');
var productClass = product.attr('class');
if (!productClass) return console.log('Product element does not have a class defined');
var productNumber = productClass.split(/\s+/)[3].split("_")[1];
console.log(productNumber);
This'll help you debug where things are going wrong, and perhaps indicate that you can't scrape your dataset as easily as you'd hoped.

Asynchronous function with multiple emit events (futures in meteor)

My use case is to read RSS feed items asynchronously and load them into a meteor collection.
I have the feedparser npm module that does the parsing. It emits three events .on('error'), .on('meta') and .on('readable) with three different outputs.
When I run it in fixtures.js, with just console.log statements to run the output, its working fine.
When I use the same code to insert into a collection, I get errors related to asynchronocity of the function (assuming something to do with fibers)
So, I want to make it into a meteor method using futures as below -
http://www.discovermeteor.com/patterns/5828399
I tried but could not wrap my head around handling multiple events in Futures.
If you just want to push something to db at one point, it's enough to synchronize this call. Other than that, you can do whatever you want asynchronously. For example:
var Fiber = Npm.require('fibers');
var item = {};
var onInit = function() {
// do whatever with item
};
var onData = function() {
// do whatever with item
};
var onFinish = function() {
new Fiber(function(){
Documents.insert(item);
}).run();
};
Although Meteor is a great tool, I think node and its async insight is brilliant, and the best tool for what you are doing. Keep as a plan b having this part of your project be a straight node app.
Otherwise,
async from meteor
and

Resources