I've written a very tiny script in node.js to check out how links can be passed to a function using loop.
I could see that I can do the same in two ways but can't figure out which way I should stick to, meaning which one is ideal and why?
One way: for (link in links) {}
var request = require('request');
var links = ['https://stackoverflow.com/questions/ask', 'https://github.com/request/request'];
for (link in links) {
(function(url) {
request(url, function() {
console.log(url);
});
})(links[link]);
}
The other way: for (const link of links) {}
var request = require('request');
var links = ['https://stackoverflow.com/questions/ask', 'https://github.com/request/request'];
for (const link of links) {
(function(url) {
request(url, function() {
console.log(url);
});
})(link);
}
There is no ideal way or at least an universal ideal way of doing this, So i will point out the difference between these two.
In first for loop you are iterating the array as an object (in javascript array is an object which can traverse with indexes). But will create a global variable called link after the execution. So an unwanted variable and memory location is created.
Try console.log(link) after the execution.
The second for loop is introduced with ECMA Script 6 and won't create a global variable and is recommended. Because of the readability and the more control over your data, and link can be defined as const if you want. So it won't be modified inside the loop.
For node.js I guess second one may be perfect for most scenarios. But in javascript, the first one may be higher in performance wise, if you are compiling it from ES6 to ES5 and it is the case for most scenarios.
Related
I'm using https://github.com/Haidy777/node-youtubeAPI-simplifier to grab some information from a playlist of Bounty Killers. The way, this library is setup seems to use Promise via Bluebird (https://github.com/petkaantonov/bluebird) which I don't know much about. Looking up the Beginner's Guide for BlueBird gives http://bluebirdjs.com/docs/beginners-guide.html which literally just shows
This article is partially or completely unfinished. You are welcome to create pull requests to help completing this article.
I am able to set up the library
var ytapi = require('node-youtubeapi-simplifier');
ytapi.setup('My Server Key');
As well as list some information about Bounty Killers
ytdata = [];
ytapi.playlistFunctions.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (data) {
for (var i = 0, len = data.length; i < len; i++) {
ytapi.videoFunctions.getDetailsForVideoIds([data[i].videoId])
.then(function (video) {
console.log(video);
// ytdata.push(video); <- Push a Bounty Killer Video
});
}
});
// console.log(ytdata); This gives []
Basically the above pulls the full playlist (normally there will be some pagination here depending on the length) then it takes the data from getVideosForPlaylist iterates the list and calls getDetailsForVideoIds for each YouTube video. All good here.
The issues arises with getting data out of this. I would like to push the video object to ytdata array and I'm unsure whether the empty array at the end is due to scoping or some out of sync such that console.log(ytdata) gets called before the API calls are finished.
How will I be able to get each Bounty Killer video into the ytdata array to be available globally?
console.log(ytdata) gets called before the API calls are finished
Spot on, that's exactly what's happening here, the API calls are async. Once you're using async functions, you must go the async way if you want to deal with the returned data. Your code could be written like this:
var ytapi = require('node-youtubeapi-simplifier');
ytapi.setup('My Server Key');
// this function return a promise you can "wait"
function getVideos() {
return ytapi.playlistFunctions
.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (videos) {
// extract all videoIds
var videoIds = videos.map(video => video.videoId);
// getDetailsForVideoIds is called with an array of videoIds
// and return a promise, one API call is enough
return ytapi.videoFunctions.getDetailsForVideoIds(videoIds);
});
}
getVideos().then(function (ydata) {
// this is the only place ydata is full of data
console.log(ydata);
});
I made use of ES6's arrow function in videos.map(video => video.videoId);, that should work if your nodejs is v4+.
console.log(ytdata) should be immediately AFTER your FOR loop. This data is NOT available until the promise is resolved and the FOR loop execution is complete and attempting to access it beforehand will give you an empty array.
(your current console.log is not working because that code is being executed immediately before the promise is resolved). Only code inside the THEN block is executed AFTER the promise is resolved.
If you NEED the data available NOW or ASAP and the requests for the videos is taking a long time then can you request 1 video at a time or on demand or on a separate thread (using a webworker maybe)? Can you implement caching?
Can you make the requests up front behind the scenes before the user even visits this page? (not sure this is a good idea but it is an idea)
Can you use video thumbnails (like youtube does) so that when the thumbnail is clicked then you start streaming and playing the video?
Some ideas ... Hope this helps
ytdata = [];
ytapi.playlistFunctions.getVideosForPlaylist('PLCCB0BFBF2BB4AB1D')
.then(function (data) {
// THE CODE INSIDE THIS THEN BLOCK IS EXECUTED WHEN ALL THE VIDEO IDS HAVE BEEN RETRIEVED AND ARE AVAILABLE
// YOU COULD SAVE THESE TO A DATASTORE IF YOU WANT
for (var i = 0, len = data.length; i < len; i++) {
var videoIds = [data[i].videoId];
ytapi.videoFunctions.getDetailsForVideoIds(videoIds)
.then(function (video) {
// THE CODE INSIDE THIS THEN BLOCK IS EXECUTED WHEN ALL THE DETAILS HAVE BEEN DOWNLOADED FOR ALL videoIds provided
// AGAIN YOU CAN DO WHATEVER YOU WANT WITH THESE DETAILS
// ALSO NOW THAT THE DATA IS AVAILABLE YOU MIGHT WANT TO HIDE THE LOADING ICON AND RENDER THE PAGE! AGAIN JUST AN IDEA, A DATA STORE WOULD PROVIDE FASTER ACCESS BUT YOU WOULD NEED TO UPDATE THE CACHE EVERY SO OFTEN
// ytdata.push(video); <- Push a Bounty Killer Video
});
// THE DETAILS FOR ANOTHER VIDEO BECOMES AVAILABLE AFTER EACH ITERATION OF THE FOR LOOP
}
// ALL THE DATA IS AVAILABLE WHEN THE FOR LOOP HAS COMPLETED
});
// This is executed immediately before YTAPI has responded.
// console.log(ytdata); This gives []
Ok, say I have two listeners with callbacks, and the code in one callback depends on a variable (UIDfromOnEndFunction) from the other callback.
For example:
//using andris9/mailparser on github
var mailparser = new MailParser({
streamAttachments: true
}
// OnEnd Function
mailparser.on("end", function(objMail){
**UIDfromOnEndFuntion** = objMail.UID;
saveToDB("mail" + "1234", objMail);
});
mailparser.on("attachment", function(attachment){
var output = fs.createWriteStream("attachments/"
+ **UIDfromOnEndFuntion** + "/" + attachment.generatedFileName);
// need UIDfromOnEndFunction here
attachment.stream.pipe(output);
});
How do I cause the callback in mailparser.on("attachment" to get the variable UIDfromOnEndFunction.
Does this involve promises? How do you do this?
You can do this via a closure: just access a variable from outside.
//using andris9/mailparser on github
var UIDfromOnEndFunction;
var mailparser = new MailParser({
streamAttachments: true
}
// OnEnd Function
mailparser.on("end", function(objMail){
UIDfromOnEndFuntion = objMail.UID;
saveToDB("mail" + "1234", objMail);
});
mailparser.on("attachment", function(attachment){
var output = fs.createWriteStream("attachments/" + UIDfromOnEndFuntion + "/" + attachment.generatedFileName);
attachment.stream.pipe(output);
});
Please note my comment about ensuring end is called before attachment. If they do not fire in this way, this is fundamentally impossible.
OK I came up with a solution. It's based on closures from #Brenden Ashworth 's suggestion. It's untested but I'm fairly certain it would work, and I wanted to post this before I moved on as I found I didn't need to do what the original question described to get my project working.
However, I still think it is useful to have a solution to this type of problem as the need could arise, and I don't know a better solution.
Here's my solution:
//using andris9/mailparser on github
var mailparser = new MailParser({
streamAttachments: true
}
var UIDfromOnEndFuntion;
var myAttachment;
var intNumberOfEmitsToEndAndAttachment = 0;
var funcBothEndAndAttachmentEmitted = function () {
var output = fs.createWriteStream("attachments/"
+ UIDfromOnEndFuntion + "/" + myAttachment.generatedFileName);
//UIDfromEndFunction should be garaunteed to be
//populated by .once("end",...)
myAttachment.stream.pipe(output);
//myAttachment should be gauranteed to be populated
//by .once("attachment",...)
}
mailparser.once("end", function(objMail){
UIDfromOnEndFuntion = objMail.UID;
saveToDB("mail" + "1234", objMail);
intNumberOfEmitsToEndAndAttachment++;
if (intNumberOfEmitsToEndAndAttachment == 2) {
funcBothEndAndAttachmentEmitted();
}
});
mailparser.once("attachment", function(attachment){
myAttachment = attachment;
intNumberOfEmitsToEndAndAttachment++;
if (intNumberOfEmitsToEndAndAttachment == 2) {
funcBothEndAndAttachmentEmitted();
}
});
Now this would only work for a single emitted "end" and a single emitted "attachment".
You could get more creative with how the tracking is done to handle multiple attachments. For example, instead of using an integer to track the total number of calls, an array of objects could be used like [{"attachment",attachment_args1},{"attachment",attachment_args2},{"end",end_args2}] to do the tracking of calls (this would mean attachment has been called twice so far, and "end" once, for example, and you could trigger a function based on that knowledge like I do by calling funcBothEndAndAttachmentEmitted()).
I think this needs to be cleaned up and made into a library, unless there is a better way to do it that's not apparent. (Please comment if you know a better solution or I might go ahead and write a library for this solution.)
Another solution I thought of that might work is putting mailparser.once("attachment"...) inside of the callback for mailparser.once("end"...) but I suspect that wouldn't work if "attachment" is emitted first, and this solution seems a bit cludgey compared to a library-based solution if you're working with many different emitted events for some reason or different objects emitting different events.
Is there a best-practice solution to be able to use within in promise this? In jQuery i can bind my object to use it in my promise/callback - but in angularJS? Are there best-practice solutions? The way "var service = this;" i don't prefer ...
app.service('exampleService', ['Restangular', function(Restangular) {
this._myVariable = null;
this.myFunction = function() {
Restangular.one('me').get().then(function(response) {
this._myVariable = true; // undefined
});
}
}];
Are there solutions for this issue? How i can gain access to members or methods from my service within the promise?
Thank you in advance.
The generic issue of dynamic this in a callback is explained in this answer which is very good - I'm not going to repeat what Felix said. I'm going to discuss promise specific solutions instead:
Promises are specified under the Promises/A+ specification which allows promise libraries to consume eachother's promises seamlessly. Angular $q promises honor that specification and therefor and Angular promise must by definition execute the .then callbacks as functions - that is without setting this. In strict mode doing promise.then(fn) will always evaluate this to undefined inside fn (and to window in non-strict mode).
The reasoning is that ES6 is across the corner and solves these problems more elegantly.
So, what are your options?
Some promise libraries provide a .bind method (Bluebird for example), you can use these promises inside Angular and swap out $q.
ES6, CoffeeScript, TypeScript and AtScript all include a => operator which binds this.
You can use the ES5 solution using .bind
You can use one of the hacks in the aforementioned answer by Felix.
Here are these examples:
Adding bind - aka Promise#bind
Assuming you've followed the above question and answer you should be able to do:
Restangular.one('me').get().bind(this).then(function(response) {
this._myVariable = true; // this is correct
});
Using an arrow function
Restangular.one('me').get().then(response => {
this._myVariable = true; // this is correct
});
Using .bind
Restangular.one('me').get().then(function(response) {
this._myVariable = true; // this is correct
}.bind(this));
Using a pre ES5 'hack'
var that = this;
Restangular.one('me').get().then(function(response) {
that._myVariable = true; // this is correct
});
Of course, there is a bigger issue
Your current design does not contain any way to _know when _myVariable is available. You'd have to poll it or rely on internal state ordering. I believe you can do better and have a design where you always execute code when the variable is available:
app.service('exampleService', ['Restangular', function(Restangular) {
this._myVariable =Restangular.one('me');
}];
Then you can use _myVariable via this._myVariable.then(function(value){. This might seem tedious but if you use $q.all you can easily do this with several values and this is completely safe in terms of synchronization of state.
If you want to lazy load it and not call it the first time (that is, only when myFunction is called) - I totally get that. You can use a getter and do:
app.service('exampleService', ['Restangular', function(Restangular) {
this.__hidden = null;
Object.defineProperty(this,"_myVariable", {
get: function(){
return this.__hidden || (this.__hidden = Restangular.one('me'));
}
});
}];
Now, it will be lazy loaded only when you access it for the first time.
I am trying to get all the variables that have been defined, i tried using the global object
but it seems to be missing the ones defined as var token='44'; and only includes the ones defined as token='44';. What i am looking for idealy is something like the get_defined_vars() function of php. I need to access the variables because i need to stop the node process and then restart at the same point without having to recalculate all the variables, so i want to dump them somewhere and access them later.
It's impossible within the language itself.
However:
1. If you have an access to the entire source code, you can use some library to get a list of global variables like this:
var ast = require('uglify-js').parse(source)
ast.figure_out_scope()
console.log(ast.globals).map(function (node, name) {
return name
})
2. If you can connect to node.js/v8 debugger, you can get a list of local variables as well, see _debugger.js source code in node.js project.
As you stated
I want to dump them somewhere and access them later.
It seems like you should work towards a database (as Jonathan mentioned in the comments), but if this is a one off thing you can use JSON files to store values. You can then require the JSON file back into your script and Node will handle the rest.
I wouldn't recommend this, but basically create a variable that will hold all the data / variables that you define. Some might call this a God Object. Just make sure that before you exit the script, export the values to a JSON file. If you're worried about your application crashing, perform backups to that file more frequently.
Here is a demo you can play around with:
var fs = require('fs');
var globalData = loadData();
function loadData() {
try { return require('./globals.json'); } catch(e) {}
return {};
}
function dumpGlobalData(callback) {
fs.writeFile(
__dirname + '/globals.json', JSON.stringify(globalData), callback);
}
function randomToken() {
globalData.token = parseInt(Math.random() * 1000, 10);
}
console.log('token was', globalData.token)
randomToken();
console.log('token is now', globalData.token)
dumpGlobalData(function(error) {
process.exit(error ? 1 : 0);
});
I'm trying to write a scraper using 'request' and 'cheerio'. I have an array of 100 urls. I'm looping over the array and using 'request' on each url and then doing cheerio.load(body). If I increase i above 3 (i.e. change it to i < 3 for testing) the scraper breaks because var productNumber is undefined and I can't call split on undefined variable. I think that the for loop is moving on before the webpage responds and has time to load the body with cheerio, and this question: nodeJS - Using a callback function with Cheerio would seem to agree.
My problem is that I don't understand how I can make sure the webpage has 'loaded' or been parsed in each iteration of the loop so that I don't get any undefined variables. According to the other answer I don't need a callback, but then how do I do it?
for (var i = 0; i < productLinks.length; i++) {
productUrl = productLinks[i];
request(productUrl, function(err, resp, body) {
if (err)
throw err;
$ = cheerio.load(body);
var imageUrl = $("#bigImage").attr('src'),
productNumber = $("#product").attr('class').split(/\s+/)[3].split("_")[1]
console.log(productNumber);
});
};
Example of output:
1461536
1499543
TypeError: Cannot call method 'split' of undefined
Since you're not creating a new $ variable for each iteration, it's being overwritten when a request is completed. This can lead to undefined behaviour, where one iteration of the loop is using $ just as it's being overwritten by another iteration.
So try creating a new variable:
var $ = cheerio.load(body);
^^^ this is the important part
Also, you are correct in assuming that the loop continues before the request is completed (in your situation, it isn't cheerio.load that is asynchronous, but request is). That's how asynchronous I/O works.
To coordinate asynchronous operations you can use, for instance, the async module; in this case, async.eachSeries might be useful.
You are scraping some external site(s). You can't be sure the HTML all fits exactly the same structure, so you need to be defensive on how you traverse it.
var product = $('#product');
if (!product) return console.log('Cannot find a product element');
var productClass = product.attr('class');
if (!productClass) return console.log('Product element does not have a class defined');
var productNumber = productClass.split(/\s+/)[3].split("_")[1];
console.log(productNumber);
This'll help you debug where things are going wrong, and perhaps indicate that you can't scrape your dataset as easily as you'd hoped.