I have a chrome extension browser action that I want to have list a series of links, and open any selected link in the current tab. So far what I have is this, using jquery:
var url = urlForThisLink;
var li = $('<li/>');
var ahref = $('' + title + '');
ahref.click(function(){
chrome.tabs.getSelected(null, function (tab) {
chrome.tabs.update(tab.id, {url: url});
});
});
li.append(ahref);
It partially works. It does navigate the current tab, but will only navigate to whichever link was last created in this manner. How can I do this for an iterated series of links?
#jmort253's answer is actually a good illustration of what is probably your error. Despite being declared inside the for loop, url has function scope since it is declared with var. So your click handler closure is binding to a variable scoped outside the for loop, and every instance of the closure uses the same value, ie. the last one.
Once Chrome supports the let keyword you will be able to use it instead of var and it will work fine since url will be scoped to the body of the for loop. In the meantime you'll have to create a new scope by creating your closure in a function:
function makeClickHandler(url) {
return function() { ... };
}
Inside the for loop say:
for (var i = 0; i < urls.length; i++) {
var url = urls[i];
...
ahref.click(makeClickHandler(url));
...
}
In your code example, it looks like you only have a single link. Instead, let's assume you have an actual collection of links. In that case, you can use a for loop to iterate through them:
// collection of urls
var urls = ["http://example.com", "http://domain.org"];
// loop through the collection, for each url, build a separate link.
for(var i = 0; i < urls.length; i++) {
// this is the link for iteration i
var url = urls[i];
var li = $('<li/>');
var ahref = $('' + title + '');
ahref.click( (function(pUrl) {
return function() {
chrome.tabs.getSelected(null, function (tab) {
chrome.tabs.update(tab.id, {url: pUrl});
});
}
})(url));
li.append(ahref);
}
I totally forgot about scope when writing the original answer, so I updated it to use a closure based on Matthew Gertner's answer. Basically, in the click event handler, I'm now passing in the url variable into an anonymous 1 argument function which returns another function. The returned function uses the argument passed into the anonymous function, so its state is unaffected by the fact that the next iterations of the for loop will change the value of url.
Related
I have a website that has a main URL containing several links. I want to get the first <p> element from each link on that main page.
I have the following code that works fine to get the desired links from main page and stores them in urls array. But my issue is
that I don't know how to make a loop to load each url from urls array and print each first <p> in each iteration or append them
in a variable and print all at the end.
How can I do this? thanks
var request = require('request');
var cheerio = require('cheerio');
var main_url = 'http://www.someurl.com';
request(main_url, function(err, resp, body){
$ = cheerio.load(body);
links = $('a'); //get all hyperlinks from main URL
var urls = [];
//With this part I get the links (URLs) that I want to scrape.
$(links).each(function(i, link){
lnk = 'http://www.someurl.com/files/' + $(link).attr('href');
urls.push(lnk);
});
//In this part I don't know how to make a loop to load each url within urls array and get first <p>
for (i = 0; i < urls.length; i++) {
var p = $("p:first") //first <p> element
console.log(p.html());
}
});
if you can successfully get the URLs from the first <p>, you already know everything to do that so I suppose you have issues with the way request is working and in particular with the callback based workflow.
My suggestion is to drop request since it's deprecated. You can use something like got which is Promise based so you can use the newer async/await features coming with it (which usually means easier workflow) (Though, you need to use at least nodejs 8 then!).
Your loop would look like this:
for (const i = 0; i < urls.length; i++) {
const source = await got(urls[i]);
// Do your cheerio determination
console.log(new_p.html());
}
Mind you, that your function signature needs to be adjusted. In your case you didn't specify a function at all so the module's function signature is used which means you can't use await. So write a function for that:
async function pullAllUrls() {
const mainSource = await got(main_url);
...
}
If you don't want to use async/await you could work with some promise reductions but that's rather cumbersome in my opinion. Then rather go back to promises and use a workflow library like async to help you manage the URL fetching.
A real example with async/await:
In a real life example, I'd create a function to fetch the source of the page I'd like to fetch, like so (don't forget to add got to your script/package.json):
async function getSourceFromUrl(thatUrl) {
const response = await got(thatUrl);
return response.body;
}
Then you have a workflow logic to get all those links in the other page. I implemented it like this:
async function grabLinksFromUrl(thatUrl) {
const mainSource = await getSourceFromUrl(thatUrl);
const $ = cheerio.load(mainSource);
const hrefs = [];
$('ul.menu__main-list').each((i, content) => {
$('li a', content).each((idx, inner) => {
const wantedUrl = $(inner).attr('href');
hrefs.push(wantedUrl);
});
}).get();
return hrefs;
}
I decided that I'd like to get the links in the <nav> element which are usually wrapped inside <ul> and elements of <li>. So we just take those.
Then you need a workflow to work with those links. This is where the for loop is. I decided that I wanted the title of each page.
async function mainFlow() {
const urls = await grabLinksFromUrl('https://netzpolitik.org/');
for (const url of urls) {
const source = await getSourceFromUrl(url);
const $ = cheerio.load(source);
// Netpolitik has two <title> in their <head>
const title = $('head > title').first().text();
console.log(`${title} (${url}) has source of ${source.length} size`);
// TODO: More work in here
}
}
And finally, you need to call that workflow function:
return mainFlow();
The result you see on your screen should look like this:
Dossiers & Recherchen (https://netzpolitik.org/dossiers-recherchen/) has source of 413853 size
Der Netzpolitik-Podcast (https://netzpolitik.org/podcast/) has source of 333354 size
14 Tage (https://netzpolitik.org/14-tage/) has source of 402312 size
Official Netzpolitik Shop (https://netzpolitik.merchcowboy.com/) has source of 47825 size
Über uns (https://netzpolitik.org/ueber-uns/#transparenz) has source of 308068 size
Über uns (https://netzpolitik.org/ueber-uns) has source of 308068 size
netzpolitik.org-Newsletter (https://netzpolitik.org/newsletter) has source of 291133 size
netzwerk (https://netzpolitik.org/netzwerk/?via=nav) has source of 299694 size
Spenden für netzpolitik.org (https://netzpolitik.org/spenden/?via=nav) has source of 296190 size
I'm new to Nightmare/PhantomJS and am struggling to get a simple inventory of all the tags on a given page. I'm running on Ubuntu 14.04 after building PhantomJS from source and installing NodeJS, Nightmare and so forth manually, and other functions seem to be working as I expect.
Here's the code I'm using:
var Nightmare = require('nightmare');
new Nightmare()
.goto("http://www.google.com")
.wait()
.evaluate(function ()
{
var a = document.getElementsByTagName("*");
return(a);
},
function(i)
{
for (var index = 0; index < i.length; index++)
if (i[index])
console.log("Element " + index + ": " + i[index].nodeName);
})
.run(function(err, nightmare)
{
if (err)
console.log(err);
});
When I run this inside a "real" browser, I get a list of all the tag types on the page (HTML, HEAD, BODY, ...). When I run this using node GetTags.js, I just get a single line of output:
Element 0: HTML
I'm sure it's a newbie problem, but what am I doing wrong here?
PhantomJS has two contexts. The page context which provides access to the DOM can only be accessed through evaluate(). So, variables must be explicitly passed in and out of the page context. But there is a limitation (docs):
Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.
Closures, functions, DOM nodes, etc. will not work!
Nightmare's evaluate() function is only a wrapper around the PhantomJS function of the same name. This means that you will need to work with the elements in the page context and only pass a representation to the outside. For example:
.evaluate(function ()
{
var a = document.getElementsByTagName("div");
return a.length;
},
function(i)
{
console.log(i + " divs available");
})
I need to use bluebird in my code and I have no idea how to use it. My code contains nested loops. When the user logs in, my code will run. It will begin to look for any files under the user, and if there are files then, it will loop through to get the name of the files, since the name is stored in a dictionary. Once it got the name, it will store the name in an array. Once all the names are stored, it will be passed along in res.render().
Here is my code:
router.post('/login', function(req, res){
var username = req.body.username;
var password = req.body.password;
Parse.User.logIn(username, password, {
success: function(user){
var Files = Parse.Object.extend("File");
var object = [];
var query = new Parse.Query(Files);
query.equalTo("user", Parse.User.current());
var temp;
query.find({
success:function(results){
for(var i=0; i< results.length; i++){
var file = results[i].toJSON();
for(var k in file){
if (k ==="javaFile"){
for(var t in file[k]){
if (t === "name"){
temp = file[k][t];
var getname = temp.split("-").pop();
object[i] = getname;
}
}
}
}
}
}
});
console.log(object);
res.render('filename', {title: 'File Name', FIles: object});
console.log(object);
},
error: function(user, error) {
console.log("Invalid username/password");
res.render('logins');
}
})
});
EDIT:The code doesn't work, because on the first and second console.log(object), I get an empty array. I am suppose to get one item in that array, because I have one file saved
JavaScript code is all parsed from top to bottom, but it doesn't necessarily execute in that order with asynchronous code. The problem is that you have the log statements inside of the success callback of your login function, but it's NOT inside of the query's success callback.
You have a few options:
Move the console.log statements inside of the inner success callback so that while they may be parsed at load time, they do not execute until both callbacks have been invoked.
Promisify functions that traditionally rely on and invoke callback functions, and hang then handlers off of the returned value to chain the promises together.
The first option is not using promises at all, but relying solely on callbacks. To flatten your code you will want to promisify the functions and then chain them.
I'm not familiar with the syntax you're using there with the success and error callbacks, nor am I familiar with Parse. Typically you would do something like:
query.find(someArgsHere, function(success, err) {
});
But then you would have to nest another callback inside of that, and another callback inside of that. To "flatten" the pyramid, we make the function return a promise instead, and then we can chain the promises. Assuming that Parse.User.logIn is a callback-style function (as is Parse.Query.find), you might do something like:
var Promise = require('bluebird');
var login = Promise.promisify(Parse.User.logIn);
var find = Promise.promisify(Parse.Query.find);
var outerOutput = [];
return login(yourArgsHere)
.then(function(user) {
return find(user.someValue);
})
.then(function(results) {
var innerOutput = [];
// do something with innerOutput or outerOutput and render it
});
This should look familiar to synchronous code that you might be used to, except instead of saving the returned value into a variable and then passing that variable to your next function call, you use "then" handlers to chain the promises together. You could either create the entire output variable inside of the second then handler, or you can declare the variable output prior to even starting this promise chain, and then it will be in scope for all of those functions. I have shown you both options above, but obviously you don't need to define both of those variables and assign them values. Just pick the option that suits your needs.
You can also use Bluebird's promisifyAll() function to wrap an entire library with equivalent promise-returning functions. They will all have the same name of the functions in the library suffixed with Async. So assuming the Parse library contains callback-style functions named someFunctionName() and someOtherFunc() you could do this:
var Parse = Promise.promisifyAll(require("Parse"));
var promiseyFunction = function() {
return Parse.someFunctionNameAsync()
.then(function(result) {
return Parse.someOtherFuncAsync(result.someProperty);
})
.then(function(otherFuncResult) {
var something;
// do stuff to assign a value to something
return something;
});
}
I have a few pointers. ... Btw tho, are you trying to use Parse's Promises?
You can get rid of those inner nested loops and a few other changes:
Use some syntax like this to be more elegant:
/// You could use a map function like this to get the files into an array of just thier names
var fileNames = matchedFiles.map(function _getJavaFile(item) {
return item && item.javaFile && item.javaFile.name // NOT NULL
&& item.javaFile.name.split('-')[0]; // RETURN first part of name
});
// Example to filter/retrieve only valid file objs (with dashes in name)
var matchedFiles = results.filter(function _hasJavaFile(item) {
return item && item.javaFile && item.javaFile.name // NOT NULL
&& item.javaFile.name.indexOf('-') > -1; // and has a dash
});
And here is an example on using Parse's native promises (add code above to line 4/5 below, note the 'then()' function, that's effectively now your 'callback' handler):
var GameScore = Parse.Object.extend("GameScore");
var query = new Parse.Query(GameScore);
query.select("score", "playerName");
query.find().then(function(results) {
// each of results will only have the selected fields available.
});
I'm trying to write a bookmarklet which adds a JSONP call into a page like this:
javascript:(function(){
var myFunction = (window.function(data){alert('my function is firing with arg' + data)});
var j = 'http://localhost/jsonscript.js';
var s = document.createElement('script');
s.src = j;
document.getElementsByTagName('head')[0].appendChild(s);
})();
where the script src appended into the page contains
myFunction('foo');
But there's an error when I click the bookmarklet -- myFunction is not defined. How do I "export" that function outside the scope of my bookmarklet so that when called from the appended script tag it works?
Edit: I figured out that I can just pack the script element's innerHTML with raw JavaScript. This works but it's ugly. I would still like to figure out a better way.
Define the function on the window object:
window.myFunction = ...
With JSONP requests, you'll usually want some type of counter to increment, ie:
var counter = 1;
var myFuncName = "myFunction" + counter;
var j = 'http://localhost/jsonscript.js?callback=' + myFuncName;
window[myFuncName] = function (data) {...};
// remove function after timeout expires
setTimeout(function () { delete window[myFuncName] }, 5000);
I want to "re-link" everything in a specific page through a XMLHTTPRequest to a local network domain. That would lead me to GM_xmlhttpRequest in GreaseMonkey/NinjaKit except that I want to run it when the link is clicked, not when the userscript actually runs...
So I have something like:
links = document.getElementsByTagName('a');
for (i = 0; i < links.length; i++) {
oldhref = links[i].getAttribute('href');
links[i].setAttribute('href', 'javascript:loadLink(' + oldhref + ')');
}
I understand I can either use unsafeWindow or add a script element to document to inject loadLink function.
But how can I use GM_xmlhttpRequest in loadLink?
I've looked at 0.7.20080121.0 Compatibility page but I'm not sure if that is for what I need...
I've also considered adding an iframe to the page and the modified links would load inside the iframe (triggering the userscript again), but I'd prefer a cleaner solution...
You almost never need to use GM functions inside the page context, and from the code posted so far, you don't need unsafeWindow in this case either.
Also, it is not necessary to rewrite the href for what is posted so far.
Something like this will accomplish what you want:
var links = document.getElementsByTagName ('a');
for (var J = 0, len = links.length; J < len; ++J) {
links[J].addEventListener ("click", myLoadLink, false);
}
function myLoadLink (zEvent) {
zEvent.preventDefault();
zEvent.stopPropagation();
var targetHref = zEvent.currentTarget.getAttribute ('href');
GM_xmlhttpRequest ( {
//wtv
} );
return false;
}
Or with jQuery:
$("a").click (myLoadLink);
function myLoadLink () {
var targetHref = $(this).attr ('href');
GM_xmlhttpRequest ( {
//wtv
} );
return false;
}
Ok, so I managed to get that GreaseMonkey official workaround working (dunno what I did wrong the first time) with:
unsafeWindow.loadLink = function(href) {
setTimeout(function(){
GM_xmlhttpRequest({
//wtv
});
},0);
}
But I'd still prefer a solution without using unsafeWindow if there is one... (especially since this one feels so wrong...)