I'm apparently a little newer to Javascript than I'd care to admit. I'm trying to pull a webpage using Node.js and save the contents as a variable, so I can parse it however I feel like.
In Python, I would do this:
from bs4 import BeautifulSoup # for parsing
import urllib
text = urllib.urlopen("http://www.myawesomepage.com/").read()
parse_my_awesome_html(text)
How would I do this in Node?
I've gotten as far as:
var request = require("request");
request("http://www.myawesomepage.com/", function (error, response, body) {
/*
Something here that lets me access the text
outside of the closure
This doesn't work:
this.text = body;
*/
})
var request = require("request");
var parseMyAwesomeHtml = function(html) {
//Have at it
};
request("http://www.myawesomepage.com/", function (error, response, body) {
if (!error) {
parseMyAwesomeHtml(body);
} else {
console.log(error);
}
});
Edit: As Kishore noted, there are nice options for parsing available. Also see cheerio if you have python/gyp issues with jsdom on windows. Cheerio on github
That request() call is asynchronous, so the response is only available inside the callback. You have to call your parse function from it:
function parse_my_awesome_html(text){
...
}
request("http://www.myawesomepage.com/", function (error, response, body) {
parse_my_awesome_html(body)
})
Get used to chaining callbacks, that's essentially how any I/O will happen in javascript :)
JsDom is pretty good to achieve things like this if you want to parse the response.
var request = require('request'),
jsdom = require('jsdom');
request({ uri:'http://www.myawesomepage.com/' }, function (error, response, body) {
if (error && response.statusCode !== 200) {
console.log('Error when contacting myawesomepage.com')
}
jsdom.env({
html: body,
scripts: [
'http://code.jquery.com/jquery-1.5.min.js'
]
}, function (err, window) {
var $ = window.jQuery;
// jQuery is now loaded on the jsdom window created from 'agent.body'
console.log($('body').html());
});
});
also if your page has lot of javascript/ajax content being loaded you might want to consider using phantomjs
Source http://blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs/
Related
I have a simple node Express app that has a service that makesa call to a node server. The node server makes a call to an AWS web service. The AWS simply lists any S3 buckets it's found and is an asynchronous call. The problem is I don't seem to be able to get the server code to "wait" for the AWS call to return with the JSON data and the function returns undefined.
I've read many, many articles on the web about this including promises, wait-for's etc. but I think I'm not understanding the way these work fully!
This is my first exposer to node and I would be grateful if somebody could point me in the right direction?
Here's some snippets of my code...apologies if it's a bit rough but I've chopped and changed things many times over!
Node Express;
var Httpreq = new XMLHttpRequest(); // a new request
Httpreq.open("GET","http://localhost:3000/listbuckets",false);
Httpreq.send(null);
console.log(Httpreq.responseText);
return Httpreq.responseText;
Node Server
app.get('/listbuckets', function (req, res) {
var bucketData = MyFunction(res,req);
console.log("bucketData: " + bucketData);
});
function MyFunction(res, req) {
var mydata;
var params = {};
res.send('Here are some more buckets!');
var request = s3.listBuckets();
// register a callback event handler
request.on('success', function(response) {
// log the successful data response
console.log(response.data);
mydata = response.data;
});
// send the request
request.
on('success', function(response) {
console.log("Success!");
}).
on('error', function(response) {
console.log("Error!");
}).
on('complete', function() {
console.log("Always!");
}).
send();
return mydata;
}
Use the latest Fetch API (https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) to make HTTP calls. It has built-in support with Promise.
fetch('http://localhost:3000/listbuckets').then(response => {
// do something with the response here
}).catch(error => {
// Error :(
})
I eventually got this working with;
const request = require('request');
request(url, function (error, response, body) {
if (!error && response.statusCode == 200) {
parseString(body, function (err, result) {
console.log(JSON.stringify(result));
});
// from within the callback, write data to response, essentially returning it.
res.send(body);
}
else {
// console.log(JSON.stringify(response));
}
})
I'm getting data through a single request. But here I am trying to send multiple HTTP requests. Here I just struck unable to get data and how to pass data in view page i.e, in EJS
router.get('/specials',function(req,res,next){
var callbackThree = function(error, resp, body) {
var data = JSON.parse(body);
res.render("specials",{ data: data});
}
var callbackTwo = function(error, resp, body) {
request("https://siteblabla.com/wsmenu/sub_menu_list/789/", callBackThree);
}
var callbackOne = function(error, resp, body) {
request("https://siteblabla.com/wsspecials/specials_list/123/", callBackTwo);
}
// request("api.com/users", callBackOne);
});
You need to use Promises and there is a great npm package called ejs-promise which you can make use of in your case.
You can download it at the below URL,
https://www.npmjs.com/package/ejs-promise
Hope this helps!
I was calling this function from javascript file. which was working perfect but now i want to call same function using Node js. please give me any alternate method. this function is use to insert data onclick event before.
function signup_validations_google(name_g,email_g,pass_g)
{
xmlhttp = new XMLHttpRequest();
xmlhttp.open("GET","http://localhost:8000/uri?name="+name+"&email="+email+"&pass="+pass, true);
xmlhttp.onreadystatechange=function(){
if (xmlhttp.readyState==4 && xmlhttp.status==200){
string=xmlhttp.responseText;
alert("Registration successful");
}
}
xmlhttp.send();
}
In node.js, you would generally use http.get() from the http module or request() from the request module. I find request() is a bit easier to use:
const request = require('request');
let query = {name, email, pass};
request({uri: "http://localhost:8000/uri", query: query}, function(err, msg, body) {
if (err) {
// error here
} else {
// response here
}
});
There are a zillion other possible options for the request() module described here.
I know the way to make a GET request to a URL using the request module. Eventually, the code just prints the GET response within the command shell from where it has been spawned.
How do I store these GET response in a local variable so that I can use it else where in the program?
This is the code i use:
var request = require("request");
request("http://www.stackoverflow.com", function(error, response, body) {
console.log(body);
});
The easiest way (but it has pitfalls--see below) is to move body into the scope of the module.
var request = require("request");
var body;
request("http://www.stackoverflow.com", function(error, response, data) {
body = data;
});
However, this may encourage errors. For example, you might be inclined to put console.log(body) right after the call to request().
var request = require("request");
var body;
request("http://www.stackoverflow.com", function(error, response, data) {
body = data;
});
console.log(body); // THIS WILL NOT WORK!
This will not work because request() is asynchronous, so it returns control before body is set in the callback.
You might be better served by creating body as an event emitter and subscribing to events.
var request = require("request");
var EventEmitter = require("events").EventEmitter;
var body = new EventEmitter();
request("http://www.stackoverflow.com", function(error, response, data) {
body.data = data;
body.emit('update');
});
body.on('update', function () {
console.log(body.data); // HOORAY! THIS WORKS!
});
Another option is to switch to using promises.
CouchDB has authentication built in through it's _session API (http://wiki.apache.org/couchdb/Session_API) but I'm having trouble passing the cookie it dishes to the client through node.js
Here's my code (I'm using express.js login is the route):
exports.login = function(req, res){
var request = require('request');
var userData = {
"name":req.body.email,
"password": req.body.password
};
request.post({
'url': 'http://wamoyo.iriscouch.com/_session',
'json': userData,
}, function (error, response, body) {
if (!error && response.statusCode == 200) {
console.log(body);
}
request('http://wamoyo.iriscouch.com/_session', function (error, response, body) {
console.log(response.request.req._headers);
// Write the Cookie
res.setHeader("Set-Cookie", response.request.req._headers.cookie);
return res.render('index', { title: 'BasicChat' })
});
});
};
That doesn't seem to work, and I don't know a good way to debug this either. Any help would rock!
My fallback is the use the Jquery Couch plugin like this: (https://wamoyo.iriscouch.com/loginer/_design/Loginer/attachments%2findex.html) But there's a school of reasons why this would suck.
Also, I realize this code isn't efficient yet, I'm just trying to get it to work for now. Sorry if it's messy.
Seems like I found you again dear internet friend :)
nano implements this, you can use it if you like:
https://github.com/dscape/nano#using-cookie-authentication
Here is a blog post about it:
http://mahoney.eu/2012/05/23/couchdb-cookie-authentication-nodejs-nano/#.T-JYHCtYsm8
And here are is a sample (actual test code from nano)
https://github.com/dscape/nano/blob/master/tests/shared/cookie.js#L15-#L51