Manipulating the DOM using cheerio - node.js

I understand that the major application for cheerio is web scraping. Is there any way to manipulate and update the html using cheerio commands?
request('http://localhost:3000', function (error, response, html) {
if (!error && response.statusCode == 200) {
$ = cheerio.load(html);
}
$('ul').append('<li class="plum">Plum</li>');
$.html();
});
While the above code does not exactly affect the html, is there any way the changes made in the DOM such as using $('ul').append('<li class="plum">Plum</li>') be reflected on the HTML?

In the snippet you provided the required code is already present. It is $.html(). The result of this statement is exactly what you need. But if you want that result to be saved on the requested server than this is another story and there will be questions:
Have you an access to the server contents?
How server forms the request: from static files or dynamically?

Related

prevent load image on nodejs request

I am using request and cheerio to pars some web pages in nodejs. We do this every day more than 20 times so we lost many bandwidth for loading images and css content that is not useful for parsing.
I used some code like this:
request(url, function (error, response, html) {
if (!error && response.statusCode == 200) {
var $ = cheerio.load(html);
$('.n-item').each(function(i, element){
//do something
});
}
});
1- I want to know this is correct that request loads images/content and may lost my server bandwidth ?
2- Show me a solution to prevent load images/content
thanks
Request itself doesn't parse HTML code or run Javascript. It will only download the source or the URL that you enter. If it's a normal website, it literally returns the HTML source.
The only time you can pull images with "request" is if you use a URL that directly links to an image. E.g http://example.com/image.jpg

understanding and using API's

I'm working on a react-express project.
On the back end I made a small API that stream some information on my /API/ routes. Just a JSON object.
The thing is, I do not know how am I supposed to put that information on my front end and use it.
I'm using the project as a learning exercise. I have never use an API before.
My main problem (I think) is that English is not my first language. So when I try to google this issue, I get all kinds of results because I'm probably not using the right words.
Any help would be appreciated!
You typically pull the data using a JSON HTTP request. Let's say you have a route /API/myData that returns a JSON formatted response. Your server code should like :
app.get('/API/myData', function(request, response) {
response.json(myData);
});
On your react app you can pull this data with any request library. For example with request:
var request = require('request');
request('localhost/API/mydata', function (error, response, body) {
if (!error && response.statusCode == 200) {
var result = JSON.parse(body); // here is your JSON data
}
});
It's just a starting point. You should have a look at express examples, request examples and other similar libraries to get familiar with it.
I'm using window.fetch here because it's the easiest thing to start with (even though it's not supported in all browsers yet). You could also use jQuery's ajax function or any number of things.
fetch('https://httpbin.org/ip')
.then(data => data.json())
.then(json => document.getElementById('your-ip').innerHTML = json.origin)
Your IP is: <div id="your-ip"></div>

How to call Google APIs using SailsJS

I would like calculator distance between 2 coordinates using GMap API.
I'm looking for anyway to catch return data from URL
https://maps.googleapis.com/maps/api/distancematrix/json?origins=Seattle&destinations=San+Francisco&key={{myKey}}
I tried to searching but no any thing I purpose.
Please help me or give me keywords. Thanks a lot!
You can use the super awesome request package
From it's documentation:
Request is designed to be the simplest way possible to make http calls. It supports HTTPS and follows redirects by default.
var request = require('request');
request('http://www.google.com', function (error, response, body) {
if (!error && response.statusCode == 200) {
console.log(body) // Show the HTML for the Google homepage.
}
})
Hope that helps!
Google's own API documentation should contain everything you need:
https://developers.google.com/maps/documentation/javascript/distancematrix
First geocode your origin and destination from cities (or whatever) to LatLng objects:
Geocoding is the process of converting addresses (like "1600
Amphitheatre Parkway, Mountain View, CA") into geographic coordinates
(like latitude 37.423021 and longitude -122.083739), which you can use
to place markers on a map, or position the map.
https://developers.google.com/maps/documentation/geocoding/intro
Using the LatLng objects you get in response, call Distance Matrix Service.

NodeJS synchronous request from mongoose

I'm developing a website in NodeJS with Mongo. Part of the website has url localhost/api/ and returns some JSON, it works fine for fetching from clientside. Now I want to work with these data from server (to prerender it). Basically, I have a function which should return the result from the API part. It looks like this:
request('http://localhost:8000/api/', function (error, response, body) {
if (!error && response.statusCode == 200) {
return body // return the JSON array from API which works OK
}
})
Unfortunately, it doesn't work and only returns "500: TypeError: Cannot call method 'map' of undefined at App". The code simply doesn't have the value at the moment it renders, so I would ideally like to make the function somehow synchronous as I'm used to from other languages.
If I return the JSON array I need directly from the function (without asking for it from request module), it works. Therefore, I know that the problem comes from my wrong usage of asynschronous programming. What would you recommend as a solution? (I could also ask the mongo directly, not via request, but that's not the problem now - I tried and it was the same).
Just in case someone was interested.. I have solved this problem quickly with Promises. The function which waits for the result of request expects a Promise, which is made in the request call.

Scraping Google Translate

I would like to scraping Google Translate with NodeJS and cheerio library:
request("http://translate.google.de/#de/en/hallo%20welt", function(err, resp, body) {
if(err) throw err;
$ = cheerio.load(body);
console.log($('#result_box').find('span').length);
}
But he can't find the necessary span-elements from translation box (result_box). In source code of the website it looks like this:
<span id="result_box">
<span class="hps">hello</span>
<span class="hps">world</span>
</span>
So I think I could wait 5-10 seconds til Google has created all span-elements, but no.. seems to be that isn't..
setTimeout(function() {
$ = cheerio.load(body);
console.log($('#result_box').find('span').length);
}, 15000);
Could you help me, please? :)
Solution:
Instead of cheerio I use http.get:
http.get(
this.prepareURL("http://translate.google.de/translate_a/t?client=t&sl=de&tl=en&hl=de&ie=UTF-8&oe=UTF-8&oc=2&otf=1&ssel=5&tsel=5&pc=1&q=Hallo",
function(result) {
result.setEncoding('utf8');
result.on("data", function(chunk) {
console.log(chunk);
});
}));
So I get a result string with translation. The used url is the request to server.
I know you've already resolved this, but i think the reason why your code didn't work was because you should have written [...].find("span.hps").[...]
Or at least for me it worked always only with the class identifier, when present.
The reason that you can't use cheerio in node to scrape google translation that google is not rendering the translation page at google side!
They reply with a script to your request then the script make an api request that includes your string. Then the script at the user side run again and build the content you see and that's what not happen in cheerio!
So you need to do a request to the api but it's google and they can detect scraping so they will block you after a few attempts!
You still can fake a user behavior but it'll take long time and they may block you at any time!

Resources