Explicit wait for a selector isn't working? - node.js

I'm writing a code to log in Gmail. On the password page, instead of using implicit wait, I want to use explicit wait instead. However, it is not picking up my selector?
(async () => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto('https://accounts.google.com/');
await page.$('#identifierId');
await page.keyboard.type('Test1234');
await page.click('#identifierNext > content > span');
await page.waitForSelector('#password'); //this doesnt work
// await page.waitFor(5000); this works
await page.$('#password > div.aCsJod.oJeWuf > div > div.Xb9hP > input');
await page.keyboard.type('fakePassword');
await page.click('#passwordNext > content');
);
I'm getting the error:
(node:14428) UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement
at ElementHandle._clickablePoint (/Users/asd/Projects/FreeRazor/node_modules/puppeteer/lib/JSHandle.js:199:13)
at processTicksAndRejections (internal/process/next_tick.js:81:5)
-- ASYNC --
at ElementHandle. (/Users/asd/Projects/FreeRazor/node_modules/puppeteer/lib/helper.js:110:27)
at DOMWorld.click (/Users/asd/Projects/FreeRazor/node_modules/puppeteer/lib/DOMWorld.js:367:18)
at processTicksAndRejections (internal/process/next_tick.js:81:5)
-- ASYNC --
at Frame. (/Users/asd/Projects/FreeRazor/node_modules/puppeteer/lib/helper.js:110:27)
at Page.click (/Users/asd/Projects/FreeRazor/node_modules/puppeteer/lib/Page.js:988:29)
at /Users/asd/Projects/FreeRazor/app.js:19:16
at processTicksAndRejections (internal/process/next_tick.js:81:5)
(node:14428) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:14428) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

The page.waitForSelector statement is working. One of the page.click calls is the problem.
Relevant part of the error message:
(node:14428) UnhandledPromiseRejectionWarning: Error: Node is either not visible or not an HTMLElement
[...]
at Page.click (/Users/asd/Projects/FreeRazor/node_modules/puppeteer/lib/Page.js:988:29)
at /Users/asd/Projects/FreeRazor/app.js:19:16
So the error happens in line 19. I don't know for sure which line it is, but I'm assuming it is the latter page.click call as you are saying the code works if you wait longer (page.waitFor(5000)). So it seems it takes the page longer to display the #passwordNext > content DOM element than the #password element.
Solution
You can solve this problem by putting another waitForSelector before your click to make sure the element actually exists. I even added the option { visible: true } to make sure the DOM node is also visible:
await page.waitForSelector('#passwordNext > content', { visible: true });
await page.click('#passwordNext > content');

Related

Cant read property 'get' of undefined in nodejs

I am using Neo4j for the first time with nodejs and developing an application. I am getting the following error. I tried my best. Can you please help me with this error?
Error
(node:16148) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'get' of
undefined
at E:\fyps\app.js:120:33
at processTicksAndRejections (internal/process/task_queues.js:93:5)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:16148) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error
originated either by throwing inside of an async function without a catch block, or by
rejecting a promise which was not handled with .catch(). To terminate the node process
on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see
https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:16148) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated.
In the future, promise rejections that are not handled will terminate the Node.js
process with
a non-zero exit code.
code
app.get('/person/:id',function(req,res){
var id = req.params.id;
session
.run("MATCH (a:Person) WHERE id(a)=$idParam RETURN a.name as name", {idParam: id})
.then(function(result){
var name =result.records[0].get("name");
session
.run("OPTIONAL MATCH (a:Person)- [r:BORN_in]-(b:location) where id(a)=$idParam RETURN b.city as city, b.state as state" , {idParam:id})
.then(function(result2){
var city =result2.records[0].get("city");
var state =result2.records[0].get("state");
session
.run("OPTIONAL MATCH (a:Person)-[r:FRIENDS]-(b:Person) WHERE id(a)=$idParam RETURN b", {idParam:id})
.then(function(result3){
var friendsArr =[];
result3.records.forEach(function(record){
if(record._fields[0] != null){
friendsArr.push({
id: record._fields[0].identity.low,
name: record._fields[0].properties.name
});
}
})
res.render('person',{
id:id,
name:name,
city:city,
state:state,
friends:friendsArr
})
session.close();
})
.catch(function(error){
console.log(error);
})
can you please help me that how to solve this error.. Thanks
The error message clearly says that you are calling the get function on some entity that is undefined, and that has happened within a Promise. So, the error statement can be any one of these:
var name =result.records[0].get("name");
var city =result2.records[0].get("city");
var state =result2.records[0].get("state");
Add some relevant log statements to check which one is causing the issue.

Discord.js v12 server member count

(Welcome command using canvas)
How could I fetch the server member count as soon as someone joins??
Because I use that line of code
const guild = client.guilds.cache.get("843190900930510869");
let image = await welcomeCanvas
.setUsername(member.user.tag)
.setDiscriminator(member.user.discriminator)
.setMemberCount(guild.memberCount) //this line
etc...
And well, it just doesn't send the image..
Error:
(node:6387) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'memberCount' of undefined
at GuildMemberAddListener.exec (/app/listeners/guildMemberAdd.js:100:29)
(node:6387) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:6387) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
The event client.on('guildMemberAdd', () => {}) will return a GuildMember object. Simply use this GuildMember object to get the guild they entered GuildMember.guild then check if that guild is available to the client using guild.available. If it is available you can access all the properties on that guild including the guild.memberCount property.
client.on('guildMemberAdd', (member) => {
const guild = member.guild
if (!guild.available) return console.error('Uh Oh Stinky...')
const guildMemberCount = guild.memberCount
console.log(guildMemberCount)
})

Node logging unexpected UnhandledPromiseRejectionWarning

I have a piece of code that's causing Node to log UnhandledPromiseRejectionWarning. But I'm not sure why. Here's the code boiled down:
export class Hello {
async good(): Promise<string> {
let errorP = this.throwError();
let responseP = this.doSomething();
let [error, response] = await Promise.all([errorP, responseP]);
return response + '123';
}
async bad(): Promise<string> {
let errorP = this.throwError();
let responseP = this.doSomething();
let response = (await responseP) + '123';
let error = await errorP;
return response;
}
private async throwError(): Promise<string> {
await (new Promise(resolve => setTimeout(resolve, 1000)));
throw new Error('error');
}
private async doSomething(): Promise<string> {
await (new Promise(resolve => setTimeout(resolve, 1000)));
return 'something';
}
}
Calling try { await hello.bad(); } catch (err) {} causes node to log UnhandledPromiseRejectionWarning
Calling try { await hello.good(); } catch (err) {} does NOT log the warning
Full error:
(node:25960) UnhandledPromiseRejectionWarning: Error: error
at Hello.<anonymous> (C:\hello-service.ts:19:11)
at Generator.next (<anonymous>)
at fulfilled (C:\hello-service.ts:5:58)
at runNextTicks (internal/process/task_queues.js:58:5)
at listOnTimeout (internal/timers.js:523:9)
at processTimers (internal/timers.js:497:7)
(node:25960) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch()
. To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:25960) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
{"level":30,"time":1639745087914,"pid":25960,"hostname":"AZWP10801-12","reqId":"req-1","dd":{"trace_id":"2081604231398834164","span_id":"2081604231398834164","service":"#amerisave/example-service","version":"0.0.0"},"res":{"statu
sCode":200},"responseTime":1025.4359999895096,"msg":"request completed"}
(node:25960) PromiseRejectionHandledWarning: Promise rejection was handled asynchronously (rejection id: 1)
Some dependency versions:
node ver. 14.16.1
ts-node-dev ver. 1.1.8
ts-node ver. 9.1.1
typescript ver. 4.5.2
Why is good good, but bad bad?
The problem in bad() is because the errorP promise rejects BEFORE you get to await errorP and thus it rejects when there is no reject handler for it in place. Nodejs detects that a promise rejected and your process gets back to the event loop and that rejected promise does not have a reject handler on it. That gets the "unhandled rejection" warning.
Notice here that while await errorP doesn't directly apply a reject handler, it does tie errorP to the parent async function which does have a reject handler on it, so the await errorP indirectly assigns reject handling to errorP. Whereas errorP by itself will just reject and not cause anything to happen to the parent async function. It will just be a variable containing a now rejected promise with no reject handler on it.
To take advantage of async automatic error propagation of rejected promises, you have to await that promise.
Nodejs doesn't know you're going to add the await in the future with code that will execute some time in the future, so it will report the unhandled rejection.
Code becomes subject to these types of errors where you put a promise into a variable and you have no reject handler on that promise of any kind and then you go await on some other promise BEFORE you ever put any sort of reject handler on that previous promise. The promise is just sitting there with no error handling on it. If, due to the timing of things, it happens to reject in that state, you will get the warning. The usual solutions are:
Immediately put error handling on the promise so it's never left sitting by itself.
Don't create the promise until you're ready to use it however you're going to use it with appropriate error handling (with a .then().catch() or in a Promise.all().catch() or in an await or whatever).
Don't await other promises while a promise is sitting in a variable without any reject handling.
I find that if I can avoid putting a promise with no handling on it into a variable at all and rather just create the promise right into the circumstance where it's going to be monitored for completion and error, you don't even have to generally think about this issue.
FYI, you can illustrate the same general concept of a promise rejecting before you add a reject handler in a simpler manner here if you run this in nodejs:
function bad(t) {
return new Promise((resolve, reject) => {
setTimeout(reject, t);
});
}
const b = bad(500);
// this timer will fire after bad() rejects
setTimeout(() => {
b.catch(err => {
console.log("caught b rejection");
})
}, 600);
You will get the "uncaught rejection" error because when the promise rejects, it does not yet have a .catch() handler. Your code has this same issue (though obscured a little more) because the reject handler comes from the await and the async function and the try/catch the caller of the async function is using.
Here's a hypothesis (that can be experimentally proven).
The difference in behavior between good and bad can be explained by the order of awaits.
In bad you're awaiting on throwError after you have awaited on doSomething, while in good, you're awaiting on Promise.all, which will not return until both are fullfilled or at least one is rejected (which will be the case here).
So in bad, the throwing is happening outside of await, and your catch is not triggered, and it is caught internally by node.
If you change your bad so that you await on throwError first, then your catch will get triggered:
async bad(): Promise<string> {
let errorP = this.throwError();
let responseP = this.doSomething();
let error = await errorP;
let response = (await responseP) + '123';
return response;
}

ECONNREFUSED error when loading a TensorFlow frozen model from node.js

I was trying to load a TensorFlow fronzen model from a url that points to not existing resource to test my code robustness. However, even though I have set a catch, I am not able to manage a ECONNREFUSED that is raised internally by the function tf.loadFrozenModel.
Is there any possible mitigation to this issue? This is for me a critical problem, since it stops the execution of nodejs.
Here is the code where the error is generated.
global.fetch = require("node-fetch");
const tf = require("#tensorflow/tfjs");
require("#tensorflow/tfjs-node");
class TFModel {
...
loadFzModel(modelUrl, modelWeigths) {
return tf.loadFrozenModel(modelUrl, modelWeigths)
.then((mod) => {
this.arch = mod;
})
.catch((err) => {
console.log("Error downloading the model!");
});
}
...
}
Here instead are the errors I am getting:
UnhandledPromiseRejectionWarning: Error: http://localhost:30000/webModel/tensorflowjs_model.pb not found. FetchError: request to http://localhost:30000/webModel/tensorflowjs_model.pb failed, reason: connect ECONNREFUSED 127.0.0.1:30000
at BrowserHTTPRequest.<anonymous> (.../node_modules/#tensorflow/tfjs-core/dist/io/browser_http.js:128:31)
at step (.../node_modules/#tensorflow/tfjs-core/dist/io/browser_http.js:32:23)
at Object.throw (.../node_modules/#tensorflow/tfjs-core/dist/io/browser_http.js:13:53)
at rejected (.../node_modules/#tensorflow/tfjs-core/dist/io/browser_http.js:5:65)
at process.internalTickCallback (internal/process/next_tick.js:77:7)
(node:23291) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:23291) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Note: the code works if modelUrl and modelWeights are valid url pointing to existing resources.
Node-2: the code is executed as part of a custom block for Node-Red.
If you don't find any other solution you can catch the error on the top level like this:
process.on('uncaughtException', function (err) {
console.error(err);
});
In there you can get more specific to only catch your specific error.
This is in the process of being addressed at https://github.com/tensorflow/tfjs-core/pull/1455.

How to crawl a whole website with Headless Chrome Crawler?

i've been studying chrome puppeteer to develop a crawler for learning purposes. So i discovered HeadLess Chrome Crawler, a good node package. However, i found some troubles tryng crawl a entire website using this awesome package. I not found in docs where i can do this. I want to get all links from a page and pass them into an array list to crawl them. This is my code now:
const HCCrawler = require('headless-chrome-crawler');
(async() => {
var urlsToVisit = [];
var visitedURLs =[];
var title;
const crawler = await HCCrawler.launch({
// Function to be evaluated in browsers
evaluatePage: (() => ({
title: $('title').text(),
link: $('a').attr('href'),
linkslen: $('a').length,
})),
// Function to be called with evaluated results from browsers
onSuccess: (result => {
console.log(result.links)
title = result.result.title;
result.result.link.map((link)=>{
urlsToVisit.push(result.result.link)
})
}),
});
await crawler.queue({
url: 'http://books.toscrape.com',
maxDepth :0
});
await crawler.queue({
url: [urlsToVisit],
maxDepth :0
});
await crawler.onIdle(); // Resolved when no queue is left
await crawler.close(); // Close the crawler
})();
So, what i should to do?
My logs:
(node:4909) UnhandledPromiseRejectionWarning: TypeError [ERR_INVALID_ARG_TYPE]: The "url" argument must be of type string. Received type object
at Url.parse (url.js:143:11)
at urlParse (url.js:137:13)
at Promise.all.map (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/lib/hccrawler.js:167:27)
at arrayMap (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/node_modules/lodash/_arrayMap.js:16:21)
at map (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/node_modules/lodash/map.js:50:10)
at HCCrawler.queue (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/lib/hccrawler.js:157:23)
at HCCrawler.<anonymous> (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/lib/helper.js:177:23)
at /home/ubuntu/workspace/crawlertop.js:30:17
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:118:7)
(node:4909) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3)
(node:4909) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
[ 'http://books.toscrape.com/index.html',
'http://books.toscrape.com/catalogue/category/books_1/index.html',
'http://books.toscrape.com/catalogue/category/books/travel_2/index.html',
'http://books.toscrape.com/catalogue/category/books/mystery_3/index.html',
'http://books.toscrape.com/catalogue/category/books/historical-fiction_4/index.html',
'http://books.toscrape.com/catalogue/category/books/sequential-art_5/index.html',
'http://books.toscrape.com/catalogue/category/books/classics_6/index.html',
'http://books.toscrape.com/catalogue/category/books/philosophy_7/index.html',
'http://books.toscrape.com/catalogue/category/books/romance_8/index.html',
'http://books.toscrape.com/catalogue/category/books/womens-fiction_9/index.html',
'http://books.toscrape.com/catalogue/category/books/fiction_10/index.html',
'http://books.toscrape.com/catalogue/category/books/childrens_11/index.html',
'http://books.toscrape.com/catalogue/category/books/religion_12/index.html',
'http://books.toscrape.com/catalogue/category/books/nonfiction_13/index.html',
'http://books.toscrape.com/catalogue/category/books/music_14/index.html',
'http://books.toscrape.com/catalogue/category/books/default_15/index.html',
'http://books.toscrape.com/catalogue/category/books/science-fiction_16/index.html',
'http://books.toscrape.com/catalogue/category/books/sports-and-games_17/index.html',
'http://books.toscrape.com/catalogue/category/books/add-a-comment_18/index.html',
'http://books.toscrape.com/catalogue/category/books/fantasy_19/index.html',
'http://books.toscrape.com/catalogue/category/books/new-adult_20/index.html',
'http://books.toscrape.com/catalogue/category/books/young-adult_21/index.html',
'http://books.toscrape.com/catalogue/category/books/science_22/index.html',
'http://books.toscrape.com/catalogue/category/books/poetry_23/index.html',
'http://books.toscrape.com/catalogue/category/books/paranormal_24/index.html',
'http://books.toscrape.com/catalogue/category/books/art_25/index.html',
'http://books.toscrape.com/catalogue/category/books/psychology_26/index.html',
'http://books.toscrape.com/catalogue/category/books/autobiography_27/index.html',
'http://books.toscrape.com/catalogue/category/books/parenting_28/index.html',
'http://books.toscrape.com/catalogue/category/books/adult-fiction_29/index.html',
'http://books.toscrape.com/catalogue/category/books/humor_30/index.html',
'http://books.toscrape.com/catalogue/category/books/horror_31/index.html',
'http://books.toscrape.com/catalogue/category/books/history_32/index.html',
'http://books.toscrape.com/catalogue/category/books/food-and-drink_33/index.html',
'http://books.toscrape.com/catalogue/category/books/christian-fiction_34/index.html',
'http://books.toscrape.com/catalogue/category/books/business_35/index.html',
'http://books.toscrape.com/catalogue/category/books/biography_36/index.html',
'http://books.toscrape.com/catalogue/category/books/thriller_37/index.html',
'http://books.toscrape.com/catalogue/category/books/contemporary_38/index.html',
'http://books.toscrape.com/catalogue/category/books/spirituality_39/index.html',
'http://books.toscrape.com/catalogue/category/books/academic_40/index.html',
'http://books.toscrape.com/catalogue/category/books/self-help_41/index.html',
'http://books.toscrape.com/catalogue/category/books/historical_42/index.html',
'http://books.toscrape.com/catalogue/category/books/christian_43/index.html',
'http://books.toscrape.com/catalogue/category/books/suspense_44/index.html',
'http://books.toscrape.com/catalogue/category/books/short-stories_45/index.html',
'http://books.toscrape.com/catalogue/category/books/novels_46/index.html',
'http://books.toscrape.com/catalogue/category/books/health_47/index.html',
'http://books.toscrape.com/catalogue/category/books/politics_48/index.html',
'http://books.toscrape.com/catalogue/category/books/cultural_49/index.html',
'http://books.toscrape.com/catalogue/category/books/erotica_50/index.html',
'http://books.toscrape.com/catalogue/category/books/crime_51/index.html',
'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
'http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html',
'http://books.toscrape.com/catalogue/soumission_998/index.html',
'http://books.toscrape.com/catalogue/sharp-objects_997/index.html',
'http://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html',
'http://books.toscrape.com/catalogue/the-requiem-red_995/index.html',
'http://books.toscrape.com/catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html',
'http://books.toscrape.com/catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html',
'http://books.toscrape.com/catalogue/the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html',
'http://books.toscrape.com/catalogue/the-black-maria_991/index.html',
'http://books.toscrape.com/catalogue/starving-hearts-triangular-trade-trilogy-1_990/index.html',
'http://books.toscrape.com/catalogue/shakespeares-sonnets_989/index.html',
'http://books.toscrape.com/catalogue/set-me-free_988/index.html',
'http://books.toscrape.com/catalogue/scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html',
'http://books.toscrape.com/catalogue/rip-it-up-and-start-again_986/index.html',
'http://books.toscrape.com/catalogue/our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html',
'http://books.toscrape.com/catalogue/olio_984/index.html',
'http://books.toscrape.com/catalogue/mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html',
'http://books.toscrape.com/catalogue/libertarianism-for-beginners_982/index.html',
'http://books.toscrape.com/catalogue/its-only-the-himalayas_981/index.html',
'http://books.toscrape.com/catalogue/page-2.html' ]
(node:4909) UnhandledPromiseRejectionWarning: Error: Protocol error: Connection closed. Most likely the page has been closed.
at assert (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/node_modules/puppeteer/lib/helper.js:251:11)
at Page.close (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/node_modules/puppeteer/lib/Page.js:883:5)
at Crawler.close (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/lib/crawler.js:80:22)
at Crawler.<anonymous> (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/lib/helper.js:177:23)
at HCCrawler._request (/home/ubuntu/workspace/node_modules/headless-chrome-crawler/lib/hccrawler.js:349:21)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:118:7)
(node:4909) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 9)
There are multiple problems with your code. I will go thru them one my one.
Problem: Wrong code on onSuccess
You mentioned result.result.link, however result has links, so the path should be result.links instead.
The map function does not use link, you are pushing same data over and over to the urlsToVisit
Problem: Wrong logic on continuous crawling
You have two part of scraping,
one is to go thru the target page and collect links,
another is to go thru the collected links.
You need to think them separately.
Moreover, Whenever you .queue, it calls immidietely, however your urlsToVisit is not complete yet. It probably doesn't have any data at all.
Solution
Recursively queue the links. Whenever it finishes crawling, it should queue new links back to the crawler.
Also let's make sure to catch the errors with onError.
Here is a working code,
(async () => {
var visitedURLs = [];
const crawler = await HCCrawler.launch({
// Function to be evaluated in browsers
evaluatePage: () => ({
title: $("title").text(),
link: $("a").attr("href"),
linkslen: $("a").length
}),
// Function to be called with evaluated results from browsers
onSuccess: async result => {
// save them as wish
visitedURLs.push(result.options.url);
// show some progress
console.log(visitedURLs.length, result.options.url);
// queue new links one by one asynchronously
for (const link of result.links) {
await crawler.queue({ url: link, maxDepth: 0 });
}
},
// catch all errors
onError: error => {
console.log(error);
}
});
await crawler.queue({ url: "http://books.toscrape.com", maxDepth: 0 });
await crawler.onIdle(); // Resolved when no queue is left
await crawler.close(); // Close the crawler
})();
Problem: This solution does not solve my problem
You will quickly realize you are not scraping the links that you were scraping it was crawling everything using it's own method.
That is why the package has a maxDepth option. So that it can go thru the whole website all by itself without the recursive function. Read their doc, try to understand it bit by bit.
Most importantly, You have to split your code into multiple parts and solve one problem at a time.
Feel free to explore other options on the document.
You are getting the error UnhandledPromiseRejectionWarning: TypeError [ERR_INVALID_ARG_TYPE]: The "url" argument must be of type string. Received type object
The error is stating that "url" is of type object and not a string. The issue lies here
await crawler.queue({
url: [urlsToVisit], // This is an array not a string
maxDepth :0
});
You will need a for loop to run over each URL in the array urlsToVisit like so
urlsToVisit.forEach(function(u) {
await crawler.queue({
url: u,
maxDepth :0
});
});
Also your log states UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 3). Use a try/catch block so this error does not pop up

Resources