I'm trying to scrape a webpage with nightmareJS and got stuck.
In my program i pass to the function an array on links which i need to the same data from all of them
The list can be very long (over 60) and if i try to do a
async.each(Links, function (url, callback) {
var nightmare = Nightmare(size);
...
}
Only the first couple few instances actually return a value , others just hang up and wont load (blank page).When i try to do only three it work perfectly.
How can i fix it? How can i redistribute the work , for example three in parallel and only when all done it will do the next set? One more thought maybe use the same instance and repeat the steps for all the links?
There are two possible solutions:
using eachSeries which waits until one operation is done before launching the other one.
Or in async.eachpass another argument which limits how many operation are running in the same time.
Related
I am a bit new to JavaScript web dev, and so am still getting my head around the flow of asynchronous functions, which can be a bit unexpected to the uninitiated. In my particular use case, I want execute a routine on the list of available databases before moving into the main code. Specifically, in order to ensure that a test environment is always properly initialized, I am dropping a database if it already exists, and then building it from configuration files.
The basic flow I have looks like this:
let dbAdmin = client.db("admin").admin();
dbAdmin.listDatabases(function(err, dbs){/*Loop through DBs and drop relevant one if present.*/});
return await buildRelevantDB();
By peppering some console.log() items throughout, I have determined that the listDatabases() call basically puts the callback into a queue of sorts. I actually enter buildRelevantDB() before entering the callback passed to listDatabases. In this particular example, it seems to work anyway, I think because the call that reads the configuration file is also asynchronous and so puts items into the same queue but later, but I find this to be brittle and sloppy. There must be some way to ensure that the listDatabases portion resolves before moving forward.
The closest solution I found is here, but I still don't know how to get the callback I pass to listDatabases to be like a then as in that solution.
Mixing callbacks and promises is a bit more advanced technique, so if you are new to javascript try to avoid it. In fact, try to avoid it even if you already learned everything and became a js ninja.
Dcumentation for listDatabases says it is async, so you can just await it without messing up with callbacks:
const dbs = await dbAdmin.listDatabases();
/*Loop through DBs and drop relevant one if present.*/
The next thing, there is no need to await before return. If you can await within a function, it is async and returns a promise anyway, so just return the promise from buildRelevantDB:
return buildRelevantDB();
Finally, you can drop database directly. No need to iterate over all databases to pick one you want to drop:
await client.db(<db name to drop>).dropDatabase();
I've implemented a web scraper with Nodejs, cheerio and request-promise that scrapes an endpoint (basic html page) and return certain information. The content of the page I'm crawling differs based on a parameter at the end of the url (http://some-url.com?value=12345 where 12345 is my dynamic value).
I need this crawler to work every x minutes and crawl multiple pages, and to do that I've set a cronjob using Google Cloud Scheduler. (I'm fetching the dynamic values I need from Firebase).
There could be more than 50 different values for which I'd need to crawl the specific page, but I would like to ease the load with which I'm sending the requests so the server doesn't choke. To accomplish this, I've tried to add a delay
1) using setTimeout
2) using setInterval
3) using a custom sleep implementation:
const sleep = require('util').promisify(setTimeout);
All 3 of these methods work locally; all of the requests are made with y seconds delay as intended.
But when tried with Firebase Cloud Functions and Google Cloud Scheduler
1) not all of the requests are sent
2) the delay is NOT consistent (some requests fire with the proper delay, then there are no requests made for a while and other requests are sent with a major delay)
I've tried many things but I wasn't able to solve this problem.
I was wondering if anyone could suggest a different theoretical approach or a certain library etc. I can take for this scenario, since the one I have now doesn't seem to work as I intended. I'm adding one of the approaches that locally work below.
Cheers!
courseDataRefArray.forEach(async (dataRefObject: CourseDataRef, index: number) => {
console.log(`Foreach index = ${index} -- Hello StackOverflow`);
setTimeout(async () => {
console.log(`Index in setTimeout = ${index} -- Hello StackOverflow`);
await CourseUtil.initiateJobForCourse(dataRefObject.ref, dataRefObject.data);
}, 2000 * index);
});
(Note: I can provide more code samples if necessary; but it's mostly following a loop & async/await & setTimeout pattern, and since it works locally I'm assuming that's not the main problem.)
I'm trying to create selenium tests that run each step synchronously, without using .then(), or async/await. The reason for this is that I want to create a set of functions that allow pretty much anyone on our test team, almost regardless of tech skills to write easy to read automated tests. It looks to me like webdriver-sync should give me exactly what I want. However, the following dummy code is producing problems:
var wd = require('webdriver-sync');
var By = wd.By;
var Chromedriver = wd.Chromedriver;
var driver = new Chromedriver;
driver.get('https://my.test.url');
var myButton = driver.findElement(By.cssSelector('[id*=CLICK_ME]'));
myButton.click();
It tries to run - browser is launched, and page starts to load... but the steps are not executed synchronously - it goes on and tries to find and click "myButton" before the page has finished loading, throwing a "no such element" error... which to me kinda defeats the point of webdriver-sync?! Can someone tell me where I am going wrong?
FWIW, I have webdriver-sync 1.0.0, node v7.10.0, java 1.8.0_74, all running on CentOS 7.
Thanks in advance!
You need to put double-quotes around "CLICK_ME" as it's a string value.
Generally, though, it's a good idea to Wait for specific elements because dynamic pages are often "ready" before all their elements have been created.
I'm writing a script to batch process some text documents and insert them into a mysql database. I'm trying to use the async library because using a standard while loop blocks the event queue and prevents the insert queries from getting run until all are generated. Since that may take 10 minutes or more, I get a timeout. So, I am trying to use async to avoid blocking the main thread. However, it's not working as expected. When I run the simplest form of the code below, using node test.js, in the command line, it only executes once, instead of infinitely. It seems like the computer is terminating the node process early since it is non-blocking. This, of course, is not what I want. Why is this, and how can I get it to work correctly?
//this code should run forever, constantly printing "working". However it only runs once.
var async = require('async')
async.whilst(function(){return true},function(){console.log("working")})
The second parameter for whilst() is a function that takes in a callback that needs to be called when the current iteration is "done."
So if you modify the code this way, you'll get what you're expecting:
var async = require('async');
async.whilst(function() {
return true
}, function(cb) {
console.log("working");
cb();
});
I have a question some general because I have problems with. I'll take an example to show you, I have an application with a loop to connect two accounts.
for each{
Login informations
Make connect
}
But in this situation, the first loop are going to make the connect and going immediately to the second loop with new login informations. So the second account is the only one connected.
Edit : http://pastebin.com/zuWSzxBX
Thanks per advance!
PokeRwOw
You use 'expired' i in your asynchronous callbacks.
It's often error.
Write a function to process each row and call it in each loop iteration:
function processRow(row){
// process row
}
for(var i in rows) processRow(rows[i]);