In my code, the executing promise is 1 by 1. its not concurrency concept. how to convert to concurrency with limit 8 promise. so its like a create a pool with limit 8 promise. then, push and read promise output realtime.
example.
if in 1 second it can done 2 promise job, then it will output 2 promise. and fill again the pool with 2 new promise job.
i dont want it wait 8 promise job done in sametime, and start to fill pool with 8 job again.
import fetch from 'node-fetch';
import { load } from 'cheerio';
(async () => {
const response = await fetch('https://hentairead.com/page/1/');
const $ = load(await response.text());
const totalPage = $('span.pages')
.text()
.match(/\d{3,}/g)
.toString();
for(let page = 1; page <= totalPage; page++)
{
const response = await fetch(`https://hentairead.com/page/${page}/`);
const $ = load(await response.text());
$('div.item-thumb.c-image-hover > a').each((index, item) => {
console.log(item.attribs.title);
});
}
})();
import fetch from 'node-fetch';
import { load } from 'cheerio';
import {CPromise} from "c-promise2";
(async () => {
const response = await fetch('https://hentairead.com/page/1/');
const $ = load(await response.text());
const totalPage = $('span.pages')
.text()
.match(/\d{3,}/g)
.toString();
const fetchPage= async (page)=>{
const response = await fetch(`https://hentairead.com/page/${page}/`);
const $ = load(await response.text());
$('div.item-thumb.c-image-hover > a').each((index, item) => {
console.log(item.attribs.title);
});
}
await CPromise.all(function*(){
for(let page = 1; page <= totalPage; page++){
yield fetchPage(page);
}
}, {concurrency: 8});
})();
Related
This code is showing empty object ( {} )
// declared at top
let mainData = {};
let trainStations = {};
let routes = {};
let trainNo = {};
data["data"].forEach(async (element) => {
const response2 = await fetch(
`https://india-rail.herokuapp.com/trains/getRoute?trainNo=${element["train_base"]["train_no"]}`
);
const data2 = await response2.json();
data2["data"].forEach((ele) => {
routes[ele["source_stn_code"]] = true;
});
trainNo[element["train_base"]["train_no"]] = routes;
});
console.log(trainNo);
if i do this then i will give response with data
data["data"].forEach(async (element) => {
const response2 = await fetch(
`https://india-rail.herokuapp.com/trains/getRoute?trainNo=${element["train_base"]["train_no"]}`
);
const data2 = await response2.json();
data2["data"].forEach((ele) => {
routes[ele["source_stn_code"]] = true;
});
trainNo[element["train_base"]["train_no"]] = routes;
console.log(trainNo);
});
maybe there is some scooping issue please kindly help me to solve this problem :)
Please refer here and also check this.
As a short note, using await inside a forEach() loop will give unexpected results. This is because the forEach() does not wait until the promise to settled (either fulfilled or rejected).
A simple solution for this could be using either the traditional for loop or the for..of loop.
for(let element of data["data"]){
const response2 = await fetch(
`https://india-rail.herokuapp.com/trains/getRoute?trainNo=${element["train_base"]["train_no"]}`
);
const data2 = await response2.json();
data2["data"].forEach((ele) => {
routes[ele["source_stn_code"]] = true;
});
trainNo[element["train_base"]["train_no"]] = routes;
}
console.log(trainNo);
NOTE: Make sure to wrap the above for..of loop inside an async function because the await keyword is allowed inside a function only when the function is defined with async keyword.
I am trying to loop through an array, get the quantity and prices of each stock from the database, do some calculations and push them to an array
But after getting the quantity and prices from the database and push to the array
The array will still be empty
If I remove this line
const db_stock = await Stocks.findById(stock.stockId);
everything works fine
But if I add the await back the array becomes empty
import mongoose from "mongoose";
import { response } from "express";
import StockDispatches from "../models/StockDispatch.js"
import Stocks from "../models/Stocks.js"
import { createError } from "../error.js";
import validator from 'express-validator'
const { validationResult } = validator
import { generateRamdom } from "../utils/Utils.js"
export const createStockDispatched = async (req, res, next) => {
const error = validationResult(req).formatWith(({ msg }) => msg);
const trx_id = generateRamdom(30);
let quantity = 0;
let total = 0;
let dispatchedTotal = 0;
const hasError = !error.isEmpty();
if (hasError) {
res.status(422).json({ error: error.array() });
} else {
const options={ordered: true};
let user_stocks =[];
req.body.stocks.map(async (stock, index) => {
let total = stock.price * stock.quantity
const db_stock = await Stocks.findById(stock.stockId);
if(!db_stock) return res.status(404).json({msg: "Stock Not Found."})
if( stock.quantity > db_stock.quantity)
return res.status(208).json({msg: `Quantity of ${stock.name} is greater than what we have in database`})
quantity = db_stock.quantity - stock.quantity;
total = quantity * db_stock.price;
const updated_stock = await Stocks.findByIdAndUpdate(stock.id, {quantity, total},{$new: true})
dispatchedTotal = stock.quantity * db_stock.price;
user_stocks.push("samson")
user_stocks.push({...stock, staffId: req.user.id, total: dispatchedTotal, trx_id, stockId: stock.id, price: db_stock.price})
});
try{
const stockDispatched = await StockDispatches.insertMany(user_stocks, options);
if(!stockDispatched) return res.status(500).json({msg: "Error. Please try again."})
return res.status(200).json({msg: "Stock uploaded successfully.."})
}catch(error){
next(error)
}
}
}
The simplest way to fix this is to change
req.body.stocks.map()
to:
for (let [index, stock] of req.body.stocks.entries()) { ... }
since the for loop is promise-aware and will pause the loop for your await inside the loop body whereas .map() does not pause its loop for an await in the callback.
I have a loop inside my async function which essentially is meant to solve captchas, however the awaits within the loop don't seem to be working...
Here is what the loop looks like:
for(var i=0; i < 7; i++){
fileName = `./tasks/myfile-${i.toString()}.json`
// SOLVE reCAPTCHAs
const requestId = await initiateCaptchaRequest(apiKey);
const response = await pollForRequestResults(apiKey, requestId);
console.log(`SOLVED CAPTCHA - ${response}`);
// Should run this function AFTER the response has been received
await enter(response);
}
And here is what the functions look like that the loop utilises to solve the captcha (code from: https://medium.com/#jsoverson/bypassing-captchas-with-headless-chrome-93f294518337):
async function initiateCaptchaRequest(apiKey) {
const formData = {
method: 'userrecaptcha',
googlekey: siteDetails.sitekey,
key: apiKey,
pageurl: siteDetails.pageurl,
json: 1
};
const response = await request.post('http://2captcha.com/in.php', {form: formData});
return JSON.parse(response).request;
}
async function pollForRequestResults(key, id, retries = 30, interval = 1500, delay = 15000) {
await timeout(delay);
return poll({
taskFn: requestCaptchaResults(key, id),
interval,
retries
});
}
function requestCaptchaResults(apiKey, requestId) {
const url = `http://2captcha.com/res.php?key=${apiKey}&action=get&id=${requestId}&json=1`;
return async function() {
return new Promise(async function(resolve, reject){
const rawResponse = await request.get(url);
const resp = JSON.parse(rawResponse);
if (resp.status === 0) return reject(resp.request);
resolve(resp.request);
});
}
}
const timeout = millis => new Promise(resolve => setTimeout(resolve, millis))
My problem is that the loop seen above should run the initiateCaptchaRequest function which gets the requestId, THEN run the pollForRequestResults function to get the response and THEN run the enter function.
But right now the awaits don't seem to be working and almost instantly after the initiateCaptchaRequest function is run, the pollForRequestResults function runs which returns a JSON parse error obviously because the requestId has not been gathered yet, and is used in the requestCaptchaResults function...
Any help is appreciated!
I tried to recreate your issue, but it seem to work perfectly fine for me. But I think the issue is in requestCaptchaResults function
(async ()=>{
for(var i=0; i < 7; i++){
await initiateCaptchaRequest(i);
console.log(i)
}
})()
function initiateCaptchaRequest(i){
console.log('sdas'+i);
return new Promise((resolve, reject)=>{
setTimeout(()=>{resolve();}, 4000);
});
}
output:
sdas0
0
sdas1
1
sdas2
2
sdas3
3
sdas4
4
sdas5
5
sdas6
6
await is not works in for loop because for loop is not async
so you have to make for loop to promise
const newFunc = ()=>{
return new Promise(async (resolve,reject)=>{
// SOLVE reCAPTCHAs
const requestId = await initiateCaptchaRequest(apiKey);
const response = await pollForRequestResults(apiKey, requestId);
console.log(`SOLVED CAPTCHA - ${response}`);
// Should run this function AFTER the response has been received
const data = await enter(response);
resolve(data);
})
}
Yourfunction(){
val = []
for(var i=0; i < 7; i++){
fileName = `./tasks/myfile-${i.toString()}.json`
val.push(**your arguments)
}
return new Promise.all(val).then((res)=>{return res;}).catch((err)=> {return err;})
}
I'm developing a scraper/crawler using Node and Puppeteer. I'm trying to run several crawlers at the same time. Each one access a different site, login and do stuff. At the end all on them return a json with data. When an error occurs, the method returns a json with "N/A" on the fields.
The main file:
var a = require('./a');
var b = require('./b');
var c= require('./c');
var d= require('./d');
const http = require('http');
var url = require("url");
http.createServer((request, response) => {
if (request.method === 'GET') {
var urlObj = url.parse(request.url, true);
// cnpj is the company id I want my scraper to look for.
if (urlObj['query']['cnpj'] != undefined) {
var cnpj = urlObj['query']['cnpj'];
response.setHeader('Content-Type', 'application/json');
response.writeHeader(200);
getCotacao(cnpj).then(ret => {
response.end(JSON.stringify(ret));
})
}
} else {
response.statusCode = 404;
response.end();
}
}).listen(8080, 'localhost');
async function getCotacao(cnpj) {
var returnCount = 4;
var sleepCycle = 0;
var ret = [];
// run all of them
a(cnpj).then(response => { console.log('Ended a:' + JSON.stringify(response)); ret = ret.concat(response); returnCount--; });
b(cnpj).then(response => { console.log('Ended b:' + JSON.stringify(response)); ret = ret.concat(response); returnCount--; });
c(cnpj).then(response => { console.log('Ended: c:' + JSON.stringify(response)) ; ret = ret.concat(response); returnCount--; });
d(cnpj).then(response => { console.log('Ended d:' + JSON.stringify(response)) ; ret = ret.concat(response); returnCount--; });
//wait them to finish or not
while (returnCount > 0 && sleepCycle < 90) { // 1:30m timeout. After the timeout or all promises returned, end the code. Which ever comes first
await sleep(1000).then(function () {
sleepCycle++;
console.log("sleepCycle: " + sleepCycle);
});
}
return ret;
}
My a, b, c methods look all the same:
var puppeteer = require('puppeteer');
const a= async (cnpj) => {
var browser;
try {
var timeout = 3000; // 7 segundos
browser = await puppeteer.launch(
{ headless: true
}
);
const page = await browser.newPage();
await page.goto('a_website.com', {waitUntil: 'load', timeout: timeout});
await page.waitForSelector('#txtLogin', {timeout: timeout});
await page.focus('#txtLogin');
await page.keyboard.type('my_login');
await page.focus('#txtSenha');
await page.keyboard.type('my_pass');
await page.click('#btnLogin');
await page.waitForSelector('#btnSic', {timeout: timeout});
await page.click('#btnSic');
await page.waitForSelector('#txtCnpj', {timeout: timeout});
await page.focus('#txtCnpj');
await page.keyboard.type(cnpj);
await page.click('#btnAnalisar');
await page.waitForSelector('#Finaliza', {timeout: timeout});
const CreditLimit = await page.evaluate(() => document.querySelector('#Finaliza > table:nth-child(3) > tbody > tr:nth-child(5) > td:nth-child(2) > span').textContent);
const RiskRate = await page.evaluate(() => document.querySelector('#Finaliza > table:nth-child(3) > tbody > tr:nth-child(6) > td:nth-child(2) > span').textContent);
const limitAvailable = "";
data = {
'a':
{
CreditLimit,
RiskRate,
limitAvailable
}
}
await browser.close();
return data;
} catch (e) {
console.log(e);
data = {
'a': {
'CreditLimit': 'N/D',
'limitAvailable': 'N/D',
'RiskRate': 'N/D'
}
}
if (browser)
await browser.close();
return data;
}
};
module.exports = a;
THE PROBLEM: Sometimes (not always :) ) when I run it and one of the methods (a,b,c or d) get puppeteer timeout and finishes, close to the time when another method also finishes, both of them get the same return on the .then() clause. A log of this is:
sleepCycle: 1
sleepCycle: 2
sleepCycle: 3
sleepCycle: 4
{ TimeoutError: Navigation Timeout Exceeded: 3000ms exceeded
at Promise.then (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\LifecycleWatcher.js:142:21)
-- ASYNC --
at Frame.<anonymous> (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\helper.js:111:15)
at Page.goto (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\Page.js:674:49)
at Page.<anonymous> (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\helper.js:112:23)
at a (C:\Users\fabyt\PhpstormProjects\robo_garantia\src\a.js:15:24)
at process._tickCallback (internal/process/next_tick.js:68:7) name: 'TimeoutError' }
{ TimeoutError: Navigation Timeout Exceeded: 3000ms exceeded
at Promise.then (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\LifecycleWatcher.js:142:21)
-- ASYNC --
at Frame.<anonymous> (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\helper.js:111:15)
at Page.goto (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\Page.js:674:49)
at Page.<anonymous> (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\helper.js:112:23)
at b(C:\Users\fabyt\PhpstormProjects\robo_garantia\src\b.js:21:20)
at process._tickCallback (internal/process/next_tick.js:68:7) name: 'TimeoutError' }
sleepCycle: 5
{ TimeoutError: waiting for selector "#idToken1" failed: timeout 3000ms exceeded
at new WaitTask (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\DOMWorld.js:561:28)
at DOMWorld._waitForSelectorOrXPath (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\DOMWorld.js:490:22)
at DOMWorld.waitForSelector (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\DOMWorld.js:444:17)
at Frame.waitForSelector (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\FrameManager.js:628:47)
at Frame.<anonymous> (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\helper.js:112:23)
at Page.waitForSelector (C:\Users\fabyt\PhpstormProjects\robo_garantia\node_modules\puppeteer\lib\Page.js:1089:29)
at c(C:\Users\fabyt\PhpstormProjects\robo_garantia\src\c.js:16:24)
at process._tickCallback (internal/process/next_tick.js:68:7) name: 'TimeoutError' }
Ended a:{"c":{"CreditLimit":"N/D","limitAvailable":"N/D","RiskRate":"N/D"}}
Ended b:{"c":{"CreditLimit":"N/D","limitAvailable":"N/D","RiskRate":"N/D"}}
Ended c:{"c":{"CreditLimit":"N/D","limitAvailable":"N/D","RiskRate":"N/D"}}
sleepCycle: 6
sleepCycle: 7
sleepCycle: 8
sleepCycle: 9
sleepCycle: 10
Ended d:{"d":{"CreditLimit":"1.000.000.00","limitAvailable":"R$ 1,000,000.00","RiskRate":"1.5000"}}
sleepCycle: 11
The important part is here:
Ended a:{"c":{"CreditLimit":"N/D","limitAvailable":"N/D","RiskRate":"N/D"}}
Ended b:{"c":{"CreditLimit":"N/D","limitAvailable":"N/D","RiskRate":"N/D"}}
Ended c:{"c":{"CreditLimit":"N/D","limitAvailable":"N/D","RiskRate":"N/D"}}
Notice that the .then() clause of method a,b and c have all returned the same json from the method c. Like somehow one method could access other methods return promise. When the puppeteer code doesn't break each method return it's right json. The problem only occurs when one or more then one method timesout. How can I fix it?
Your variable data inside the modules has not been declared anywhere, so it is global. All the modules access the same memory area, so the values get overriten.
I used to write code for async-await in (Style 1), other dev suggested me to write in (Style 2).
Can someone please explain to me what is the difference between both styles, for me it seems the same.
Code Style 1:
const fixtures = await fixtureModel.fetchAll();
const team = await teamModel.fetch(teamId);
Code Style 2:
const fixturesPromise = fixtureModel.fetchAll();
const teamPromise = teamModel.fetch(teamId);
const fixtures = await fixturesPromise;
const team = await teamPromise;
They are not the same.
The first will initialize a Promise, wait for it to complete, then initialize another Promise, and wait for the second Promise to complete.
The second will initialize both Promises at once and wait for both to complete. So, it will take less time. Here's a similar example:
// Takes twice as long as the other:
const makeProm = () => new Promise(resolve => setTimeout(resolve, 1000));
console.log('start');
(async () => {
const foo = await makeProm();
const bar = await makeProm();
console.log('done');
})();
// Takes half as long as the other:
const makeProm = () => new Promise(resolve => setTimeout(resolve, 1000));
console.log('start');
(async () => {
const fooProm = makeProm();
const barProm = makeProm();
const foo = await fooProm;
const bar = await barProm;
console.log('done');
})();
But you might consider making the code even clearer with Promise.all instead:
const [fixtures, team] = await Promise.all([
fixtureModel.fetchAll(),
teamModel.fetch(teamId)
]);