I'm trying to pass a variable into a page.evaluate() function in Puppeteer, but when I use the following very simplified example, the variable evalVar is undefined.
I can't find any examples to build on, so I need help passing that variable into the page.evaluate() function so I can use it inside.
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
const evalVar = 'WHUT??';
try {
await page.goto('https://www.google.com.au');
await page.waitForSelector('#fbar');
const links = await page.evaluate((evalVar) => {
console.log('evalVar:', evalVar); // appears undefined
const urls = [];
hrefs = document.querySelectorAll('#fbar #fsl a');
hrefs.forEach(function(el) {
urls.push(el.href);
});
return urls;
})
console.log('links:', links);
} catch (err) {
console.log('ERR:', err.message);
} finally {
// browser.close();
}
})();
You have to pass the variable as an argument to the pageFunction like this:
const links = await page.evaluate((evalVar) => {
console.log(evalVar); // 2. should be defined now
…
}, evalVar); // 1. pass variable as an argument
You can pass in multiple variables by passing more arguments to page.evaluate():
await page.evaluate((a, b c) => { console.log(a, b, c) }, a, b, c)
The arguments must either be serializable as JSON or JSHandles of in-browser objects: https://pptr.dev/#?show=api-pageevaluatepagefunction-args
I encourage you to stick on this style, because it's more convenient and readable.
let name = 'jack';
let age = 33;
let location = 'Berlin/Germany';
await page.evaluate(({name, age, location}) => {
console.log(name);
console.log(age);
console.log(location);
},{name, age, location});
Single Variable:
You can pass one variable to page.evaluate() using the following syntax:
await page.evaluate(example => { /* ... */ }, example);
Note: You do not need to enclose the variable in (), unless you are going to be passing multiple variables.
Multiple Variables:
You can pass multiple variables to page.evaluate() using the following syntax:
await page.evaluate((example_1, example_2) => { /* ... */ }, example_1, example_2);
Note: Enclosing your variables within {} is not necessary.
It took me quite a while to figure out that console.log() in evaluate() can't show in node console.
Ref: https://github.com/GoogleChrome/puppeteer/issues/1944
everything that is run inside the page.evaluate function is done in the context of the browser page. The script is running in the browser not in node.js so if you log it will show in the browsers console which if you are running headless you will not see. You also can't set a node breakpoint inside the function.
Hope this can help.
For pass a function, there are two ways you can do it.
// 1. Defined in evaluationContext
await page.evaluate(() => {
window.yourFunc = function() {...};
});
const links = await page.evaluate(() => {
const func = window.yourFunc;
func();
});
// 2. Transform function to serializable(string). (Function can not be serialized)
const yourFunc = function() {...};
const obj = {
func: yourFunc.toString()
};
const otherObj = {
foo: 'bar'
};
const links = await page.evaluate((obj, aObj) => {
const funStr = obj.func;
const func = new Function(`return ${funStr}.apply(null, arguments)`)
func();
const foo = aObj.foo; // bar, for object
window.foo = foo;
debugger;
}, obj, otherObj);
You can add devtools: true to the launch options for test
I have a typescript example that could help someone new in typescript.
const hyperlinks: string [] = await page.evaluate((url: string, regex: RegExp, querySelect: string) => {
.........
}, url, regex, querySelect);
Slightly different version from #wolf answer above. Make code much more reusable between different context.
// util functions
export const pipe = (...fns) => initialVal => fns.reduce((acc, fn) => fn(acc), initialVal)
export const pluck = key => obj => obj[key] || null
export const map = fn => item => fn(item)
// these variables will be cast to string, look below at fn.toString()
const updatedAt = await page.evaluate(
([selector, util]) => {
let { pipe, map, pluck } = util
pipe = new Function(`return ${pipe}`)()
map = new Function(`return ${map}`)()
pluck = new Function(`return ${pluck}`)()
return pipe(
s => document.querySelector(s),
pluck('textContent'),
map(text => text.trim()),
map(date => Date.parse(date)),
map(timeStamp => Promise.resolve(timeStamp))
)(selector)
},
[
'#table-announcements tbody td:nth-child(2) .d-none',
{ pipe: pipe.toString(), map: map.toString(), pluck: pluck.toString() },
]
)
Also not that functions inside pipe cant used something like this
// incorrect, which is i don't know why
pipe(document.querySelector)
// should be
pipe(s => document.querySelector(s))
Related
This code is showing empty object ( {} )
// declared at top
let mainData = {};
let trainStations = {};
let routes = {};
let trainNo = {};
data["data"].forEach(async (element) => {
const response2 = await fetch(
`https://india-rail.herokuapp.com/trains/getRoute?trainNo=${element["train_base"]["train_no"]}`
);
const data2 = await response2.json();
data2["data"].forEach((ele) => {
routes[ele["source_stn_code"]] = true;
});
trainNo[element["train_base"]["train_no"]] = routes;
});
console.log(trainNo);
if i do this then i will give response with data
data["data"].forEach(async (element) => {
const response2 = await fetch(
`https://india-rail.herokuapp.com/trains/getRoute?trainNo=${element["train_base"]["train_no"]}`
);
const data2 = await response2.json();
data2["data"].forEach((ele) => {
routes[ele["source_stn_code"]] = true;
});
trainNo[element["train_base"]["train_no"]] = routes;
console.log(trainNo);
});
maybe there is some scooping issue please kindly help me to solve this problem :)
Please refer here and also check this.
As a short note, using await inside a forEach() loop will give unexpected results. This is because the forEach() does not wait until the promise to settled (either fulfilled or rejected).
A simple solution for this could be using either the traditional for loop or the for..of loop.
for(let element of data["data"]){
const response2 = await fetch(
`https://india-rail.herokuapp.com/trains/getRoute?trainNo=${element["train_base"]["train_no"]}`
);
const data2 = await response2.json();
data2["data"].forEach((ele) => {
routes[ele["source_stn_code"]] = true;
});
trainNo[element["train_base"]["train_no"]] = routes;
}
console.log(trainNo);
NOTE: Make sure to wrap the above for..of loop inside an async function because the await keyword is allowed inside a function only when the function is defined with async keyword.
Hey I have written this code in node js
async function getUdemydata(){
try{
var finalData = []
const res = await axios.get(baseUrl)
const $ = cheerio.load(res.data);
$('div[class="content"]').each(async (i,el) =>{
const courseTitle = $(el).find("a").text()
const aa = $(el).html()
const courseUrl = await getCoupenCode($(el).find("a").attr("href"))
const courseDescription = $(el).find(".description").text().trim()
const courseImage = await formateImageUrl($(el).find(".image").html())
var dataObj = {
"title": courseTitle,
"description": courseDescription,
"image": courseImage,
"link": courseUrl
}
finalData.push(dataObj);
console.log('appended');
})
return (finalData);
} catch(error){
console.error(error);
}
}
(async()=>{
var rs = await getUdemydata();
console.log(rs);
})()
When I call the getUdemydata() function only the empty array is printed after that appended which is inside the function is being printed what should I change in the code so that the function will return the final array
The definition of .each is as follows:
each<T>(fn: (i: number, el: T) => boolean | void): Cheerio<T>
It is not Promise-aware, so does not await the async function you supply as its parameter.
Remember that an async function is just a normal function that returns a Promise, so you could map these function calls, ending up with an array of Promise, then wait for them all.
// var finalArray = [] // <-- NO
const promises = $('div[class="content"]')
.toArray()
.map(async (el,i) => {
// same body as `each` in your code
// except, don't push into `finalArray`
// just return your value
// finalData.push(dataObj); // <-- NO
return dataObj;
});
const finalArray = await Promise.all(promises);
or if the parallelism of running all those promises at once is too much, then the documentation suggests that a Cheerio is Iterable, so just loop over $('div[class="content"]') with a normal
for(const el of $('div[class="content"]')) { //...
I've been trying to use puppeteer to scrape Twitch.
The idea for this program is to get the icon, username, & thumbnail of every stream in (for example) the category 'Just Chatting' in the 1st page.
I think my main code is working, but the object I'm trying to return (properties) is being returned as undefined.
I tried adding await behind my console.log in the function log(), and I also searched it up on here and read that the values returned from evaluate function have to be json serializeable, which I believe do include the strings which the object would have. Any help would be appreciated, thanks!
let properties = { icon: [], user: [], img: [], link: [] };
const puppeteer = require('puppeteer');
let elements = {
'https://www.twitch.tv/directory/game/Just%20Chatting': [
'img[class="InjectLayout-sc-588ddc-0.iyfkau.tw-image.tw-image-avatar"]',
'a[class="ScCoreLink-udwpw5-0.cxXSPs.tw-link"]',
'img[class="tw-image"]',
],
};
async function scrapeStreams() {
console.log('scrape started');
try {
console.log('try started');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setDefaultNavigationTimeout(0);
await page.goto(Object.keys(elements)[0], { waitUntil: 'networkidle2' });
await page.evaluate(
(properties, elements) => {
for ([key, value] of Object.entries(elements)) {
if ((key = Object.keys(elements)[0])) {
value.forEach((element) => {
if ((element = Object.values(elements)[0])) {
el = document.querySelector(element);
for (let val in el) {
datatype = val.src;
Object.values(properties)[0].push(datatype);
}
} else if ((element = Object.values(elements)[1])) {
el = document.querySelector(element);
for (let val in el) {
datatype = val.innerHTML;
Object.values(properties)[1].push(datatype);
}
} else if ((element = Object.values(elements)[2])) {
el = document.querySelector(element);
for (let val in el) {
datatype = val.src;
Object.values(properties)[2].push(datatype);
}
}
});
}
}
return properties;
},
properties,
elements
);
} catch (error) {
console.log('THIS IS THE ERROR: ' + error);
}
}
async function log() {
let properties = await scrapeStreams();
console.log(properties);
}
log();
Variables inside and outside of the function argument of page.evaluate() are not the same: they are copied while transferred between Node.js and browser contexts. So while you change properties inside page.evaluate(), the properties outside remains unchanged. While you use return properties; inside page.evaluate(), you are not save the returned value.
You forget to return value in scrapeStreams().
However, it seems there are some other issues in your code (many null are returned), but you may use another question for them.
// ...
// FIXED:
properties = await page.evaluate(
// ...
// FIXED:
return properties;
} catch (error) {
console.log('THIS IS THE ERROR: ' + error);
}
}
// ...
I am getting this error, when I try to run the script (which uses webpack)
Error: Evaluation failed: ReferenceError: _babel_runtime_helpers_toConsumableArray__WEBPACK_IMPORTED_MODULE_1___default is not defined at __puppeteer_evaluation_script__:2:27
but when I run same code which doesn't use webpack I got the expected result.
here is my function.
const getMeenaClickProducts = async (title) => {
const url = ` ${MEENACLICK}/${title}`;
console.log({ url });
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url);
await page.waitForSelector('.ant-pagination-total-text');
const products = await page.evaluate(() => {
const cards = [...document.querySelectorAll('.card-thumb')];
console.log({ cards });
return cards.map((card) => {
const productTitle = card.querySelector('.title').innerText;
const priceElement = card.querySelector('.reg-price');
const price = priceElement ? priceElement.innerText : '';
const image = card.querySelector('.img').src;
const link = card.querySelector('.main-link').href;
return {
title: productTitle,
price,
image,
link,
};
});
});
await browser.close();
const filteredProducts = products
.filter((product) =>
product.title.toLowerCase().includes(title.toLowerCase())
)
.filter((item) => item.price);
return filteredProducts;
};
what could be the reason?
The problem is with Babel, and with this part:
const products = await page.evaluate(() => {
const cards = [...document.querySelectorAll('.card-thumb')];
console.log({ cards });
return cards.map((card) => {
const productTitle = card.querySelector('.title').innerText;
const priceElement = card.querySelector('.reg-price');
const price = priceElement ? priceElement.innerText : '';
const image = card.querySelector('.img').src;
const link = card.querySelector('.main-link').href;
return {
title: productTitle,
price,
image,
link,
};
});
});
The inside of the page.evaluate() script you are passing as a function parameter, is not the actual code that is being passed to the page instance, because first you are using babel to transform it.
The array spread operator you have in this part:
const cards = [...document.querySelectorAll('.card-thumb')];
Is most likely being transformed in your build to a function named _babel_runtime_helpers_toConsumableArray__WEBPACK_IMPORTED_MODULE_1___default, which is then passed to the puppeteer page context, and ultimately executed in that page. But such function is not defined in that context, that's why you get a ReferenceError.
Some options to fix it:
Don't use the spread operator combined with the current babel config you are using, so the transformed build doesn't includ a polyfill/replacement of it. Think of a replacement with an equivalent effect, such as:
const cards = Array.from(document.querySelectorAll('.card-thumb'));
Or more traditional for / forEach() loops and build up the array yourself will get job done.
Update your babel config / target language level to support the spread operator natively.
Basically, I'm trying to create a Many-To-One relationship between 'Needs' and 'Pgroup'. However, for my use, I would be getting my create-requests with all the data required to create One Pg with all the Needs; in one shot; like so...
This is my code I have worked on so
(A Little Back-ground)
async new(data: PgroupEntity) {
// const pgp = await this.pgrouprepository.create(
// await this.pgrouprepository.save(pgp);
const pp = await this.pgrouprepository.findOne({ where: { id: 'c682620d-9717-4d3c-bef9-20a31d743a99' } });
This is where the code starts
for (let item in data.needs ) {
const need = await this.needrepository.create({...data.needs[item], pgroup: pp});
await this.needrepository.save(need);
return need;
}
}
For some reason, this for-loop doesn't work. It only iterates once. This code below works
const need = await this.needrepository.create({...data.needs[2], pgroup: pp});
await this.needrepository.save(need);
But I'm unable to save more than one need at a time.
Try this
async new(data: PgroupEntity) {
// const pgp = await this.pgrouprepository.create(data);
// await this.pgrouprepository.save(pgp);
const pp = await this.pgrouprepository.findOne({ where: { id: 'bad6eb03-655b-4e29-8d70-7fd63a7fe7d7' } });
data.needs.forEach(item => {
const need = this.needrepository.create({...item, pgroup: pp});
this.needrepository.save(need);
return need;
});
return data;
}