I have a node project using puppeteer.
I run the following code in console and I get 170 results back
circlenod = window.document.querySelectorAll('area[templateid="ucChart_pnlip"]');
for (var i = 0; i < circlenod.length; i++) {
console.log('circlenod --> : ', circlenod[i]);
}
However when I try to use page evaluate in puppet or $$ method, I get no results coming back
let test;
let list = await page.$$('area[templateid="ucChart_pnlip"]');
console.log('list ===> ', list); ===> this is empty
await page.evaluate(() => {
console.log('coming in ??'); ==> never see this
test = Array.from(document.querySelectorAll('area[templateid="ucChart_pnlip"]'));
for (var i = 0; i < test.length; i++) {
console.log('circlenod --> : ', test[i]); ==> never see this
}
})
console.log('test', test); ==> undefined
This is an example of the element. How can I extract this information from the attribute after looping through all the <area /> fields="date|5/1/2020|14"
<area shape="poly" coords="802,235,807,233,807,241,802,243" fields="date|5/1/2020|14" templateid="ucChart_pnlip" href="../../../../../../UserControls/History/Single/#" onclick="charttip_Show(this, event);return false" alt="">
I found a solution for this, I am putting this out there for whoever needs it. when you get array of elements, you need to spicify the attribute of the element that you want
let test= await page.evaluate((sele) =>{
const elements = Array.from(document.querySelectorAll(sele))
let links = elements.map(element=>{
return element.getAttribute('fields');
})
return links;
},'area[templateid="ucChart_pnlip"]')
console.log(test)
Related
The function outputs correctly on online code editors but I am not successful in replicating the output on my browser. What's the correct way of outputting it to my browser? I have tried numerous methods. Here is the function I want to output.
function countdown(i) {
console.log(i);
if (i <= 1) { // base case
return;
} else { // recursive case
countdown(i - 1);
}
}
countdown(5); // This is the initial call to the function.
Here is my most recent attempt at output on my web browser
function countDown(i) {
document.getElementById("recursiveFuncAttempt").innerHTML = i;
if (i <= 1) {
return;
} else {
cat = countDown(i - 1);
return document.getElementById("recursiveFuncAttempt").innerHTML = cat;
}
}
countDown(5);
<div>
countdown attempt
<button onclick="countDown()">click me</button>
<p id="recursiveFuncAttempt"></p>
</div>
Grouping your code and the comments together...
Your original code was correct but instead of logging to the console you should add the value to the text content of a page element.
Logging the different values in the console - line by line - gives an appearance of time passing which updating the text content of a DOM element wouldn't give you. All you would see is the last digit in the sequence because the function would work faster than your eyes can see.
Therefore a a timeout is needed to pause execution for n time before calling the function again.
You can simplify the code a little by eliminating the else part of the condition.
// Cache the element
const div = document.querySelector('div');
// Add a default value to count if a value
// is not passed into the function
function countdown(count = 5) {
// If count is zero just return
if (count < 1) return;
// Otherwise update the text content
// of the cached element
div.textContent = count;
// Wait one second (1000ms), and call the function
// with a decremented count
setTimeout(countdown, 1000, --count);
}
countdown();
div { font-size: 5em; color: blue; font-weight: 700;}
<div></div>
Are you ready to refill your bottles?
Yes - I’m ready
Would you like to talk to the specialist?
No, I m all set!
const dynamic_question = props.oResponse.filter((item: any) => item.dynamic_question_status === true);
const CommonTrimming = (props: { value: string }) => {
let temp = props.value
.replace(/{*/g, '')
.replace(/}*/g, '')
.replaceAll(/'*/g, '')
.replace(/&/g, '&')
.replaceAll('\\n', '<br/>')
.replaceAll(',', '<br/>');
console.warn("executing or not ? ", temp);
return <div dangerouslySetInnerHTML={{ __html: temp }}></div>;
};
`
i have used to make common trimming.
here in second example we're missing single quote.
it should come in downloaded pdf.
I am trying to write a bot to convert a bunch of HTML pages to markdown, in order to import them as Jekyll document. For this, I use puppeteer to get the HTML document, and cheerio to manipulate it.
The source HTML is pretty complex, and polluted with Google ADS tags, external scripts, etc. What I need to do, is to get the HTML content of a predefined selector, and then remove elements that match a predefined set of selectors from it in order to get a plain HTML with just the text and convert it to markdown.
Assume the source html is something like this:
<html>
<head />
<body>
<article class="post">
<h1>Title</h1>
<p>First paragraph.</p>
<script>That for some reason has been put here</script>
<p>Second paragraph.</p>
<ins>Google ADS</ins>
<p>Third paragraph.</p>
<div class="related">A block full of HTML and text</div>
<p>Forth paragraph.</p>
</article>
</body>
</html>
What I want to achieve is something like
<h1>Title</h1>
<p>First paragraph.</p>
<p>Second paragraph.</p>
<p>Third paragraph.</p>
<p>Forth paragraph.</p>
I defined an array of selectors that I want to strip from the source object:
stripFromText: ['.social-share', 'script', '.adv-in', '.postinfo', '.postauthor', '.widget', '.related', 'img', 'p:empty', 'div:empty', 'section:empty', 'ins'],
And wrote the following function:
const getHTMLContent = async ($, selector) => {
let value;
try {
let content = await $(selector);
for (const s of SELECTORS.stripFromText) {
// 1
content = await content.remove(s);
// 2
// await content.remove(s);
// 3
// content = await content.find(s).remove();
// 4
// await content.find(s).remove();
// 5
// const matches = await content.find(s);
// for (m of matches) {
// await m.remove();
// }
};
value = content.html();
} catch(e) {
console.log(`- [!] Unable to get ${selector}`);
}
console.log(value);
return value;
};
Where
$ is the cheerio object containing const $ = await cheerio.load(html);
selector is the dome selector for the container (in the example above it would be .post)
What I am unable to do, is to use cheerio to remove() the objects. I tried all the 5 versions I left commented in the code, but without success. Cheerio's documentation didn't help so far, and I just found this link but the proposed solution did not work for me.
I was wondering if someone more experienced with cheerio could point me in the right direction, or explain me what I am missing here.
I found a classical newby error in my code, I was missing an await before the .remove() call.
The working function now looks like this, and works:
const getHTMLContent = async ($, selector) => {
let value;
try {
let content = await $(selector);
for (const s of SELECTORS.stripFromText) {
console.log(`--- Stripping ${s}`);
await content.find(s).remove();
};
value = await content.html();
} catch(e) {
console.log(`- [!] Unable to get ${selector}`);
}
return value;
};
You can remove the elements with remove:
$('script,ins,div').remove()
I'm trying to click on a div that has a role="button" attribute,
Even though I'm not using it trying to get it by its DOM path is not working either.
What I've tried:
try {
let Button = await this.page.$("div.class1.class2 > div");
await Button.click();
console.log("Clicked");
} catch (e) {
console.log(e);
console.log("No button");
}
The error I get is:
TypeError: Cannot read property '$' of undefined
I tried to get to the div by the div that contains it which does have 2 classes I can relate on but it doesn't seem to work.
Is there a way to get an array of all the divs with role="button" and click only on the one that has a span inside it with a specific text?
Remove this keyword to fix the TypeError error.
let Button = await page.$("div.class1.class2 > div");
To get an array of all the divs with role=button and Specific text text:
const buttons = await page.$x('//div[#role="button"][text()="Specific Text"]'); // returns: <Promise<Array<ElementHandle>>>
But I would recommend adding waitForXPath method to wait for the element.
Full example:
try {
const button = await page.waitForXPath('//div[#role="button"][text()="Specific Text"]');
await button.click();
console.log("Clicked");
} catch (e) {
console.log("No button", e);
}
I've got an issue where I'm using template.render to render an array of items based on a html template. Each item in the array also contains another array, that I want to bind to another template, within the parent element for the area. I know I can use a grid layout for groups, but I'm trying to accomplish this another way, so please, no suggestions to use a different control, I'm just curious as to why the following doesn't work correctly.
//html templates
<div id="area-template" data-win-control="WinJS.Binding.Template">
<h1 class="area-title" data-win-bind="innerHTML:title"></h1>
<div class="items">
</div>
</div>
<div id="item-template" data-win-control="WinJS.Binding.Template">
<h2 class="item-title" data-win-bind="innerHTML:title"></h2>
</div>
// JS in ready event
var renderer = document.getElementsByTagName('section')[0];
var area_template = document.getElementById('area-template').winControl;
var item_template = document.getElementById('item-template').winControl;
for (var i = 0; i < areas.length; i++) {
var area = areas.getAt(i);
area_template.render(area, renderer).done(function (el) {
var item_renderer = el.querySelector('.items');
for (var j = 0; j < area.items.length; j++) {
var item = area.items[j];
item_template.render(item, item_renderer).done(function (item_el) {
});
}
});
}
So what should happen, is that after it renders the area, in the "done" function the newly created element (el) gets returned, I'm then finding it's ".items" div to append the items to. However, this appends all the items to the first div created. If it was the last div, it might make more sense due to closures, but the fact it happens on the first one is really throwing me off!
What's interesting, is that if I replace my template render function using document.createElement and el.appendChild, it does display correctly e.g: (in the done of area render)
area_template.render(area, renderer).done(function (el) {
var item = area.items[j];
var h2 = document.createElement('h2');
h2.innerText = item.title;
el.appendChild(h2);
}
although I've realised this is el it is appending it to, not the actual .items div of the el
I'm not quite sure what could be going on here. It appears the value of el is getting updated correctly, but el.querySelector is either always returning the wrong ".items" div or it's getting retained somewhere, however debugging does show that el is changing during the loop. Any insight would be greatly appreciated.
thanks
I've worked out what is going on here. The "el" returned in the render promise is not the newly created element as I thought. It's the renderer and the newly created html together. Therefore el.querySelector('.items') is always bringing back the first '.items' it finds. I must have misread the docs, but hopefully someone else will find this information useful in case they have the same error.
I guess one way around this would be to do item_rendered = el.querySelectorAll('.items')[i] and return the numbered '.items' based on the position in the loop
e.g
for (var i = 0; i < areas.length; i++) {
var area = areas.getAt(i);
area_template.render(area, renderer).done(function (el) {
var item_renderer = el.querySelectorAll('.items')[i];
for (var j = 0; j < area.items.length; j++) {
var item = area.items[j];
var h2 = document.createElement('h2');
h2.innerText = item.title;
item_renderer.appendChild(h2);
}
});
}