node js selenium-webdriver save page as

node js selenium-webdriver save page as - node.js

I am using the node.js package called selenium-webdriver and Firefox v52.9.0 on a Raspberry Pi (Raspbian Stretch).
At a certain point I would like to execute the equivalent of the Firefox GUI "Save Page As" function.
I found reference to something like this on this page:
# Write the output to output.txt
with open('output.txt', 'w') as file:
file.write(str(browser.page_source))
The issue here is this guide is using python instead of node. I am not sure how to implement the equivalent code.
I have located this in the documentation, but the docs don't include any example code and I am not sure how to implement the .write function. Can anyone explain (or point me to a resource for understanding) how to achieve "Save File" functionality in selenium-webdriver?
It is also worth mentioning that I need the file that is saved to include modifications made to the DOM by some AJAX and Javascript - that it won't be acceptable to simply save the original source of the html document, but it needs to be a representation of the current state of the page.
Here is some code for context:
const {Builder, By, Key, until} = require('selenium-webdriver');
(async function example() {
let driver = await new Builder().forBrowser('firefox').build();
try {
await driver.get('http://localhost/mypage.html');
await driver.sleep(10000);
/*SOMEHOW SAVE THE PAGE TO A FILE */
} finally {
await driver.quit();
}
})();

Just call driver.getPageSource()
try {
await driver.get('https://google.com');
await driver.sleep(1000);
const source = await driver.getPageSource();
fs.writeFileSync('source.html', source);
} finally {
await driver.quit();
}

Related

waitForSelector suddenly no longer working in puppeteer

I have a working puppeteer script that I'd like to make into an API but I'm having problems with waitForSelector.
Background:
I wrote a puppeteer script that successfully searches for and scrapes the result of a query I specify in the code e.g. let address = xyz;. Now I'd like to make it into an API so that a user can query something. I managed to code everything necessary for the local API (working with express) and everything works as well. By that I mean: I coded all the server side stuff: I can make a request, the scraper function is called, puppeteer starts up, carries out my search (I need to type in an address, choose from a dropdown and press enter).
The status:
The result of my query is a form (basically 3 columns and some rows) in an iFrame and I want to scrape all the rows (I modify them into a specific json later on). The way it works is I use waitForSelector on the form's selector and then I use frame.evaluate.
Problem:
When I run my normal scraper everything works well, but when I run the (slightly modified but essentially same) code within the API framework, waitForSelector suddenly always times out. I have tried all the usual workarounds: waitForNavigation, taking a screenshot and inspecting etc but nothing helped. I've been reading quite a bit and could it be that I'm screwing something up in terms of async/await when I call my scraper from within the context of the API? I'm still quite new to this so please bear with me. This is the code of the working script - I indicated the important part
const puppeteer = require("puppeteer");
const chalk = require("chalk");
const fs = require('fs');
const error = chalk.bold.red;
const success = chalk.keyword("green");
address = 'Gumpendorfer Straße 12, 1060 Wien';
(async () => {
try {
// open the headless browser
var browser = await puppeteer.launch();
// open a new page
var page = await browser.newPage();
// enter url in page
await page.goto(`https://mein.wien.gv.at/Meine-Amtswege/richtwert?subpage=/lagezuschlag/`, {waitUntil: 'networkidle2'});
// continue without newsletter
await page.click('#dss-modal-firstvisit-form > button.btn.btn-block.btn-light');
// let everyhting load
await page.waitFor(1000)
console.log('waiting for iframe with form to be ready.');
//wait until selector is available
await page.waitForSelector('iframe');
console.log('iframe is ready. Loading iframe content');
//choose the relevant iframe
const elementHandle = await page.$(
'iframe[src="/richtwertfrontend/lagezuschlag/"]',
);
//go into frame in order to input info
const frame = await elementHandle.contentFrame();
//enter address
console.log('filling form in iframe');
await frame.type('#input_adresse', address, { delay: 100});
//choose first option from dropdown
console.log('Choosing from dropdown');
await frame.click('#react-autowhatever-1--item-0');
console.log('pressing button');
//press button to search
await frame.click('#next-button');
// scraping data
console.log('scraping')
await frame.waitForSelector('#summary > div > div > br ~ div');//This keeps failing in the API
const res = await frame.evaluate(() => {
const rows = [...document.querySelectorAll('#summary > div > div > br ~ div')];
const cells = rows.map(
row => [...row.querySelectorAll('div')]
.map(cell => cell.innerText)
);
return cells;
});
await browser.close();
console.log(success("Browser Closed"));
const mapFields = (arr1, arr2) => {
const mappedArray = arr2.map((el) => {
const mappedArrayEl = {};
el.forEach((value, i) => {
if (arr1.length < (i+1)) return;
mappedArrayEl[arr1[i]] = value;
});
return mappedArrayEl;
});
return mappedArray;
}
const Arr1 = res[0];
const Arr2 = res.slice(1,3);
let dataObj = {};
dataObj[address] = [];
// dataObj['lagezuschlag'] = mapFields(Arr1, Arr2);
// dataObj['adresse'] = address;
dataObj[address] = mapFields(Arr1, Arr2);
console.log(dataObj);
} catch (err) {
// Catch and display errors
console.log(error(err));
await browser.close();
console.log(error("Browser Closed"));
}
})();
I just can't understand why it would work in the one case and not in the other, even though I barely changed something. For the API I basically changed the name of the async function to const search = async (address) => { such that I can call it with the query in my server side script.
Thanks in advance - I'm not attaching the API code cause I don't want to clutter the question. I can update it if it's necessary

I solved this myself. Turns out the problem wasn't as complicated as I thought and it was annoyingly simple to solve. The problem wasn't with the selector that was timing out but with the previous selectors, specifically the typing and choosing from dropdown selectors. Essentially, things were going too fast. Before the search query was typed in, the dropdown was already pressed and nonsense came out. How I solved it: I included a waitFor(1000) call before the dropdown is selected and everything went perfectly. An interesting realisation was that even though that one selector timed out, it wasn't actually the source of the problem. But like I said, annoyingly simple and I feel dumb for asking this :) but maybe someone will see this and learn from my mistake

Javascript trigger action for form fields created using PDFTron WebViewer FormBuilder UI

I am currently evaluating WebViewer version 5.2.8.
I need to set some javascript function/code as an action for triggers like calculate trigger, format trigger and keystroke trigger through the WebViewer UI.
Please help me on how to configure javascript code for a form field trigger in WebViewer UI.
Thanks in advance,
Syed

Sorry for the late response!
You will have to create the UI components yourself that will take in the JavaScript code. You can do something similar to what the FormBuilder demo does with just HTML and JavaScript. However, it may be better to clone the open source UI and add your own components.
As for setting the action, I would recommend trying out version 6.0 instead as there is better support for widgets and form fields in that version. However, we are investigating a bug with the field actions that will throw an error on downloading the document. You should be able to use this code to get it working first:
docViewer.on('annotationsLoaded', () => {
const annotations = annotManager.getAnnotationsList();
annotations.forEach(annot => {
const action = new instance.Actions.JavaScript({ javascript: 'alert("Hello World!")' });
// C cor Calculate, and F for Format
annot.addAction('K', action);
});
});
Once the bug has been dealt with, you should be able to download the document properly.
Otherwise, you will have to use the full API and that may be less than ideal. It would be a bit more complicated with the full API and I would not recommend it if the above feature will be fixed soon.
Let me know if this helps or if you need more information about using the full API to accomplish this!
EDIT
Here is the code to do it with the full API! Since the full API works at a low level and very closely to the PDF specification, it does take a lot more to make it work. You do still have to update the annotations with the code I provided before which I will include again.
docViewer.on('documentLoaded', async () => {
// This part requires the full API: https://www.pdftron.com/documentation/web/guides/full-api/setup/
const doc = docViewer.getDocument();
// Get document from worker
const pdfDoc = await doc.getPDFDoc();
const pageItr = await pdfDoc.getPageIterator();
while (await pageItr.hasNext()) {
const page = await pageItr.current();
// Note: this is a PDF array, not a JS array
const annots = await page.getAnnots();
const numAnnots = await page.getNumAnnots();
for (let i = 0; i < numAnnots; i++) {
const annot = await annots.getAt(i);
const subtypeDict = await annot.findObj('Subtype');
const subtype = await subtypeDict.getName();
const actions = await annot.findObj('AA');
// Check to make sure the annot is of type Widget
if (subtype === 'Widget') {
// Create the additional actions dictionary if it does not exist
if (!actions) {
actions = await annot.putDict('AA');
}
let calculate = await actions.findObj('C');
// Create the calculate action (C) if it does not exist
if (!calculate) {
calculate = await actions.putDict('C');
await Promise.all([calculate.putName('S', 'JavaScript'), calculate.putString('JS', 'app.alert("Hello World!")')]);
}
// Repeat for keystroke (K) and format (F)
}
}
pageItr.next();
}
});
docViewer.on('annotationsLoaded', () => {
const annotations = annotManager.getAnnotationsList();
annotations.forEach(annot => {
const action = new instance.Actions.JavaScript({ javascript: 'app.alert("Hello World!")' });
// K for Keystroke, and F for Format
annot.addAction('C', action);
});
});
You can probably put them together under the documentLoaded event but once the fix is ready, you can delete the part using the full API.

Generating PDF of a Web Page

I'm trying to generate a pdf file of a web page and want to save to local disk to email later.
I had tried this approach but the problem here is, its not working for pages like this. I'm able to generate the pdf but its not matching with web page content.
Its very clear that pdf is generated before document ready or might be something else. I'm unable to figure out the exact issue. I'm just looking for an approach where I can save web page output as pdf.
I hope generating pdf of a web page is more suitable in node then php? If any solution in php is available then it will be a big help or even node implementation is also fine.

Its very clear that pdf is generated before document ready
Very true, so it is necessary to wait until after scripts are loaded and executed.
You linked to an answer that uses phantom node module.
The module was upgraded since then and now supports async/await functions that make script much much more readable.
If I may suggest a solution that uses the async/await version (version 4.x, requires node 8+).
const phantom = require('phantom');
const timeout = ms => new Promise(resolve => setTimeout(resolve, ms));
(async function() {
const instance = await phantom.create();
const page = await instance.createPage();
await page.property('viewportSize', { width: 1920, height: 1024 });
const status = await page.open('http://www.chartjs.org/samples/latest/charts/pie.html');
// If a page has no set background color, it will have gray bg in PhantomJS
// so we'll set white background ourselves
await page.evaluate(function(){
document.querySelector('body').style.background = '#fff';
});
// Let's benchmark
console.time('wait');
// Wait until the script creates the canvas with the charts
while (0 == await page.evaluate(function(){ return document.querySelectorAll("canvas").length }) ) {
await timeout(250);
}
// Make sure animation of the chart has played
await timeout(500);
console.timeEnd('wait');
await page.render('screen.pdf');
await instance.exit();
})();
On my dev machine it takes 600ms to wait for the chart to be ready. Much better than to await timeout(3000) or any other arbitrary number of seconds.

I did something similiar using html-pdf package.
The code is simple, you can use like this:
pdf.create(html, options).toFile('./YourPDFName.pdf', function(err, res) {
if (err) {
console.log(err);
}
});
See more about it in the package page here.
Hope it help you.

Download files asynchrounsly and parse them synchronously with Node JS

I have a gulp task that downloads a few JSON files from GitHub, then prompts the user for values to replace in those files. For example, I have an .ftpconfig that gets download, and then the user is asked to enter hostname, username, password, and path.
Because the file first needs to be downloaded before it can be configured, and each file needs to be configured sequentially, I'm using quite a few nested callbacks. I'd like to change this "callback hell" system so that it utilizes async/await and/or promises instead, but I'm having a lot of difficulty understanding exactly why my code isn't working; it seems that promises fire their .then() functions asynchronously, which doesn't make sense to me.
My goals are as follows:
Download all config files asynchronously
Wait for all config files to finish downloading
Read existing settings from the config files
Prompt the user for changed settings in each config file synchronously
I've tried a number of approaches, none of which worked. I discarded the code I've used, but here's a rough recreation of the things I've tried:
Attempt #1:
return new Promise((resolve) => {
// download files...
}).then((resolve) => {
// configure first file...
}).then((resolve) => {
// configure second file...
}).then((resolve) => {
// configure thrid file...
});
Attempt #2:
const CONFIG_FILES = async () => {
const bs_download = await generate_config("browsersync");
const ftp_download = await generate_config("ftp");
const rsync_download = await generate_config("rsync");
return new Promise(() => {
configure_json("browsersync");
}).then(() => {
configure_json("ftp");
}).then(() => {
configure_json("rsync");
});
};
I'm sure I'm doing something very obviously wrong, but I'm not adapt enough at JavaScript to see the problem. Any help would be great appreciated.
My gulp task can be found here:
gulpfile.js
gulp-tasks/config.js

Thanks to #EricB, I was able to figure out what I was doing wrong. It was mostly a matter of making my functions return promises as well.
https://github.com/JacobDB/new-site/blob/d119b8b3c22aa7855791ab6b0ff3c2e33988b4b2/gulp-tasks/config.js

Refresh page while element is not presented with cucumber-nightwatch (selenium web driver)

I need to refresh page while element is not presented
i'm trying something like this, but it doesn't help
When(/^"([^"]*)" task status changed$/, taskName => {
let needRefresh = true;
do {
client.url(`${client.globals.env.url}${client.globals.env.index}/messaging/messages`)
.pause(10000)
.getTagName(`//div[contains(#class, "task-checkbox")]//*[contains(text(), "${taskName}")]`, res => {
client.equal(res.value, 'div')
}).pause(20000);
} while (!needRefresh)
});
how to do it correctly?

To be able to use complex asynchronous operations I suggest to use the new async functions. If your NodeJs version does not support it natively I suggest to use Babel. There is an example for that in the nightwatch-cucumber example folder. To be able to refresh the page until some condition you can use the following example.
When(/^"([^"]*)" task status changed$/, async (taskName) => {
let needRefresh = true;
do {
await client.refresh();
await client.pause(10000);
needRefresh = await client.getTagName(...
} while (!needRefresh)
});

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

node js selenium-webdriver save page as - node.js

Just call driver.getPageSource() try { await driver.get('https://google.com'); await driver.sleep(1000); const source = await driver.getPageSource(); fs.writeFileSync('source.html', source); } finally { await driver.quit(); }

Related

waitForSelector suddenly no longer working in puppeteer

Javascript trigger action for form fields created using PDFTron WebViewer FormBuilder UI

Generating PDF of a Web Page

Download files asynchrounsly and parse them synchronously with Node JS

Refresh page while element is not presented with cucumber-nightwatch (selenium web driver)

Categories

Resources