I've written a script in node using puppeteer to fetch different names and the links to their profiles from a webpage. The script is fetching them in the right way.
What I wish to do now is write the data in a csv file but can't find any idea how to do so. I have come across many tuts which describe about writing the same but most of them are either incomplete or using such libraries which are no longer being maintained.
This is what I've written so far:
const puppeteer = require('puppeteer');
const link = "https://www.ak-brandenburg.de/bauherren/architekten_architektinnen";
(async ()=> {
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto(link)
const listItem = await page.evaluate(() =>
[...document.querySelectorAll('.views-table tr')].map(item => ({
name: item.querySelector('.views-field-title a').innerText.trim(),
profilelink: "https://www.ak-brandenburg.de" + item.querySelector('.views-field-title a').getAttribute("href"),
}))
);
console.log(listItem);
await browser.close();
})();
How can I write the data in a csv file?
There is a far easier way to achieve the same. If you check out this library, you can write the data in a csv file very easily.
Working script:
const fs = require('fs');
const Json2csv = require('json2csv').Parser;
const puppeteer = require('puppeteer');
const link = "https://www.ak-brandenburg.de/bauherren/architekten_architektinnen";
(async ()=> {
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto(link)
const listItem = await page.evaluate(() =>
[...document.querySelectorAll('.views-table tbody tr')].map(item => ({
name: item.querySelector('.views-field-title a').innerText.trim(),
profilelink: "https://www.ak-brandenburg.de" + item.querySelector('.views-field-title a').getAttribute("href"),
}))
);
const j2csv = new Json2csv(['name','profilelink']);
const csv = j2csv.parse(listItem);
fs.writeFileSync('./output.csv',csv,'utf-8')
await browser.close();
})();
I haven't worked with puppeteer but I have created csv file in my node project
Store your data in a array eg: csvData
Then use fs.writeFile to save your csv data.
`fs.writeFile(`path/to/csv/${csvName}.csv`, csvData, 'utf8', function(err) {
if (err) {
console.log('error', err)
}
res.send({
url: `path/to/csv/${csvName}.csv`
})
})`
only use res.send if you want to send csv file from server to client
Related
its that time again when I'm clueless & come humbly to ask for help!
I am trying to download 4500 images at once, average 1mb size, all the images get created & download starts, after about 2gb downloaded (so half) some images are complete, some partial, some empty, task manager confirms the download stops suddenly.
What could possibly be the issue? No matter how much I wait, nothing happens, at least if I got an error I would try something else...
Please advice if possible, thank you!
//get all json files from a folder
const fs = require("fs");
const path = require("path");
const axios = require("axios");
let urlsArray = [];
const collection = "rebels";
const folder = collection + "_json";
const getFiles = (folder) => {
const directoryPath = path.join(__dirname, folder);
return fs.readdirSync(directoryPath);
};
const files = getFiles(folder);
//inside the folder there are json files with metadata
//for each json file parse it and get the image url
files.forEach((file) => {
const filePath = path.join(__dirname, folder, file);
const fileContent = fs.readFileSync(filePath, "utf8");
const parsedJson = JSON.parse(fileContent);
const imageFromMetadata = parsedJson.image;
const url = imageFromMetadata.replace("ipfs://", "https://ipfs.io/ipfs/");
let nr = file.replace(".json", "");
urlsArray.push({ url, nr });
});
//foreach url create a promise to download with axios
const downloadImage = (url, nr) => {
const writer = fs.createWriteStream(
process.cwd() + `/${collection}_images2/${nr}.png`
);
return axios({
url,
method: "GET",
responseType: "stream",
}).then((response) => {
return new Promise((resolve, reject) => {
response.data.pipe(writer);
writer.on("finish", resolve);
writer.on("error", reject);
});
});
};
const promiseAll = async () => {
const promises = urlsArray.map((data) => {
console.log(`trying to download image nr ${data.nr} from ${data.url}`);
return downloadImage(data.url, data.nr);
});
await Promise.allSettled(promises);
};
promiseAll();
//download all
Since Promise.allSettled() never rejects, nothing in your code will report on any rejected promises that it sees. So, I'd suggest you iterate its results and see if you have any rejected promises there.
You can do that like this:
const results = await Promise.allSettled(promises);
console.log(`results.length = ${results.length}`);
for (const r of results) {
if (r.status === "rejected") {
console.log(r.reason);
}
}
console.log("all done");
This will verify that you got through the end of the Promise.allSettled(promises) and will verify that you got non-zero results and will log any rejected promises you got.
I'm using TypeScript + Node.js + the pdfkit library to create PDFs and verify that they're consistent.
However, when just creating the most basic PDF, consistency already fails. Here's my test.
import {readFileSync, createWriteStream} from "fs";
const PDFDocument = require('pdfkit');
const assert = require('assert').strict;
const fileName = '/tmp/tmp.pdf'
async function makeSimplePDF() {
return new Promise(resolve => {
const stream = createWriteStream(fileName);
const doc = new PDFDocument();
doc.pipe(stream);
doc.end();
stream.on('finish', resolve);
})
}
describe('test that pdfs are consistent', () => {
it('simple pdf test.', async () => {
await makeSimplePDF();
const data: Buffer = readFileSync(fileName);
await makeSimplePDF(); // make PDF again
const data2: Buffer = readFileSync(fileName);
assert.deepStrictEqual(data, data2); // fails!
});
});
Most of the values in the two Buffers are identical but a few of them are not. What's happening here?
I believe that the bytes may be slightly different due to the creation time being factored into the Buffer somehow. When I used mockdate(https://www.npmjs.com/package/mockdate) to fix 'now', I ended up getting consistent Buffers.
I have a page on this link (https://master.d3tei1upkyr9mb.amplifyapp.com/report) with 3 export buttons.
These export buttons generate XLSX, CSV, PDF on the frontend, and hence there are no URLs for XLSX, CSV, PDF.
I need puppeteer to be able to download or get or intercept the blobs or buffers of these files in my node backend.
I tried different ways to achieve this but still haven't figured out.
It was possible through playwright library through the code written below. But I need to be able to do it with Puppeteer.
const {chromium} = require('playwright');
const fs = require('fs');
(async () => {
const browser = await chromium.launch();
const context = await browser.newContext({acceptDownloads: true});
const page = await context.newPage();
await page.goto('http://localhost:3000/');
const [ download ] = await Promise.all([
page.waitForEvent('download'), // <-- start waiting for the download
page.click('button#expoXLSX') // <-- perform the action that directly or indirectly initiates it.
]);
const path = await download.path();
console.log(path);
const newFile = await fs.readFileSync(path);
console.log(newFile);
fs.writeFile("test.xlsx", newFile, "binary",function(err) {
if(err) {
console.log(err);
} else {
console.log("The file was saved!");
}
});
await browser.close()
})();
Is there any way?
Any reason not to simulate the click on the frontend and allow puppeteer download the file to the location of your choice? You can easily download the file this way with the following:
Edit: You can determine when the file download completes by listening to the Page.downloadProgress event and checking for the completed state. Getting the actual filename saved to disk isn't 100% guaranteed with this method, but you are able to get what is termed the suggestedFileName from the Page.downloadWillBegin event, which in my tests thus far (at least on the example page in the question) does match the filename persisted to disk.
const puppeteer = require('puppeteer');
const path = require('path');
const downloadPath = path.resolve('./download');
(async ()=> {
let fileName;
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.goto(
'https://master.d3tei1upkyr9mb.amplifyapp.com/report',
{ waitUntil: 'networkidle2' }
);
await page._client.send('Page.setDownloadBehavior', {
behavior: 'allow',
downloadPath: downloadPath
});
await page._client.on('Page.downloadWillBegin', ({ url, suggestedFilename }) => {
console.log('download beginning,', url, suggestedFilename);
fileName = suggestedFilename;
});
await page._client.on('Page.downloadProgress', ({ state }) => {
if (state === 'completed') {
console.log('download completed. File location: ', downloadPath + '/' + fileName);
}
});
await page.click('button#expoPDF');
})();
I am trying to download invoice from website using puppeteer, I just started to learn puppeteer. I am using node to create and execute the code. I have managed to login and navigate to the invoice page, but it opens in new tab, so, code is not detecting it since its not the active tab. This is the code I used:
const puppeteer = require('puppeteer')
const SECRET_EMAIL = 'emailid'
const SECRET_PASSWORD = 'password'
const main = async () => {
const browser = await puppeteer.launch({
headless: false,
})
const page = await browser.newPage()
await page.goto('https://my.apify.com/sign-in', { waitUntil: 'networkidle2' })
await page.waitForSelector('div.sign_shared__SignForm-sc-1jf30gt-2.kFKpB')
await page.type('input#email', SECRET_EMAIL)
await page.type('input#password', SECRET_PASSWORD)
await page.click('input[type="submit"]')
await page.waitForSelector('#logged-user')
await page.goto('https://my.apify.com/billing#/invoices', { waitUntil: 'networkidle2' })
await page.waitForSelector('#reactive-table-1')
await page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a')
const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())))
const page2 = await newPagePromise
await page2.bringToFront()
await page2.screenshot({ path: 'apify1.png' })
//await browser.close()
}
main()
In the above code I am just trying to take screenshot. Can anyone help me?
Here is an example of a work-around for the chromium issue mentioned in the comments above. Adapt to fit your specific needs and use-case. Basically, you need to capture the new page (target) and then do whatever you need to do to download the file, possibly pass it as a buffer to Node as per the example below if no other means work for you (including a direct request to the download location via fetch or ideally some request library on the back-end)
const [PDF_page] = await Promise.all([
browser
.waitForTarget(target => target.url().includes('my.apify.com/account/invoices/' && target).then(target => target.page()),
ATT_page.click('#reactive-table-1 > tbody > tr:nth-child(1) > td.number > a'),
]);
const asyncRes = PDF_page.waitForResponse(response =>
response
.request()
.url()
.includes('my.apify.com/account/invoices'));
await PDF_page.reload();
const res = await asyncRes;
const url = res.url();
const headers = res.headers();
if (!headers['content-type'].includes('application/pdf')) {
await PDF_page.close();
return null;
}
const options = {
// target request options
};
const pdfAb = await PDF_page.evaluate(
async (url, options) => {
function bufferToBase64(buffer) {
return btoa(
new Uint8Array(buffer).reduce((data, byte) => {
return data + String.fromCharCode(byte);
}, ''),
);
}
return await fetch(url, options)
.then(response => response.arrayBuffer())
.then(arrayBuffer => bufferToBase64(arrayBuffer));
},
url,
options,
);
const pdf = Buffer.from(pdfAb, 'base64');
await PDF_page.close();
I have spent a lot of time experimenting and using google trying to find a solution to this issue but I have not had any success and I am hoping that someone will be able to provide some guidance. So here is my situation, I am trying to run javascript mocha selenium-webdriver tests on my company's application using docker containers. I seem to have everything working except that I am unable to upload files. Prior to trying use my tests in a docker environment I was able to use it on our local servers and upload files with no issue using the a method like this:
const companyImage = process.cwd()+ '/img/backgroundmario.jpg';
const companyImageElem = await driver.findElement(By.xpath("//div/input[#type='file']"));
await companyImageElem.sendKeys(companyImage);
However, I have not been able to have any success when using docker containers. I mounted my img folder to my selenium/node-chrome-debug container which includes a VNC viewer and I can see that the images are present (and I can manually upload the images via the VNC viewer). However, despite numerous variations of providing paths to the images I can not seem to get my images to upload. For some reason the working directory seems to be from my test container and not my node-chrome-debug container but even if I add the images to the test container and change the path to my test container directory with the images they do not upload either.
Here is a snippet of my code I am using for my test (it includes some stuff I wouldn't normally include, specifically the check for process.cwd() and process.env.PWD since I just wanted to see what the path was:
const {
Builder,
By,
Key,
until,
webdriver,
action
} = require('selenium-webdriver');
const mocha = require('mocha');
const chai = require("chai");
const chaiAsPromised = require("chai-as-promised");
const {
makeUtilityBelt
} = require('./util')
chai.use(chaiAsPromised);
const fs = require('fs');
const expect = require('chai').expect;
const ciPassword = require('./envData').ciPassword;
const campManagerMail = 'jdrzymala+companycreator#influential.co';
const campManagerName = 'companycreator';
const campManagerUsername = 'companycreator';
const legacy = "http://node-web-client";
const companyImage = '/opt/test/images/backgroundmario.jpg';
var currentDir = process.cwd();
var appFolder = process.env.PWD;
const {
createLegacyAdmin,
createLegacyResellerCompany,
createLegacyBrandCompany,
createLegacyAgencyCompany,
createLegacyCampManager,
createLegacyClient,
createLegacyInfluencer
} = require('./legacyCreationQueries');
const {
getEmailId,
getUserEmailId,
getRandom,
verifyRandom,
accountSetup
} = require('./sqlutil');
describe('Creates a Company of Each Type via the Legacy Dashboard', function () {
this.timeout(1200000);
this.slow(20000);
let driver;
let util;
before(async function () {
driver = new Builder().forBrowser('chrome').usingServer('http://selenium_hub:4444/wd/hub').build();
util = makeUtilityBelt(driver);
await createLegacyCampManager(campManagerName, campManagerUsername, campManagerMail);
});
afterEach(function () {
let testCaseName = this.currentTest.title;
let testCaseStatus = this.currentTest.state;
if (testCaseStatus === 'failed') {
driver.takeScreenshot().then((data) => {
let screenshotPath = `./results/${testCaseName}.png`;
console.log(`Saving Screenshot as: ${screenshotPath}`);
fs.writeFileSync(screenshotPath, data, 'base64');
});
}
});
after(function () {
driver.quit();
});
describe('Load Legacy Corporate Site and Login to Legacy Dashboard', function () {
it('Loads into the Legacy Dashboard Successfully', async function () {
await driver.get(legacy);
await driver.wait(until.elementLocated(By.xpath("//p[contains(text(),'Sign In')]"), 10000));
await driver.sleep(3000);
const emailElem = await driver.findElement(By.xpath("//input[#id='email']"));
await util.sendKeys(emailElem, campManagerMail);
const pwElem = await driver.findElement(By.xpath("//input[#id='password']"));
await util.sendKeys(pwElem, ciPassword);
await driver.findElement(By.xpath("//button[contains(text(),'Sign In')]")).click();
await driver.wait(until.elementLocated(By.xpath("//div/ul[contains(#class, 'campaign-search-list')]"), 10000));
await driver.wait(until.elementLocated(By.xpath("//ul[#class='menu']/li/a/span[contains(text(),'User Management')]"), 10000));
await driver.sleep(5000);
await console.log("Below is the Current Working Directory");
await console.log(currentDir);
await driver.sleep(3000);
await console.log(appFolder);
await driver.sleep(3000);
await console.log("The above is the app folder");
await driver.sleep(2000);
const loginSuccessElem = await driver.findElement(By.xpath("//ul[#class='menu']/li/a/span[contains(text(),'User Management')]"));
let loginSuccess = await loginSuccessElem.isDisplayed();
await driver.sleep(3000);
await expect(loginSuccess, 'Legacy Login Failed').to.be.true;
});
});
describe('Creates a Reseller Company', function(){
const companyName = 'Reseller Test Company';
it('Navigates to Company Management and Begins Company Creation Process', async function(){
await driver.wait(until.elementLocated(By.xpath("//ul[#class='menu']/li/a/span[contains(text(),'Company Management')]"), 10000));
await driver.findElement(By.xpath("//ul[#class='menu']/li/a/span[contains(text(),'Company Management')]")).click();
await driver.sleep(8000);
await driver.wait(until.elementLocated(By.xpath("//h3[contains(text(),'Search Companies')]"), 10000));
await driver.wait(until.elementLocated(By.xpath("//a[contains(text(),'+ Create Company')]"), 10000));
await driver.sleep(8000);
await driver.findElement(By.xpath("//a[contains(text(),'+ Create Company')]")).click();
await driver.wait(until.elementLocated(By.xpath("//h3[contains(text(),'Create Company')]"), 10000));
const companyCreationPageElem = await driver.findElement(By.xpath("//h3[contains(text(),'Create Company')]"));
let companyCreationPage = await companyCreationPageElem.isDisplayed();
await expect(companyCreationPage, 'Company Creation Page failed to Load').to.be.true;
});
it('Fills in the required fields and creates New Reseller Company', async function(){
const companyDescription = 'This is a test description for a random test company blah blah blah';
const companyAddress = '777 Lucky Lane';
const companyCity = 'Las Vegas';
const companyState = 'Nevada';
const companyZip = '89104';
const companyNameElem = await driver.findElement(By.xpath("//input[#label='Company Name']"));
await util.sendKeys(companyNameElem, companyName);
await driver.sleep(1000);
const companyDescriptionElem = await driver.findElement(By.xpath("//textarea[#label='Company Description']"));
await util.sendKeys(companyDescriptionElem, companyDescription);
await driver.sleep(1000);
const companyTypeElem = await driver.findElement(By.xpath("//select"));
await companyTypeElem.click();
await driver.wait(until.elementLocated(By.xpath("//select/option"), 10000));
const companyTypeSelectElem = await driver.findElement(By.xpath("//select/option[#value='1']"));
await companyTypeSelectElem.click();
await driver.sleep(1000);
const addressElem = await driver.findElement(By.xpath("//input[#label='Address']"));
await util.sendKeys(addressElem, companyAddress);
await driver.sleep(1000);
const cityElem = await driver.findElement(By.xpath("//input[#label='City']"));
await util.sendKeys(cityElem, companyCity);
await driver.sleep(1000);
const stateElem = await driver.findElement(By.xpath("//input[#label='State']"));
await util.sendKeys(stateElem, companyState);
await driver.sleep(1000);
const zipElem = await driver.findElement(By.xpath("//input[#label='Zip Code']"));
await util.sendKeys(zipElem, companyZip);
await driver.sleep(1000);
await driver.findElement(By.xpath("//input[#type='file']")).sendKeys(companyImage);
await driver.sleep(1000);
await driver.wait(until.elementLocated(By.xpath("//img[#class='image-preview']"), 10000));
await driver.sleep(1000);
const submitButtonElem = await driver.findElement(By.xpath("//button[contains(text(),'Submit')]"));
await submitButtonElem.click();
await driver.wait(until.elementLocated(By.xpath("//h3[contains(text(),'Company Actions')]"), 10000));
await driver.wait(until.elementLocated(By.xpath("//p[#class='company-name']"), 10000));
const companySuccessElem = await driver.findElement(By.xpath("//p[#class='company-name'][contains(text(),'"+companyName+"')]"));
let companySuccess = await companySuccessElem.isDisplayed();
await expect(companySuccess, 'Failed to Create New Company').to.be.true;
});
});
});
This is the last thing stopping me from integrating my large number of test files with our CI/CD process but a huge number of my tests involve uploading files so it is a major issue. I am extremely thankful for any guidance anyone could provide me. Thank you in advance!
Despite not receiving any guidance here, with a bit of additional research, the assistance of a coworker and some experimentation I was able to solve my problem.
So there are several aspects that are important. First of all, you must make sure that the images you want to upload are mounted to the container with your browser (in my case, selenium/node-chrome-debug). Then you must make some additions to your test Selenium test file.
You must add the following lines:
var path = require('path');
var remote = require('selenium-webdriver/remote');
You can use var or let although I've been told that let is a better standard practice.
Then, after the line
driver = new Builder().forBrowser('chrome').usingServer('http://selenium_hub:4444/wd/hub').build();
Add this line of code
driver.setFileDetector(new remote.FileDetector());
For your the file you wish to upload, you must set the path to that of the file system on the browser container (selenium/node-chrome-debug in my case). So your file variable would be something like:
const companyImage = process.cwd()+'/images/backgroundmario.jpg';
Then, when you want to upload the file, you find the respective element using whichever form of identification you like and add a little extra to the sendKeys function in comparison to how you would do it were you simply running the script on your local file system rather than a docker container. So the code would look like this:
await driver.findElement(By.xpath("//input[#type='file']")).sendKeys(path.resolve(__dirname, companyImage));
Maybe there is a slightly cleaner way to code it (for example, I generally declare the elements I am interacting with as variables) but the example I have provided will work. It took me a lot of time and effort to find this solution so I hope this eventually saves someone else the amount of pain I experienced trying to get this to work.