I'm trying to fill a form inside an existing PDF using typescript, so far I had no success using pdf-lib
const formPdfBytes = await axios.get(formUrl, {
responseType: "arraybuffer",
});
const pdfDoc = await PDFDocument.load(formPdfBytes.data);
const form = pdfDoc.getForm();
const nameField = form.getTextField("CharacterName 2");
nameField.setText("Mario");
const pdfBytes = await pdfDoc.save();
No errors are showing up, but when I open the PDF it is still empty
Related
I saw PDFescape in another post and gave that a shot to be able to edit form field names. Ran my prog and indeed form fields filled but the rest of the contents from my template file were missing.
I tried the Fill Form example file and it couldn't find the form field names, so I also loaded that in PDFescape and saved from there, and then the same result of the rest of the template missing but form fields filled.
I thought maybe PDFescape could be the issue so I purchased the Adobe trial to be able to edit the form field names but there again, pdf-lib doesn't find them (Error: PDFDocument has no form field with the name "prodCode") even though definitely saved as such -
Where the heck am I going wrong?! Code for your reference -
const { PDFDocument } = require('pdf-lib');
const fs = require('fs');
(async () => {
const pdfUTF8 = fs.readFileSync('./test.pdf','utf8')
var formPdfBytes = new TextEncoder("utf-8").encode(pdfUTF8);
// Load a PDF with form fields
const pdfDoc = await PDFDocument.load(formPdfBytes)
// Get the form containing all the fields
const form = pdfDoc.getForm()
// Get all fields in the PDF by their names
const productCodeField = form.getTextField('prodCode')
const certNumberField = form.getTextField('certNumber')
productCodeField.setText('Product code here')
certNumberField.setText('Cert number here')
// Serialize the PDFDocument to bytes (a Uint8Array)
const pdfBytes = await pdfDoc.save()
const data = fs.writeFileSync('./done.pdf', new Buffer.from(pdfBytes))
})().catch(e => {
console.log(e)
});
I read documentation for over a dozen packages and implemented about 5, the only one I could get to work with the requirements being in Node JS and can load a remote PDF file from my MongoDB and fill it was pdfform.js.
const pdfform = require('pdfform.js');
const Mail = require('./tools/sendgrid/sendgrid');
const mongoose = require('mongoose');
mongoose.connect(process.env.MONGODB_URI, {useNewUrlParser: true, useUnifiedTopology: true})
const db = mongoose.connection
db.on('error', (error) => console.error(error))
db.once('open', () => console.log('Connected to Database'))
db.collection('warrantycerts').findOne({_id : 4}, function(err, doc){
if (err) {
console.error(err);
}
const pdf_buf = doc.bin.buffer
var pdfFieldsJson = pdfform().list_fields(pdf_buf)
console.log(pdfFieldsJson)
var fields = {
'prodCode': ['prod code here'],
'certNumber': ['cert here'],
'model': ['model here'],
'serial': ['serial here'],
'date': ['11-10-21'],
};
var out_buf = pdfform().transform(pdf_buf, fields);
Mail.emailAttach(me, 'testing', 'testing', 'cert_test.pdf', new Buffer.from(out_buf).toString('base64'))
})
FYI issue regarding pdf-lib was submitted but not much traction of yet.
i'm using puppeteer to retrieve datas online, and facing an issue.
Two functions have the same name and return serialized object, the first one returns an empty object, but the second one does contains the datas i'm targeting.
My question is, how can I proceed to select the second occurence of the function instead of the first one, which return an empty object.
Thanks.
My code :
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
const Variants = require('./variants.js');
const Feedback = require('./feedback.js');
async function Scraper(productId, feedbackLimit) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
/** Scrape page for details */
await page.goto(`${productId}`);
const data = (await page.evaluate()).match(/window.runParams = {"result/)
const data = data.items
await page.close();
await browser.close();
console.log(data);
return data;
}
module.exports = Scraper;
Website source code :
window.runParams = {};
window.runParams = {"resultCount":19449,"seoFeaturedSnippet":};
Please try this, it should work.
const data = await page.content();
const regexp = /window.runParams/g;
const matches = string.matchAll(regexp);
for (const match of matches) {
console.log(match);
console.log(match.index)
}
Is it possible to split a pdf file into the number of pages it has and save these files in a folder using node js?
Using pdf-lib this should be fairly simple. This code should help you get started, it still needs some error-handling of course:
const fs = require('fs');
const PDFDocument = require('pdf-lib').PDFDocument;
async function splitPdf(pathToPdf) {
const docmentAsBytes = await fs.promises.readFile(pathToPdf);
// Load your PDFDocument
const pdfDoc = await PDFDocument.load(docmentAsBytes)
const numberOfPages = pdfDoc.getPages().length;
for (let i = 0; i < numberOfPages; i++) {
// Create a new "sub" document
const subDocument = await PDFDocument.create();
// copy the page at current index
const [copiedPage] = await subDocument.copyPages(pdfDoc, [i])
subDocument.addPage(copiedPage);
const pdfBytes = await subDocument.save()
await writePdfBytesToFile(`file-${i + 1}.pdf`, pdfBytes);
}
}
function writePdfBytesToFile(fileName, pdfBytes) {
return fs.promises.writeFile(fileName, pdfBytes);
}
(async () => {
await splitPdf("./path-to-your-file.pdf");
})();
I'm following a web scraping course that uses Cheerio. I practice on a different website then they use in the course and now I run into the problem that all my scraped text end up in one big object. But every title should end up in it's own object. Can someone see what I did wrong? I already bumbed my head 2 hours on this problem.
const request = require('request-promise');
const cheerio = require('cheerio');
const url = "https://huurgoed.nl/gehele-aanbod";
const scrapeResults = [];
async function scrapeHuurgoed() {
try {
const htmlResult = await request.get(url);
const $ = await cheerio.load(htmlResult);
$("div.aanbod").each((index, element) => {
const result = $(element).children(".item");
const title = result.find("h2").text().trim();
const characteristics = result.find("h4").text();
const scrapeResult = {title, characteristics};
scrapeResults.push(scrapeResult);
});
console.log(scrapeResults);
} catch(err) {
console.error(err);
}
}
scrapeHuurgoed();
This is the link to the repo: https://github.com/danielkroon/huurgoed-scraper/blob/master/index.js
Thanks!
That is because of the way you used selectors. I've modified your script to fetch the content as you expected. Currently the script is collecting titles and characteristics. Feel free to add the rest within your script.
This is how you can get the required output:
const request = require('request-promise');
const cheerio = require('cheerio');
const url = "https://huurgoed.nl/gehele-aanbod";
const scrapeResults = [];
async function scrapeHuurgoed() {
try {
const htmlResult = await request.get(url);
const $ = await cheerio.load(htmlResult);
$("div.item").each((index, element) => {
const title = $(element).find(".kenmerken > h2").text().trim();
const characteristics = $(element).find("h4").text().trim();
scrapeResults.push({title,characteristics});
});
console.log(scrapeResults);
} catch(err) {
console.error(err);
}
}
scrapeHuurgoed();
I've written a script in node using puppeteer to fetch different names and the links to their profiles from a webpage. The script is fetching them in the right way.
What I wish to do now is write the data in a csv file but can't find any idea how to do so. I have come across many tuts which describe about writing the same but most of them are either incomplete or using such libraries which are no longer being maintained.
This is what I've written so far:
const puppeteer = require('puppeteer');
const link = "https://www.ak-brandenburg.de/bauherren/architekten_architektinnen";
(async ()=> {
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto(link)
const listItem = await page.evaluate(() =>
[...document.querySelectorAll('.views-table tr')].map(item => ({
name: item.querySelector('.views-field-title a').innerText.trim(),
profilelink: "https://www.ak-brandenburg.de" + item.querySelector('.views-field-title a').getAttribute("href"),
}))
);
console.log(listItem);
await browser.close();
})();
How can I write the data in a csv file?
There is a far easier way to achieve the same. If you check out this library, you can write the data in a csv file very easily.
Working script:
const fs = require('fs');
const Json2csv = require('json2csv').Parser;
const puppeteer = require('puppeteer');
const link = "https://www.ak-brandenburg.de/bauherren/architekten_architektinnen";
(async ()=> {
const browser = await puppeteer.launch()
const [page] = await browser.pages()
await page.goto(link)
const listItem = await page.evaluate(() =>
[...document.querySelectorAll('.views-table tbody tr')].map(item => ({
name: item.querySelector('.views-field-title a').innerText.trim(),
profilelink: "https://www.ak-brandenburg.de" + item.querySelector('.views-field-title a').getAttribute("href"),
}))
);
const j2csv = new Json2csv(['name','profilelink']);
const csv = j2csv.parse(listItem);
fs.writeFileSync('./output.csv',csv,'utf-8')
await browser.close();
})();
I haven't worked with puppeteer but I have created csv file in my node project
Store your data in a array eg: csvData
Then use fs.writeFile to save your csv data.
`fs.writeFile(`path/to/csv/${csvName}.csv`, csvData, 'utf8', function(err) {
if (err) {
console.log('error', err)
}
res.send({
url: `path/to/csv/${csvName}.csv`
})
})`
only use res.send if you want to send csv file from server to client