How to send scraped data to index html file to load - node.js

I am new to node.js and Puppeteer and may not be asking this correctly. I was able to create a web scraper to grab data. What I am wanting to do is send this information I received and push it to a HTML file to load locally on my machine to show the results instead of having it inside my console.
Is there anyway to do this?
Do I need to use another framework?
Any useful tips or resources would be great!

To create a HTML file, you can use your variables and fs.writeFileSync to generate a HTML file with your content.
Example:
const fs = require('fs');
// ... crawling is done, variable are filled
const result1 = 'Some data';
const result2 = 'More crawled data';
const resultHtml = `<!DOCTYPE html>
<html lang="de">
<head><title>Crawling results</title></head>
<body>
<p>Result 1: ${result1}</p>
<p>Result 2: ${result2}</p>
<p>...</p>
</body>
</html>`;
fs.writeFileSync('results.html', resultHtml);
You can then open the results.html file locally in your browser to view the results.

You can use a templating language like handlebars to render the data to an HTML template and write the result to an HTML file.

Related

How can get a normal array without quote after being placed in html dataset

I am using express js on my server and ejs as a template engine.
After getting array from my database, I send it to ejs, I then store the data in a dataset in the ejs, while trying to get the dataset using vanilla javascript, I realized that the dataset has stringify my array... How can I get it back as an array?
//ejs
<div class="news_grid" data-kkk="<%=newsData%>">
let a = document.querySelector(".news_grid").dataset.kkk
Make sure the template variable is a properly formatted JSON string, eg, when rendering:
newsData: JSON.stringify(newsData)
Then, in your frontend JS, parse the dataset into an object first:
const newsData = JSON.parse(document.querySelector(".news_grid").dataset.kkk);
// do stuff with newsData.a

Why am I not able to load a module from within embedded JavaScript?

My objective is to store some data in a module and then be able to retrieve
that data for display using embedded javascript contained in an HTML document. My code for testing this is shown in the following 3 files:
File 1: The HTML (/var/www/html/modTest.html)
<html>
<head>
<title>Module Test</title>
<% var myModule = require("/var/www/cgi-bin/node_modules/modTest") %>
</head>
<body>
<p>Color: <%= myModule.color %>
</body>
</html>
File 2: The Module (/var/www/cgi-bin/node_modules/modTest.js)
var color = "Blue";
module.exports.color = color;
File 3: The CGI Script (/var/www/cgi-bin)
#!/bin/node
var fs = require("fs");
var ejs = require("ejs")
console.log("Content-type: text/html\n");console.log(ejs.render(fs.readFileSync('/var/www/html/modTest.html','utf8')));
When I load the URL of the cgi script into my browser I get a blank page. There is an error message in the httpd error log compaining that "require" is not defined. Can anyone please tell me why this is and (more importantly) how to fix it? Thanks for any input.
... doug
you need require.js or webpack/browserify to link/pack your javascript together for client side and to load your modules with a require - otherwise you would need to include every javascript file in as a
<script src="...">
tag seperately.

Accessing Express.js local variables in client side JavaScript (nunjucks)

This is bascially the same as this question, except that I am using nunjucks as a template engine.
I am passing a variable to a nunjucks template using express's render method:
res.render('template.html', {myObject:myObject})
I want to access it in my client-side javascript. So far, the only way I have figured out is to put it in an invisible HTML element and pull it into the javascript from there:
<span id='local-variable' style="display:none">{{ myObject.name }}</span>
<script>
var myObjectName = $('#local-variable').text();
</script>
Is there a better method?
Use pipe of dump and safe filters:
<script>
var myObjectName = {{ myObject.name | dump | safe }};
</script>

Metalsmith static site pages are missing metadata

I've been attempting a tutorial to set up Metalsmith, and I've gotten to the end of part 1.
I've install node.js and modules. The IDE is Visual Studio 2013 with Node.js tools installed. I've put a basic structure in and I'm trying to get a single page to render with a template.
The instructions tell to put into a file the following:
---
title: Home
template: home.hbt
---
This is your first page
With a template like:
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>{{ title }} | Metalsmith Page</title>
</head>
<body>
<div class="main-wrapper">
{{{ contents }}}
</div>
</body>
</html>
The tutorial says that it should render out into a html page but the result I am getting is something like:
--- title: Home template: home.hbt --- This is your first page
When I use the markdown renderer it gives
<p>---
title: Home</p>
<h2 id="template-home-hbt">template: home.hbt</h2>
<p>This is your first page</p>
Debugging the code shows that when it gets to the renderer the YAML front-matter metadata is missing. This seems to be important as the plugins use the metadata to render the page.
The key to the solution lay in the three strange charatcers at the begining of the markdown rendered page

and the YAML front-matter warning:
UTF-8 Character Encoding Warning
If you use UTF-8 encoding, make sure that no BOM header characters exist in your files or very, very bad things will happen to Jekyll. This is especially relevant if you’re running Jekyll on Windows.
Looking at the buffer loaded in Node.js showed the utf8 BOM characters.
One solution would be to make the IDE stop saving it as utf8 with BOM but for me that wasn't a viable option.
I created a workaround as a few small lines that has to be run before any other metalsmith plugin.
var stripBom = require('strip-bom');
var front = require('front-matter');
var extend = require('extend');
// **snip**
.use(function __utf8BOM_workaround(files, metalsmith, done)
{
setImmediate(done);
Object.keys(files).forEach(function (file)
{
var data = files[file];
var parsed = front(stripBom(data.contents.toString()));
data = extend({}, data, parsed.attributes);
data.contents = new Buffer(parsed.body);
files[file] = data;
});
})

Is it possible to use PhantomJS and Node to dynamically generate PDFs from templates?

Background / Need
I am working with a group on a web application using Node.JS and Express. We need to be able to generate reports which can be printed as hard copies and also hard copy forms. Preferably we'd like to dynamically generate PDFs on the server for both reports and hand written forms. We're currently using EJS templates on the server.
Options
I was thinking that it would be convenient to be able to use templates to be able to build forms/reports and generate a PDF from the resulting HTML, however my options to do this appear limited as far as I can find. I have looked at two different possible solutions:
PhantomJS -- (npm node-phantom module)
PDFKit
EDIT: I found another Node.JS module which is able to generate PDFs from HTML called node-wkhtml which relies on wkhtmltopdf. I am now comparing using node-phantom and node-wkhtml. I have been able to generate PDFs on a Node server with both of these and they both appear to be capable of doing what I need.
I have seen some examples of using PhantomJS to render PDF documents from websites, but all of the examples I have seen use a URL and do not feed it a string of HTML. I am not sure if I could make this work with templates in order to dynamically generate PDF reports.
When a request for a report comes in I was hoping to generate HTML from an EJS template, and use that to generate a PDF. Is there anyway for me to use Phantom to dynamically create a page completely on the server without making a request?
My other option is to use PDFkit which allows dynamic generation of PDFs, but it is a canvas-like API and doesn't really support any notion of templates as far as I can tell.
The Question
Does anyone know if I can use PhantomJS with Node to dynamically generate PDFs from HTML generated from a template? Or does anyone know any other solutions I can use to generate and serve printable reports/forms from my Node/Express back-end.
EJS seems to run fine in PhantomJS (after installing the path module). To load a page in PhantomJS given a string of HTML, do page.content = '<html><head>...';.
npm install ejs and npm install path, then:
var ejs = require('ejs'),
page = require('webpage').create();
var html = ejs.render('<h1><%= title %></h1>', {
title: 'wow'
});
page.content = html;
page.render('test.pdf');
phantom.exit();
(Run this script with phantomjs, not node.)
I am going to post an answer for anyone trying to do something similar with node-phantom. Because node-phantom controls the local installation of PhantomJS, it must use asynchronous methods for everything even when the corresponding PhantomJS operation is synchronous. When setting the content for a page to be rendered in PhantomJS you can simply do:
page.content = '<h1>Some Markup</h1>';
page.render('page.pdf');
However, using the node-phantom module within node you must use the page.set method. This is the code I used below.
'use strict';
var phantom = require('node-phantom');
var html = '<!DOCTYPE html><html><head><title>My Webpage</title></head>' +
'<body><h1>My Webpage</h1><p>This is my webpage. I hope you like it' +
'!</body></html>';
phantom.create(function (error, ph) {
ph.createPage(function (error, page) {
page.set('content', html, function (error) {
if (error) {
console.log('Error setting content: %s', error);
} else {
page.render('page.pdf', function (error) {
if (error) console.log('Error rendering PDF: %s', error);
});
}
ph.exit();
});
});
});
A really easy solution to this problem is the node-webshot module - you can put raw html directly in as an argument and it prints the pdf directly.

Resources