XML scraping using nodeJs - node.js

I have a very huge xml file that I got by exporting all the data from tally, I am trying to use web scraping to get elements out of my code using cheerio, but I am having trouble with the formatting or something similar. Reading it with fs.readFileSync() works fine and the console.log shows complete xml file but when I write the file using the fs.writeFileSync it makes it look like this:
And my web scraping code outputs empty file:
const cheerio = require('cheerio');
const fs = require ('fs');
var xml = fs.readFileSync('Master.xml','utf8');
const htmlC = cheerio.load(xml);
var list = [];
list = htmlC('ENVELOPE').find('BODY>TALLYMESSAGE>STOCKITEM>LANGUAGENAME.LIST>NAME.LIST>NAME').each(function (index, element) {
list.push(htmlC(element).attr('data-prefix'));
})
console.log(list)
fs.writeFileSync("data.html",list,()=>{})

You might try checking to make sure that Cheerio isn't decoding all the HTML entities. Change:
const htmlC = cheerio.load(xml);
to:
const htmlC = cheerio.load(xml, { decodeEntities: false });

Related

Node.js reads the file but does not write JSON in the HTML

I'm currently running Node.js by Browserify for my website.
It reads the JSON file and I get the message through MQTT.
But the problem is that it seems like writefile does not work.
(Running this as node test.js in the terminal works by the way).
What is wrong with my code?
Moreover, Is this the best way to store any user data?
Thank you so much in advance.
Here's some part of my code
var fs = require("fs");
var path = require("path");
let newFile = fs.readFileSync('/home/capstone/www/html/javascript/test.json');
function testT() { //THIS WORKS FINE
let student0 = JSON.parse(newFile);
var myJSON = JSON.stringify(student0);
client.publish("send2IR", myJSON);
response.end();
};
function write2JSON() { //PROBLEM OF THIS CODE
const content = 'Some content!'
fs.writeFileSync('/home/capstone/www/html/javascript/test.json', content)
};
document.getElementById("blink").addEventListener("click", publish);
document.getElementById("write").addEventListener("click", write2JSON);
You cann't write directly for security reasons. For other hand you can use a server as API to do the filye system tasks and in the client only trigger the events.
This post is very related with your problem:
Is it possible to write data to file using only JavaScript?

scraping data from a website node js

i am new to scraping data from a website, i would like to scrape the level number from: https://fortnitetracker.com/profile/pc/Twitch.BadGuyBen, i have tried using cheerio and request for this task and im not sure if im using the right selector maybe some tips on what i should do. this is my code:
var request = require('request');
var cheerio = require('cheerio');
var options = {
url: `https://fortnitetracker.com/profile/pc/Twitch.BadGuyBen`,
method: 'GET'
}
request(options, function (error, response, body) {
var $ = cheerio.load(body);
var level = "";
var xp = "";
$('.top-stats').filter(function(){
var data = $(this);
level = data.children().first().find('.value').text();
console.log(level);
})
});
again i am not sure if i have even selected the right class much appreciated.
EDIT:
also '.top-stats' is present further on
website open in chrome dev tools
other .top-stats class
You can't use request to get the body since the stats are displayed using javascript. You will have to use something like puppeteer to request the page and execute the javascript and then scrape the stats.

How to pass PDFKit readable stream into request's post method?

My app needs to create a PDF file and then upload it to another server. The upload happens down the line via the post method from the request NPM package. Everything works fine if I pass in an fs.createReadStream:
const fs = require('fs');
const params = {file: fs.createReadStream('test.pdf')};
api.uploadFile(params);
Since PDFKit instantiates a read stream as well, I'm trying to pass that directly into the post params like this:
const PDFDocument = require('pdfkit');
const doc = new PDFDocument();
doc.text('steam test');
doc.end();
const params = {file: doc};
api.uploadFile(params);
However, this produces an error:
TypeError: Path must be a string. Received [Function]
If I look at PDFKit source code I see (in coffeescript):
class PDFDocument extends stream.Readable
I'm new to streams and it's clear I'm not understanding the difference here. To me if they are both readable streams, they should both be able to be passed in the same way.

Passing objects between nodejs and jade

I have the following code in my server.js
var cddata = [];
body.rows.forEach(function(doc) {
cddata.push([{id: doc.id, name: doc.key, text:doc.value.Time, group: 1}]);
});
response.render('timeline', {cddata: JSON.stringify(cddata)});
and I have the following in my Jade view file
script(src='vis/dist/vis.js')
link(rel="stylesheet", href="vis/dist/vis.css", type="text/css")
script.
//alert(cddata);
var options = {};
var data = new vis.DataSet(cddata);
var container = document.getElementById('visualization');
new vis.Timeline(container, data, options);
However, nothing related to the chart is rendered. I presume the object is not correctly passed to the jade file. Please help!
Also, is there a way to verify the incoming object in Jade? Alerts dont seem to work.
thanks
The <script> in your jade is a browser side script so won't be able to access variables in the templates generation scope. You'll need to output your data as JSON and read it in using browser side JavaScript, something like this:
script(src='vis/dist/vis.js')
link(rel="stylesheet", href="vis/dist/vis.css", type="text/css")
script.
var chartData = JSON.parse('#{cddata}')
var options = {};
var data = new vis.DataSet(chartData);
var container = document.getElementById('visualization');
new vis.Timeline(container, data, options);
After much deliberation, the following worked to pass object from node server to client side server scripting on Jade file.
on the server.js, where dbdata is an array of JSON objects
response.render('timeline', {dbdata:dbdata});
On the jade file,
script.
var chartData = !{JSON.stringify(dbdata)};
Thanks,

Retrieving HTML from CouchBase into Node.js / Express 4 leaves it unrendered

I'm having a small issue with rendering HTML, stored in CouchBase, fetched by Node.js
In CouchBase I have several small HTML-snippets. They contain text, tags such as <br /> and html entities such as <. They are of course stored as an escaped string in JSON.
So far, so good. However when I pull it out and display on the page, it is rendered "as-is", without being interpreted as HTML.
For example:
[ some content ...]
<p>Lorem is > ipsum<br />And another line</p>
[rest of content ...]
From the controller in Express 4:
var express = require('express');
var router = express.Router();
var couchbase = require('couchbase');
var cluster = new couchbase.Cluster('couchbase://myserver');
var bucket = cluster.openBucket('someBucket', 'somePassword');
var Entities = require('html-entities').XmlEntities;
entities = new Entities();
var utf8 = require('utf8');
/* GET home page. */
router.get('/', function(req, res) {
bucket.get('my:thingie:44', function(err, result) {
if(err) throw err
console.log(result);
var html = utf8.decode(entities.decode(result.value.thingie.html));
// var html = utf8.encode(result.value.thingie.html);
// var html = utf8.decode(result.value.thingie.html);
res.render('index', { title: 'PageTitle', content: html });
});
});
It is then passed to the template (using hogan.js) for rendering.
When looking into this I found that it might have something to do with the encoding of the <'s and <'s that prevent it from being parsed. You can see my converting attempts in the code, where none of the options gave the desired result, i.e. rendering the contents as HTML.
When using utf8.decode(), no difference.
Using utf8.encode(), no difference.
Using entities.decode() it convert < into < as predicted, but it's not rendered even if <div;&gt becomes <div>.
Any ideas?
I found the solution over here: Partials with Node.js + Express + Hogan.js
When putting HTML in a Hogan template, you have to use {{{var}}} instead of {{var}}.
And thus it renders beautifully, as intended :)
Wasn't encoding issues at all ;)

Resources