Showdown doesn't parse inside html block - node.js

I am using Showdown.
When I run this code:
const showdown = require("showdown")
converter = new showdown.Converter()
const myMarkdownText = '## Some important text'
const myHtmlText = converter.makeHtml(myMarkdownText)
I get
<h2 id="someimportanttext">Some important text</h2>
which is the expected result.
But when I run this code:
const showdown = require("showdown")
converter = new showdown.Converter()
const myMarkdownText = '<div markdown = "1"> ## Some important text </div>'
const myHtmlText = converter.makeHtml(myMarkdownText)
I get
<div markdown = "1"><p>## Some important text </p></div>
Which means that Showdown didn't parse the stuff inside the html div.
Any help on how to make it work?

After reading the Showdown documentation (https://github.com/showdownjs/showdown#valid-options) my conclusion is that you should probably enable the backslashEscapesHTMLTags option and backslash the html tags.

A bit late but for future reference:
To enable parsing of markdown inside HTML tags you have to put markdown="1" as a property on the HTML tag like:
<div markdown="1"># I will be parsed</div>
There is more information in the documentation

Related

Node JS: given a html string, how can I get the content inside of all <script> tags, manipulate and replace it?

Overview:
I am working on a project that has dozens of .Liquid (Shopify) snippets with <script> tags inside of them containing JS code.
They're similar to HTML, they look something like this:
{% assign variable = 'test' %}
<p>hey {{variable}}</p>
<script>console.log("hey")</script>
{% schema %}
{
...json stuff
}
{% endschema %}
Issue:
Basically what I wanna do is get the content inside <script>, manipulate it and replace with the new manipulated one.
I managed to do this using cheerio, but it ends up messing up the Liquid variables since it doesn't recognize them.
My previous code was looking something like this:
let html = cheerio.load(code, { _useHtmlParser2: true });
const { data: js } = html("script").get()[0].children[0];
html("script").get()[0].children[0].data = await minifyJS(js);
const result = html.html();
Expected Behavior:
I need to:
Find all script tags in a HTML string;
Get the code inside of the <script> tag;
Manipulate this code (minify, essentially);
Replace it with the now minified code.
I am trying to avoid using regex, but I can't foresee any other solutions.
Any suggestion is greatly appreciated.
Thank you!
To get the content inside tags you can use Regular Expressions
<script(.|\n)*?<\/script>
This is just the regex
let str = <Whatever string or data you want to extract script tags>;
let result = let result = str.match(/<script(.|\n)*?<\/script>
/g);
console.log(result);
in result you will get the content inside the script tag

Cheerio - indent newly inserted element under sibling?

I am using cheerio to mutate a xml file in node. I am inserting the node/element <target> after <source> which works with the insertAfter() api in oldEl.translation.insertAfter(msgSourceEl);.
However, I loose my indention:
<trans-unit id="title" datatype="html">
<source>Login</source><target>iniciar sesión</target>
Is it possible, or is there a way, to indent the newly inserted <target>iniciar sesión</target> underneath the <source> element?
Just fix the final XML indentation :
It is possible to use xml-beautifier to achieve the human-readable indented XML
import beautify from 'xml-beautifier';
const HumanXML = beautify(XML);
console.log(HumanXML); // => will output correctly indented elements
(EXTRA) No need for Cheerio :
In the following example we will be using xml2js to manipulate the XML as a JSON and then build it back to the original XML format
var xml2js = require('xml2js');
var xml = "<trans-unit id=\"title\" datatype=\"html\"><source>Login</source></trans-unit>"
xml2js.parseString(xml, function (err, result) {
result["trans-unit"].target=["iniciar sessión"]
var builder = new xml2js.Builder();
var xml = builder.buildObject(result);
console.log(xml)
});
Final Output :
<trans-unit id="title" datatype="html">
<source>Login</source>
<target>iniciar sessión</target>
</trans-unit>
I am sure you are doing this as part of a loop, so it shouldn't be hard to extrapolate the example to make it work . I suggest using underscore for the usual (each, map, reduce, filter...)
First of all, since source is an empty element, cheerio does not keep the <source>Login</source>. It converts it to <source>Login
So for demonstrating element indentation I will use a <source2> element.
As shown below, providing newlines and tabs as part of the given html solves the issue.
let $ = require('cheerio').load(`
<trans-unit id="title" datatype="html">
<source2>Login</source2>
</trans-unit>`)
$(`
<target>iniciar sesión</target>`).insertAfter($('source2'));
console.log($.html('trans-unit'))
output
<trans-unit id="title" datatype="html">
<source2>Login</source2>
<target>iniciar sesión</target>
</trans-unit>

Cheerio how to ignore elements of a certain tag

I am scraping the body of the webpage:
axios.get(url)
.then(function(response){
var $ = cheerio.load(response.data);
var body = $('body').text();
});
The problem is, I want to exclude contents from the <footer> tag. How do I do that?
cheerio creates a pseudo-DOM when it parses the HTML. You can manipulate that DOM similar to how you would manipulate the DOM in a browser. In your specific case, you could remove items from the DOM using any number of methods such as
.remove()
.replaceWith()
.empty()
.html()
So, the basic idea is that you would use a selector to find the footer element and then remove it as in:
$('footer').remove();
Then, fetch the text after you've removed those elements:
var body = $('body').text();

Gathering document fragments at rendring time using `pug`

I use pug to generate HTML email messages from a template:
doctype html
html
head
title Hello #{name}
body
...
The title is the subject of the email.
Currently, I extract the title text content by parsing the HTML document rendered by pug. But it doesn't seem to be a very efficient way of doing.
Is there some feature or hook available in pug to collect part of the document while rendering it? I considered pug filters, but as far as I understand, those are not suitable since they are triggered at compile time. Not while rendering the document.
I came to a solution using a mixin:
mixin collect(name)
-
// This is just an ugly hack to
// capture the inner block rendered
// text
const savedHtml = pug_html;
pug_html = "";
if (block) block();
const innerHtml = pug_html;
self[name]=innerHtml;
pug_html = savedHtml+innerHtml;
html
head
title
+collect('title')
| Hello #{self.name}
var pug = require("pug");
const compiledFunction = pug.compileFile('template.pug', {debug:true,self:true});
console.log(compiledFunction(out={
name: 'Timothy',
}));
console.log(JSON.stringify(out));
Displaying:
<html><head><title>Hello Timothy</title></head></html>
{"name":"Timothy","title":"Hello Timothy"}
The code of the collect() mixin is not particularly pretty because as far as I know it there is no elegant way to capture the block() output. So I had to tackle into the internal undocumented pug_html variable.
Or is there a cleaner way to achieve that?

Cheerio itemprop attribute content selection

I am using Cheerio in nodejs to select text from a URL where an element contains the attribute itemprop="name".
At the moment I need to know the parent element in order to read the attribute and associated text. See below as an example.
However, what I would like to do is insert a wildcard for the Element. eg. H2, so I can select any attribute with name="itemprop". Is this possible?
var $ = cheerio.load(body);
var domElem = $("h2[itemprop = 'name']").get(0);
var content = $(domElem).text().trim();
ogTitle = content;
console.log(content);
It looks like you can do the following as a wilcard:
var $ = cheerio.load(body);
var domElem = $("*[itemprop = 'name']").get(0);
var content = $(domElem).text().trim();
ogTitle = content;
console.log(content);
The following also worked for me:
Html Code:
<a href="/someLine" itemscope="" itemprop="author" itemtype="http://schema.org/Person">
<span itemprop="name">Jane Author</span>
</a>
Used this to get Jane Author:
author = $("*[itemprop = 'author']").text();
// Jane Author

Resources