Cheerio itemprop attribute content selection - node.js

I am using Cheerio in nodejs to select text from a URL where an element contains the attribute itemprop="name".
At the moment I need to know the parent element in order to read the attribute and associated text. See below as an example.
However, what I would like to do is insert a wildcard for the Element. eg. H2, so I can select any attribute with name="itemprop". Is this possible?
var $ = cheerio.load(body);
var domElem = $("h2[itemprop = 'name']").get(0);
var content = $(domElem).text().trim();
ogTitle = content;
console.log(content);

It looks like you can do the following as a wilcard:
var $ = cheerio.load(body);
var domElem = $("*[itemprop = 'name']").get(0);
var content = $(domElem).text().trim();
ogTitle = content;
console.log(content);

The following also worked for me:
Html Code:
<a href="/someLine" itemscope="" itemprop="author" itemtype="http://schema.org/Person">
<span itemprop="name">Jane Author</span>
</a>
Used this to get Jane Author:
author = $("*[itemprop = 'author']").text();
// Jane Author

Related

Showdown doesn't parse inside html block

I am using Showdown.
When I run this code:
const showdown = require("showdown")
converter = new showdown.Converter()
const myMarkdownText = '## Some important text'
const myHtmlText = converter.makeHtml(myMarkdownText)
I get
<h2 id="someimportanttext">Some important text</h2>
which is the expected result.
But when I run this code:
const showdown = require("showdown")
converter = new showdown.Converter()
const myMarkdownText = '<div markdown = "1"> ## Some important text </div>'
const myHtmlText = converter.makeHtml(myMarkdownText)
I get
<div markdown = "1"><p>## Some important text </p></div>
Which means that Showdown didn't parse the stuff inside the html div.
Any help on how to make it work?
After reading the Showdown documentation (https://github.com/showdownjs/showdown#valid-options) my conclusion is that you should probably enable the backslashEscapesHTMLTags option and backslash the html tags.
A bit late but for future reference:
To enable parsing of markdown inside HTML tags you have to put markdown="1" as a property on the HTML tag like:
<div markdown="1"># I will be parsed</div>
There is more information in the documentation

Nodejs xpath selector on xhtml not working

I have this simple, but not very well formatted html page with all it's mistakes:
<HTML>
<head>
<title>Official game sheet</title>
</head>
<body class="sheet">
</BODY>
</HTML>
Tried to apply an xpath //title on the document parsed from this html.
const document = parse5.parse(xmlString);
const xhtml = xmlser.serializeToString(document);
const doc = new dom().parseFromString(xhtml);
const select = xpath.useNamespaces({
"x": "http://www.w3.org/1999/xhtml"
});
const nodes = select("//title", doc);
console.log(nodes);
Tried the solution from here without success. The returned nodes list is empty.
Here you can see the problem.
Here you go #neptune, you don't need parse5 nor xmlser all what is needed is xpath and xmldom.
var xpath = require('xpath');
var dom = require('xmldom').DOMParser;
var xmlString = `
<HTML>
<head>
<title>Official game sheet</title>
<custom>Here we are</custom>
<body class="sheet">
</BODY>
</HTML>`;
//const document = parse5.parse(xmlString);
//const xhtml = xmlser.serializeToString(document);
const doc = new dom().parseFromString(xmlString);
const nodes = xpath.select("//custom", doc);
//console.log(document);
console.log(nodes[0].localName + ": " + nodes[0].firstChild.data);
console.log("Node: " + nodes[0].toString());
please correct the lines to get title
const nodes = select("//x:title//text()", doc);
console.log(nodes[0].data)

NodeJS Extracting input value from url

I have html stored in a var
var html = "<div class="RiP" style="text-align: left;"><div class="clr"></div><input name="extraMP" value="999" type="hidden"><div class="txta dropError">Slide to activate</div><div class="bgSlider"><div class="Slider ui-draggable"></div></div><div class="clr"></div><input name="randomValue" value="randomValue2" type="hidden"></div>"
I want to extract "randomValue" and "randomValue2".
Maybe I should use cheerio? I tried with it but I had hard time managing to do it.
If cheerio is hard for you - you could use regular expression to get the values.
For easily access you could provide class attribute for the <input> like:
<input class="className" name="randomValue" value="randomValue2" type="hidden">
your regexp will be:
const match = html.match(/<input\s*class="className"\s*name="(.+?)"\s*value="(.+?)"/m)
match[1] // randomValue
match[2] // randomValue2
With cheerio it will be:
const cheerio = require('cheerio');
const html = `<div class="RiP" style="text-align: left;"><div class="clr"></div><input name="extraMP" value="999" type="hidden"><div class="txta dropError">Slide to activate</div><div class="bgSlider"><div class="Slider ui-draggable"></div></div><div class="clr"></div><input class="myClass" name="randomValue" value="randomValue2" type="hidden"></div>`
const $ = cheerio.load(html);
$('.myClass').val(); // randomValue2
$('.myClass').attr('name'); // randomValue
What you might do using cheerio is to find the last input inside the RiP class and get the name and value attribute:
var html = `<div class="RiP" style="text-align: left;"><div class="clr"></div><input name="extraMP" value="999" type="hidden"><div class="txta dropError">Slide to activate</div><div class="bgSlider"><div class="Slider ui-draggable"></div></div><div class="clr"></div><input name="randomValue" value="randomValue2" type="hidden"></div>`;
const cheerio = require('cheerio');
const $ = cheerio.load(html);
let input = $('.RiP input').last();
console.log(input.attr('name'));
console.log(input.val());
Result:
randomValue
randomValue2
Note that it is not advisable to parse html with regex

Best way to get URL from Nodejs results while only knowing link text

After getting html results from a nodejs GET, what's the best way to retrieve the URL of a link when I only know the link text. Can I use cheerio? regex? jQuery?
for example, how would I retrieve the URL for "second website"
<a href='www.website1.com'>first website</a>
<a href='www.website2.com'>second website</a>
<a href='www.website3.com'>third website</a>
$('a:contains("second website")').attr('href')
I was hoping to use a selector rather than run through each hyperlink, but here's what worked for me:
var $ = cheerio.load(body);
links = $('a'); //get all hyperlinks
$(links).each(function(i, link){
var currentlink = $(link).text()
if(currentlink.includes('second')){
console.log($(link).text() + ':\n ' + $(link).attr('href'));
}
});

How to get the value of Liferay input editor in a javascript variable

In my application i am using liferay-ui:input-editor .I want to get the value of input editor to a javascript variable, How to achieve that?? I have tried
<liferay-ui:input-editor />
<input name="<portlet:namespace />htmlCodeFromEditorPlacedHere" type="hidden" value="" />
function createPopUp(){
var url ="<%=fetchCandidateByIdForPhoneURL%>";
var type= "fetchCandidateInfo";
var candidateId = $('#candidateID').val();
var jobId = $('#JobList').val();
var text1 = $('#text1').val();
var text2 = $('#text2').val();
var text3 = $('#text3').val();
var interviewVenue = $('#interviewVenue').val();
var interviewCC = $('#interviewCC').val();
var interviewBCC =$('#interviewBCC').val();
var startDate = $('#start-date').val();
var interviewType = $('#interviewType').val();
var x ;
function <portlet:namespace />initEditor() {
return '<font style="font-weight: bold">scott was here</font>';
}
function <portlet:namespace />extractCodeFromEditor() {
var x = document.<portlet:namespace />fm.<portlet:namespace />htmlCodeFromEditorPlacedHere.value = window.<portlet:namespace />editor.getHTML();
alert(x);
}
But it is showing that
ReferenceError: _InterviewSchedule_WAR_InterviewScheduleportlet_initEditor is not defined error. How to resolve it and get the value in a javascript variable
Given the information provided in question, it seems that the javascript initialization function is missing for <liferay-ui:input-editor />. As pointed out in the tutorial here, which OP seems to be using (juding by variable names):
By default, the editor will call "initEditor()" to try and pre-populate the body of the editor. In this example, when the editor loads, it will have the value of "scott was here" in bold.
(...)
function <portlet:namespace />initEditor() {
return '<font style="font-weight: bold">scott was here</font>';
}
By default, the ck editor that Liferay uses will try to call the initEditor() javascript method to try and pre-populate the contents of the editor.
Therefore, you should define such a method, even if you return a blank string.
An example is given below:
<aui:script>
function <portlet:namespace />initEditor() {
return "<%= content %>";
}
</aui:script>
, where content is the string variable with the content you want to pass in when the editor is loaded. If you do not want to pass initial content then simply pass a black string.

Resources