Error parsing xml when namespace directives are present - node.js

I need to read multiple XML-files using node.js. When the root node contains namespace directives, parsing the xml file fails. When removing the namespace directives, all works well. All my files can have different declarations. How do I parse the XML, ignoring the namespace attributes? I need to use xPath to get some values.
I'm using ...
var fs = require('fs');
var xpath = require('xpath');
var dom = require('xmldom').DOMParser;
var xml = fs.readFileSync('/test.xml', 'utf8').toString();
var doc = new dom().parseFromString(xml);
var id = xpath.select("/export/asset/id", doc);
console.log(id[0].firstChild.data);
XML-file
<export xmlns="some url" xmlns:xsi="some url" format="archive" version="2.4" xsi:schemaLocation="some url.xsd">
<asset>
<id>1445254514291</id>
<name>test</name>
<displayName />
<origin>demo</origin>
</asset>
<export>

Generally speaking, taking into account namespaces is preferable, but if you have to deal with too many of these, one way to avoid them altogether is to use xpath and the somewhat convoluted, but effective, local-name() function.
So I would change
var id = xpath.select("/export/asset/id", doc);
to
var id = xpath.select("//*[local-name()='export']//*[local-name()='asset']//*[local-name()='id']", doc);
With the sample xml in the question, this should output:
"1445254514291"

Related

xpath query root element with the namespace

I'm trying to address the root element with the namespace and providing a reference to the library xml-crypto.
I'm not giving the path correctly, please advise. Objective is to sign the document so the signature can be inserted right after the tag <samlp:Response
<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="efedb3b0-909f-4b39-b8c0-57427ee8dc83" Version="2.0" IssueInstant="2019-11-08T15:34:51.272Z">
<saml:Issuer xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion">http://www.example.com</saml:Issuer>
</samlp:Response>
nodeJS code
var SignedXml = require('xml-crypto').SignedXml, fs = require('fs');
var sig = new SignedXml();
sig.addReference("//*[local-name(.)='samlp:Response']");
sig.signingKey = fs.readFileSync(__dirname + "/client.pem");
sig.computeSignature(xml);
fs.writeFileSync(__dirname + "/signed.xml", sig.getSignedXml());
Attempts
sig.addReference("//samlp:Response");
Error: Cannot resolve QName samlp
it worked fine at https://www.freeformatter.com/xpath-tester.html though
If you wish to defeat/bypass namespaces, then change
sig.addReference("//*[local-name(.)='samlp:Response']");
to
sig.addReference("//*[local-name()='Response']");
because the namespace prefix, samlp, is not part of the local name, Response.
For a comprehensive answer on namespaces in XPath, see How does XPath deal with XML namespaces?

NodeJS xml2js - removes CDATA tag while converting from XML to JSON

In NodeJS by using xml2js module I am converting the XML string to JSON object and after some edit again converting that JSON object back into XML. All this is working well however the problem is that CDATA tags are missing in the converted XML. Can someone help me with this? I am giving the sample code below which has the same issue.
var xml2js = require('xml2js');
var parser = new xml2js.Parser();
parser.parseString("<myxml myattribute='value'><![CDATA[Hello again]]>
</myxml>", function (err, data) {
var builder = new xml2js.Builder({
cdata: true
});
var xml = builder.buildObject(data);
console.log(" ------------ "+xml);
});
Thanks
-kt
Please read https://github.com/Leonidas-from-XIV/node-xml2js/issues/218
Per the package author, per wikipedia:
A CDATA section is merely an alternative syntax for expressing
character data; there is no semantic difference between character data
that manifests as a CDATA section and character data that manifests as
in the usual syntax in which "<" and "&" would be represented by "<"
and "&", respectively.
The documentation states for the option cdata:
cdata (default: false): wrap text nodes in instead
of escaping when necessary. Does not add if it is
not required. Added in 0.4.5.

Including a pug filter which name is in a variable

In the Pug Language reference there is an easy example on how to use the markdown filter on a .md file and include it.
include:markdown-it myfile.md
However, I can't get to work reading from a file which name is in a variable. I expected this syntax to work:
include:markdown-it ${article}
Being var article = 'myfile.md'. But the code crash saying it can't find file '${article}' in my views folder.
What is the correct way of doing it? Or is it just imposible?
Regards.
EDIT: As suggested here, it's not possible to do what I want. The solution I found was rendering the markdown file first,
function loadmd(md) {
var fs = require('fs');
var markdown = require('markdown-it')();
var mdfile = fs.readFileSync(__dirname + '/mds/' + md + '.md');
return markdown.render(mdfile.toString());
}
and then pass it to the pug renderer as a variable:
res.render('art', {md: loadmd(req.params.article)}, function renderDone(err, html) { ... }
Template:
doctype html
html(lang=es)
head
include htmlhead
title !{pagetitle}
body
include pageheader
include pagemenu
section
article !{md}
footer
include pagefooter

How to make the cache to refresh when the XML is changed?

I am using MvcSiteMapProvider 4.6.3, MVC 4. Using DI to config the Sitemap.
this.For<System.Runtime.Caching.ObjectCache>()
.Use(s => System.Runtime.Caching.MemoryCache.Default);
this.For(typeof (ICacheProvider<>)).Use(typeof (RuntimeCacheProvider<>));
var rootCacheDependency = this.For<ICacheDependency>().Use<RuntimeFileCacheDependency>()
.Ctor<string>("fileName").Is(rootFileName);
var rootCacheDetails = this.For<ICacheDetails>().Use<CacheDetails>()
.Ctor<TimeSpan>("absoluteCacheExpiration").Is(absoluteCacheExpiration)
.Ctor<TimeSpan>("slidingCacheExpiration").Is(TimeSpan.MinValue)
.Ctor<ICacheDependency>().Is(rootCacheDependency);
var cacheDetails = new List<SmartInstance<CacheDetails>>();
var xmlSources = new List<SmartInstance<FileXmlSource>>();
How to make it automatically update the cache when the Sitemap xml is updated?
I am upgrading MvcSitemapProvider from v3 to v4.
In version 3, it seems the sitemap is automatically refreshed.
I did set the cache expiration time to be 5 min, is this causing problem?
TimeSpan absoluteCacheExpiration = TimeSpan.FromMinutes(5);
var rootCacheDetails = this.For<ICacheDetails>().Use<CacheDetails>()
.Ctor<TimeSpan>("absoluteCacheExpiration").Is(absoluteCacheExpiration)
.Ctor<TimeSpan>("slidingCacheExpiration").Is(TimeSpan.MinValue)
.Ctor<ICacheDependency>().Is(rootCacheDependency);
UPDATE
When I change the sitemap xml file the cache is not updated till 5 min the cache expire.
I am using multiple sitemap xml files.
var sitmapPath = HostingEnvironment.MapPath("~/Sitemaps");
var sitemaps = new List<string>();
if (sitmapPath != null)
{
sitemaps.AddRange(Directory.GetFiles(sitmapPath, "*.sitemap"));
}
foreach (var sitemapFileName in sitemaps)
{
var cacheDependencie =
this.For<ICacheDependency>()
.Use<RuntimeFileCacheDependency>()
.Ctor<string>("fileName")
.Is(sitemapFileName);
cacheDetails.Add(this.For<ICacheDetails>().Use<CacheDetails>()
.Ctor<TimeSpan>("absoluteCacheExpiration").Is(absoluteCacheExpiration)
.Ctor<TimeSpan>("slidingCacheExpiration").Is(TimeSpan.MinValue)
.Ctor<ICacheDependency>().Is(cacheDependencie));
xmlSources.Add(this.For<IXmlSource>().Use<FileXmlSource>()
.Ctor<string>("fileName").Is(sitemapFileName));
}
Will this be the reason it's not working?
I don't see a problem with the code you posted. However, it is the RuntimeFileCacheDependency that will make it reload when the XML is changed.
The RuntimeFileCacheDependency expects the fileName argument to be an absolute path. So you must convert it using HostingEnvironment.MapPath before providing it to the RuntimeFileCacheDependency constructor.
var rootFileName = HostingEnvironment.MapPath("~/root.sitemap");
Response to Your Update
The purpose of the cacheDetails object is to specify the caching policy for a single SiteMapBuilderSet instance. If you look further down in the (original) DI module, notice that the variable is passed to the constructor of this class.
// Configure the builder sets
this.For<ISiteMapBuilderSetStrategy>().Use<SiteMapBuilderSetStrategy>()
.EnumerableOf<ISiteMapBuilderSet>().Contains(x =>
{
x.Type<SiteMapBuilderSet>()
.Ctor<string>("instanceName").Is("default")
.Ctor<bool>("securityTrimmingEnabled").Is(securityTrimmingEnabled)
.Ctor<bool>("enableLocalization").Is(enableLocalization)
.Ctor<bool>("visibilityAffectsDescendants").Is(visibilityAffectsDescendants)
.Ctor<bool>("useTitleIfDescriptionNotProvided").Is(useTitleIfDescriptionNotProvided)
.Ctor<ISiteMapBuilder>().Is(builder)
.Ctor<ICacheDetails>().Is(cacheDetails); // <- caching specified here explicitly.
});
This is what is used to expire the cache, but it is a completely separate mechanism from the part that specifies to use multiple files to build a SiteMap:
// Register the sitemap node providers
var siteMapNodeProvider = this.For<ISiteMapNodeProvider>().Use<CompositeSiteMapNodeProvider>()
.EnumerableOf<ISiteMapNodeProvider>().Contains(x =>
{
x.Type<XmlSiteMapNodeProvider>()
.Ctor<bool>("includeRootNode").Is(true)
.Ctor<bool>("useNestedDynamicNodeRecursion").Is(false)
.Ctor<IXmlSource>().Is(rootXmlSource);
// NOTE: Each additional XmlSiteMapNodeProvider instance for the same SiteMap instance must
// specify includeRootNode as "false"
x.Type<XmlSiteMapNodeProvider>()
.Ctor<bool>("includeRootNode").Is(false)
.Ctor<bool>("useNestedDynamicNodeRecursion").Is(false)
.Ctor<IXmlSource>().Is(childXmlSource1);
x.Type<XmlSiteMapNodeProvider>()
.Ctor<bool>("includeRootNode").Is(false)
.Ctor<bool>("useNestedDynamicNodeRecursion").Is(false)
.Ctor<IXmlSource>().Is(childXmlSource2);
// Add additional XmlSiteMapNodeProviders here (with includeRootNode as "false")...
// You only need this if you intend to use MvcSiteMapNodeAttribute in your application
x.Type<ReflectionSiteMapNodeProvider>()
.Ctor<IEnumerable<string>>("includeAssemblies").Is(includeAssembliesForScan)
.Ctor<IEnumerable<string>>("excludeAssemblies").Is(new string[0]);
});
// Register the sitemap builders
var builder = this.For<ISiteMapBuilder>().Use<SiteMapBuilder>()
.Ctor<ISiteMapNodeProvider>().Is(siteMapNodeProvider);
This is how to specify multiple XML files for a single SiteMap, but it is also possible to make each XML file into its own SiteMap instance by passing each instance of XmlSiteMapNodeProvider to a separate SiteMapBuilder and a separate SiteMapBuilderSet as described in Multiple SiteMaps in One Application.
IMPORTANT: For multiple XML files to work on a single SiteMap instance, you must specify the same key for the root node of each SiteMap as shown at the bottom of this answer. But you cannot specify a node representing the same controller action in more than one XML file (other than the root node).
If you need more flexibility than this, I would suggest implementing your own XmlSiteMapNodeProvider or abandoning the idea of using XML altogether, since using ISiteMapNodeProvider or IDynamicNodeProvider is much more flexible.
Now, back to the caching. If you are indeed using multiple XML files in the same SiteMap instance, you need to use a RuntimeCompositeCacheDependency so each of the files will be considered a dependency for the same cache, but you must use a single instance of CacheDetails.
var rootCacheDependency =
this.For<ICacheDependency>().Use<RuntimeFileCacheDependency>()
.Ctor<string>("fileName").Is(rootAbsoluteFileName);
var childCacheDependency1 =
this.For<ICacheDependency>().Use<RuntimeFileCacheDependency>()
.Ctor<string>("fileName").Is(childAbsoluteFileName1);
var childCacheDependency2 =
this.For<ICacheDependency>().Use<RuntimeFileCacheDependency>()
.Ctor<string>("fileName").Is(childAbsoluteFileName2);
var cacheDependency =
this.For<ICacheDependency>().Use<RuntimeCompositeCacheDependency>()
.Ctor<ICacheDependency[]>().Is(new ICacheDependency[]
{
(ICacheDependency)rootCacheDependency,
(ICacheDependency)childCacheDependency1,
(ICacheDependency)childCacheDependency2
});
var cacheDetails =
this.For<ICacheDetails>().Use<CacheDetails>()
.Ctor<TimeSpan>("absoluteCacheExpiration").Is(absoluteCacheExpiration)
.Ctor<TimeSpan>("slidingCacheExpiration").Is(TimeSpan.MinValue)
.Ctor<ICacheDependency>().Is(cacheDependency);

How to use Node.js to create modified versions of html documents?

I am trying to do this:
Read html document "myDocument.html" with Node
Insert contents of another html document named "foo.html" immediately after the open body tag of myDocument.html.
Insert contents of yet another html document named "bar.html" immediately before the close body tag of myDocument.html.
Save the modified version of "myDocument.html".
To do the above, I would need to search the DOM with Node to find the open and closing body tags.
How can this be done?
Very simply, you can use the native Filesystem module that comes with Node.JS. (var fs = require("fs")). This allows you to read and convert the HTML to a string, perform string replace functions, and finally save the file again by rewriting it.
The advantage is that this solution is completely native, and requires no external libraries. It is also completely faithful to the original HTML file.
//Starts reading the file and converts to string.
fs.readFile('myDocument.html', function (err, myDocData) {
fs.readFile('foo.html', function (err, fooData) { //reads foo file
myDocData.replace(/\<body\>/, "<body>" + fooData); //adds foo file to HTML
fs.readFile('bar.html', function (err, barData) { //reads bar file
myDocData.replace(/\<\/body\>/, barData + "</body>"); //adds bar file to HTML
fs.writeFile('myDocumentNew.html', myDocData, function (err) {}); //writes new file.
});
});
});
In a simple but not accurate way, you can do this:
str = str.replace(/(<body.*?>)/i, "$1"+read('foo.html'));
str = str.replace(/(<\/body>)/i, read('bar.html')+'$1');
It will not work if the myDocument content contains multiple "<body ..' or '</body>', e.g. in javascript, and also the foo.html and bar.html can not contains '$1' or '$2'...
If you can edit the content of myDocument, then you can leave some "placeholder" there(as html comments), like
<!--foo.html-->
Then, it's easy, just replace this "placeholder" .
Use the cheerio library, which has a simplified jQuery-ish API.
var cheerio = require('cheerio');
var dom = cheerio(myDocumentHTMLString);
dom('body').prepend(fooHTMLString);
dom('body').append(barHTMLString);
var finalHTML = dom.html();
And just to be clear since the legions of pro-regex individuals are already appearing in droves, yes you need a real parser. No you cannot use a regular expression. Read Stackoverflow lead developer Jeff Atwood's post on parsing HTML the Cthulhu way.

Resources