Cannot read XML data using XPATH when there is no valid namespace name available [duplicate] - node.js

How does XPath deal with XML namespaces?
If I use
/IntuitResponse/QueryResponse/Bill/Id
to parse the XML document below I get 0 nodes back.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IntuitResponse xmlns="http://schema.intuit.com/finance/v3"
time="2016-10-14T10:48:39.109-07:00">
<QueryResponse startPosition="1" maxResults="79" totalCount="79">
<Bill domain="QBO" sparse="false">
<Id>=1</Id>
</Bill>
</QueryResponse>
</IntuitResponse>
However, I'm not specifying the namespace in the XPath (i.e. http://schema.intuit.com/finance/v3 is not a prefix of each token of the path). How can XPath know which Id I want if I don't tell it explicitly? I suppose in this case (since there is only one namespace) XPath could get away with ignoring the xmlns entirely. But if there are multiple namespaces, things could get ugly.

XPath 1.0/2.0
Defining namespaces in XPath (recommended)
XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.
It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.
Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.
(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)
C#:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(#"/i:IntuitResponse/i:QueryResponse", nsmgr);
Google Docs:
Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.
Java (SAX):
NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");
Java (XPath):
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
Remember to call
DocumentBuilderFactory.setNamespaceAware(true).
See also:
Java XPath: Queries with default namespace xmlns
JavaScript:
See Implementing a User Defined Namespace Resolver:
function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );
Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().
Perl (LibXML):
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my #nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');
Python (lxml):
from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})
Python (ElementTree):
namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)
Python (Scrapy):
response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()
PhP:
Adapted from #Tomalak's answer using DOMDocument:
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");
$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");
See also #IMSoP's canonical Q/A on PHP SimpleXML namespaces.
Ruby (Nokogiri):
puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")
Note that Nokogiri supports removal of namespaces,
doc.remove_namespaces!
but see the below warnings discouraging the defeating of XML namespaces.
VBA:
xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")
VB.NET:
xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)
SoapUI (doc):
declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse
xmlstarlet:
-N i="http://schema.intuit.com/finance/v3"
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...
Once you've declared a namespace prefix, your XPath can be written to use it:
/i:IntuitResponse/i:QueryResponse
Defeating namespaces in XPath (not recommended)
An alternative is to write predicates that test against local-name():
/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']
Or, in XPath 2.0:
/*:IntuitResponse/*:QueryResponse
Skirting namespaces in this manner works but is not recommended because it
Under-specifies the full element/attribute name.
Fails to differentiate between element/attribute names in different
namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='IntuitResponse']
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='QueryResponse']
Thanks to Daniel Haley for the namespace-uri() note.
Is excessively verbose.
XPath 3.0/3.1
Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:
/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse
While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.
Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.

I use /*[name()='...'] in a google sheet to fetch some counts from Wikidata. I have a table like this
thes WD prop links items
NOM P7749 3925 3789
AAT P1014 21157 20224
and the formulas in cols links and items are
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(*)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(distinct?item)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
respectively. The SPARQL query happens not to have any spaces...
I saw name() used instead of local-name() in Xml Namespace breaking my xpath!, and for some reason //*:literal doesn't work.

Related

SOAP + Zeep + XSD extension

Am interacting with a SOAP service through Zeep and so far it's been going fine, except I hit a snag with regards to dealing with passing values in anything related to an XSD extension.
I've tried multiple ways and am at my wits end.
campaignClient = Client("https://platform.mediamind.com/Eyeblaster.MediaMind.API/V2/CampaignService.svc?wsdl")
listPaging = {"PageIndex":0,"PageSize":5}
fact=campaignClient.type_factory("ns1")
parentType = fact.CampaignIDFilter
subtype=dict(parentType.elements)["CampaignID"] = (123456,)
combined= parentType(CampaignID=subtype)
rawData = campaignClient.service.GetCampaigns(Paging=listPaging,CampaignsFilter=combined, ShowCampaignExtendedInfo=False,_soapheaders=token)
print(rawData)
The context is the following :
this service is to get a list of items and it's possible to apply a filter to it, which is a generic type. You can then implement any type of filter matching that type, here a CampaignIDFilter.
My other attempts failed and the service used to pinpoint incorrect type or such but this way - which I think is on paper sound, gets me a 'something went wrong'.
I'm literraly implementing the solution found here : Creating XML sequences with zeep / python
Here's the service Doc http://platform.mediamind.com/Eyeblaster.MediaMind.API.Doc/?v=3
Cheers
Turns out the right way to get there was to hack around a bit to get the right structure and use of types. The code itself :
objectType = campaignClient.get_type('ns1:CampaignIDFilter')
objectWrap = xsd.Element('CampaignServiceFilter',objectType)
objectValue = objectWrap(CampaignID=123456)
wrapperT = campaignClient.get_type('ns1:ArrayOfCampaignServiceFilter')
wrapper = xsd.Element("CampaignsFilter",wrapperT)
outercontent = wrapper(objectValue)
This ends up generating the following XML :
<soap-env:Body>
<ns0:GetCampaignsRequest xmlns:ns0="http://api.eyeblaster.com/message">
<ns0:CampaignsFilter>
<ns1:CampaignServiceFilter xmlns:ns1="http://api.eyeblaster.com/V1/DataContracts" xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" xsi:type="ns1:CampaignIDFilter">
<ns1:CampaignID>123456</ns1:CampaignID>
</ns1:CampaignServiceFilter>
</ns0:CampaignsFilter>
<ns0:Paging>
<ns0:PageIndex>0</ns0:PageIndex>
<ns0:PageSize>5</ns0:PageSize>
</ns0:Paging>
<ns0:ShowCampaignExtendedInfo>false</ns0:ShowCampaignExtendedInfo>
</ns0:GetCampaignsRequest>
</soap-env:Body>
Much credit to the user here which gave me the boiler plate needed to get this lovecraftian horror to work how to specify xsi:type zeep python

How to set `invalidAttributeNamePrefix` value in Java?

Suppose I'm cleaning some html using HtmlCleaner (v2.18) and I want to set the property invalidAttributeNamePrefix (see section Cleaner parameters) to some value, i.e.: data-.
This way an attribute my-custom-attr="my-value" in the HTML will be transformed to data-my-custom-attr="my-value".
How can I do that? I wasn't able to find any example for the Java usage.
You can take as reference this piece of code:
HtmlCleaner cleaner = new HtmlCleaner();
CleanerProperties properties = cleaner.getProperties();
properties.setOmitComments(true);
// properties.setInvalidAttributeNamePrefix("data-"); there is no such method
// html is a declared variable which contains some html content
TagNode rootTagNode = cleaner.clean(html);
XmlSerializer xmlSerializer = new PrettyXmlSerializer(properties);
String cleanedHtml = xmlSerializer.getAsString(rootTagNode);
Upgrading to version 2.22 solves this.
Now it can be done
// ...
properties.setInvalidXmlAttributeNamePrefix("data-");
//...

Issues adding source code to a custom DSL file programatically

I currently have a xtext grammar that looks like the following :
Features:
'feature' name = ID
'{'(
('action' '{' action+=Actions (',' action+=Actions)* '}')? &
('dependencies' '{' dependencies = Dependencies '}')? &
('children' '{' children = Children '}')?
)'}'
;
What I want to do with this is add an action to an already existing source file programatically, for that I am using the IUnitOfWork.Void class that I subclass for easier implementation , it currently looks like this (the meaningful part of it) :
final XtextEditor editor = (XtextEditor)sourcepart;
final IXtextDocument document = editor.getDocument();
document.modify(new IUnitOfWork.Void<XtextResource>(){
public void process (XtextResource resource) throws Exception {
IParseResult parseResult = resource.getParseResult();
if(parseResult ==null)
return;
CompositeNode rootNode=(CompositeNode) parseResult.getRootNode();
LeafNode node = (LeafNode)NodeModelUtils.findLeafNodeAtOffset(rootNode, 0);
EObject object =NodeModelUtils.findActualSemanticObjectFor(node);
Through this I traverse the tree of the model and get to my Features object to which I want to add an action to (this is done through a pop up menu in a custom Tree View I'm implementing)
Here's my problem : whenever I want to add an action it screws up the way the tags are placed in the source file , and by that I mean that instead of :
action {
act1.set (foo),
act2.set (bar),
act3.set (baz),
act4.set (booze) //where this is the new action that I add
}
it will add it as
action {
act1.set (foo),
act2.set (bar),
act3.set (baz)
}
action {
act4.set(booze)
}
And this is illegal by the rules of my grammar, and I'm not allowed to change the way it should be written. (I am allowed to make small changes to the way the rules are implemented, but would really want to avoid it as it would mean a whole new amount of work to reimplement other things that depend on them)
I've tried :
adding it directly through Features.getAction().add(*the new action);
copying the items in the list into an array with the toArray() method so as to avoid referencing, adding my action to the array, clearing the list then adding all the elements again one by one
creating an entirely new Features object and setting everything in it to be the same as the currently edited one then replacing the feature with the new one
And I'm out of ideas after that. The frustrating part is that the 3rd method worked for a different kind of object in my grammar and had no errors there.
How could I make this work ?
this is a bug in xtext. (can you please file a ticket?)
as a workaround you may use the following
Features:
'feature' name = ID
'{'(
('action' '{' actionList=ActionList '}')? &
('dependencies' '{' dependencies = Dependencies '}')? &
('children' '{' children = Children '}')?
)'}';
ActionList:
(action+=Action (',' action+=Action)*)
;

How to get a child node value based on another child in the same group

I'm developing in C#, and am processing xml from an external soap call.
I have loaded the xml response into an XElement.
Given the following xml stub
<record>
<node>
<a>My title</a>
<name>title_en</name>
</node>
<node>
<a>...</a>
<name>contact_name</name>
</node>
.....
</record>
Using xpath in C#: I'm trying to do the follow when using the method XPathSelectElement.
where
\record\node\name == 'title_en' select \record\node\a
If there is a better method to use or another suggestion on how to preform the query, I'm open to ideas.
Thanks in advance.
You need a predicate to constrain which node elements you need:
/record/node[name = 'title_en']/a
You read this expression as "find the record element, find all its child elements named node that have a name child with value "title_en", and for each of those find all a children"
Use this:
var title = doc.Descendants("node")
.Where(x => (string)x.Element("name") == "title_en")
.Select(x => (string)x.Element("a"))
.FirstOrDefault();

Reading/Editing XLIFF using C#

I need to parse an XLIFF file using C#, but I'm having some trouble. These files are fairly complex, containing a huge amount of nodes.
Basically, all I need to do is read the source node from each trans-unit node, do some processing on it, and insert the processed text into the corresponding target node (which will always be present, but empty).
An example of one of the nodes I need to parse would be (the whole file may contain 100s of these):
<trans-unit id="0000000002" datatype="text" restype="string">
<source>Windows Update is not installed</source>
<target/>
<iws:segment-metadata tm_score="0.00" ws_word_count="6" max_segment_length="0">
<iws:status target_content="placeholders_only"/>
</iws:segment-metadata>
<iws:boundary-seg sequence="bs20721"/>
<iws:markup-seg sequence="0000000001">
</trans-unit>
The trans-unit nodes can be buried deep in the files, the header section contains a lot of data. I'd like to use LINQ to XML to read the data, but I'm not having any luck getting it to work. Here's my current code (just trying to read and output the source nodes from the file:
XDocument doc = XDocument.Load(path);
Console.WriteLine("Before loop");
foreach (var transUnitNode in doc.Descendants("trans-unit"))
{
Console.WriteLine("In loop");
XElement sourceNode = transUnitNode.Element("source");
XElement targetNode = transUnitNode.Element("target");
Console.WriteLine("Source: " + sourceNode.Value);
}
I never see 'In loop' and I don't know why, can someone tell me what I'm doing wrong here, or suggest a better way to achieve what I'm trying to do here?
Thanks.
Try
XNamespace df = doc.Root.Name.Namespace;
foreach (XElement transUnitNode in doc.Descendants(df + "trans-unit"))
{
XElement sourceNode = transUnitNode.Element(df + "source");
// and so one, use the df namespace object to qualify any elements names
}
See also http://msdn.microsoft.com/en-us/library/bb387093.aspx.

Resources