Avoid nesting of element in RDF/XML in Apache Jena - nested

I am trying to writer a exporter utility to SKOS using Apache Jena. My issue is that the broader or narrower objects are getting nested. I am expecting the following xml but getting the xmls with nested elements. I am not getting any help from tutorials. Is it just a formatting issue or something to do with the way I am coding for it ?
Actual Output
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:SKOS="http://www.w3.org/2004/02/skos/core#">
<SKOS:Concept rdf:about="http://lexicon.ai/P011">
<SKOS:broader>
<SKOS:Concept>
<SKOS:narrower>
<SKOS:Concept>
<SKOS:scopeNote>testb</SKOS:scopeNote>
<SKOS:prefLabel>Disease</SKOS:prefLabel>
</SKOS:Concept>
</SKOS:narrower>
<SKOS:scopeNote>testb</SKOS:scopeNote>
<SKOS:prefLabel>Disease</SKOS:prefLabel>
</SKOS:Concept>
</SKOS:broader>
<SKOS:altLabel>alt2</SKOS:altLabel>
<SKOS:altLabel>alt1</SKOS:altLabel>
<SKOS:scopeNote>test</SKOS:scopeNote>
<SKOS:prefLabel>Disease</SKOS:prefLabel>
</SKOS:Concept>
</rdf:RDF>
Expected Output
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:SKOS="http://www.w3.org/2004/02/skos/core#">
<SKOS:Concept rdf:about="http://lexicon.ai/P011">
<SKOS:broader rdf:about="http://lexicon.ai/P012"/>
<SKOS:altLabel>alt2</SKOS:altLabel>
<SKOS:altLabel>alt1</SKOS:altLabel>
<SKOS:scopeNote>test</SKOS:scopeNote>
<SKOS:prefLabel>Disease</SKOS:prefLabel>
</SKOS:Concept>
<SKOS:Concept rdf:about="http://lexicon.ai/P012">
<SKOS:narrower rdf:about="http://lexicon.ai/P0121"/>
<SKOS:scopeNote>testb</SKOS:scopeNote>
<SKOS:prefLabel>Diseaseb</SKOS:prefLabel>
</SKOS:Concept>
<SKOS:Concept rdf:about="http://lexicon.ai/P0121">
<SKOS:scopeNote>testn</SKOS:scopeNote>
<SKOS:prefLabel>Diseasen</SKOS:prefLabel>
</SKOS:Concept>
</rdf:RDF>
Codes is as follows
Model model = ModelFactory.createDefaultModel();
model.setNsPrefix("SKOS", SKOS.uri);
Model model2 = ModelFactory.createDefaultModel();
model2.setNsPrefix("SKOS", SKOS.uri);
final Resource Entity = model.createResource(personURI);
final Resource broader1 = model.createResource();
final Resource nt1 = model.createResource();
nt1.addProperty(RDF.type, SKOS.Concept);
nt1.addProperty(SKOS.prefLabel, "Diseasen");
nt1.addProperty(SKOS.scopeNote, "testn");
broader1.addProperty(RDF.type, SKOS.Concept);
broader1.addProperty(SKOS.prefLabel, "Diseaseb");
broader1.addProperty(SKOS.scopeNote, "testb");
broader1.addProperty(SKOS.narrower, nt1);
Entity.addProperty(RDF.type, SKOS.Concept);
Entity.addProperty(SKOS.prefLabel, "Disease");
Entity.addProperty(SKOS.scopeNote, "test");

"http://lexicon.ai/P011" does not appear in the code sample Entity does not appear to be used.
There are two model.createResource() which will create 2 blank nodes.
The "actual" output shows one resource was created with createResource("http://lexicon.ai/P011") and one with a blank node. That looks like the cause of the nesting.
To get nearer to the required output, you will need to use to named resources and may be better off with the more basic writer, RDFFormat.RDFXML_PLAIN writing using RDFDataMgr.write.

Related

How to resolve: XML schema created in Excel contains denormalized data

Edit/Update: By removing the <GrpHdr> element completely, Excel was able to verify the XML Map as exportable. My original question still remains, how can I solve the "Denormalized Data" error, with the <GrpHdr> included.
I am new to XML, and have been trying to import a source file (XML below) into Excel, create a schema/XML Map (unsure of the difference there) which I can then drag and drop onto two different tables:
One table contains one row of data for the Group Header: <GrpHdr> (Occurs ONCE)
One table contains multiple rows of data for the various Payments: <PmtInf> (Occurs MULTIPLE times)
I am able to successfully load the below XML into Excel using the Source button, and also to create an XML map off of it (which then appears in a "XML Source" window, showing the parent and child elements).
The problem I am having is in Verifying the XML Map for export. Excel says that the map contains "Denormalized Data". I have looked at various Microsoft resources, as well as on Stack Overflow.
Such as:
https://support.microsoft.com/en-us/office/issue-verifying-an-xml-map-for-export-fbfcdb77-c2d6-4040-b256-e584a71151b0
excel: Cannot save or export xml data. The xml map in this workbook are not exportable
Export denormalized data from excel to xml
Based on my research, I tried the following:
I have tried setting the MinOccurs and MaxOccurs attributes to be "0" and "unbounded" respectively, as I believe the default is "1" for both, and Excel's error saying that the XML Map contains "Denormalized Data" is due to having an element with the MaxOccurs set to "1".
I have also tried adding multiple <PmtInf> elements, so Excel knows (when creating a schema from the below sample file), that <PmtInf> is to occur multiple times.
Thanks!
<?xml version="1.0" encoding="utf-8"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03">
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>UNIQUE MESSAGE ID 35 AN</MsgId>
<CreDtTm>2016-05-26T10:07:00</CreDtTm>
<NbOfTxs>1</NbOfTxs>
<CtrlSum>0.01</CtrlSum>
<InitgPty>
<Id>
<OrgId>
<Othr>
<Id>ABC12345678</Id>
</Othr>
</OrgId>
</Id>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>ORIGINATOR REFERENCE 35AN</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
</PmtTpInf>
<ReqdExctnDt>2016-05-26</ReqdExctnDt>
<Dbtr>
<Nm>DEBTOR NAME 70AN</Nm>
<PstlAdr>
<StrtNm>Street Name</StrtNm>
<BldgNb>Building Number</BldgNb>
<PstCd>Post Code</PstCd>
<TwnNm>Town Name</TwnNm>
<CtrySubDvsn>County/State/Region</CtrySubDvsn>
<Ctry>LU</Ctry>
</PstlAdr>
</Dbtr>
<DbtrAcct>
<Id>
<IBAN>NL39HSBC0123456789</IBAN>
</Id>
</DbtrAcct>
<DbtrAgt>
<FinInstnId>
<BIC>HSBCNL2A</BIC>
<PstlAdr>
<Ctry>IE</Ctry>
</PstlAdr>
</FinInstnId>
</DbtrAgt>
<ChrgBr>SLEV</ChrgBr>
<CdtTrfTxInf>
<PmtId>
<InstrId>PAYMENT ID 35AN</InstrId>
<EndToEndId>UNIQUE BENEFICIARY REFERENCE 35AN</EndToEndId>
</PmtId>
<Amt>
<InstdAmt Ccy="EUR">0.01</InstdAmt>
</Amt>
<CdtrAgt>
<FinInstnId>
<BIC>MIDLGB22</BIC>
<PstlAdr>
<Ctry>GB</Ctry>
</PstlAdr>
</FinInstnId>
</CdtrAgt>
<Cdtr>
<Nm>CREDITOR NAME 70AN</Nm>
<PstlAdr>
<StrtNm>Street Name</StrtNm>
<BldgNb>Building Number</BldgNb>
<PstCd>Post Code</PstCd>
<TwnNm>Town Name</TwnNm>
<CtrySubDvsn>County/State/Region</CtrySubDvsn>
<Ctry>GB</Ctry>
</PstlAdr>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>GB94MIDL40123487654321</IBAN>
</Id>
</CdtrAcct>
<RmtInf>
<Ustrd>Remittance Info up to 140AN</Ustrd>
</RmtInf>
</CdtTrfTxInf>
</PmtInf>
</CstmrCdtTrfInitn>
</Document>

SOAP + Zeep + XSD extension

Am interacting with a SOAP service through Zeep and so far it's been going fine, except I hit a snag with regards to dealing with passing values in anything related to an XSD extension.
I've tried multiple ways and am at my wits end.
campaignClient = Client("https://platform.mediamind.com/Eyeblaster.MediaMind.API/V2/CampaignService.svc?wsdl")
listPaging = {"PageIndex":0,"PageSize":5}
fact=campaignClient.type_factory("ns1")
parentType = fact.CampaignIDFilter
subtype=dict(parentType.elements)["CampaignID"] = (123456,)
combined= parentType(CampaignID=subtype)
rawData = campaignClient.service.GetCampaigns(Paging=listPaging,CampaignsFilter=combined, ShowCampaignExtendedInfo=False,_soapheaders=token)
print(rawData)
The context is the following :
this service is to get a list of items and it's possible to apply a filter to it, which is a generic type. You can then implement any type of filter matching that type, here a CampaignIDFilter.
My other attempts failed and the service used to pinpoint incorrect type or such but this way - which I think is on paper sound, gets me a 'something went wrong'.
I'm literraly implementing the solution found here : Creating XML sequences with zeep / python
Here's the service Doc http://platform.mediamind.com/Eyeblaster.MediaMind.API.Doc/?v=3
Cheers
Turns out the right way to get there was to hack around a bit to get the right structure and use of types. The code itself :
objectType = campaignClient.get_type('ns1:CampaignIDFilter')
objectWrap = xsd.Element('CampaignServiceFilter',objectType)
objectValue = objectWrap(CampaignID=123456)
wrapperT = campaignClient.get_type('ns1:ArrayOfCampaignServiceFilter')
wrapper = xsd.Element("CampaignsFilter",wrapperT)
outercontent = wrapper(objectValue)
This ends up generating the following XML :
<soap-env:Body>
<ns0:GetCampaignsRequest xmlns:ns0="http://api.eyeblaster.com/message">
<ns0:CampaignsFilter>
<ns1:CampaignServiceFilter xmlns:ns1="http://api.eyeblaster.com/V1/DataContracts" xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" xsi:type="ns1:CampaignIDFilter">
<ns1:CampaignID>123456</ns1:CampaignID>
</ns1:CampaignServiceFilter>
</ns0:CampaignsFilter>
<ns0:Paging>
<ns0:PageIndex>0</ns0:PageIndex>
<ns0:PageSize>5</ns0:PageSize>
</ns0:Paging>
<ns0:ShowCampaignExtendedInfo>false</ns0:ShowCampaignExtendedInfo>
</ns0:GetCampaignsRequest>
</soap-env:Body>
Much credit to the user here which gave me the boiler plate needed to get this lovecraftian horror to work how to specify xsi:type zeep python

How to access xml field with lxml?

Python 3.6, Lxml, Windows 10
I am getting crazy. I want to access the item field. But I always get the error:
AttributeError: 'cython_function_or_method' object has no attribute'item'
Everything else (address fields etc...) I can access without problems. How can I access the item fields (sku, amount etc...)?
I've used this code:
import requests
from lxml import objectify
url = "URL_TO_XML_FILE"
xml_content = requests.get(url).text.encode('utf-8')
xml = objectify.fromstring(xml_content)
for sale in xml.response.sales.sale:
for item in sale.items.item:
print(item.sku)
Here is the beginning of the xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<getnewsalesresult xmlns="https://pmcdn.priceminister.com/res/schema/getnewsales">
<request>
<version>2017-08-07</version>
<user>SELLER</user>
</request>
<response>
<lastversion>2017-08-07</lastversion>
<sellerid>95029358</sellerid>
<sales>
<sale>
<purchaseid>297453287592813953</purchaseid>
<purchasedate>15/12/2018-19:10</purchasedate>
<deliveryinformation>
<shippingtype>Normal</shippingtype>
<isfullrsl>N</isfullrsl>
<purchasebuyerlogin><![CDATA[LOGIN]]></purchasebuyerlogin>
<purchasebuyeremail>EMAIL</purchasebuyeremail>
<deliveryaddress>
<civility>Mme</civility>
<lastname><![CDATA[Lastname]]></lastname>
<firstname><![CDATA[Firstname]]></firstname>
<address1><![CDATA[STREET]]></address1>
<address2><![CDATA[]]></address2>
<zipcode>13570</zipcode>
<city><![CDATA[Paris]]></city>
<country><![CDATA[France]]></country>
<countryalpha2>FX</countryalpha2>
<phonenumber1></phonenumber1>
<phonenumber2>PHONENUMBER</phonenumber2>
</deliveryaddress>
</deliveryinformation>
<items>
<item>
<sku><![CDATA[SKU1]]></sku>
<advertid>411812243030</advertid>
<advertpricelisted>
<amount>15.99</amount>
<currency>EUR</currency>
</advertpricelisted>
<itemid>551131040</itemid>
<headline><![CDATA[HEADLINE]]></headline>
<itemstatus><![CDATA[REQUESTED]]></itemstatus>
<ispreorder>N</ispreorder>
<isnego>N</isnego>
<negotiationcomment></negotiationcomment>
<price>
<amount>15.99</amount>
<currency>EUR</currency>
</price>
<isrsl>N</isrsl>
<isbn></isbn>
<ean>4363745894373857474; </ean>
<paymentstatus><![CDATA[INCOMING]]></paymentstatus>
<sellerscore></sellerscore>
</item>
</items>
</sale>
<sale>
The problem is that items is actually a method of ObjectifiedElement, so the expression sale.items actually returns the method, because it has precedence.
To get the 'items' object you want, you have to be more explicit about getting the attribute of sale and not looking for methods of the class first, which is the usual python order. This is what python does behind the scene when you access an attribute, and you can do it too:
sale.__getattr__('items')
This will also work (it's a dictionary-like interface to the attributes of an object):
sale.__dict__['items']
The revised code:
import requests
from lxml import objectify
url = "URL_TO_XML_FILE"
xml_content = requests.get(url).text.encode('utf-8')
xml = objectify.fromstring(xml_content)
for sale in xml.response.sales.sale:
for item in sale.__dict__['items'].item:
print(item.sku)
Another way to deal with this is to avoid using the flaky attribute interface:
for sale in xml['response']['sales']['sale']:
for item in sale['items']['item']:
print(item['sku'])
Using the dict-like indexing interface, you never have to worry about certain attributes names (which includes such common words as items, index, keys, remove, replace, tag, set, text, and values) returning surprising results.

Berkeley XML DB "where" analog

I'm currently studying Berkeley XML DB and got an assignment to write Python script using it. The problem I'm currently facing is to select specific node of container. For example we have container with such information
<root>
<lab>
<name>Lab1</name>
<state>Completed</state>
</lab>
<lab>
<name>Lab3</name>
<state>Not completed</state>
</lab>
</root>
How to select <lab> element with specific <name>? In SQL I'd use WHERE Name='Lab1'. Is there any way to do something like that in XML BDB?
I think you better get old document, copy data, remove document and add new with modified data.
mgr = XmlManager()
uc = mgr.createUpdateContext()
container = mgr.openContainer("labs.dbxml") # Here must be your database name
qc = mgr.createQueryContext()
document = container.getDocument("Lab11")
name = document.getName()
content = document.getContent()
# Change fields here using XPath
container.deleteDocument('La1 1', uc)
container.putDocument(name, content, uc)

Reading/Editing XLIFF using C#

I need to parse an XLIFF file using C#, but I'm having some trouble. These files are fairly complex, containing a huge amount of nodes.
Basically, all I need to do is read the source node from each trans-unit node, do some processing on it, and insert the processed text into the corresponding target node (which will always be present, but empty).
An example of one of the nodes I need to parse would be (the whole file may contain 100s of these):
<trans-unit id="0000000002" datatype="text" restype="string">
<source>Windows Update is not installed</source>
<target/>
<iws:segment-metadata tm_score="0.00" ws_word_count="6" max_segment_length="0">
<iws:status target_content="placeholders_only"/>
</iws:segment-metadata>
<iws:boundary-seg sequence="bs20721"/>
<iws:markup-seg sequence="0000000001">
</trans-unit>
The trans-unit nodes can be buried deep in the files, the header section contains a lot of data. I'd like to use LINQ to XML to read the data, but I'm not having any luck getting it to work. Here's my current code (just trying to read and output the source nodes from the file:
XDocument doc = XDocument.Load(path);
Console.WriteLine("Before loop");
foreach (var transUnitNode in doc.Descendants("trans-unit"))
{
Console.WriteLine("In loop");
XElement sourceNode = transUnitNode.Element("source");
XElement targetNode = transUnitNode.Element("target");
Console.WriteLine("Source: " + sourceNode.Value);
}
I never see 'In loop' and I don't know why, can someone tell me what I'm doing wrong here, or suggest a better way to achieve what I'm trying to do here?
Thanks.
Try
XNamespace df = doc.Root.Name.Namespace;
foreach (XElement transUnitNode in doc.Descendants(df + "trans-unit"))
{
XElement sourceNode = transUnitNode.Element(df + "source");
// and so one, use the df namespace object to qualify any elements names
}
See also http://msdn.microsoft.com/en-us/library/bb387093.aspx.

Resources