Mapping excel to XML - Problem importing XML-fields - excel

I seem to have a problem with mapping XML parts to an existing exceltable.
I have a sample XML file provided from the Swedish tax authority as XML-schema:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Skatteverket xmlns="http://xmls.skatteverket.se/se/skatteverket/ai/instans/infoForBeskattning/4.0"
xmlns:gm="http://xmls.skatteverket.se/se/skatteverket/ai/gemensamt/infoForBeskattning/4.0"
xmlns:ku="http://xmls.skatteverket.se/se/skatteverket/ai/komponent/infoForBeskattning/4.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" omrade="Kontrolluppgifter"
xsi:schemaLocation="http://xmls.skatteverket.se/se/skatteverket/ai/instans/infoForBeskattning/4.0
http://xmls.skatteverket.se/se/skatteverket/ai/kontrolluppgift/instans/Kontrolluppgifter_4.0.xsd ">
<ku:Avsandare>
<ku:Programnamn>KUfilsprogrammet</ku:Programnamn>
<ku:Organisationsnummer>162234567895</ku:Organisationsnummer>
<ku:TekniskKontaktperson>
<ku:Namn>Bo Ek</ku:Namn>
<ku:Telefon>+46881234567</ku:Telefon>
<ku:Epostadress>bo.ek#elbolagetab.se</ku:Epostadress>
<ku:Utdelningsadress1>Strömgatan 11</ku:Utdelningsadress1>
<ku:Postnummer>62145</ku:Postnummer>
<ku:Postort>Strömby</ku:Postort>
</ku:TekniskKontaktperson>
<ku:Skapad>2015-06-07T21:32:52</ku:Skapad>
</ku:Avsandare>
<ku:Blankettgemensamt>
<ku:Uppgiftslamnare>
<ku:UppgiftslamnarePersOrgnr>165599990602</ku:UppgiftslamnarePersOrgnr>
<ku:Kontaktperson>
<ku:Namn>John Ström</ku:Namn>
<ku:Telefon>+46812345678</ku:Telefon>
<ku:Epostadress>siv.strom#elbolagetab.se</ku:Epostadress>
<ku:Sakomrade>Förnybar el</ku:Sakomrade>
</ku:Kontaktperson>
</ku:Uppgiftslamnare>
</ku:Blankettgemensamt>
<!-- Kontrolluppgift 1 -->
<ku:Blankett nummer="2350">
<ku:Arendeinformation>
<ku:Arendeagare>165599990602</ku:Arendeagare>
<ku:Period>2018</ku:Period>
</ku:Arendeinformation>
<ku:Blankettinnehall>
<ku:KU66>
<ku:UppgiftslamnareKU66>
<ku:UppgiftslamnarId faltkod="201">165599990602</ku:UppgiftslamnarId>
<ku:NamnUppgiftslamnare faltkod="202">Sonjas elhandel</ku:NamnUppgiftslamnare>
</ku:UppgiftslamnareKU66>
<ku:Inkomstar faltkod="203">2018</ku:Inkomstar>
<ku:KWhMatatsIn faltkod="270">3622</ku:KWhMatatsIn>
<ku:KWhTagitsUt faltkod="271">4822</ku:KWhTagitsUt>
<ku:AnlaggningsID faltkod="272">735999123456789012</ku:AnlaggningsID>
<ku:AndelIAnslPunkt faltkod="273">12.5</ku:AndelIAnslPunkt>
<ku:Specifikationsnummer faltkod="570">128</ku:Specifikationsnummer>
<ku:InkomsttagareKU66>
<ku:Inkomsttagare faltkod="215">193804139149</ku:Inkomsttagare>
</ku:InkomsttagareKU66>
</ku:KU66>
</ku:Blankettinnehall>
</ku:Blankett>
</Skatteverket>
When using Excel, Developer tab -> XML -> Source and adding the file I don't seem to get the XML parts inside the tag
<ku:Blankettinnahall>
Any reason why Excel would skip these XML parts?
Here is some sample exceltable data that I would like to map to those XML-fields:
AnlaggningsID Inkomsttagare Inkomstar KWhMatatsIn KWhTagitsUt AndelIAnslPunkt Specifikationsnummer
526009875445385000 190101019999 2018 50078,0 88462,0 1
258655985101244000 190201019999 2018 75,0 4615,0 2
112855269388863000 190301019999 2018 16687,0 19870,0 42 3
364615095294089000 190401019999 2018 16687,0 19870,0 58 4
534980084130649000 190501019999 2018 174,0 7009,0 5

It looks like your missing the actual data itself...the top half is the description of the sender and details. And later is data section (Blankettinnehall)
So on your excel I would expect rows with columns for each header/ sender details. This may be whats missing.
You can see this if you take a sample file from them and view it in Excel.
I struggled with KU52 last year ended up doing a C# application to generate the XML file.

Related

how to create an XML File from and EXCEL File

Friends, I have not done that before and seeking help if some has done this. I have one One EXCEL File with multiple rows and column.
I want to export the data into XML as below:
Case Number AER_No. P_No. PROD1 PROD2 EVENT1 EVENT2
004089652 202211-01 80000231204 TYLONEL PREVNAR2 FEVER RASH
Expected output:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ichicsr SYSTEM http://eudravigilance.ema.europa.eu/dtd/icsr21xml.dtd>
<ichicsr lang="en">
<ichicsrmessageheader>
<CASENUMBER>004089652</CASENUMBER>
<AER_NO>202211-01</AER_NO>
<P_NO>80000231204</P_NO>
<PROD1>TYLONEL</PROD1>
<PROD2>PREVNAR2</PROD2>
<EVENT1>FEVER</EVENT1>
<EVENT2>RASH</EVENT2>
</ichicsrmessageheader>

Concatenate long strings from multiple records into one string

I have a situation where I need to concatenate long strings from multiple records in an Oracle database into a single string. These long strings are portions of a larger XML string, and my ultimate goal is to be able to convert this XML into something resembling query results and pull out specific values.
The data would look something like this, with the MSG_LINE_TEXT field being VARCHAR2(4000). So if the total message is less than 4000 characters, then there'd only be one record. In theory, there could be an infinite number of records for each message, although the highest I've seen so far is 14 records, which means I need to be able to handle strings that are at least 56000 characters long.
MESSAGE_ID MSG_LINE_NUMBER MSG_LINE_TEXT
---------- --------------- --------------------------------
17415414 1 Some XML snippet here
17415414 2 Some XML snippet here
17415414 3 Some XML snippet here
17415414 4 Some XML snippet here
The total XML for one MESSAGE_ID might look something like this. There could be many App_Advice_Error tags, although this specific example only contains one.
<tXML>
<Header>
<Source>MANH_prod_wmsweb</Source>
<Action_Type />
<Sequence_Number />
<Company_ID>1</Company_ID>
<Msg_Locale />
<Version />
<Internal_Reference_ID>17415414</Internal_Reference_ID>
<Internal_Date_Time_Stamp>2021-02-09 13:45:22</Internal_Date_Time_Stamp>
<External_Reference_ID />
<External_Date_Time_Stamp />
<User_ID>ESBUSER</User_ID>
<Message_Type>RESPONSE</Message_Type>
</Header>
<Response>
<Persistent_State>0</Persistent_State>
<Error_Type>2</Error_Type>
<Resp_Code>501</Resp_Code>
<Response_Details>
<Application_Advice>
<Shipper_ID />
<Imported_Object_Type>ASN</Imported_Object_Type>
<Response_Type>Error</Response_Type>
<Transaction_Date>2/9/21 13:45</Transaction_Date>
<Application_Ackg_Code>TE</Application_Ackg_Code>
<Business_Unit></Business_Unit>
<Tran_Set_Identifier_Code></Tran_Set_Identifier_Code>
<Transaction_Purpose_Code>11</Transaction_Purpose_Code>
<Imported_Message_Id></Imported_Message_Id>
<Imported_Object_Id>Reference Number Here</Imported_Object_Id>
<Additional_References>
<Additional_Reference_Info>
<Reference_Type>BusinessPartner</Reference_Type>
<Reference_ID></Reference_ID>
</Additional_Reference_Info>
</Additional_References>
<App_Advice_Errors>
<App_Advice_Error>
<App_Error_Text>Some error text here</App_Error_Text>
<Error_Message_Tokens>
<Error_Message_Token>Object that errored out</Error_Message_Token>
</Error_Message_Tokens>
<App_Err_Cond_Code>6100234</App_Err_Cond_Code>
</App_Advice_Error>
</App_Advice_Errors>
<Imported_Data></Imported_Data>
</Application_Advice>
</Response_Details>
</Response>
</tXML>
The values that I'm most interested in pulling out are the App_Err_Cond_Code, Error_Message_Token, and App_Error_Text tags. I had tried using something like this:
extractvalue(xmltype(msg_line_text), '//XPath of Tag')
This works beautifully for stuff where the entire XML is less than 4000 characters, i.e. the entire XML is stored in a single record. The problem comes when there are multiple records, because each individual snippet of XML isn't a valid XML string on its own, and so XMLTYPE throws an error, hence the reason I'm trying to concatenate them all into a single string, which I can then use with the above method.
I've tried a variety of ways to do this - LISTAGG, XMLAGG, SYS_CONNECT_BY_PATH, as well as writing a custom function something like this:
with
function get_messages(pTranLogID number) return string
is
xml varchar2;
begin
xml := '';
for msg in (
select r.msg_line_text
from tran_log_response_message r, tran_log t
where
t.message_id = r.message_id
and t.tran_log_id = pTranLogID
order by r.msg_line_number
)
loop
xml := xml || msg.msg_line_text;
end loop;
return 'test';
end;
select
tran_log_id, get_messages(tran_log_id)
from
tran_log
where
tran_log_id = '20633610';
/
The problem is that every one of these methods complained that the string was too long. Does anyone have any other ideas? Or maybe a better approach to this problem?
Thanks.

Reading CDATA with lxml, problem with end of line

Hello I am parsing a xml document with contains bunch of CDATA sections. I was working with no problems till now. I realised that when I am reading the an element and getting the text abribute I am getting end of line characters at the beggining and also at the end of the text read it.
A piece of the important code as follow:
for comments in self.xml.iter("Comments"):
for comment in comments.iter("Comment"):
description = comment.get('Description')
if language == "Arab":
tag = self.name + description
text = comment.text
The problem is at element Comment, he is made it as follow:
<Comment>
<![CDATA[Usually made it with not reason]]>
I try to get the text atribute and I am getting like that:
\nUsually made it with not reason\n
I Know that I could do a strip and so on. But I would like to fix the problem from the root cause, and maybe there is some option before to parse with elementree.
When I am parsing the xml file I am doing like that:
tree = ET.parse(xml)
Minimal reproducible example
import xml.etree.ElementTree as ET
filename = test.xml #Place here your path test xml file
tree = ET.parse(filename)
root = tree.getroot()
Description = root[0]
text = Description.text
print (text)
Minimal xml file
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Description>
<![CDATA[Hello world]]>
</Description>
You're getting newline characters because there are newline characters:
<Comment>
<![CDATA[Usually made it with not reason]]>
</Comment>
Why else would <![CDATA and </Comment start on new lines?
If you don't want newline characters, remove them:
<Comment><![CDATA[Usually made it with not reason]]></Comment>
Everything inside an element counts towards its string value.
<![CDATA[...]]> is not an element, it's a parser flag. It changes how the XML parser is reading the enclosed characters. You can have multiple CDATA sections in the same element, switching between "regular mode" and "cdata mode" at will:
<Comment>normal text <![CDATA[
CDATA mode, this may contain <unescaped> Characters!
]]> now normal text again
<![CDATA[more special text]]> now normal text again
</Comment>
Any newlines before and after a CDATA section count towards the "normal text" section. When the parser reads this, it will create one long string consisting of the individual parts:
normal text
CDATA mode, this may contain <unescaped> Characters!
now normal text again
more special text now normal text again
I thought that when CDATA comes at xml they were coming with end of line at the beginning and at the end, like that.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Description>
<![CDATA[Hello world]]>
</Description>
But you can have it like that also.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Description><![CDATA[Hello world]]></Description>
It is the reason to get end of line characters when we are parsing the with the Elementtree library, is working perfect in both cases, you only have to strip or not strip depending how you want to process the data.
if you want to remove both '\n' just add the following code:
text = Description.text
text = text.strip('\n')

Get the index of xml element according to its attribute / Python

I need to find out the index (position) of XML element with certain attribute and namespace. In my XML there are more elements with the same name so only possible way to identify the right one is by its attribute.
This is sample of my XML document:
<mets:mets LABEL="Moderní pedagogika, 2002" TYPE="Monograph"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:mets="http://www.loc.gov/METS/"
xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:ns3="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:ns5="http://www.w3.org/1999/xlink"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/XMLSchema-instance http://www.w3.org/2001/XMLSchema.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd http://www.w3.org/1999/xlink http://www.w3.org/1999/xlink.xsd">
<mets:metsHdr CREATEDATE="2012-12-05T07:42:22" LASTMODDATE="2012-12-05T07:42:22">
<mets:agent ROLE="CREATOR" TYPE="ORGANIZATION">
<mets:name>ABA001</mets:name>
</mets:agent>
<mets:agent ROLE="ARCHIVIST" TYPE="ORGANIZATION">
<mets:name>ABA001</mets:name>
</mets:agent>
</mets:metsHdr>
<mets:dmdSec ID="MODSMD_VOLUME_0001">
.....
</mets:dmdSec>
<mets:dmdSec ID="DCMD_VOLUME_0001">
.....
</mets:dmdSec>
</mets:mets>
Desired Index in this case is the index of this tag <mets:dmdSec ID="MODSMD_VOLUME_0001">
I have tried some solution regarding list(root).index(dmdSec) but without success since I am not able or do not know how to insert there details about attribute and namespace
Could someone help me with this
I'm assuming that you are using the lxml.etree library for xml parsing - if not you may have to modify things a bit - but the principle is the same:
Simply use:
Edit:
from lxml import etree
root = etree.parse(r'path\to\your\file.xml')
int(root.xpath('count(//*[#ID="MODSMD_VOLUME_0001"]/preceding-sibling::*)+1'))
Output:
2.
Note that the position is 2 and not 1 - xpath counts from 1 (unlike python, which counts from 0). Your target is the second <mets:dmdSec> node within the root.

Swift String encoding and NSXMLParser parsing issues

My App is calling the free Weather Forecast web service found at this URL:
http://www.webservicex.net/globalweather.asmx/GetWeather?CityName=Boston&CountryName=United+States
I'm using the usual NSURLConnection and NSXMLParser delegate methods to parse the incoming data (I've done this a million times before) but quite strangely, the NSMutableData that is returned is not getting converted to a string correctly via NSUTF8StringEncoding. Its basically failing to convert the "<" and ">" characters of the opening and closing XML tags, giving me "& l t;" and "& g t;" instead.
The problem seems to be in the connectionDidFinishLoading function:
func connection(connection: NSURLConnection, didReceiveData data: NSData) {
webServiceData!.appendData(data)
}
func connectionDidFinishLoading(connection: NSURLConnection) {
let XMLResponseString = NSString(data: webServiceData!, encoding: NSUTF8StringEncoding)!
println("XMLResponseString = \(XMLResponseString)")
}
The output I get from the println statement there is:
<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://www.webserviceX.NET"><?xml version="1.0" encoding="utf-16"?>
<CurrentWeather>
<Location>DALLAS EXECUTIVE AIRPORT, TX, United States (KRBD) 32-41N 096-52W 203M</Location>
<Time>Dec 30, 2014 - 08:53 AM EST / 2014.12.30 1353 UTC</Time>
<Wind> from the NE (050 degrees) at 12 MPH (10 KT):0</Wind>
<Visibility> 9 mile(s):0</Visibility>
<SkyConditions> overcast</SkyConditions>
<Temperature> 39.9 F (4.4 C)</Temperature>
<DewPoint> 34.0 F (1.1 C)</DewPoint>
<RelativeHumidity> 79%</RelativeHumidity>
<Pressure> 30.42 in. Hg (1030 hPa)</Pressure>
<Status>Success</Status>
</CurrentWeather></string>
So as you can see I'm getting the first 2 tags correctly - the "< ?XML >" and "< string xmlns >" tags, but the rest are all showing up as "& l t;" and "& g t;"
What's really strange is that its saying encoding="utf-8" for the first tag, but on the second line (towards the end) its saying encoding="utf-16".
So I tried using NSUTF16StringEncoding:
let XMLResponseString = NSString(data: webServiceData!, encoding: NSUTF16StringEncoding)!
and that basically gave me chinese looking characters.
I also tried running the parser directly on the url instead of the NSMutableData that's returned, like so:
myXMLParser = NSXMLParser(contentsOfURL:theURL!)!
(the original statement was this:
myXMLParser = NSXMLParser(data:webServiceData)
but neither of these worked.
So what's going on here? Any suggestions on how to get this to work properly?
This is actually the remote service being broken, rather than your code. Yes, the server really is sending XML in XML for no particularly good reason.
$ curl 'http://www.webservicex.net/globalweather.asmx/GetWeather?CityName=Boston&CountryName=United+States'
<?xml version="1.0" encoding="utf-8"?>
<string xmlns="http://www.webserviceX.NET"><?xml version="1.0" encoding="utf-16"?>
<CurrentWeather>
<Location>BOSTON LOGAN INTERNATIONAL, MA, United States (KBOS) 42-22N 071-01W 54M</Location>

Resources