I am having trouble extracting the "EXTRACT_THIS_PLEASE" from a similar XML file using xmllint --xpath. I understand sed and awk should not be used from some Googling. I also see that other XML parsers are usually recommended, but this is the only one I seem to have on my RHEL system. I have tried various things and understand that the issue has to do with white spaces.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<model-response-list xmlns="http://www.website.com/thing/link/linktothing/linklink" total-models="1" throttle="1" error="EndOfResults">
<model-responses>
<model mh="0x12345678">
<attribute id="0x12345">EXTRACT_THIS_PLEASE</attribute>
</model>
</model-responses>
</model-response-list>
EDIT: kjhughes and j_b, you guys are both wizards. Thank you so much. Could I also also extract 0x12345678 from "". I am looking to do this 5000+ times and ultimately have a list of devices in rows or columns like this:
"0x12345678
EXTRACT_THIS_PLEASE
0x99999999
EXTRACT_THIS_PLEASE
0x11111111
NOTHING
0x33333333
EXTRACT_THIS_PLEASE
0x22222222
NOTHING"
This xmllint command line,
xmllint --xpath "//*[#id='0x12345']/text()" file.xml
will select
EXTRACT_THIS_PLEASE
as requested.
See also
Daniel Haley's answer showing how to use XML namespaces in xmllint.
Another option to extract the contents of the <attribute> elemenet:
xmllint --xpath "//*[name()='attribute']/text()" x.xml
Output:
EXTRACT_THIS_PLEASE
Related
I'm trying to export a spreadsheet that has some XML in some of the cells of the table.
ID (column A): 23455
FACT (column B) (this code is copied & pasted from a sample cell - they don't all have this simplicity or structure):
"<div class=""fact"">
<p><strong>FACT.</strong> The closest star to our solar system is Alpha Centauri.</p>
</div>
"
I'd like to have XML like the following:
<record>
<ID>23455</ID>
<FACT><div class="fact"><p><strong>FACT.</strong> The closest star to our solar system is Alpha Centauri.</p></div></FACT>
</record>
This is complex enough that I doubt that Excel's native XML schema export will work (that thing is persnickety enough that I can't get it to work with simplest of data values).
My current thought is to write a Perl script, to read this as a CSV file and export XML. However, I've noticed that CSV does a poor job handling XML that's been "embedded" like this.
I'm hoping someone else might have a better suggestion for how to pull this information out.
Edit: Finally figured out the mistake I made with export. Can export and get the following:
<record>
<ID>23455</ID>
<FACT><div class="fact"><p><strong>FACT.</strong> The closest star to our solar system is Alpha Centauri.</p></div>
</FACT>
</record>
I think I can work with this...some regex and it might be good enough (looking for all < might put me at risk of killing a true less-than sign).
So I'm still open to suggestions
Just posting this as the answer...
If you export the column as text you can get the following:
<record>
<ID>23455</ID>
<FACT><div class="fact"><p><strong>FACT.</strong> The closest star to our solar system is Alpha Centauri.</p></div>
</FACT>
</record>
In an XML editor I did a find and replace to get all the tags using the following regex: s/<(\/?[\w\s="-_]+?)>/<$1>/
It's a bit dangerous if there are actual signs in the document, but you'd need a case where it was < /maybe and text with common tag symbols ="-_ > - possible but most equations are of the form X < Y < Z. Our content doesn't use <> all that much, so I can be fairly confident it won't catch the edge case.
I also "fixed" all the HTML (s/<b>/<b/>/ and s/<img (.*?)>/<img $1/>/) and checked parsing (theoretically an edge case would cause a parsing error).
And yes, I now have a doc in mixed DTD that will make all true XML peeps quake with horror, but I can work with it.
I have multiple blocks of the below pattern
<APPLIANCE>
<ID>12233</ID>
<UUID>xxxx-xxxx-xxxx-xxxx-xxxxxxx</UUID>
<NAME>xxxxxxx</NAME>
<STATUS>Offline</STATUS>
</APPLIANCE>
<APPLIANCE>
<ID>12234</ID>
<UUID>xxxx-xxxx-xxxx-xxxx-xxxxxxx</UUID>
<NAME>yyyyy</NAME>
<STATUS>Offline</STATUS>
</APPLIANCE>
I want to extract a block with Particular ID and Particular Name.
The output should display
For example :-
<ID>12234</ID>
<NAME>yyyyy</NAME>
I wanted to do using grep, sed, awk
Thanks.
This sed should work for you:
sed -n '/<ID>12234/,/<NAME>/{//p}' file
But you'd better use an xml parser as xmllint or xmlstarlet to parse valid xml files.
I'm calling SchemaCrawler in the following way:
call java -classpath ../_schemacrawler/lib/*;lib/* schemacrawler.Main -server=mysql -database=db_db -host=localhost -user=user -password=pwd -infolevel=maximum -command=brief -portablenames=false -tabletypes=TABLE -routines=.*\.X.*.* -routines=.*\.X.*.* -outputformat=html -o=html.html %*
It generates a nice html output. But I would like to see the table COMMENT text. It appears for the case of columns but cannot find a way to see the same for tables.
I guess it's related to the -noremarks options but I have already tried it without success.
How should I proceed?
I have been trying to make use of this module for some time now. I have many lists of dictionaries, that I want to convert into xml format. However, I want each list to essentially have its own 'table'. However When I try doing something along the lines of:
xml = dicttoxml.dictoxml(myList, root = False,
custom_root = "MyName",
attr_type = False)
I get every dict displayed as an <item> type. Shouldn't this produce what the module's owner refers to as an "xml snippet" that also is identified by the custom_root name?
Essentially I want each list to have its own identifier but not be created as 'root'. Basically where the following would have each item number associated to a certain list. Either encapsulating the whole list or each dict in the list would be suitable, I believe.
<root>
<item1>
#dict info
</item1>
<item2>
#dict info
</item2>
</root>
I fixed my problem by using just the custom_root variable in my call and leaving root = True. Then, I stripped the leading
b'<?xml version="1.0" encoding="UTF-8" ?>'
by calling
xml.partition(b'<?xml version="1.0" encoding="UTF-8" ?>')[2]
From then on, I created a file with <root> </root> tags and had the xml i created appended in between these tags.
I have a xml like shown below
<?xml version="1.0" encoding="UTF-8"?>
<schools>
<city>Marshall</city>
<state>Maryland</state>
<highschool>
<schoolname>Marshalls</schoolname>
<department id="1">
<deptCode seq="1">D1</deptCode>
<deptName seq="2">Chemistry</deptName>
<deptHead seq="3">Henry Carl</deptHead>
<deptRank seq="4">L</deptRank>
</department>
<department id="2">
..
..
..
</highschool>
</schools>
In XSL i am copying the contents from department based on deptCode using
<xsl:copy-of select="*">
This produces result with all the attributes in the element tags.
Is it possible to ignore the attributes while using xsl:copy-of?
The desired result is like shown below
<deptCode>D1</deptCode>
<deptName>Chemistry</deptName>
<deptHead>Henry Carl</deptHead>
<deptRank>L</deptRank>
xsl:valueOf is working as required but i am trying to know if it
can be done with in xsl:copy-of? As a note, in my requirement, there are nearly 5 or 6 attributes for each element. Can someone please help? Thanks in Advance..
regards
Udayakiran
xsl:valueOf is working as required but i am trying to know if it can
be done with in xsl:copy-of?
No. xsl:copy-of is a package deal, you cannot pick and choose. To avoid repetitive coding, use a template matching department/*.