Stripping whitespaces from an xml element using ElementTree - python-3.x

I'm having difficulty removing leading and trailing whitespace, even white space between elements that are deemed excessive. For the sake of the example, this is the xml document I'm currently running test cases on:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
<description>Liechtenstein has a lot of flowers. </description>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
<description>Singapore has a lot of street markets.</description>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
<description> Panama has a lot of great food.</description>
</country>
</data>
Notice how in description for country name = "Liechtenstein" there is excess whitespace at the end of the description or excess white space between neighbor and description in the second country element or excess leading whitespace in description of the third country node.
Every time I run my code:
# Remove whitespace for each element in the tree
for elem in root.iter():
elem.text = elem.text.strip()
elem.tail = elem.tail.strip()
I end up with the following error:
AttributeError: 'NoneType' object has no attribute 'strip'

import xml.etree.ElementTree as ET
file = 'source.xml'
root = ET.parse(file)
for elem in root.iter():
if elem.text is not None:
elem.text = elem.text.strip()
if elem.tail is not None:
elem.tail = elem.tail.strip()
# print XML with stripped out whitespace
ET.dump(root)
# pretty print XML with stripped out whitespace
ET.indent(root, space="\t", level=0)
ET.dump(root)
Output (stripped out whitespace):
<data><country name="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc><neighbor name="Austria" direction="E" /><neighbor name="Switzerland" direction="W" /><description>Liechtenstein has a lot of flowers.</description></country><country name="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighbor name="Malaysia" direction="N" /><description>Singapore has a lot of street markets.</description></country><country name="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighbor name="Costa Rica" direction="W" /><neighbor name="Colombia" direction="E" /><description>Panama has a lot of great food.</description></country></data>
Output (pretty-printed with stripped out whitespace):
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E" />
<neighbor name="Switzerland" direction="W" />
<description>Liechtenstein has a lot of flowers.</description>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N" />
<description>Singapore has a lot of street markets.</description>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W" />
<neighbor name="Colombia" direction="E" />
<description>Panama has a lot of great food.</description>
</country>
</data>

Related

Removing the same element across all the nodes of an XML tree

For example sake, this is the xml file that I'm working with:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
<description>Liechtenstein has a lot of flowers.</description>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
<description>Singapore has a lot of street markets.</description>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
<description>Panama has a lot of great food.</description>
</country>
</data>
How would I write the code such that I could delete one node element (i.e. year or description) across each of the country nodes. For example, in the following code:
# To remove
# for country in root.findall('country'):
# year = int(country.find('year').text)
# if year > 2010:
# root.remove(country)
# tree.write('sample.xml')
I can remove any country nodes whose attribute of the element year is greater than 2010. But that removes the entire node, not just the year element. I know that I can remove a single element of a node with the following:
# for country in root.findall('country'):
# description_node = country.find('description')
# if description_node.text == "Singapore has a lot of street markets.":
# country.remove(description_node)
# tree.write('sample.xml')
But now I want to create a condition where I delete the description element or the year element or the neighbor element throughout all of the country nodes present.
One option might be the following that uses .findall and .remove:
import xml.etree.ElementTree as ET
file = 'source.xml'
data = ET.parse(file)
for country in data.findall('country'):
for neighbor in country.findall('neighbor'):
country.remove(neighbor)
for year in country.findall('year'):
country.remove(year)
for description in country.findall('description'):
country.remove(description)
ET.dump(data)
Output:
python yourscript.py
<data>
<country name="Liechtenstein">
<rank>1</rank>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<gdppc>59900</gdppc>
</country>
<country name="Panama">
<rank>68</rank>
<gdppc>13600</gdppc>
</country>
</data>
In XSLT 3.0 you can do, for example:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="year[. > 2000]"/>
</xsl:transform>
The empty template rule causes elements that match the predicate to be removed; the xsl:mode instruction causes everything else to be retained.

placing child element of xml in variable

Hi I'm new to xml never used it before and I'm trying to place two child elements in a variable each. So here's the XML data I'm using:
<?xml version="1.0"?>
<data>
<counties>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
</countries>
<cities>
<city id="1036323110">
<city>Katherine</city>
<country>Australia</country>
<capital>Australia</capital>
<population>1488</population>
</city>
</cities>
</data>
So I'm trying to get a variable that contains each child branch and this is what I've tried so far:
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
country = root.find(".//countries")
city = root.find(".//cities")
Am I right in approaching it in this method? Thank you

Dash player for segment list with non-zero repeat count

<Period programDateTime='2021-03-17T07:15:26.239Z' duration='PT1M04.078S'>
<AdaptationSet contentType='video' mimeType='video/mp4' par='16:9' id='7'>
<SegmentList timescale='90000' presentationTimeOffset='4327131510'>
<SegmentTimeline>
<!-- Doesn't work -->
<S t='4327131510' d='2903310' r='1' />
<!-- Uncomment below and remove the above and manifest will work -->
<!-- <S t='4327131510' d='2903310'/>-->
<!-- <S t='4330034820' d='2903310'/>-->
</SegmentTimeline>
</SegmentList>
<Representation id='1280x720' codecs='avc1.4d0029' width='1280' height='720' bandwidth='100000'>
<BaseURL>http://localhost:8000/downloaded/</BaseURL>
<SegmentList>
<Initialization sourceURL='1.m4v' />
<SegmentURL media='1.m4v' />
<SegmentURL media='2.m4v' />
</SegmentList>
</Representation>
</AdaptationSet>
</Period>
Dash player starts playing the manifest however, it skips the 2.m4v and reduces the size of video to just first segment. However, if I remove the repeat count field and specify each segment explicitly in segmentTimeline then it works fine.
This manifest also works fine in Shaka player.
Your manifest doesn't look right. I don't think the first segment list with the time line should be there. I would try:
<Period programDateTime='2021-03-17T07:15:26.239Z' duration='PT1M04.078S'>
<AdaptationSet contentType='video' mimeType='video/mp4' par='16:9' id='7'>
<Representation id='1280x720' codecs='avc1.4d0029' width='1280' height='720' bandwidth='100000'>
<BaseURL>http://localhost:8000/downloaded/</BaseURL>
<SegmentList timescale='90000' duration='2903310'>
<Initialization sourceURL='1.m4v' />
<SegmentURL media='1.m4v' />
<SegmentURL media='2.m4v' />
</SegmentList>
</Representation>
</AdaptationSet>
</Period>

I would like to find friends child element is present or not in specific parent element like 'Liechtenstein'

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
ss= 'Liechtenstein'
tag_names = set (t.tag for t in root.findall(".//*[#name=ss]/friends"))
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
<friends>
<frined name="arun" />
</friends>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
I would like to find friends child element is present or not in specific parent element like 'Liechtenstein'
here, tag_names gives empty set. but expecting tag_names=(friends)

Parsing xml data into nested list bash

I'm working on a Plex Geeklet, and I have a string of recently added TV shows.
SHOW_DATA=$(curl --silent "http://localhost:32400/library/sections/3/recentlyAdded?X-Plex-Container-Start=0&X-Plex-Container-Size=10")
This is an example of my data:
<?xml version="1.0" encoding="UTF-8"?>
<MediaContainer size="10" totalSize="50" allowSync="1" art="/:/resources/show-fanart.jpg" identifier="com.plexapp.plugins.library" librarySectionID="3" librarySectionTitle="TV Shows" librarySectionUUID="600cd0c5-fd4b-460a-846b-e4bad1ecdf4a" mediaTagPrefix="/system/bundle/media/flags/" mediaTagVersion="1402960845" mixedParents="1" nocache="1" offset="0" thumb="/:/resources/show.png" title1="TV Shows" title2="Recently Added" viewGroup="episode" viewMode="65592">
<Video ratingKey="588" key="/library/metadata/588" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Pilot" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="Fast-talking lawyer Jeff Winger (Joel McHale) enrolls at Greendale Community College after the State Bar discovered his illegitimate degree and threatened to suspend his license. When Jeff pretends to be a Spanish tutor to get close to his classmate Britta (Gillian Jacobs), he winds up with an entire study group of students looking for his help. Pierce (Chevy Chase), Abed (Danny Pudi), Shirley (Yvette Nicole Brown), Annie (Alison Brie), Troy (Donald Glover), and Britta comprise the band of misfits that Jeff never asked for, but may end up needing when he realizes his connection to Greendale professor Ian Duncan (John Oliver) won&apos;t pay off like he hoped." index="1" parentIndex="1" rating="7.4000000953674299" year="2009" thumb="/library/metadata/588/thumb/1403755683" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1525134" originallyAvailableAt="2009-09-17" addedAt="1403755618" updatedAt="1403755683">
<Media videoResolution="480" id="479" duration="1525134" bitrate="2509" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="522" key="/library/parts/522/file.mp4" duration="1525134" file="/Users/joe/Videos/TV Shows/Community/Season 1/01 Pilot.mp4" size="478232014" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Dan Harmon" />
<Director tag="Anthony Russo" />
<Director tag="Joe Russo" />
</Video>
<Video ratingKey="589" key="/library/metadata/589" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Spanish 101" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="Jeff&apos;s (Joel McHale) efforts to win over Britta (Gillian Jacobs) backfire, and he finds himself paired up with Pierce (Chevy Chase) for their Spanish class project. The two give teacher Señor Chang (Ken Jeong) the presentation of a lifetime. Meanwhile, inspired by Britta&apos;s awareness of social issues, Annie (Alison Brie) and Shirley (Yvette Nicole Brown) stage a protest on Greendale&apos;s campus." index="2" parentIndex="1" rating="7.4000000953674299" year="2009" thumb="/library/metadata/589/thumb/1403755684" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1278352" originallyAvailableAt="2009-09-24" addedAt="1403755618" updatedAt="1403755684">
<Media videoResolution="480" id="480" duration="1278352" bitrate="2253" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="523" key="/library/parts/523/file.mp4" duration="1278352" file="/Users/joe/Videos/TV Shows/Community/Season 1/02 Spanish 101.mp4" size="359953984" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Dan Harmon" />
<Director tag="Joe Russo" />
</Video>
<Video ratingKey="591" key="/library/metadata/591" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Introduction to Statistics" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="It&apos;s Halloween at Greendale, and Jeff (Joel McHale) has the hots for one of his teachers (Lauren Stamile) and gets dating advice from Señor Chang (Ken Jeong). Meanwhile Annie (Alison Brie) throws a ""Dia de los Muertos" party for extra credit." index="7" parentIndex="1" rating="7.9000000953674299" year="2009" thumb="/library/metadata/591/thumb/1403755686" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1276610" originallyAvailableAt="2009-10-29" addedAt="1403755618" updatedAt="1403755686">
<Media videoResolution="480" id="482" duration="1276610" bitrate="2258" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="525" key="/library/parts/525/file.mp4" duration="1276610" file="/Users/joe/Videos/TV Shows/Community/Season 1/07 Introduction to Statistics.mp4" size="360268838" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Jon Pollack" />
<Writer tag="Tim Hobert" />
<Director tag="Justin Lin" />
</Video>
<Video ratingKey="592" key="/library/metadata/592" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Home Economics" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="Britta (Gillian Jacobs) tries to rid Jeff (Joel McHale) of his materialistic ways. Meanwhile Pierce (Chevy Chase) joins a rock band on campus, and Annie (Alison Brie) grudgingly helps Troy (Donald Glover) plan a date with another girl." index="8" parentIndex="1" rating="7.5999999046325701" year="2009" thumb="/library/metadata/592/thumb/1403755686" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1275844" originallyAvailableAt="2009-11-05" addedAt="1403755618" updatedAt="1403755686">
<Media videoResolution="480" id="483" duration="1275844" bitrate="2340" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="526" key="/library/parts/526/file.mp4" duration="1275844" file="/Users/joe/Videos/TV Shows/Community/Season 1/08 Home Economics.mp4" size="373156573" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Lauren Pomerantz" />
<Director tag="Anthony Russo" />
</Video>
<Video ratingKey="593" key="/library/metadata/593" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Comparative Religion" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="Shirley (Yvette Nicole Brown) tries to get everyone in the Christmas spirit, but Jeff (Joel McHale) threatens her holiday cheer when he decides to fight the school bully (guest star Anthony Michael Hall)." index="12" parentIndex="1" rating="7.8000001907348597" year="2009" thumb="/library/metadata/593/thumb/1403755688" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1276355" originallyAvailableAt="2009-12-10" addedAt="1403755618" updatedAt="1403755688">
<Media videoResolution="480" id="484" duration="1276355" bitrate="2446" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="527" key="/library/parts/527/file.mp4" duration="1276355" file="/Users/joe/Videos/TV Shows/Community/Season 1/12 Comparative Religion.mp4" size="390216047" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Liz Cackowski" />
<Director tag="Adam Davidson" />
</Video>
<Video ratingKey="594" key="/library/metadata/594" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Investigative Journalism" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="Everyone&apos;s vibe is thrown off when an unwanted outsider tries to join the study group. Meanwhile, Jeff (Joel McHale) becomes the new editor of Greendale&apos;s school newspaper and appoints Annie (Alison Brie) as his ace reporter." index="13" parentIndex="1" rating="7.5" year="2010" thumb="/library/metadata/594/thumb/1403755689" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1269923" originallyAvailableAt="2010-01-14" addedAt="1403755618" updatedAt="1403755689">
<Media videoResolution="480" id="485" duration="1269923" bitrate="1998" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="528" key="/library/parts/528/file.mp4" duration="1269923" file="/Users/joe/Videos/TV Shows/Community/Season 1/13 Investigative Journalism.mp4" size="317146865" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Jon Pollack" />
<Writer tag="Tim Hobert" />
<Director tag="Joe Russo" />
</Video>
<Video ratingKey="595" key="/library/metadata/595" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Romantic Expressionism" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="Britta (Gillian Jacobs) and Jeff (Joel McHale) stage an intervention when Annie (Alison Brie) gets cozy with Vaughn (Eric Christian Olsen). Meanwhile Pierce (Chevy Chase) struggles to prove his wit when he crashes Abed (Danny Pudi) and Troy’s (Donald Glover) movie night." index="15" parentIndex="1" rating="7.9000000953674299" year="2010" thumb="/library/metadata/595/thumb/1403755689" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1274799" originallyAvailableAt="2010-02-04" addedAt="1403755618" updatedAt="1403755689">
<Media videoResolution="480" id="486" duration="1274799" bitrate="2059" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="529" key="/library/parts/529/file.mp4" duration="1274799" file="/Users/joe/Videos/TV Shows/Community/Season 1/15 Romantic Expressionism.mp4" size="328027632" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Andrew Guest" />
<Director tag="Joe Russo" />
</Video>
<Video ratingKey="596" key="/library/metadata/596" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Communication Studies" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="When Britta (Gillian Jacobs) drunk dials Jeff (Joel McHale) things get awkward between them and Jeff attempts to repair their relationship. Meanwhile, Annie (Alison Brie) and Shirley (Yvette Nicole Brown) conspire to humiliate Señor Chang (Ken Jeong)." index="16" parentIndex="1" rating="7.8000001907348597" year="2010" thumb="/library/metadata/596/thumb/1403755691" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1267206" originallyAvailableAt="2010-02-11" addedAt="1403755618" updatedAt="1403755691">
<Media videoResolution="480" id="487" duration="1267206" bitrate="2278" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="530" key="/library/parts/530/file.mp4" duration="1267206" file="/Users/joe/Videos/TV Shows/Community/Season 1/16 Communication Studies.mp4" size="360777073" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Chris McKenna" />
<Director tag="Adam Davidson" />
</Video>
<Video ratingKey="597" key="/library/metadata/597" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Modern Warfare" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="JEFF AND BRITTA&apos;S SEXUAL TENSION HEATS UP ? The sexual tension between Jeff (Joel McHale) and Britta (Gillian Jacobs) becomes a hot topic among the study group. Meanwhile, what starts out as a simple contest for a chance at early class registration turns the peaceful campus of Greendale Community College into an all-out war zone. Friendships are tested, as only one student can be victorious." index="23" parentIndex="1" rating="8.5" year="2010" thumb="/library/metadata/597/thumb/1403755692" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" duration="1260333" originallyAvailableAt="2010-05-06" addedAt="1403755618" updatedAt="1403755692">
<Media videoResolution="480" id="488" duration="1260333" bitrate="2233" width="854" height="480" aspectRatio="1.78" audioChannels="2" audioCodec="aac" videoCodec="h264" container="mp4" videoFrameRate="24p" optimizedForStreaming="0" has64bitOffsets="0">
<Part id="531" key="/library/parts/531/file.mp4" duration="1260333" file="/Users/joe/Videos/TV Shows/Community/Season 1/23 Modern Warfare.mp4" size="351822199" container="mp4" has64bitOffsets="0" optimizedForStreaming="0" />
</Media>
<Writer tag="Emily Cutler" />
<Director tag="Justin Lin" />
</Video>
<Video ratingKey="590" key="/library/metadata/590" parentRatingKey="587" grandparentRatingKey="586" type="episode" title="Advanced Criminal Law" grandparentKey="/library/metadata/586" parentKey="/library/metadata/587" grandparentTitle="Community" contentRating="TV-PG" summary="Señor Chang (Ken Jeong) invokes an inquisition and trial when one of the gang cheats on an exam. Annie (Alison Brie) enlists Pierce (Chevy Chase) to help her compose Greendale&apos;s new school song, and Troy (Donald Glover) educates Abed (Danny Pudi) on the art of joking." index="5" parentIndex="1" rating="7.8000001907348597" year="2009" thumb="/library/metadata/590/thumb/1403755683" art="/library/metadata/586/art/1403755684" parentThumb="/library/metadata/587/thumb/1403755684" grandparentThumb="/library/metadata/586/thumb/1403755684" grandparentTheme="/library/metadata/586/theme/1403755684" originallyAvailableAt="2009-10-15" addedAt="1403755618" updatedAt="1403755683">
<Media id="481" container="">
<Part id="524" key="/library/parts/524/file.mp4" file="/Users/joe/Videos/TV Shows/Community/Season 1/05 Advanced Criminal Law.mp4" size="48" />
</Media>
<Writer tag="Andrew Guest" />
<Director tag="Joe Russo" />
</Video>
</MediaContainer>
I'm trying to get this into an orders list like this
[ [show_title_1, episode_title_1], [show_title_2, episode_title_2], ... [show_title_10, episode_title_10] ]
so I can eventually print it as (for example)
Community: Pilot
Arrested Development: My Mother, the Car
I've been able to get them into separate newline-delimited strings like so:
SHOW_NAMES=$("$SHOW_DATA" | grep -o 'grandparentTitle="\([^"]*\)"' | sed -e 's/grandparentTitle="//' -e 's/"//' | perl -MHTML::Entities -ne 'print decode_entities($_)'))
SHOW_TITLES=$("$SHOW_DATA" | grep -o 'title="\([^"]*\)"' | sed -e 's/title="//' -e 's/"//' | perl -MHTML::Entities -ne 'print decode_entities($_)'))
So is it possible to convert them both to lists, and use a loop to construct a new list containing nested lists of those values? I've really hit a wall here.
If I were you I would get rid of all the grep and sed, etc.. and just rely on pattern matching. I guessed at which exact fields you wanted, but you can easily update the code as needed. I used your data file to test with, so this will work. The script takes 1 input, that being the filename to the downloaded .xml file:
#!/bin/bash
test -r "$1" || { echo "error: invalid input, usage: ${0//*\//} filename.xml"; exit 1; }
let idx=0
while read line || test -n "$line"; do
if test "${line:0:2}" == '<V'; then
tmp=${line##* title=}
title=${tmp%% grandparentKey*}
tmp=${line##*grandparentTitle=}
gptitle=${tmp%% contentRating*}
if test "$idx" -lt "1" ; then
let idx=1
echo -n "[ [ $title, $gptitle ]"
else
echo -n ", [ $title, $gptitle ]"
fi
fi
done <"$1"
echo " ]"
exit 0
The following will read the values into an array to allow later processing. You can add additional arrays as you like, you can even read the entire file into an array if you like (but re-reading if from disk is simple enough as well) The output is the same as above (it is just for illustration):
let idx=0
declare -a title
declare -a gptitle
declare -a allvideo
while read line || test -n "$line"; do
if test "${line:0:2}" == '<V'; then
allvideo+=( "$line" )
tmp=${line##* title=}
title+=( "${tmp%% grandparentKey*}" )
tmp=${line##*grandparentTitle=}
gptitle+=( "${tmp%% contentRating*}" )
fi
done <"$1"
# output the original 2 variables
for ((i=0; i<${#title[#]}; i++)); do
if test "$i" -eq 0 ; then
echo -n "[ [ ${title[$i]}, ${gptitle[$i]} ]"
else
echo -n ", [ ${title[$i]}, ${gptitle[$i]} ]"
fi
done
echo " ]"
oldifs=$IFS
IFS=$'\n' # set Internal Field Separator to only break on newlines
# output the entire file with the allvideo array
for i in ${allvideo[#]}; do
echo "$i"
done
IFS=$oldifs
output:
[ [ "Pilot", "Community" ], [ "Spanish 101", "Community" ], \
[ "Introduction to Statistics", "Community" ], [ "Home Economics", "Community" ], \
[ "Comparative Religion", "Community" ], [ "Investigative Journalism", "Community" ], \
[ "Romantic Expressionism", "Community" ], [ "Communication Studies", "Community" ], \
[ "Modern Warfare", "Community" ], [ "Advanced Criminal Law", "Community" ] ]
**the dump of the original file is omitted for brevity.
Let me know if you have any additional questions.

Resources