Comparing two strings in linux KSH script - linux

I have two files: file1 and file2. Here is a sample of the files content:
<TG>
<entry name="KEYNAME" val="" type="string" />
<entry name="KEYTYPE" val="" type="string" />
<entry name="TIMEZONE_OFFSET" val="-240" type="INT16" />
...
</TG>
I want to check if certain lines (those containing the string "entry name") in file1 exist in file2, and then if yes, compare if the two lines are identical or not.
I literally created file2 by copying file1, then I changed a few values. The problem is that comparing the two string variables does not return correct values. Although when displaying the two variables I can see they are identical, the result of the comparison gives that they are not. I am using Ksh. Here is my code:
while read p; do
if [[ $p == *"entry name"* ]]; then
PARAM_NAME=$(echo $p | cut -d '"' -f2)
echo $PARAM_NAME
PARAM_OLD=$(grep $PARAM_NAME file2)
if [[ $PARAM_OLD == *"entry name"* ]]; then
echo $PARAM_OLD
echo $p
if [ "$PARAM_OLD" = "$p" ]; then
echo 'Identical values'
else
echo 'Different values'
fi
else
echo "$PARAM_NAME does not exist in previous version file. Using default value"
fi
fi
done <file1
I tried all possibilities for the parenthesis, equal signs and quotations ([], [[]], = , ==, "", '""', etc.)
Here is the output I am getting:
<entry name="KEYNAME" val="" type="string" />
KEYNAME
<entry name="KEYNAME" val="" type="string" />
<entry name="KEYNAME" val="" type="string" />
Different values
<entry name="KEYTYPE" val="" type="string" />
KEYTYPE
<entry name="KEYTYPE" val="" type="string" />
<entry name="KEYTYPE" val="" type="string" />
Different values
<entry name="TIMEZONE_OFFSET" val="-24" type="INT16" />
TIMEZONE_OFFSET
<entry name="TIMEZONE_OFFSET" val="-240" type="INT16" />
<entry name="TIMEZONE_OFFSET" val="-24" type="INT16" />
Different values
Still I am getting that the strings are different! I would appreciate any explanation and help.

Most recent/newer OSs have support for both ksh and ksh93.
With ksh93 we can use an associative array to limit ourselves to a single pass through each file.
First some sample data:
$ cat file1
<TG>
<entry name="KEYNAME" val="" type="string" />
<entry name="KEYTYPE" val="" type="string" />
<entry name="KEYATTRIB" val="" type="string" />
<entry name="TIMEZONE_OFFSET" val="-241" type="INT16" />
</TG>
$ cat file2
<TG>
<entry name="KEYNAME" val="" type="string" />
<entry name="KEYTYPE" val="" type="stringX" />
<entry name="TIMEZONE_OFFSET" val="-240" type="INT16" />
</TG>
The ksh93 script:
$ cat my_comp
#!/bin/ksh93
unset pline
typeset -A pline
# pull unique list of 'entry name' lines from file2 and store in our associative array pline[]:
egrep "entry name" file2 | sort -u | while read line
do
# strip out the 'entry name' value
x=${line#*\"}
pname=${x%%\"*}
# use the 'entry name' value as the index for our pline[] array
pline[${pname}]=${line}
done
# for each unique 'entry name' line in file1, see if we have a match in file2 (aka our pline[] array):
egrep "entry name" file1 | sort -u | while read line
do
# again, strip out the 'entry name' value
x=${line#*\"}
pname=${x%%\"*}
# if pname does not exist in file2
[ "${pline[${pname}]}" = '' ] && \
echo "\npname = '${pname}' : Does not exist in file2. Using default value:" && \
echo "file1: ${line}" && \
continue
# if pname exists in file2 but line is different
[ "${pline[${pname}]}" = "${line}" ] && \
echo "\npname = '${pname}' : Identical values for pname" && \
echo "file1: ${line}" && \
echo "file2: ${pline[${pname}]}" && \
continue
# if pname exists in file2 and line is the same
[ "${pline[${pname}]}" != "${line}" ] && \
echo "\npname = '${pname}' : Different values for pname" && \
echo "file1: ${line}" && \
echo "file2: ${pline[${pname}]}"
done
Running the script against the sample files:
$ my_comp
pname = 'KEYATTRIB' : Does not exist in file2. Using default value:
file1: <entry name="KEYATTRIB" val="" type="string" />
pname = 'KEYNAME' : Identical values for pname
file1: <entry name="KEYNAME" val="" type="string" />
file2: <entry name="KEYNAME" val="" type="string" />
pname = 'KEYTYPE' : Different values for pname
file1: <entry name="KEYTYPE" val="" type="string" />
file2: <entry name="KEYTYPE" val="" type="stringX" />
pname = 'TIMEZONE_OFFSET' : Different values for pname
file1: <entry name="TIMEZONE_OFFSET" val="-241" type="INT16" />
file2: <entry name="TIMEZONE_OFFSET" val="-240" type="INT16" />
Back to plain ol' basic ksh:
$ cat my_comp2
#!/bin/ksh
egrep "entry name" file1 | sort -u | while read line1
do
x=${line1#*\"}
pname=${x%%\"*}
# see if we can find a matching line in file2; need to strip off leading
# spaces in order to match with line1
unset line2
line2=$( egrep "entry name.*${pname}" file2 | sed 's/^ *//g' )
# if pname does not exist in file2
[ "${line2}" = '' ] && \
echo "\npname = '${pname}' : Does not exist in file2. Using default value:" && \
echo "file1: ${line1}" && \
continue
# if pname exists in file2 but lines are different
[ "${line2}" = "${line1}" ] && \
echo "\npname = '${pname}' : Identical values for pname" && \
echo "file1: ${line1}" && \
echo "file2: ${line2}" && \
continue
# if pname exists in file2 and lines are the same
[ "${line2}" != "${line1}" ] && \
echo "\npname = '${pname}' : Different values for pname" && \
echo "file1: ${line1}" && \
echo "file2: ${line2}"
done
Running the script against the sample files:
$ my_comp2
pname = 'KEYATTRIB' : Does not exist in file2. Using default value:
file1: <entry name="KEYATTRIB" val="" type="string" />
pname = 'KEYNAME' : Identical values for pname
file1: <entry name="KEYNAME" val="" type="string" />
file2: <entry name="KEYNAME" val="" type="string" />
pname = 'KEYTYPE' : Different values for pname
file1: <entry name="KEYTYPE" val="" type="string" />
file2: <entry name="KEYTYPE" val="" type="stringX" />
pname = 'TIMEZONE_OFFSET' : Different values for pname
file1: <entry name="TIMEZONE_OFFSET" val="-241" type="INT16" />
file2: <entry name="TIMEZONE_OFFSET" val="-240" type="INT16" />

You have just 1 = in your if statement, change it to two:
if [ "$PARAM_OLD" = "$p" ]; then
to:
if [ "$PARAM_OLD" == "$p" ]; then
Also (isn't the problem now, but might be your next problem), surround $PARAM_OLD with " in the following line:
if [[ $PARAM_OLD == *"entry name"* ]]; then
so it becomes:
if [[ "$PARAM_OLD" == *"entry name"* ]]; then

Transforming from XML to Tab-Separated Key/Value Pairs
The following will transform your content into tab-separate key-value form:
xml_to_tsv() {
xmlstarlet sel -t -m '//entry[#name]' -v ./#name -o $'\t' -v ./#value -n
}
Extracting Unique Lines From Each
Thus, if you want to compare your two streams, the following will emit only lines unique to the first file:
comm -23 <(xml_to_tsv <one.xml | sort) <(xml_to_tsv <two.xml | sort)
...and the following will emit only lines unique to the second:
comm -13 <(xml_to_tsv <one.xml | sort) <(xml_to_tsv <two.xml | sort)
Doing The Same, Without XMLStarlet
If you don't have XMLStarlet installed, you can generate an XSLT template to perform the same operation. Thus, if you have the following file as extract_entries.xslt:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="//entry[#name]">
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="./#name"/>
</xsl:call-template>
<xsl:text> </xsl:text>
<xsl:call-template name="value-of-template">
<xsl:with-param name="select" select="./#value"/>
</xsl:call-template>
<xsl:value-of select="'
'"/>
</xsl:for-each>
</xsl:template>
<xsl:template name="value-of-template">
<xsl:param name="select"/>
<xsl:value-of select="$select"/>
<xsl:for-each select="exslt:node-set($select)[position()>1]">
<xsl:value-of select="'
'"/>
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
...you could use the following xml_to_tsv implementation on systems that don't have XMLStarlet at all:
xml_to_tsv() {
xsltproc extract_entries.xslt -
}

Using the following two data files and one transform file:
file1.xml:
<TG>
<entry name="common" value="foo"/>
<entry name="changed" value="bar"/>
<entry name="unique1" val="qux"/>
</TG>
file2.xml:
<TG>
<entry name="common" value="foo"/>
<entry name="changed" value="bar"/>
<entry name="unique2" val="quux"/>
</TG>
transform.xslt:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<!-- To be passed with xsltproc's stringparam argument -->
<xsl:param name="other"/>
<!-- Convenience aliases -->
<xsl:variable name="file1" select="/"/>
<xsl:variable name="file2" select="document($other)"/>
<xsl:template match="/">
<results>
<common_entries>
<xsl:for-each select="$file1/TG/entry[#name]">
<xsl:variable name="node1" select="."/>
<xsl:variable name="node2" select="$file2/TG/entry[#name=$node1/#name]"/>
<!-- xpath 1.0, the only version people use, lacks the deep-equal() function -->
<xsl:if test="$node1/#value = $node2/#value">
<xsl:apply-templates select="$node1"/>
</xsl:if>
</xsl:for-each>
</common_entries>
<changed_entries>
<xsl:for-each select="$file1/TG/entry[#name]">
<xsl:variable name="node1" select="."/>
<xsl:variable name="node2" select="$file2/TG/entry[#name=$node1/#name]"/>
<xsl:if test="$node1/#value != $node2/#value">
<diff>
<old>
<xsl:apply-templates select="$node1"/>
</old>
<new>
<xsl:apply-templates select="$node2"/>
</new>
</diff>
</xsl:if>
</xsl:for-each>
</changed_entries>
<unique1_entries>
<xsl:for-each select="$file1/TG/entry[not(#name=$file2/TG/entry/#name)]">
<xsl:apply-templates select="."/>
</xsl:for-each>
</unique1_entries>
<unique2_entries>
<xsl:for-each select="$file2/TG/entry[not(#name=$file1/TG/entry/#name)]">
<xsl:apply-templates select="."/>
</xsl:for-each>
</unique2_entries>
</results>
</xsl:template>
<!-- Standard identity transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Then running xsltproc --stringparam other file2.xml transform.xslt file1.xml will produce:
<?xml version="1.0"?>
<results>
<common_entries>
<entry name="common" value="foo"/>
</common_entries>
<changed_entries>
<diff>
<old>
<entry name="changed" value="bar"/>
</old>
<new>
<entry name="changed" value="baz"/>
</new>
</diff>
</changed_entries>
<unique1_entries>
<entry name="unique1" val="qux"/>
</unique1_entries>
<unique2_entries>
<entry name="unique2" val="quux"/>
</unique2_entries>
</results>

Related

how to get values from file using sed,awk or grep on linux command/scripting?

i have file1 with value:
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
i want to get the contents of the file using linux scripting or command (sed, grep or awk). example output:
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%/20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/%2/20Buck%20%282019%30
my code:
grep -oP 'href="([^".]*)">([^</.]*)' file1
please help i am newbie :)
$ awk -v RS='<[^>]+>' 'NF{printf "%s", $0 (++c%2?" |":ORS)}' file
stick man (2020)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/
python easy (2019)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/
note that forward slashes are in your original data
requires multi-char RS support (GNU awk).
This
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
does looks like piece of HTML file. If you are allowed to install utilites in your system I suggest giving a try hxselect which is useful when you want to extract something you can describe in CSS language. For example to get content of all columns whose label is referensi from file.html:
cat file.html | hxselect -i -c -s '\n' column[label=referensi]
With awk you can try:
awk -F'>|/<' '{ORS= (NR == 3 || NR == 7) ? " |" : "\n"} $2 != "" {print $2}' file
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30
Or shorter:
awk -F'>|/<' '{ORS= (NR%2) ? " |" : RS} $2 != "" {print $2}' file

Get a tag value in multi line XML using shell script [duplicate]

This question already has answers here:
How to parse XML in Bash?
(17 answers)
Extract data from xml file using shell commands
(4 answers)
Closed 3 years ago.
I have a xml file as follows
<Module dataPath="/abc/def/xyz" handler="DataRegistry" id="id1" path="test.so"/>
<Module id="id2" path="/my/file/path">
<Config>
<Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/xyz" />
<Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/xyz" id="V2"/>
</Config>
</Module>
I just want to extract value of dataPath from every moduleid.
I was using, the command like
`grep 'id2' file | grep -ioPm1 "(?<=DataPath=)[^ ]+"`
which is giving me from the first module id, not for second module id. because second module is in multiple lines.
How can i do this using shell script?
Desired output would be– if i want to get the datapath of id1 module, then is should get
/my/file/path
Of for second module id, say id2, i should get datapath separated by comma
/my/file/path, /my/file/path
Or my second approach to grep the datapath is to replace the newline character between <Module and </Module> only, then i can use grep.
-m1 tells grep to exit after first matching line, that's why it prints only one line of output.
I wouldn't use a line oriented tool for this though. There are more convenient tools out there for parsing XML, such as xmlstarlet:
xml sel -t -m '//#dataPath' -v . -n file.xml
Firstly my answer assumes that you have actual well formed source XML. The example code you've provided doesn't have a root element - but I'll assume there is one anyway.
Bash features by themselves are not very well suited parsing XML.
This renowned Bash FAQ states the following:
Do not attempt [to extract data from an XML file] with sed, awk, grep, and so on (it leads to undesired results)
If you must use a shell script then utilize an XML specific command line tool, such as XMLStarlet or xsltproc. Refer to the download info here for XML Starlet if you don't have it installed already.
Solution:
Given your source XML and your desired output consider utilizing the following xslt template to achieve this.
template.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="node()|#*">
<xsl:apply-templates select="node()|#*"/>
</xsl:template>
<xsl:template match="Module">
<xsl:choose>
<xsl:when test="#dataPath and not(descendant::*/#dataPath)">
<xsl:value-of select="#dataPath"/>
<xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="not(#dataPath) and descendant::*/#dataPath">
<xsl:for-each select="descendant::*/#dataPath">
<xsl:value-of select="."/>
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="#dataPath and descendant::*/#dataPath">
<xsl:value-of select="#dataPath"/>
<xsl:text>, </xsl:text>
<xsl:for-each select="descendant::*/#dataPath">
<xsl:value-of select="."/>
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Then run either;
the following XML Starlet command:
$ xml tr /path/to/template.xsl /path/to/input.xml
Or the following xsltproc command:
$ xsltproc /path/to/template.xsl /path/to/input.xml
Note: The pathnames to template.xsl and input.xml in the aforementioned command(s) should be redefined to wherever those files reside.
Either of the commands above essentially transform your input.xml file and print the desired results.
Demo:
Using the following input.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<Module dataPath="/abc/def/1" handler="DataRegistry" id="id1" path="test.so"/>
<Module id="id2" path="/my/file/path">
<Config>
<Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/2" />
<Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/3" id="V2"/>
</Config>
</Module>
<Module id="id3" path="/my/file/path" dataPath="/abc/def/4">
<Config>
<Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/5" />
<Source cutoffpackage="1" dailyStart="20060819" dataPath="/abc/def/6" id="V2"/>
</Config>
</Module>
<Module id="id4" path="/my/file/path" dataPath="/abc/def/7"/>
<Module id="id5" path="/my/file/path" dataPath="/abc/def/8"/>
<!-- The following <Module>'s have no associated `dataPath` attribute -->
<Module id="id6">
<Config>
<Source cutoffpackage="1" dailyStart="20060819" id="V2"/>
</Config>
</Module>
<Module id="id7"/>
</root>
Then running either of the aforementioned commands prints the following result:
/abc/def/1
/abc/def/2, /abc/def/3
/abc/def/4, /abc/def/5, /abc/def/6
/abc/def/7
/abc/def/8
Additional Note:
If you wanted to avoid the use of a separate .xsl file you could inline the aforementioned XSLT template in your shell script as follows:
script.sh
#!/usr/bin/env bash
xslt() {
cat <<EOX
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="node()|#*">
<xsl:apply-templates select="node()|#*"/>
</xsl:template>
<xsl:template match="Module">
<xsl:choose>
<xsl:when test="#dataPath and not(descendant::*/#dataPath)">
<xsl:value-of select="#dataPath"/>
<xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="not(#dataPath) and descendant::*/#dataPath">
<xsl:for-each select="descendant::*/#dataPath">
<xsl:value-of select="."/>
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="#dataPath and descendant::*/#dataPath">
<xsl:value-of select="#dataPath"/>
<xsl:text>, </xsl:text>
<xsl:for-each select="descendant::*/#dataPath">
<xsl:value-of select="."/>
<xsl:if test="position()!=last()">
<xsl:text>, </xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
EOX
}
# 1. Using XML Startlet
xml tr <(xslt) /path/to/input.xml
# 2. Or using xsltproc
xsltproc <(xslt) - </path/to/input.xml
Note: The pathname to your input.xml, (i.e. the /path/to/input.xml part in script.sh above), should again be redefined to wherever that file resides.

Shell script Multithreading running

I have shell script for split xml files. but have one million xml files in Customer environment。the script running slow。could run Multithreading mode ?
Thanks!
my shell script:
#!/bin/sh
File=/home/spark/PktLog
count=0
startLine=(`sed -n -e '/?xml version="1.0" encoding/=' $File`)
fileEnd=`sed -n '$=' $File`
endLine=(`echo ${startLine[*]} | awk -v a=$fileEnd '{for(i=2;i<=NF;i++) printf("%d ",$i-1);print a}'`)
let maxIndex=${#startLine[#]}-1
for n in `seq 0 $maxIndex`
do
sed -n "${startLine[$n]},${endLine[$n]}p" $File >result_${n}.xml
done
echo $startLine[#]`enter code here`
Your method is very slow because it reads the input file many times.
Instead of trying to make it faster with multithreading, you should rewrite the script to only read the input file one time.
Here is an example input file:
$ cat testfile
<?xml version="1.0" encoding="UTF-8"?>
<test>
<some data />
</test>
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more />
<data />
</test>
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more type="data" />
</test>
Here is an awk command that reads the file one time, and writes each document to a separate file:
$ awk 'BEGIN { file="/dev/null"; n=0; }
/xml version="1.0" encoding/ {
close(file);
file="file" ++n ".xml";
}
{print > file;}' testfile
Here is the result:
$ cat file1.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<some data />
</test>
$ cat file2.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more />
<data />
</test>
This is much faster:
$ grep -c 'xml version' PktLog
3000
$ time ./yourscript
real 0m9.791s
user 0m6.849s
sys 0m2.660s
$ time ./thisscript
real 0m0.248s
user 0m0.130s
sys 0m0.107s

How to make order for strings according to some filed

<scene name="scene_1_Overview" title="1 Overview" onstart="" thumburl="panos/1_Overview.tiles/thumb.jpg" lat="" lng="" heading="">
abc
</scene>
<scene name="scene_1_Overview" title="10 Overview" onstart="" thumburl="panos/1_Overview.tiles/thumb.jpg" lat="" lng="" heading="">
abc
</scene>
<scene name="scene_10_Room_Balcony_View" title="2 Room Balcony View" onstart="" thumburl="panos/10_Room_Balcony_View.tiles/thumb.jpg" lat="" lng="" heading="">
abc
def
</scene>
Saying that I have such a XML file as above.
Now I need to make the three elements in order according to the numbers followed by title=, which are 1, 10 and 2.
I'm considering using bash script to do this.
I can use things like awk '{print $3}' test | awk -F "\"" '{print $2}' to get the three numbers but I don't know how to read multiple lines from each <scene to </scene>, to make them in order and overwrite them.
I think doing this in awk is not the greatest idea, but I know what it's like being stuck on a box where you lack access to install anything. If you are stuck with it then something like the following awk script should get you in the ballpark.
awk -F"[\" ]" '$0~/title/{title=$6} {scene[title]=scene[title]$0"\n"} END{PROCINFO["sorted_in"]="#ind_num_asc"; for (title in scene) {print scene[title]}}' inFile
Here awk is:
Splitting each line by either " or (-F"[\" ]")
If the line contains the word "title" ($0~/title/), then it sets the variable title to whatever it finds in field 6 (title=$6;) which might change if your "name" contains spaces since we are splitting on that so you might have to monkey with the delimiters.
Next it stores the contents of the line, followed by a linefeed, into the array scenes at the index set by the number stored in title ({scene[title]=scene[title]$0"\n"})
Once it's done processing the file it sets the PROCINFO["sorted_in"] setting to #ind_num_asc which tells awk to loop through arrays using the index, while forcing the index to act as a number (END{PROCINFO["sorted_in"]="#ind_num_asc")
Then we loop through the array and print each element (for (title in scene) {print scene[title]})
Minimized a bit:
awk -F"[\" ]" '$0~/title/{t=$6}{s[t]=s[t]$0"\n"}END{PROCINFO["sorted_in"]="#ind_num_asc";for(t in s)print s[t]}' inFile
Using xsltproc
$ xsltproc sort.xslt scenes.xml
sort.xslt
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:strip-space elements="*" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()">
<xsl:sort select="scene/#title" />
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Otherwise the following perl one-liner
perl -e '
undef$/;
print "$_\n" for sort{
($a=~/title="(.*?)"/ms)[0] cmp ($b=~/title="(.*?)"/ms)[0]
}<>=~/<scene[ >].*?<\/scene>/gms
' scenes.xml

Replace node via shell script

I've below snippet of xml from my code base:
<property name="myData">
<map>
<entry key="/mycompany/abc">
<value>Mike</value>
</entry>
<entry key="/mycompany/pqr">
<value>John</value>
</entry>
<entry key="/mycompany/xyz">
<value>Sara</value>
</entry>
</map>
</property>
The above snippet is just a portion of XML file. I've an existing shell script that replaces some of the data from the above file.
Now, I need to modify my existing shell script to comment the section as shown below:
<!-- entry key="/mycompany/abc">
<value>Mike</value>
</entry>
<entry key="/mycompany/pqr">
<value>John</value>
</entry -->
Is it possible to comment the above 2 entries to comment via shell script? I can replace any occurrence of with since I've only one such unique occurrence but I'm not able to replace </entry> closing tag if /mycompany/pqr node since all occurrences will get replaced if I try to replace it with </entry -->
Any idea on how to replace this closing node in shell script?
Thanks!
Using an xslt stylesheet like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output omit-xml-declaration="no"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/property/map/entry[#key='/mycompany/abc']"/>
<xsl:template match="/property/map/entry[#key='/mycompany/pqr']"/>
</xsl:stylesheet>
Then using the xsltproc xsl processor via the shell script:
$ xsltproc fix.xslt document.xml
which will give you:
<?xml version="1.0"?>
<property name="myData">
<map>
<entry key="/mycompany/xyz">
<value>Sara</value>
</entry>
</map>
</property>
If you really need those nodes commented out then my xslt-foo is not strong enough - you'll probably need <xsl:comment>.
EDIT: A solution with awk:
awk '/<entry key="\/mycompany\/(abc|pqr)">/,/<\/entry>/ {p=1}; /.*/{ if(p==0) {print;}; p=0 }' blah.xml
Result:
<property name="myData">
<map>
<entry key="/mycompany/xyz">
<value>Sara</value>
</entry>
</map>
</property>
Please note that the awk version will not work correctly with nested tags.
-nick
Disclaimer: I think of using awk/sed/... for XML files as a bad idea; if the formatting changes, the line-number between your tags differ, you end up with a bung XML file.
BEGIN{
count=-6
}
{
if( $0 !~ /\/mycompany\/pqr/ && NR != count+5){
print $0
next
}
if( $0 ~ /\/mycompany\/pqr/) {
count=NR;
print gensub( /(entry)/, "!-- \\1", "1" )
}else{
print gensub( /(entry)/, "\\1 --", "1" )
}
}
Save as "something.awk", run like so:
awk -f something.awk your_file.xml

Resources