linux shell: sed replace value in xml - linux

I have a xml file, I want to repace the text value in the tag < jdbcurl > with another value, but there are two tags named with jdbcurl nested in different pool id.
Can any one do me a favor to dig it with SED?
Thanks.
<?xml version="1.0" ?>
<WEBServer fileName="webdb.xml" name="Configuration and Security File">
<security>
<pool id="DEFAULT" jndiName="jdbc/webdb">
<dbschema></dbschema>
<userID>DBUSER</userID>
<password>passwd1</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>jdbc:oracle:thin:#db.server.com:1753/ORCSN</jdbcurl>
</pool>
<pool id="bi_id" jndiName="jdbc/bidb">
<dbschema></dbschema>
<userID>BIUSER</userID>
<password>passwd2</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>jdbc:oracle:thin:#db.server.com:1753/ORCSN</jdbcurl>
</pool>
</security>
</WEBServer>

sed -E '/bi_id/,/pool/ s/jdbc:[^<]*/you will replace/g' filename
this one will replace jdbc in pool with id='bi_id'
sed -E '/DEFAULT/,/pool/ s/jdbc:[^<]*/you will replace/g'
this is for DEFAULT pool's jdbcurl

With xmlstarlet:
xmlstarlet edit --update '//WEBServer/security/pool[#id="DEFAULT"]/jdbcurl' --value 'XYZ' file.xml
Output:
<?xml version="1.0"?>
<WEBServer fileName="webdb.xml" name="Configuration and Security File">
<security>
<pool id="DEFAULT" jndiName="jdbc/webdb">
<dbschema/>
<userID>DBUSER</userID>
<password>passwd1</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>XYZ</jdbcurl>
</pool>
<pool id="bi_id" jndiName="jdbc/bidb">
<dbschema/>
<userID>BIUSER</userID>
<password>passwd2</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>jdbc:oracle:thin:#db.server.com:1753/ORCSN</jdbcurl>
</pool>
</security>
</WEBServer>

Related

Shell script Multithreading running

I have shell script for split xml files. but have one million xml files in Customer environment。the script running slow。could run Multithreading mode ?
Thanks!
my shell script:
#!/bin/sh
File=/home/spark/PktLog
count=0
startLine=(`sed -n -e '/?xml version="1.0" encoding/=' $File`)
fileEnd=`sed -n '$=' $File`
endLine=(`echo ${startLine[*]} | awk -v a=$fileEnd '{for(i=2;i<=NF;i++) printf("%d ",$i-1);print a}'`)
let maxIndex=${#startLine[#]}-1
for n in `seq 0 $maxIndex`
do
sed -n "${startLine[$n]},${endLine[$n]}p" $File >result_${n}.xml
done
echo $startLine[#]`enter code here`
Your method is very slow because it reads the input file many times.
Instead of trying to make it faster with multithreading, you should rewrite the script to only read the input file one time.
Here is an example input file:
$ cat testfile
<?xml version="1.0" encoding="UTF-8"?>
<test>
<some data />
</test>
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more />
<data />
</test>
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more type="data" />
</test>
Here is an awk command that reads the file one time, and writes each document to a separate file:
$ awk 'BEGIN { file="/dev/null"; n=0; }
/xml version="1.0" encoding/ {
close(file);
file="file" ++n ".xml";
}
{print > file;}' testfile
Here is the result:
$ cat file1.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<some data />
</test>
$ cat file2.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more />
<data />
</test>
This is much faster:
$ grep -c 'xml version' PktLog
3000
$ time ./yourscript
real 0m9.791s
user 0m6.849s
sys 0m2.660s
$ time ./thisscript
real 0m0.248s
user 0m0.130s
sys 0m0.107s

Groovy: Split one segment if condition of other segment matches

Could anyone please help me in the groovy code for this requirement. I have an XML input such as below:
<?xml version="1.0" encoding="UTF-8"?>
<result>
<records>
<dataProcessed>
<FieldName>Tesco</FieldName>
<Mode>As Is</Mode>
</dataProcessed>
<dataProcessed>
<FieldName>ASDA|Tesco|Walmart</FieldName>
<Mode>Split</Mode>
</dataProcessed>
</records>
<records>
<dataProcessed>
<FieldName>Orange|MTS</FieldName>
<Mode>Break</Mode>
</dataProcessed>
</records>
</result>
When the value of field Mode is either Split or Break, then I need to spilt the segment using pipe delimiter, and I need to change the value of field Mode to 1,2 etc. based on the splitting.
<?xml version="1.0" encoding="UTF-8"?>
<result>
<records>
<dataProcessed>
<FieldName>Tesco</FieldName>
<Mode>As Is</Mode>
</dataProcessed>
<dataProcessed>
<FieldName>ASDA</FieldName>
<Mode>1</Mode>
</dataProcessed>
<dataProcessed>
<FieldName>Tesco</FieldName>
<Mode>2</Mode>
</dataProcessed>
<dataProcessed>
<FieldName>Walmart</FieldName>
<Mode>3</Mode>
</dataProcessed>
</records>
<records>
<dataProcessed>
<FieldName>Orange</FieldName>
<Mode>1</Mode>
</dataProcessed>
<dataProcessed>
<FieldName>MTS</FieldName>
<Mode>2</Mode>
</dataProcessed>
</records>
</result>
Loop through the dataProcessed nodes and then for each, check the value of Mode and act accordingly on the nodes.

how to delete line after specific pattern and extract something

UPDATE
This is my file:
<department name="/fighters" id="123879" group="channel" case="none" use="no">
<options index_name="index.html" listing="0" sum="no" allowed="no" />
<target prefix="ttp" suffix=".net" />
<type="effort">
<region="20491" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20491-set-writable" />
</replicate>
<region="20576" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20576-set-writable" />
</replicate>
</replication>
<user="T:106603" />
<user="T:123879" />
<user="test" />
<user="ele::123456" />
<user="company-temp" />
<user="companymw2" />
<user="bird" />
<user="coding11" />
<user="plazamedia" />
<allow go="123456=abcdefghijklmnopqrstuvwxyz" />
</department>
I wrote a bash like:
awk < test.xml -Fuser= '{ print $2 }' | sed '/^$/d' | cut -d" " -f1
and result is something like:
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
But imagine the result is:
"T:106603" />
"T:123879" />
"test" />
"ele::123456" />
"company-temp" />
"companymw2" />
"bird" />
"coding11" />
"plazamedia" />
first,How can I say remove every thing after second "?
secondly, how can I say extract everything between " "?
I like doing it with sed or awk
Thank you in advance
Try this:
awk -F'"' '/<user=/{ print $2 }' file
Using only sed:
$ sed 's/^<user=\(.*"\).*/\1/' test.xml # With quotes
$ sed 's/^<user="\(.*\)".*/\1/' test.xml # Without quotes
Try this cut,
cut -d'"' -f 2 test.xml
Try this sed,
With quotes("):
sed 's/^.*\("[^"]\+"\).*/\1/g' test.xml
Without quotes("):
sed 's/^.*"\([^"]\+\)".*/\1/g' test.xml
UPDATE:
sed -e '/^<user/!{d}' -e '/^<user/s/^.*"\([^"]\+\)".*/\1/' test.xml
If you want to get rid of the sed and cut in the pipeline, there are many ways to do that, depending on what the corner cases are. The simplest to me would seem to be
awk -F'"' '/<user=/ { print "\"$2\"" }' test.xml
As usual, here's the obligatory don't parse XML with regex link.
Slightly interesting corner cases would be if there can be quoted double quotes in the string (but usually XML would use entities instead) or if the elements can have multiple attributes. If there could be multiple <user=...> elements on a single line, this will quickly become more complex than the proper solution, which is to use XSLT.
Try :
$ awk '/<user=/ && gsub(/<user=|\/>/,x)' file
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
Using gnu grep
grep -Po 'user=\K"[^"]*"' file

Grep, Sed... Awk to modify lines in a file

I've searched for this but I didn't find exactly what I'm looking for.
What I want to do is replace in the same file all the lines that contains a jar file with "PATTERN" text in the name so that I can add in those lines a new property with the sourcecode for those jar files. The sourcecode for those jars will be always in same relative folder (i.e. path1/lib/a.jar -> path1/src/java)
An example of this. This is my original file:
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/lib/MOM-Search-I18N-0.23-ECI.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/DAS/lib/axis-1.4.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/REST/lib/org.json.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/StoreFront/lib/MOM-Search-I18N-Index-0.23-ECI.jar" />
And I want to get this:
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/lib/MOM-Search-I18N-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/src/main/java" />
<classpathentry path="/sde/ATG/ATG10.1.2/DAS/lib/axis-1.4.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/REST/lib/org.json.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/StoreFront/lib/MOM-Search-I18N-Index-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/StoreFront/src/main/java"/>
I need to add a sourcepath attribute to the lines with my pattern and that sourcepath value should take the root of the path value.
As
grep -o 'path="[/-.0-9A-Za-z]*/lib/MOM[-.0-9A-Za-z]*.jar"' test.txt
gives me the lines that contain lines with the jars I'm looking for, I thought that this would solve my problem:
cat test.txt | sed -r 's|path="[/-.0-9A-Za-z]*/lib/MOM[-.0-9A-Za-z]*.jar"|\1 sourcepath="\2/main/src/main/java"/>|'
But gives me this error: sed: -e expression #1, char 91: invalid reference \2 on `s' command's RHS
Any idea?
Thanks guys!
You could say:
sed 's| \(path=.*\)\(/lib\)\(/MOM[^ ]*\)| \1\2\3 source\1/src/main/java"|' inputfile
For your sample input, it'd produce:
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/lib/MOM-Search-I18N-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/src/main/java" />
<classpathentry path="/sde/ATG/ATG10.1.2/DAS/lib/axis-1.4.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/REST/lib/org.json.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/StoreFront/lib/MOM-Search-I18N-Index-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/StoreFront/src/main/java" />
Use sed's basic search-replace functionality:
sed 's|\(PATTERN[^ ]*\)|\1 sourcepath="/sde/ATG/ATG10.1.2/path1/src/main/java"|'
$ cat injars.txt | sed -r 's%(^.* path="(.*)/lib/PATTERN-.*)/>%\1 sourcepath="\2/src/main/java"/>%'
Gives:
<classpathentry path="/sde/ATG/ATG10.1.2/path1/lib/jar1.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/path1/lib/PATTERN-jar1.jar" sourcepath="/sde/ATG/ATG10.1.2/path1/src/main/java"/>
<classpathentry path="/sde/ATG/ATG10.1.2/path2/lib/jar2.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/path2/lib/PATTERN-jar2.jar" sourcepath="/sde/ATG/ATG10.1.2/path2/src/main/java"/>
<classpathentry path="/sde/ATG/ATG10.1.2/path3/lib/PATTERN-jar3.jar" sourcepath="/sde/ATG/ATG10.1.2/path3/src/main/java"/>
<classpathentry path="/sde/ATG/ATG10.1.2/path3/lib/jar3.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/path4/lib/PATTERN-jar4.jar" sourcepath="/sde/ATG/ATG10.1.2/path4/src/main/java"/>
for injars.txt containing your input.

Replace node via shell script

I've below snippet of xml from my code base:
<property name="myData">
<map>
<entry key="/mycompany/abc">
<value>Mike</value>
</entry>
<entry key="/mycompany/pqr">
<value>John</value>
</entry>
<entry key="/mycompany/xyz">
<value>Sara</value>
</entry>
</map>
</property>
The above snippet is just a portion of XML file. I've an existing shell script that replaces some of the data from the above file.
Now, I need to modify my existing shell script to comment the section as shown below:
<!-- entry key="/mycompany/abc">
<value>Mike</value>
</entry>
<entry key="/mycompany/pqr">
<value>John</value>
</entry -->
Is it possible to comment the above 2 entries to comment via shell script? I can replace any occurrence of with since I've only one such unique occurrence but I'm not able to replace </entry> closing tag if /mycompany/pqr node since all occurrences will get replaced if I try to replace it with </entry -->
Any idea on how to replace this closing node in shell script?
Thanks!
Using an xslt stylesheet like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output omit-xml-declaration="no"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/property/map/entry[#key='/mycompany/abc']"/>
<xsl:template match="/property/map/entry[#key='/mycompany/pqr']"/>
</xsl:stylesheet>
Then using the xsltproc xsl processor via the shell script:
$ xsltproc fix.xslt document.xml
which will give you:
<?xml version="1.0"?>
<property name="myData">
<map>
<entry key="/mycompany/xyz">
<value>Sara</value>
</entry>
</map>
</property>
If you really need those nodes commented out then my xslt-foo is not strong enough - you'll probably need <xsl:comment>.
EDIT: A solution with awk:
awk '/<entry key="\/mycompany\/(abc|pqr)">/,/<\/entry>/ {p=1}; /.*/{ if(p==0) {print;}; p=0 }' blah.xml
Result:
<property name="myData">
<map>
<entry key="/mycompany/xyz">
<value>Sara</value>
</entry>
</map>
</property>
Please note that the awk version will not work correctly with nested tags.
-nick
Disclaimer: I think of using awk/sed/... for XML files as a bad idea; if the formatting changes, the line-number between your tags differ, you end up with a bung XML file.
BEGIN{
count=-6
}
{
if( $0 !~ /\/mycompany\/pqr/ && NR != count+5){
print $0
next
}
if( $0 ~ /\/mycompany\/pqr/) {
count=NR;
print gensub( /(entry)/, "!-- \\1", "1" )
}else{
print gensub( /(entry)/, "\\1 --", "1" )
}
}
Save as "something.awk", run like so:
awk -f something.awk your_file.xml

Resources