Grep, Sed... Awk to modify lines in a file - string

I've searched for this but I didn't find exactly what I'm looking for.
What I want to do is replace in the same file all the lines that contains a jar file with "PATTERN" text in the name so that I can add in those lines a new property with the sourcecode for those jar files. The sourcecode for those jars will be always in same relative folder (i.e. path1/lib/a.jar -> path1/src/java)
An example of this. This is my original file:
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/lib/MOM-Search-I18N-0.23-ECI.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/DAS/lib/axis-1.4.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/REST/lib/org.json.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/StoreFront/lib/MOM-Search-I18N-Index-0.23-ECI.jar" />
And I want to get this:
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/lib/MOM-Search-I18N-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/src/main/java" />
<classpathentry path="/sde/ATG/ATG10.1.2/DAS/lib/axis-1.4.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/REST/lib/org.json.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/StoreFront/lib/MOM-Search-I18N-Index-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/StoreFront/src/main/java"/>
I need to add a sourcepath attribute to the lines with my pattern and that sourcepath value should take the root of the path value.
As
grep -o 'path="[/-.0-9A-Za-z]*/lib/MOM[-.0-9A-Za-z]*.jar"' test.txt
gives me the lines that contain lines with the jars I'm looking for, I thought that this would solve my problem:
cat test.txt | sed -r 's|path="[/-.0-9A-Za-z]*/lib/MOM[-.0-9A-Za-z]*.jar"|\1 sourcepath="\2/main/src/main/java"/>|'
But gives me this error: sed: -e expression #1, char 91: invalid reference \2 on `s' command's RHS
Any idea?
Thanks guys!

You could say:
sed 's| \(path=.*\)\(/lib\)\(/MOM[^ ]*\)| \1\2\3 source\1/src/main/java"|' inputfile
For your sample input, it'd produce:
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/lib/MOM-Search-I18N-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/Search/I18N/src/main/java" />
<classpathentry path="/sde/ATG/ATG10.1.2/DAS/lib/axis-1.4.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/REST/lib/org.json.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/Momentum/StoreFront/lib/MOM-Search-I18N-Index-0.23-ECI.jar" sourcepath="/sde/ATG/ATG10.1.2/Momentum/StoreFront/src/main/java" />

Use sed's basic search-replace functionality:
sed 's|\(PATTERN[^ ]*\)|\1 sourcepath="/sde/ATG/ATG10.1.2/path1/src/main/java"|'

$ cat injars.txt | sed -r 's%(^.* path="(.*)/lib/PATTERN-.*)/>%\1 sourcepath="\2/src/main/java"/>%'
Gives:
<classpathentry path="/sde/ATG/ATG10.1.2/path1/lib/jar1.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/path1/lib/PATTERN-jar1.jar" sourcepath="/sde/ATG/ATG10.1.2/path1/src/main/java"/>
<classpathentry path="/sde/ATG/ATG10.1.2/path2/lib/jar2.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/path2/lib/PATTERN-jar2.jar" sourcepath="/sde/ATG/ATG10.1.2/path2/src/main/java"/>
<classpathentry path="/sde/ATG/ATG10.1.2/path3/lib/PATTERN-jar3.jar" sourcepath="/sde/ATG/ATG10.1.2/path3/src/main/java"/>
<classpathentry path="/sde/ATG/ATG10.1.2/path3/lib/jar3.jar" />
<classpathentry path="/sde/ATG/ATG10.1.2/path4/lib/PATTERN-jar4.jar" sourcepath="/sde/ATG/ATG10.1.2/path4/src/main/java"/>
for injars.txt containing your input.

Related

Shell script Multithreading running

I have shell script for split xml files. but have one million xml files in Customer environment。the script running slow。could run Multithreading mode ?
Thanks!
my shell script:
#!/bin/sh
File=/home/spark/PktLog
count=0
startLine=(`sed -n -e '/?xml version="1.0" encoding/=' $File`)
fileEnd=`sed -n '$=' $File`
endLine=(`echo ${startLine[*]} | awk -v a=$fileEnd '{for(i=2;i<=NF;i++) printf("%d ",$i-1);print a}'`)
let maxIndex=${#startLine[#]}-1
for n in `seq 0 $maxIndex`
do
sed -n "${startLine[$n]},${endLine[$n]}p" $File >result_${n}.xml
done
echo $startLine[#]`enter code here`
Your method is very slow because it reads the input file many times.
Instead of trying to make it faster with multithreading, you should rewrite the script to only read the input file one time.
Here is an example input file:
$ cat testfile
<?xml version="1.0" encoding="UTF-8"?>
<test>
<some data />
</test>
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more />
<data />
</test>
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more type="data" />
</test>
Here is an awk command that reads the file one time, and writes each document to a separate file:
$ awk 'BEGIN { file="/dev/null"; n=0; }
/xml version="1.0" encoding/ {
close(file);
file="file" ++n ".xml";
}
{print > file;}' testfile
Here is the result:
$ cat file1.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<some data />
</test>
$ cat file2.xml
<?xml version="1.0" encoding="UTF-8"?>
<test>
<more />
<data />
</test>
This is much faster:
$ grep -c 'xml version' PktLog
3000
$ time ./yourscript
real 0m9.791s
user 0m6.849s
sys 0m2.660s
$ time ./thisscript
real 0m0.248s
user 0m0.130s
sys 0m0.107s

linux shell: sed replace value in xml

I have a xml file, I want to repace the text value in the tag < jdbcurl > with another value, but there are two tags named with jdbcurl nested in different pool id.
Can any one do me a favor to dig it with SED?
Thanks.
<?xml version="1.0" ?>
<WEBServer fileName="webdb.xml" name="Configuration and Security File">
<security>
<pool id="DEFAULT" jndiName="jdbc/webdb">
<dbschema></dbschema>
<userID>DBUSER</userID>
<password>passwd1</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>jdbc:oracle:thin:#db.server.com:1753/ORCSN</jdbcurl>
</pool>
<pool id="bi_id" jndiName="jdbc/bidb">
<dbschema></dbschema>
<userID>BIUSER</userID>
<password>passwd2</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>jdbc:oracle:thin:#db.server.com:1753/ORCSN</jdbcurl>
</pool>
</security>
</WEBServer>
sed -E '/bi_id/,/pool/ s/jdbc:[^<]*/you will replace/g' filename
this one will replace jdbc in pool with id='bi_id'
sed -E '/DEFAULT/,/pool/ s/jdbc:[^<]*/you will replace/g'
this is for DEFAULT pool's jdbcurl
With xmlstarlet:
xmlstarlet edit --update '//WEBServer/security/pool[#id="DEFAULT"]/jdbcurl' --value 'XYZ' file.xml
Output:
<?xml version="1.0"?>
<WEBServer fileName="webdb.xml" name="Configuration and Security File">
<security>
<pool id="DEFAULT" jndiName="jdbc/webdb">
<dbschema/>
<userID>DBUSER</userID>
<password>passwd1</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>XYZ</jdbcurl>
</pool>
<pool id="bi_id" jndiName="jdbc/bidb">
<dbschema/>
<userID>BIUSER</userID>
<password>passwd2</password>
<jdbcdriver>oracle.jdbc.driver.OracleDriver</jdbcdriver>
<jdbcurl>jdbc:oracle:thin:#db.server.com:1753/ORCSN</jdbcurl>
</pool>
</security>
</WEBServer>

How to insert the content of a file into another file before a pattern

I have a file Afile :
<start>
<memory>
<hdd>10</hdd>
<hdc>40</hdc>
</memory>
<storage>
<disk>
<disk1>firstname</disk1>
</disk>
<disk>
<disk1>secondname</disk1>
</disk>
<map>
<code>1</code>
</map>
<map>
<code>2</code>
</map>
</storage>
</start>
I have the second file Bfile:
<disk>
<disk1>thirdname</disk1>
</disk>
How using sed I can insert content of Bfile into Afile. So finally I need to have the following file:
<start>
<memory>
<hdd>10</hdd>
<hdc>40</hdc>
</memory>
<storage>
<disk>
<disk1>firstname</disk1>
</disk>
<disk>
<disk1>secondname</disk1>
</disk>
<disk>
<disk1>thirdname</disk1>
</disk>
<map>
<code>1</code>
</map>
<map>
<code>2</code>
</map>
</storage>
</start>
So it should be inserted after the last pattern. When I use the following command I get the following result:
sed -e '/disk>/rBfile' Afile
<start>
<memory>
<hdd>10</hdd>
<hdc>40</hdc>
</memory>
<storage>
<disk>
<disk1>firstname</disk1>
</disk>
<disk>
<disk1>thirdname</disk1>
</disk>
<disk>
<disk1>secondname</disk1>
</disk>
<disk>
<disk1>thirdname</disk1>
</disk>
<map>
<code>1</code>
</map>
<map>
<code>2</code>
</map>
</storage>
</start>
So it put the content of Bfile after each occurence of "disk>". I need just the last occurence. How to change the command?
XML (like structured data in general) shouldn't be handled with plain-text tools like awk and sed except in very special cases because nobody expects XML tools to break if newlines change places or spaces are inserted/removed in benign places.
Instead, I'd use Python, which has an XML parser in its standard library:
#!/usr/bin/python
import xml.etree.ElementTree as ET;
import sys;
# file names taken from command line arguments.
target = ET.parse(sys.argv[1]);
insert = ET.parse(sys.argv[2]);
# Interesting part here:
target.getroot().find("./storage").append(insert.getroot())
# to write to a file, use target.write('output.xml')
ET.dump(target)
Call that as
python foobar.py fileA fileB
I didn't manage to do that in a single line so i made a sed script. The problem is that the r command will not work if there are chars after the file name so it needs to be on it's own line.
#!/bin/sed -f
/<\/disk>/{
:a
n
s/disk/disk/
t a
h
r bbb
g
N
}
You can then call it like this :
sed -f sedscript Afile
if limited by storage (first sample given)
sed '\#</storage># {r Bfile
N;} ' Afile
if last disk in storage (like this edited version of the request)
sed '1;\#<storage>#{1h;1!H
\#<storage># {g
s#^\(.*\n</disk>\).*#\1#p
r Bfile
G;N
s/^\(.*\)\1\(.*\)/\2/
}
}' Afile
Normaly sed script loop to next line after a r action (and does not read rest of script for this line) but with a N after, it continue AND keep the line in buffer for action (in this case with the next one).
So only works IF there is a line after storage (could add a test before with a if/the/else action in this case)
Just to add some examples using AWK.
Assuming that we have:
afile:
<start>
<memory>
<hdd>10</hdd>
<hdc>40</hdc>
</memory>
<storage>
<disk>
<disk1>firstname</disk1>
</disk>
<disk>
<disk1>secondname</disk1>
</disk>
</storage>
</start>
and bfile:
<disk>
<disk1>thirdname</disk1>
</disk>
AWK using </storage> tag as reference:
awk '/^<\/storage>/{while(getline line<"bfile"){print line};print;next}1' afile
That will result in:
<start>
<memory>
<hdd>10</hdd>
<hdc>40</hdc>
</memory>
<storage>
<disk>
<disk1>firstname</disk1>
</disk>
<disk>
<disk1>secondname</disk1>
</disk>
<disk>
<disk1>thirdname</disk1>
</disk>
</storage>
</start>
But in case you REALLY need to look for </disk>, I would do something like:
awk -v n=4 '{print;}/<\/disk1>$/,/^<\/disk>/{m++}(m==n){n=0;while(getline l<"bfile"){print l}}' afile
In addition, you can also use xmllint to format the output for you:
awk -v n=4 '{print;}/<\/disk1>$/,/^<\/disk>/{m++}(m==n){n=0;while(getline l<"bfile"){print l}}' afile | xmllint --format --recover -
That will result in:
<start>
<memory>
<hdd>10</hdd>
<hdc>40</hdc>
</memory>
<storage>
<disk>
<disk1>firstname</disk1>
</disk>
<disk>
<disk1>secondname</disk1>
</disk>
<disk>
<disk1>thirdname</disk1>
</disk>
</storage>
</start>
If ed is an option (if the input file is not too big), it would be easier :
echo '/map/-1 r Bfile
wq' | ed Afile
This might work for you (GNU sed):
sed -e '/<disk>/,${/<disk>/,/<\/disk>/b;ecat fileb' -e ':a;n;ba}' filea
This restricts the sed commands to those lines beginning with <disk> to the end of the file. Within this range all complete <disk>/<\/disk> tags are printed as usual. The following line is where the file is to be inserted and using the sed evalute command the file is immediately inserted (rather than using the r command which inserts the file following the current pattern space). The rest of the file is then printed using a simple loop.

vim, how to search text and prepend new line before searched line

Input :
<Action name="Compile" />
<Action name="Debug" />
Ouput:
<Action name="Parse" />
<Action name="Compile" />
<Action name="Debug" />
Using vim, how can i search for line starts contains word "Compile" and prepend the line with another line ?
I tired :%s but it is not replacing the entire line
I would do it the Prince Goulash's way but…
:%s/^.*Compile/<Action name="Parse" \/>\r&
^.*Compile " matches everything from the first char on the line
" up to and including 'Compile'
<Action name="Parse" \/>\r& " replaces the match with the new desired line,
" followed by a newline, followed by the match
or the very elegant…
:g/Compile/t-|s//Parse
:g/Compile/t- " copies the matching line above itself
:s//Parse " substitutes the last search pattern with 'Parse'
" on that new line
You can use the :g command to search for lines, and then supply a :normal! command on each matched line:
:g/Compile/normal! O<Action name="Parse" />
The ! ensures no user mappings are invoked in the :normal call.

how to delete line after specific pattern and extract something

UPDATE
This is my file:
<department name="/fighters" id="123879" group="channel" case="none" use="no">
<options index_name="index.html" listing="0" sum="no" allowed="no" />
<target prefix="ttp" suffix=".net" />
<type="effort">
<region="20491" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20491-set-writable" />
</replicate>
<region="20576" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20576-set-writable" />
</replicate>
</replication>
<user="T:106603" />
<user="T:123879" />
<user="test" />
<user="ele::123456" />
<user="company-temp" />
<user="companymw2" />
<user="bird" />
<user="coding11" />
<user="plazamedia" />
<allow go="123456=abcdefghijklmnopqrstuvwxyz" />
</department>
I wrote a bash like:
awk < test.xml -Fuser= '{ print $2 }' | sed '/^$/d' | cut -d" " -f1
and result is something like:
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
But imagine the result is:
"T:106603" />
"T:123879" />
"test" />
"ele::123456" />
"company-temp" />
"companymw2" />
"bird" />
"coding11" />
"plazamedia" />
first,How can I say remove every thing after second "?
secondly, how can I say extract everything between " "?
I like doing it with sed or awk
Thank you in advance
Try this:
awk -F'"' '/<user=/{ print $2 }' file
Using only sed:
$ sed 's/^<user=\(.*"\).*/\1/' test.xml # With quotes
$ sed 's/^<user="\(.*\)".*/\1/' test.xml # Without quotes
Try this cut,
cut -d'"' -f 2 test.xml
Try this sed,
With quotes("):
sed 's/^.*\("[^"]\+"\).*/\1/g' test.xml
Without quotes("):
sed 's/^.*"\([^"]\+\)".*/\1/g' test.xml
UPDATE:
sed -e '/^<user/!{d}' -e '/^<user/s/^.*"\([^"]\+\)".*/\1/' test.xml
If you want to get rid of the sed and cut in the pipeline, there are many ways to do that, depending on what the corner cases are. The simplest to me would seem to be
awk -F'"' '/<user=/ { print "\"$2\"" }' test.xml
As usual, here's the obligatory don't parse XML with regex link.
Slightly interesting corner cases would be if there can be quoted double quotes in the string (but usually XML would use entities instead) or if the elements can have multiple attributes. If there could be multiple <user=...> elements on a single line, this will quickly become more complex than the proper solution, which is to use XSLT.
Try :
$ awk '/<user=/ && gsub(/<user=|\/>/,x)' file
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
Using gnu grep
grep -Po 'user=\K"[^"]*"' file

Resources