How to use sed to extract substring - linux

I have a file containing the following lines:
<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
<parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
<parameter name="RemoteHost" access="readWrite"></parameter>
<parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="PortMappingProtocol" access="readWrite"></parameter>
<parameter name="InternalClient" access="readWrite"></parameter>
<parameter name="PortMappingDescription" access="readWrite"></parameter>
I want to execute command on this file to extract only the parameter names as displayed in the following output:
$sedcommand file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
What could be this command?

grep was born to extract things:
grep -Po 'name="\K[^"]*'
test with your data:
kent$ echo '<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
<parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
<parameter name="RemoteHost" access="readWrite"></parameter>
<parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="PortMappingProtocol" access="readWrite"></parameter>
<parameter name="InternalClient" access="readWrite"></parameter>
<parameter name="PortMappingDescription" access="readWrite"></parameter>
'|grep -Po 'name="\K[^"]*'
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

sed 's/[^"]*"\([^"]*\).*/\1/'
does the job.
explanation of the part inside ' '
s - tells sed to substitute
/ - start of regex string to search for
[^"]* - any character that is not ", any number of times. (matching parameter name=)
" - just a ".
([^"]*) - anything inside () will be saved for reference to use later. The \ are there so the brackets are not considered as characters to search for. [^"]* means the same as above. (matching RemoteHost for example)
.* - any character, any number of times. (matching " access="readWrite"> /parameter)
/ - end of the search regex, and start of the substitute string.
\1 - reference to that string we found in the brackets above.
/ end of the substitute string.
basically s/search for this/replace with this/ but we're telling him to replace the whole line with just a piece of it we found earlier.

You want awk.
This would be a quick and dirty hack:
awk -F "\"" '{print $2}' /tmp/file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription

You should not parse XML using tools like sed, or awk. It's error-prone.
If input changes, and before name parameter you will get new-line character instead of space it will fail some day producing unexpected results.
If you are really sure, that your input will be always formated this way, you can use cut.
It's faster than sed and awk:
cut -d'"' -f2 < input.txt
It will be better to first parse it, and extract only parameter name attribute:
xpath -q -e //#name input.txt | cut -d'"' -f2
To learn more about xpath, see this tutorial: http://www.w3schools.com/xpath/

Explaining how you can use cut:
cat yourxmlfile | cut -d'"' -f2
It will 'cut' all the lines in the file based on " delimiter, and will take the 2nd field , which is what you wanted.

Related

how to get values from file using sed,awk or grep on linux command/scripting?

i have file1 with value:
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
i want to get the contents of the file using linux scripting or command (sed, grep or awk). example output:
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%/20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/%2/20Buck%20%282019%30
my code:
grep -oP 'href="([^".]*)">([^</.]*)' file1
please help i am newbie :)
$ awk -v RS='<[^>]+>' 'NF{printf "%s", $0 (++c%2?" |":ORS)}' file
stick man (2020)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/
python easy (2019)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/
note that forward slashes are in your original data
requires multi-char RS support (GNU awk).
This
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
does looks like piece of HTML file. If you are allowed to install utilites in your system I suggest giving a try hxselect which is useful when you want to extract something you can describe in CSS language. For example to get content of all columns whose label is referensi from file.html:
cat file.html | hxselect -i -c -s '\n' column[label=referensi]
With awk you can try:
awk -F'>|/<' '{ORS= (NR == 3 || NR == 7) ? " |" : "\n"} $2 != "" {print $2}' file
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30
Or shorter:
awk -F'>|/<' '{ORS= (NR%2) ? " |" : RS} $2 != "" {print $2}' file

Replace \n with <br /> in bash

[UPDATED QUESTION]
I've got a variable $CHANGED which stores the output of a subversion command like this: CHANGED="$(svnlook changed -r $REV $REPOS)".
Executing svnlook changed -r $REV $REPOS will output the following to the command line:
A /path/to/file
A /path/to/file2
A /path/to/file3
However, I need to store the output formatted as shown below in a variable $FILES:
A /path/to/file<br />A /path/to/file2<br />A /path/to/file3<br />
I need this for using $FILES in a command which generates an email massage like this:
sendemail [some-options] $FILES
It should to replace $FILES with A /path/to/file<br />A /path/to/file2<br />A /path/to/file3<br /> so that it can interpret the html break tags.
In bash:
echo "${VAR//$'\n'/<br />}"
See Parameter Expansion
The Parameter Expansion section of the man page is your friend.
Starting with
changed="
A /path/to/file
A /path/to/other/file
A /path/to/new/file
"
You can remove leading and trailing newlines using the # and % expansions:
files="${changed#$'\n'}"
files="${files%$'\n'}"
Then replace the other newlines with <br />:
files="${files//$'\n'/<br />}"
Demonstration:
printf '***%s***\n' "$files"
***A /path/to/file<br />A /path/to/other/file<br />A /path/to/new/file***
(Note that I've changed your all-uppercase variable names to lower case. Avoid uppercase names for your locals, as these tend to be used for communication via the environment.)
If you dislike writing newline as $'\n', you may of course store it in a variable:
nl=$'\n'
files="${changed#$nl}"
files="${files%$nl}"
files="${files//$nl/<br />}"
You can modify hek2mgl's answer to strip out the first <br /> (if any):
CHANGED="
A /path/to/file
A /path/to/other/file
A /path/to/new/file
"
FILES="$(echo "${CHANGED//$'\n'/<br />}" | sed 's#^<br />##g')"
echo "$FILES"
Output:
A /path/to/file<br />A /path/to/other/file<br />A /path/to/new/file<br />
Another way (with only sed):
FILES="$(echo "$CHANGED" | sed ':a;N;$!ba;s#\n#<br />#g;s#^<br />##g')"

How do I replace every occurance using AWK statement

How can I adjust the following code to replace every occurrence of the value set for the element, ThreadGroup.num_threads.
Here is the code I'm trying to make work.
awk ' BEGIN { FS = "[<|>]" }
{
if ($2 == "stringProp name=\"ThreadGroup.num_threads\"") {
$newValue
}
print
}
' Test1.jmx
Here is the XML snippet I'm parsing.
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Thread Group" enabled="true">
<stringProp name="ThreadGroup.num_threads">3</stringProp>
</ThreadGroup>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Thread Group2" enabled="true">
<stringProp name="ThreadGroup.num_threads">3</stringProp>
</ThreadGroup>
newValue=999999
In your code, the variable newValue is never defined. Moreover, you do not need $ in front of your own variables.
Here is my suggestion:
awk '$0 ~ /stringProp name="ThreadGroup.num_threads"/
{sub(/<stringProp name="ThreadGroup.num_threads">[0-9]+/,
"<stringProp name=\"ThreadGroup.num_threads\">999999",
$0)}
{}1' inputFile
1st line: I check whether the current line contains the text stringProp name="ThreadGroup.num_threads"
2nd-4th line: If yes, I substitute the string <stringProp name="ThreadGroup.num_threads"> if it is followed by one or more numbers by the same string followed by 999999.
5th line: Finally, I output each line.
Of course you can define a variable:
awk 'BEGIN{newValue=999999}
$0 ~ /stringProp name="ThreadGroup.num_threads"/
{sub(/<stringProp name="ThreadGroup.num_threads">[0-9]+/,
"<stringProp name=\"ThreadGroup.num_threads\">"newValue,
$0)}
{}1' inputFile
The output is:
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Thread Group" enabled="true">
<stringProp name="ThreadGroup.num_threads">999999</stringProp>
</ThreadGroup>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Thread Group2" enabled="true">
<stringProp name="ThreadGroup.num_threads">999999</stringProp>
</ThreadGroup>
Perhaps this is easier with sed
$ sed -r 's/("ThreadGroup.num_threads">)([0-9]+)</\19999</g'

how to delete line after specific pattern and extract something

UPDATE
This is my file:
<department name="/fighters" id="123879" group="channel" case="none" use="no">
<options index_name="index.html" listing="0" sum="no" allowed="no" />
<target prefix="ttp" suffix=".net" />
<type="effort">
<region="20491" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20491-set-writable" />
</replicate>
<region="20576" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20576-set-writable" />
</replicate>
</replication>
<user="T:106603" />
<user="T:123879" />
<user="test" />
<user="ele::123456" />
<user="company-temp" />
<user="companymw2" />
<user="bird" />
<user="coding11" />
<user="plazamedia" />
<allow go="123456=abcdefghijklmnopqrstuvwxyz" />
</department>
I wrote a bash like:
awk < test.xml -Fuser= '{ print $2 }' | sed '/^$/d' | cut -d" " -f1
and result is something like:
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
But imagine the result is:
"T:106603" />
"T:123879" />
"test" />
"ele::123456" />
"company-temp" />
"companymw2" />
"bird" />
"coding11" />
"plazamedia" />
first,How can I say remove every thing after second "?
secondly, how can I say extract everything between " "?
I like doing it with sed or awk
Thank you in advance
Try this:
awk -F'"' '/<user=/{ print $2 }' file
Using only sed:
$ sed 's/^<user=\(.*"\).*/\1/' test.xml # With quotes
$ sed 's/^<user="\(.*\)".*/\1/' test.xml # Without quotes
Try this cut,
cut -d'"' -f 2 test.xml
Try this sed,
With quotes("):
sed 's/^.*\("[^"]\+"\).*/\1/g' test.xml
Without quotes("):
sed 's/^.*"\([^"]\+\)".*/\1/g' test.xml
UPDATE:
sed -e '/^<user/!{d}' -e '/^<user/s/^.*"\([^"]\+\)".*/\1/' test.xml
If you want to get rid of the sed and cut in the pipeline, there are many ways to do that, depending on what the corner cases are. The simplest to me would seem to be
awk -F'"' '/<user=/ { print "\"$2\"" }' test.xml
As usual, here's the obligatory don't parse XML with regex link.
Slightly interesting corner cases would be if there can be quoted double quotes in the string (but usually XML would use entities instead) or if the elements can have multiple attributes. If there could be multiple <user=...> elements on a single line, this will quickly become more complex than the proper solution, which is to use XSLT.
Try :
$ awk '/<user=/ && gsub(/<user=|\/>/,x)' file
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
Using gnu grep
grep -Po 'user=\K"[^"]*"' file

parsing and replacing some strings in two files

I want to run a shell script with this usage:
./run A.txt B.xml
A.txt contain some statistic:
Accesses = 1
Hits = 2
Misses = 3
Evictions = 4
Retries = 5
B.xml looks like:
<stat name="total_accesses" value="0"/>
<stat name="total_misses" value="0"/>
<stat name="conflicts" value="0"/>
I want to replace some stats in B.xml from A.txt. For example, I want to
1- find "Accesses" in A.txt
2- find "total_accesses" in B.xml
3- replace 0 with 1
1- find "Misses" in A.txt
2- find "total_misses" in B.xml
3- replace 0 with 3
So B.xml will look like:
<stat name="total_accesses" value="1"/>
<stat name="total_misses" value="3"/>
<stat name="conflicts" value="0"/>
I want to do that with shell "sed" command. However I find it quite complex as the regexp is hard to understand.
Does "sed" help me in this problem or I have to find another way?
It might be a bit heavy-weight for such a simple case, but here's a Python script that does the job:
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as etree
# read A.txt; fill stats
stats = {}
for line in open(sys.argv[1]):
if line.strip():
name, _, count = line.partition('=')
stats["total_"+name.lower().strip()] = count.strip()
# read B.xml; fix to make it a valid xml; replace stat[#value]
root = etree.fromstring("<root>%s</root>" % open(sys.argv[2]).read())
for s in root:
if s.get('name') in stats:
s.set('value', stats[s.get('name')])
print etree.tostring(s),
Example
$ python fill-xml-template.py A.txt B.xml
<stat name="total_accesses" value="1" />
<stat name="total_misses" value="3" />
<stat name="conflicts" value="0" />
To process input files incrementally or to makes changes inplace you could use the following:
#!/usr/bin/env python
import fileinput
import sys
import xml.etree.ElementTree as etree
try: sys.argv.remove('-i')
except ValueError:
inplace = False
else: inplace = True # make changes inplace if `-i` option is specified
# read A.txt; fill stats
stats = {}
for line in open(sys.argv.pop(1)):
if line.strip():
name, _, count = line.partition('=')
stats["total_"+name.lower().strip()] = count.strip()
# read input; replace stat[#value]
for line in fileinput.input(inplace=inplace):
s = etree.fromstring(line)
if s.get('name') in stats:
s.set('value', stats[s.get('name')])
print etree.tostring(s)
Example
$ python fill-xml-template.py A.txt B.xml -i
It can read from stdin or process several files:
$ cat B.xml | python fill-xml-template.py A.txt
<stat name="total_accesses" value="1" />
<stat name="total_misses" value="3" />
<stat name="conflicts" value="0" />
Here is a shell script that does what you want:
#!/bin/bash
while read line
do
key=`echo $line | cut -d' ' -f1`
value=`echo $line | cut -d' ' -f3`
xmlLine=`grep -i $key $2`
if [ -n "$xmlLine" ]; then
for num in `seq 5`
do
field[${num}]=`echo "$xmlLine" | cut -d'"' -f${num}`
done
echo ${field[1]}\"${field[2]}\"${field[3]}\"$value\"${field[5]}
fi
done
You can copy it to a file say A.sh , give run permissions to it (chmod +x A.sh) and then:
./A.sh A.txt B.xml
Please mind that this code is not suitable for production and regex is paramount for these scripts.
while you can hack this on the command line, I'd recommend not to do this.
XML is way too fragile to be handled this way - use a proper XML library and parse the XML before manipulating it. Otherwise you could easily end up with broken XML. e.g. write a script in Ruby, Python, or Perl and use an XML library.

Resources