UPDATE
This is my file:
<department name="/fighters" id="123879" group="channel" case="none" use="no">
<options index_name="index.html" listing="0" sum="no" allowed="no" />
<target prefix="ttp" suffix=".net" />
<type="effort">
<region="20491" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20491-set-writable" />
</replicate>
<region="20576" readonly="fs1a" readwrite="fs1a" upload="yes" download="yes" repl="yes" hard="0" soft"0" prio="0" write="no" stage="yes" migrate="no" size="0" >
<read="content" readwrite="content" hard="215822106624" soft="237296943104" prio="5" write="yes" stage="yes" migrate="no" size="0" />
<overflow name="20576-set-writable" />
</replicate>
</replication>
<user="T:106603" />
<user="T:123879" />
<user="test" />
<user="ele::123456" />
<user="company-temp" />
<user="companymw2" />
<user="bird" />
<user="coding11" />
<user="plazamedia" />
<allow go="123456=abcdefghijklmnopqrstuvwxyz" />
</department>
I wrote a bash like:
awk < test.xml -Fuser= '{ print $2 }' | sed '/^$/d' | cut -d" " -f1
and result is something like:
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
But imagine the result is:
"T:106603" />
"T:123879" />
"test" />
"ele::123456" />
"company-temp" />
"companymw2" />
"bird" />
"coding11" />
"plazamedia" />
first,How can I say remove every thing after second "?
secondly, how can I say extract everything between " "?
I like doing it with sed or awk
Thank you in advance
Try this:
awk -F'"' '/<user=/{ print $2 }' file
Using only sed:
$ sed 's/^<user=\(.*"\).*/\1/' test.xml # With quotes
$ sed 's/^<user="\(.*\)".*/\1/' test.xml # Without quotes
Try this cut,
cut -d'"' -f 2 test.xml
Try this sed,
With quotes("):
sed 's/^.*\("[^"]\+"\).*/\1/g' test.xml
Without quotes("):
sed 's/^.*"\([^"]\+\)".*/\1/g' test.xml
UPDATE:
sed -e '/^<user/!{d}' -e '/^<user/s/^.*"\([^"]\+\)".*/\1/' test.xml
If you want to get rid of the sed and cut in the pipeline, there are many ways to do that, depending on what the corner cases are. The simplest to me would seem to be
awk -F'"' '/<user=/ { print "\"$2\"" }' test.xml
As usual, here's the obligatory don't parse XML with regex link.
Slightly interesting corner cases would be if there can be quoted double quotes in the string (but usually XML would use entities instead) or if the elements can have multiple attributes. If there could be multiple <user=...> elements on a single line, this will quickly become more complex than the proper solution, which is to use XSLT.
Try :
$ awk '/<user=/ && gsub(/<user=|\/>/,x)' file
"T:106603"
"T:123879"
"test"
"ele::123456"
"company-temp"
"companymw2"
"bird"
"coding11"
"plazamedia"
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
Using gnu grep
grep -Po 'user=\K"[^"]*"' file
Related
i have file1 with value:
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
i want to get the contents of the file using linux scripting or command (sed, grep or awk). example output:
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%/20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/%2/20Buck%20%282019%30
my code:
grep -oP 'href="([^".]*)">([^</.]*)' file1
please help i am newbie :)
$ awk -v RS='<[^>]+>' 'NF{printf "%s", $0 (++c%2?" |":ORS)}' file
stick man (2020)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/
python easy (2019)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/
note that forward slashes are in your original data
requires multi-char RS support (GNU awk).
This
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
does looks like piece of HTML file. If you are allowed to install utilites in your system I suggest giving a try hxselect which is useful when you want to extract something you can describe in CSS language. For example to get content of all columns whose label is referensi from file.html:
cat file.html | hxselect -i -c -s '\n' column[label=referensi]
With awk you can try:
awk -F'>|/<' '{ORS= (NR == 3 || NR == 7) ? " |" : "\n"} $2 != "" {print $2}' file
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30
Or shorter:
awk -F'>|/<' '{ORS= (NR%2) ? " |" : RS} $2 != "" {print $2}' file
[UPDATED QUESTION]
I've got a variable $CHANGED which stores the output of a subversion command like this: CHANGED="$(svnlook changed -r $REV $REPOS)".
Executing svnlook changed -r $REV $REPOS will output the following to the command line:
A /path/to/file
A /path/to/file2
A /path/to/file3
However, I need to store the output formatted as shown below in a variable $FILES:
A /path/to/file<br />A /path/to/file2<br />A /path/to/file3<br />
I need this for using $FILES in a command which generates an email massage like this:
sendemail [some-options] $FILES
It should to replace $FILES with A /path/to/file<br />A /path/to/file2<br />A /path/to/file3<br /> so that it can interpret the html break tags.
In bash:
echo "${VAR//$'\n'/<br />}"
See Parameter Expansion
The Parameter Expansion section of the man page is your friend.
Starting with
changed="
A /path/to/file
A /path/to/other/file
A /path/to/new/file
"
You can remove leading and trailing newlines using the # and % expansions:
files="${changed#$'\n'}"
files="${files%$'\n'}"
Then replace the other newlines with <br />:
files="${files//$'\n'/<br />}"
Demonstration:
printf '***%s***\n' "$files"
***A /path/to/file<br />A /path/to/other/file<br />A /path/to/new/file***
(Note that I've changed your all-uppercase variable names to lower case. Avoid uppercase names for your locals, as these tend to be used for communication via the environment.)
If you dislike writing newline as $'\n', you may of course store it in a variable:
nl=$'\n'
files="${changed#$nl}"
files="${files%$nl}"
files="${files//$nl/<br />}"
You can modify hek2mgl's answer to strip out the first <br /> (if any):
CHANGED="
A /path/to/file
A /path/to/other/file
A /path/to/new/file
"
FILES="$(echo "${CHANGED//$'\n'/<br />}" | sed 's#^<br />##g')"
echo "$FILES"
Output:
A /path/to/file<br />A /path/to/other/file<br />A /path/to/new/file<br />
Another way (with only sed):
FILES="$(echo "$CHANGED" | sed ':a;N;$!ba;s#\n#<br />#g;s#^<br />##g')"
I have many html files. each file contain the follwing line :
<img src="<BASE_HTTP_URL>bladf.gif" border="0" alt="" />
I need to extract first the html file name, and then the file name after BASE_HTTP_URL. in this case it is bladf.gif it can be any file name and many kind of extentions.
I have tried to extract the name of the file by using this awk :
for f in *.html
do
awk -F'"' '/img src=/{print $4}' $f
done
but i get zero as a result. how can i print the file name and next to it the file name next to BASE_HTTP_URL?
thanks
awk -F'"' '/img src=/{match($2, "(.*/)(.*)", url); print $2, url[1], url[2]}'
if I correctly understand your need.
Here's the sample output:
alex#rhyme ~ $ echo '<img src="http://some/url/bladf.gif" border="0" alt="" />' | awk -F'"' '/img src=/{match($2, "(.*/)(.*)", url); print $2, url[1], url[2];}'
http://some/url/bladf.gif http://some/url/ bladf.gif
alex#rhyme ~ $ awk --version
GNU Awk 4.0.2
Copyright (C) 1989, 1991-2012 Free Software Foundation.
What is your awk version?
Let's start with this:
$ cat file1.html
foo
<img src="<BASE_HTTP_URL>bladf.gif" border="0" alt="" />
bar
$ cat file2.html
foo
<img src="<BASE_HTTP_URL>whatever.gif" border="0" alt="" />
bar
$ awk -F'"' '/img src=/{print FILENAME, $2}' *.html
file1.html <BASE_HTTP_URL>bladf.gif
file2.html <BASE_HTTP_URL>whatever.gif
or:
$ awk -F'"' 'sub(/<img src="<BASE_HTTP_URL>/,""){print FILENAME, $1}' *.html
file1.html bladf.gif
file2.html whatever.gif
If none of that is what you wanted, update your question to clarify.
I have a file containing the following lines:
<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
<parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
<parameter name="RemoteHost" access="readWrite"></parameter>
<parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="PortMappingProtocol" access="readWrite"></parameter>
<parameter name="InternalClient" access="readWrite"></parameter>
<parameter name="PortMappingDescription" access="readWrite"></parameter>
I want to execute command on this file to extract only the parameter names as displayed in the following output:
$sedcommand file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
What could be this command?
grep was born to extract things:
grep -Po 'name="\K[^"]*'
test with your data:
kent$ echo '<parameter name="PortMappingEnabled" access="readWrite" type="xsd:boolean"></parameter>
<parameter name="PortMappingLeaseDuration" access="readWrite" activeNotify="canDeny" type="xsd:unsignedInt"></parameter>
<parameter name="RemoteHost" access="readWrite"></parameter>
<parameter name="ExternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="ExternalPortEndRange" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="InternalPort" access="readWrite" type="xsd:unsignedInt"></parameter>
<parameter name="PortMappingProtocol" access="readWrite"></parameter>
<parameter name="InternalClient" access="readWrite"></parameter>
<parameter name="PortMappingDescription" access="readWrite"></parameter>
'|grep -Po 'name="\K[^"]*'
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
sed 's/[^"]*"\([^"]*\).*/\1/'
does the job.
explanation of the part inside ' '
s - tells sed to substitute
/ - start of regex string to search for
[^"]* - any character that is not ", any number of times. (matching parameter name=)
" - just a ".
([^"]*) - anything inside () will be saved for reference to use later. The \ are there so the brackets are not considered as characters to search for. [^"]* means the same as above. (matching RemoteHost for example)
.* - any character, any number of times. (matching " access="readWrite"> /parameter)
/ - end of the search regex, and start of the substitute string.
\1 - reference to that string we found in the brackets above.
/ end of the substitute string.
basically s/search for this/replace with this/ but we're telling him to replace the whole line with just a piece of it we found earlier.
You want awk.
This would be a quick and dirty hack:
awk -F "\"" '{print $2}' /tmp/file.txt
PortMappingEnabled
PortMappingLeaseDuration
RemoteHost
ExternalPort
ExternalPortEndRange
InternalPort
PortMappingProtocol
InternalClient
PortMappingDescription
You should not parse XML using tools like sed, or awk. It's error-prone.
If input changes, and before name parameter you will get new-line character instead of space it will fail some day producing unexpected results.
If you are really sure, that your input will be always formated this way, you can use cut.
It's faster than sed and awk:
cut -d'"' -f2 < input.txt
It will be better to first parse it, and extract only parameter name attribute:
xpath -q -e //#name input.txt | cut -d'"' -f2
To learn more about xpath, see this tutorial: http://www.w3schools.com/xpath/
Explaining how you can use cut:
cat yourxmlfile | cut -d'"' -f2
It will 'cut' all the lines in the file based on " delimiter, and will take the 2nd field , which is what you wanted.
I want to run a shell script with this usage:
./run A.txt B.xml
A.txt contain some statistic:
Accesses = 1
Hits = 2
Misses = 3
Evictions = 4
Retries = 5
B.xml looks like:
<stat name="total_accesses" value="0"/>
<stat name="total_misses" value="0"/>
<stat name="conflicts" value="0"/>
I want to replace some stats in B.xml from A.txt. For example, I want to
1- find "Accesses" in A.txt
2- find "total_accesses" in B.xml
3- replace 0 with 1
1- find "Misses" in A.txt
2- find "total_misses" in B.xml
3- replace 0 with 3
So B.xml will look like:
<stat name="total_accesses" value="1"/>
<stat name="total_misses" value="3"/>
<stat name="conflicts" value="0"/>
I want to do that with shell "sed" command. However I find it quite complex as the regexp is hard to understand.
Does "sed" help me in this problem or I have to find another way?
It might be a bit heavy-weight for such a simple case, but here's a Python script that does the job:
#!/usr/bin/env python
import sys
import xml.etree.ElementTree as etree
# read A.txt; fill stats
stats = {}
for line in open(sys.argv[1]):
if line.strip():
name, _, count = line.partition('=')
stats["total_"+name.lower().strip()] = count.strip()
# read B.xml; fix to make it a valid xml; replace stat[#value]
root = etree.fromstring("<root>%s</root>" % open(sys.argv[2]).read())
for s in root:
if s.get('name') in stats:
s.set('value', stats[s.get('name')])
print etree.tostring(s),
Example
$ python fill-xml-template.py A.txt B.xml
<stat name="total_accesses" value="1" />
<stat name="total_misses" value="3" />
<stat name="conflicts" value="0" />
To process input files incrementally or to makes changes inplace you could use the following:
#!/usr/bin/env python
import fileinput
import sys
import xml.etree.ElementTree as etree
try: sys.argv.remove('-i')
except ValueError:
inplace = False
else: inplace = True # make changes inplace if `-i` option is specified
# read A.txt; fill stats
stats = {}
for line in open(sys.argv.pop(1)):
if line.strip():
name, _, count = line.partition('=')
stats["total_"+name.lower().strip()] = count.strip()
# read input; replace stat[#value]
for line in fileinput.input(inplace=inplace):
s = etree.fromstring(line)
if s.get('name') in stats:
s.set('value', stats[s.get('name')])
print etree.tostring(s)
Example
$ python fill-xml-template.py A.txt B.xml -i
It can read from stdin or process several files:
$ cat B.xml | python fill-xml-template.py A.txt
<stat name="total_accesses" value="1" />
<stat name="total_misses" value="3" />
<stat name="conflicts" value="0" />
Here is a shell script that does what you want:
#!/bin/bash
while read line
do
key=`echo $line | cut -d' ' -f1`
value=`echo $line | cut -d' ' -f3`
xmlLine=`grep -i $key $2`
if [ -n "$xmlLine" ]; then
for num in `seq 5`
do
field[${num}]=`echo "$xmlLine" | cut -d'"' -f${num}`
done
echo ${field[1]}\"${field[2]}\"${field[3]}\"$value\"${field[5]}
fi
done
You can copy it to a file say A.sh , give run permissions to it (chmod +x A.sh) and then:
./A.sh A.txt B.xml
Please mind that this code is not suitable for production and regex is paramount for these scripts.
while you can hack this on the command line, I'd recommend not to do this.
XML is way too fragile to be handled this way - use a proper XML library and parse the XML before manipulating it. Otherwise you could easily end up with broken XML. e.g. write a script in Ruby, Python, or Perl and use an XML library.