How to extract string between 2 xml tags? - linux

I have a string like this
<anytag>my message</anytag>
How I can extract the message between the tags with sed or awk?
So I get only "my message"

Using xmllint (from libxml2):
xmllint --xpath '//anytag/text()' <(echo "<anytag>my message</anytag>")

try:
awk -F'[><]' '{print $3}' Input_file
Making field separator as '[><]' and printing 3rd field.

sed 's/<.*>\(.*\)<\/.*>/\1/g' file

I do not want to install xml paser for a lite extract string, my xml
message is not complicated
For simple strings you may use the following sed approach:
s="<anytag>my message</anytag>"
sed 's~<[^<>]*>\([^<>]*\)</[^<>]*>~\1~' <<< $s
The output:
my message

You can use the following awk command if each line of your file is in the format you have shown.
awk -F "<[^<]+?>" '{print $2;}' <filename>
Input:
<anytag>my message</anytag>
<mytag>abc</mytag>
Output:
my message
abc

Related

How do you change column names to lowercase with linux and store the file as it is?

I am trying to change the column names to lowercase in a csv file. I found the code to do that online but I dont know how to replace the old column names(uppercase) with new column names(lowercase) in the original file. I did something like this:
$cat head -n1 xxx.csv | tr "[A-Z]" "[a-z]"
But it simply just prints out the column names in lowercase, which is not enough for me.
I tried to add sed -i but it did not do any good. Thanks!!
Using awk (readability winner) :
concise way:
awk 'NR==1{print tolower($0);next}1' file.csv
or using ternary operator:
awk '{print (NR==1) ? tolower($0): $0}' file.csv
or using if/else statements:
awk '{if (NR==1) {print tolower($0)} else {print $0}}' file.csv
To change the file for real:
awk 'NR==1{print tolower($0);next}1' file.csv | tee /tmp/temp
mv /tmp/temp file.csv
For your information, sed using the in place edit switch -i do the same: it use a temporary file under the hood.
You can check this by using :
strace -f -s 800 sed -i'' '...' file
Using perl:
perl -i -pe '$_=lc() if $.==1' file.csv
It replace the file on the fly with -i switch
You can use sed to tell it to replace the first line with all lower-case and then print the rest as-is:
sed '1s/.*/\L&/' ./xxx.csv
Redirect the output or use -i to do an in-place edit.
Proof of Concept
$ echo -e "COL1,COL2,COL3\nFoO,bAr,baZ" | sed '1s/.*/\L&/'
col1,col2,col3
FoO,bAr,baZ

Print between special characters with sed,grep

I need to print the string between these characters....
atob(' ')
I am using a = in the second part as an attempt to stop the code on an equal signs (which the base64 string I'm trying to get ends in.)
I use this script, but it prints the entire line containing the above characters. I need just the data in between.
sed -n '/atob/,${p;/==/q;}'
I appreciate any help. Thank you.
Does this work (tested for GNU sed 4.2.2)?
 sed -n -e "s/atop('\(.*\)')/\1/p" b.txt
where b.txt is
atop('safdasdfasf')
or you can try awk
awk -F\' '/atop/ {print $2}' b.txt
(tested for gnu awk 4.0.2 and added the suggestion by Jotne)
And another working sed:
echo "atop('safdasdfasf')" | sed -r "/atop/ s/^[^']+'([^']+)'.*/\1/"
safdasdfasf

Strip text from output in rhel

So im trying to strip away some of the text from this output using awk
This is my output ,
href="/warning:understand-how-this-works!/5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsrePxLM2Uo">+</a>
href="/warning:understand-how-this-works!/5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsrePxLM2Uo">+</a>
href="/warning:understand-how-this-works!/5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsrePxLM2Uo">+</a>
Basically, I am trying to take that info, from the output of a text file,Remove this part:
href="/warning:understand-how-this-works!/
and this part
">+</a>
So it only shows:
5HpHagT65TZzG1PH3CSu63k8DbpvD8s5ip4nEB3kEsrePxLM2Uo
or, outputs that.
Running on centos 6
Could you please try following and let me know if this helps you.
awk '{sub(/.*!\//,X,$0);sub(/\".*/,X,$0);print}' Input_file
You can use grep if you want:
grep -oP '!/\K.*?(?=")' inputfile
Or awk by playing around FS :
awk -F'!/|">' '{print $2}' input
Or use sed backrefrencing:
sed -r 's/(^.*\!\/)(.*?)(">.*)/\2/g' input

awk to print some parameters of a line

I have lines in a file in linux, and i am trying print the line without the | and without some parameters
$cat file
2013-07-15,Provider 1.99,3|30000055|2347|0,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider 1.99,3|30000055|2347|0,12222,44,12,0,0,0,33,aaa,bbb
and i need the output like:
2013-07-15,Provider,2347,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider,2347,12222,44,12,0,0,0,33,aaa,bbb
and i am trying with awk, but i have some problems.
If your lines have similar pattern you would to retain then you can do:
awk 'BEGIN{FS=OFS=","}{$2="Provider";$3=2347}1' file
If you don't know what the patterns are then here is a more generic one:
awk 'BEGIN{FS=OFS=","}{split($2,a,/ /);split($3,b,/\|/);$2=a[1];$3=b[3]}1' file
If it doesn't solve your problem, I am pretty sure it would help you guide to get one.
Using sed:
sed 's/ [^|]*|[^|]*|\([^|]*\)|[^,]/,\1/' input
and some shorter version:
sed 's/ .*|\([^|]*\)|[^,]*/,\1/' input
and even shorter:
sed 's/ .*|\(.*\)|[^,]*/,\1/' input
Use awk, and let blank or comma or pipe be the field separators:
awk -F '[[:blank:],|]' -v OFS=, '{
print $1,$2,$6,$8,$9,$10,$11,$12,$13,$14,$15,$16
}' file
2013-07-15,Provider,2347,12222,1,3,0,0,0,19,aaa,bbb
2013-07-15,Provider,2347,12222,44,12,0,0,0,33,aaa,bbb

How to reverse order of fields using AWK?

I have a file with the following layout:
123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010
How can I convert it into the following by using AWK?
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2009-12-01
Didn't read the question properly the first time. You need a field separator that can be either a dash or a comma. Once you have that you can use the dash as an output field separator (as it's the most common) and fake the comma using concatenation:
awk -F',|-' 'OFS="-" {print $1 "," $4,$3,$2}' file
Pure awk
awk -F"," '{ n=split($2,b,"-");$2=b[3]"-"b[2]"-"b[1];$i=$1","$2 } 1' file
sed
sed -r 's/(^.[^,]*,)([0-9]{2})-([0-9]{2})-([0-9]{4})/\1\4-\3-\2/' file
sed 's/\(^.[^,]*,\)\([0-9][0-9]\)-\([0-9][0-9]\)-\([0-9]\+\)/\1\4-\3-\2/' file
Bash
#!/bin/bash
while IFS="," read -r a b
do
IFS="-"
set -- $b
echo "$a,$3-$2-$1"
done <"file"
Unfortunately, I think standard awk only allows one field separator character so you'll have to pre-process the data. You can do this with tr but if you really want an awk-only solution, use:
pax> echo '123,01-08-2006
124,01-09-2007
125,01-10-2009
126,01-12-2010' | awk -F, '{print $1"-"$2}' | awk -F- '{print $1","$4"-"$3"-"$2}'
This outputs:
123,2006-08-01
124,2007-09-01
125,2009-10-01
126,2010-12-01
as desired.
The first awk changes the , characters to - so that you have four fields separated with the same character (this is the bit I'd usually use tr ',' '-' for).
The second awk prints them out in the order you specified, correcting the field separators at the same time.
If you're using an awk implementation that allows multiple FS characters, you can use something like:
gawk -F ',|-' '{print $1","$4"-"$3"-"$2}'
If it doesn't need to be awk, you could use Perl too:
$ perl -nle 'print "$1,$4-$3-$2" while (/(\d{3}),(\d{2})-(\d{2})-(\d{4})\s*/g)' < file.txt

Resources