How to manipulate text with awk? - text

How would I be able to manipulate the output text of grep.
Right now I am using the command:
grep -i "<url>" $file >> ./txtFiles/$file.txt
This would output something like this:
<url>http://www.simplyrecipes.com/recipes/chicken_curry_salad/</url>
and then the next text will go to the next line.
How would I be able to get rid of the <url> and </url> and stop it from going to the next line at the end.

sed '/<\/*url>/!d;s///g'
<\/*url> matches both start and end tag
Delete lines that don't have this
Then remove all cases of this pattern
With your example, it might look like this
sed '/<\/*url>/!d;s///g' $file >> ./txtFiles/$file.txt

Single commands:
sed -in '/<url>/ { s|<url>\(.*\)</url>|\1| ; p ; }' INPUT > OUTPUT
Or with awk:
awk -F "</?url>" '/<url>/ { print $2 }' INPUT > OUTPUT
Note: both might give you invalid output if more than one <url>...</url> patterns are occurring on a single line. The sed version might fail if the <url>...</url> contains any pipe (|) character.

Related

Delete everything after pattern including pattern

I have a text file like
some
important
content
goes here
---from here--
some
unwanted content
I am trying to delete all lines after ---from here-- including ---from here--. That is, the desired output is
some
important
content
goes here
I tried sed '1,/---from here--/!d' input.txt but it's not removing the ---from here-- part. If I use sed '/---from here--.*/d' input.txt, it's only removing ---from here-- text.
How can I remove lines after a pattern including that pattern?
EDIT
I can achieve it by doing the first operation and pipe its output to second, like sed '1,/---from here--/!d' input.txt | sed '/---from here--.*/d' > outputput.txt.
Is there a single step solution?
Another approach with sed:
sed '/---from here--/,$d' file
The d(delete) command is applied to all lines from first line containing ---from here-- up to the end of file($)
Another awk approach:
awk '/---from here--/{exit}1' file
If you have GNU awk 4.1.0+, you can add -i inplace to change the file in-place.
Otherwise appened | tee file to change the file in-place.
I'm not positive, but I believe this will work:
sed -n '/---from here--/q; p' file
The q command tells sed to quit processing input lines after matching a given line.
Could you please try following(in case you are ok with awk).
awk '/--from here--/{found_from=1} !found_from{print}' Input_file
You can try Perl
perl -ne ' $x++ if /---from here--/; print if !$x '
using your inputs..
$ cat johnykutty.txt
some
important
content
goes here
---from here--
some
unwanted content
$ perl -ne ' $x++ if /---from here--/; print if !$x ' johnykutty.txt
some
important
content
goes here
$

Linux: Append variable to end of line using line number as variable

I am new to shell scripting. I am using ksh.
I have this particular line in my script which I use to append text in a variable q to the end of a particular line given by the variable a
containing the line number .
sed -i ''$a's#$#'"$q"'#' test.txt
Now the variable q can contain a large amount of text, with all sorts of special characters, such as !##$%^&*()_+:"<>.,/;'[]= etc etc, no exceptions. For now, I use a couple of sed commands in my script to remove any ' and " in this text (sed "s/'/ /g" | sed 's/"/ /g'), but still when I execute the above command I get the following error
sed: -e expression #1, char 168: unterminated `s' command
Any sed, awk, perl, suggestions are very much appreciated
The difficulty here is to quote (escape) the substitution separator characters # in the sed command:
sed -i ''$a's#$#'"$q"'#' test.txt
For example, if q contains # it will not work. The # will terminate the replacement pattern prematurely. Example: q='a#b', a=2, and the command expands to
sed -i 2s#$#a#b# test.txt
which will not append a#b to the end of line 2, but rather a#.
This can be solved by escaping the # characters in q:
sed -i 2s#$#a\#b# test.txt
However, this escaping could be cumbersome to do in shell.
Another approach is to use another level of indirection. Here is an example of using a Perl one-liner. First q is passed to the script in quoted form. Then, within the script the variable assigned to a new internal variable $q. Using this approach there is no need to escape the substitution separator characters:
perl -pi -E 'BEGIN {$q = shift; $a = shift} s/$/$q/ if $. == $a' "$q" "$a" test.txt
Do not bother trying to sanitize the string. Just put it in a file, and use sed's r command to read it in:
echo "$q" > tmpfile
sed -i -e ${a}rtmpfile test.txt
Ah, but that creates an extra newline that you don't want. You can remove it with:
sed -e ${a}rtmpfile test.txt | awk 'NR=='$a'{printf $0; next}1' > output
Another approach is to use the patch utility if present in your system.
patch test.txt <<-EOF
${a}c
$(sed "${a}q;d" test.txt)$q
.
EOF
${a}c will be replaced with the line number followed by c which means the operation is a change in line ${a}.
The second line is the replacement of the change. This is the concatenated value of the original text and the added text.
The sole . means execute the commands.

How to use grep and sed in order to replace the substring after searching some specific string?

I want to know how to use two 'grep' and 'sed' utilities or something else in order to replace the substring. I will explain what I want to do below.
We have the file 'test.txt' with the following string:
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='AA5', A6='keyword_A'
After searching 'keyword_A' using grep, I want to replace the value of A5 with other string, for example, "NEW".
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='NEW', A6='keyword_A'
I tried to use two commands like
grep keyword_A test.txt | sed -e 's/blabla/blabla/'
After trying all I know, I gave up at all.
Please let me know the right solution.
First, you never need grep and sed. Sed has a full regular-expression search engine, so it is a superset of grep. This command will read test.txt, change the lines that you've indicated, and print the entire result on standard output:
sed "/keyword_A/s/A5{ATTR}='[A-Z0-9]*'/A5{ATTR}='NEW'/g" < test.txt
If you want to store the results back into the file test.txt, use the -i (in-place editing) switch to sed:
sed "/keyword_A/s/A5{ATTR}='[A-Z0-9]*'/A5{ATTR}='NEW'/g" -i.bak test.txt
If you want to select only the indicated lines, modify those, and print only those lines to standard out, use a combination of the p (print) command and the -n (no output) switch.
sed "/keyword_A/s/A5{ATTR}='[A-Z0-9]*'/A5{ATTR}='NEW'/gp" -n test.txt
Using grep+sed is always the wrong approach. Here's one way to do it with GNU awk:
$ awk '/keyword_A/{ $0=gensub(/(A5({[^}]+})?=\047)[^\047]+/,"\\1NEW",1) } 1' file
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='NEW', A6='keyword_A'
Using a couple variables you could define the keyword and replacement ( if they change at all ):
q="keyword_A"
r="NEW"
Then with sed:
sed -r "s/^(.+\{.+\}=')(.+)('.+"${q}".+)$/\1"${r}"\3/" file
Result:
A1='AA1', A2='AA2', A3='AA3', A4='AA4', A5{ATTR}='NEW', A6='keyword_A'
A5="NEW"
A6="keyword_A"
# with sed
sed "s/='[^']*\(',[[:blank:]]*A6='${A6}'\)/='${A5}\1/" YourFile
# with awk
awk -F "'" -v A5="${A5}" -v A6="${A6}" '
BEGIN { OFS="\047" }
$12 == A6 { $10 = A5; $0 = $0 }
7
' YourFile
Change by the end of the string, for sed and using ' as field separator in awk instead of traditional space.
assuming there is no ' in value (or need to treat the escaping method) for awk version
We can just directly replace the fifth column when the sting keyword_A is found as shown below:
awk -F, 'BEGIN{OFS=",";}/keyword_A/{$5="A5{ATTR}='"'"NEW"'"'"}1' filename
Couple of slight alternatives:
sed -r "/keyword_A/s/(A5[^']*')[^']*/\1NEW/"
awk -F"'" '/keyword_A/{$10 = "NEW"}1' OFS="'"
Of course the negative with awk is afterwards you would have to rename the new file.

Separate a text file with sed

I have the following sample file:
evtlog.161202.002609.debugevtlog.161201.162408.debugevtlog.161202.011046.debugevtlog.161202.002809.debugevtlog.161201.160035.debugevtlog.161201.155140.debugevtlog.161201.232156.debugevtlog.161201.145017.debugevtlog.161201.154816.debug
I want to separate the string and add a newline after matching "debug" like this:
evtlog.161202.002609.debug
evtlog.161201.162408.debug
So far I tried almost everything with sed, but it doesn't seem to do what I want.
sed 's/debug/{G}' latest_evtlogs.out
sed '/debug/i "SAD"' latest_evtlogs.out
etc...
sed 's/debug/\n/g' latest_evtlogs.out doesn't work when I add it as a pipe in the script , but it does when I run it manually.
Here's how I generate the file:
printf $(ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/\n/g') >> latest_evtlogs.out
Initially I wanted to just add newline with awk, but it doesn't work either.
Any ideas why I can't separate the string with a newline ?
I'm using :
Distributor ID: Debian
Description: Debian GNU/Linux 5.0.10 (lenny)
Release: 5.0.10
Codename: lenny
Just add a new line after debug:
sed 's/debug/&\n/g' file
Note & prints back the matched text, so it is a way to print "debug" back.
This returns:
evtlog.161202.002609.debug
evtlog.161201.162408.debug
evtlog.161202.011046.debug
evtlog.161202.002809.debug
evtlog.161201.160035.debug
evtlog.161201.155140.debug
evtlog.161201.232156.debug
evtlog.161201.145017.debug
evtlog.161201.154816.debug
The problem is, that you are using the output of sed in a command expansion. In this context your shell will replace all newlines with spaces. The spaces are then used to do the word splitting, so that printf sees each line as a separate argument, interpreting the first line as the format argument and ignoring the rest as there are printf-placeholders in the format.
It should work if you drop the outer printf $() from your command and just redirect the output from your pipeline to your file:
ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/\n/g' >> latest_evtlogs.out
Maybe Perl is "happier" than sed on your system:
perl -pe 's/debug/&\n/g' < YourLogFile
Get will append what is in the hold buffer unto the pattern space (Usually just the current line read from the input file) So this cannot be used.
insert will print the specified text to standard output. So this cannot be used.
What you you want to to replace all debug with debug^J, where ^J is a newline, dependent on the sed version, you can either do:
sed 's/debug/&\n/g' input_file
But \n is - afaik - not strictly specified in POSIX sed. One can however use c strings:
sed 's/debug/&'$'\n''/g' input_file
Or a multi line string:
sed 's/debug/&\
/g' input_file
Thank you all for the answers.I finally did it like this :
echo $(ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/&\n/g') > temp.out
sed 's/ /\n/g' /share/sqa/dumps/5314577631/checks/temp.out > latest_evtlogs.out
It's not at all elegant, but it finally works.

UNIX: Grep a specific word and all the text following it

I have a variable in Unix, that stores multiple lines of alpha-numeric characters. I want to grep to a specific word and get all the text following it.
For example, $Variable contains:
Hello, User
Your files are:
File1 : Exists
File2 : None
Let us say I want to find File2, which is the last line and I want if it is Yes or None or whatever text is present after the colon and save it to another variable.
Use sed instead
sed -n '/the word you are looking for/,$p' <file name>
or since you said it was in a variable something more like:
echo "$variable" | sed -n '/the word you are looking for/,$p'
sed -n says do not print.
the pattern says from "the word you are looking for" to $ which is the end of file do the p command which is print :)
If you have to stop before the end of the file then you have to replace $ with the end pattern
If you just want to save the results to another variable:
new_variable=$(echo "$variable" | sed -n '/the word you are looking for/,$p')
Also note that is the string you are looking for has / in it then you must escape it with \ so it would look like
new_variable=$(echo "$variable" | sed -n '/the word you are\/ looking for/,$p')
So you have a variable defined as:
$ var="abc\ndef\nghi\njkl\nmn"
Then, if you want to print "line" containing "ghi" and following this way:
$ echo -e $var | sed -n '/ghi/,$p'
grep is to Globally search for a Regular Expression and Print the matching string. That is not what you want to do, you want to take a Stream of input and EDit it to output part of it. Guess what tool does THAT in UNIX.
$ echo "$var"
Hello, User
Your files are:
File1 : Exists
File2 : None
$ var2=$(echo "$var" | sed -n 's/^File2 : //p')
$ echo "$var2"
None
Given:
variable="Hello, User
Your files are:
File1 : Exists
File2 : None"
You can get the information for File2 into another variable file2 using:
file2=$(echo "$variable" | sed -n '/File2/ s/File2 *: *//p')
The double quotes preserve newlines in the variable. The -n suppresses the default printing. The pattern matches the line containing File2 followed by any number of spaces, a colon and any number of additional spaces; it is replaced by nothing, and the remainder of the line is printed by sed and that is captured in the variable file2. If there can be spaces in front of File2 in the data, you can arrange to match and remove them too.

Resources