I am trying to tie multiple commands using sed - linux

Fist I am trying to print a file with the word 'Guess' in it and change the word fall to bar.
This what I have tried:
sed -n -e '/Guess/p' -e 's/Fall/bar/' data.txt
The commands work fine alone however, together only the first part is working.

To print line containing 'Guest' and change word 'Fall' in that line,
I would try experimenting with this command>
cat data.txt | sed -n '/Guest/{ s/Fall/bar/p }'
However, this print nothing, if the line with 'Guest' does not contain the word 'Fall'. (Both scenarios - Guest + Fall are required)
If you want to print line containing 'Guest' no matter if substitution finds a word 'Fall', I suggest trying:
cat data.txt | sed -n '/Guest/ { s/Fall/bar/;p }'

Related

UNIX: Grep a specific word and all the text following it

I have a variable in Unix, that stores multiple lines of alpha-numeric characters. I want to grep to a specific word and get all the text following it.
For example, $Variable contains:
Hello, User
Your files are:
File1 : Exists
File2 : None
Let us say I want to find File2, which is the last line and I want if it is Yes or None or whatever text is present after the colon and save it to another variable.
Use sed instead
sed -n '/the word you are looking for/,$p' <file name>
or since you said it was in a variable something more like:
echo "$variable" | sed -n '/the word you are looking for/,$p'
sed -n says do not print.
the pattern says from "the word you are looking for" to $ which is the end of file do the p command which is print :)
If you have to stop before the end of the file then you have to replace $ with the end pattern
If you just want to save the results to another variable:
new_variable=$(echo "$variable" | sed -n '/the word you are looking for/,$p')
Also note that is the string you are looking for has / in it then you must escape it with \ so it would look like
new_variable=$(echo "$variable" | sed -n '/the word you are\/ looking for/,$p')
So you have a variable defined as:
$ var="abc\ndef\nghi\njkl\nmn"
Then, if you want to print "line" containing "ghi" and following this way:
$ echo -e $var | sed -n '/ghi/,$p'
grep is to Globally search for a Regular Expression and Print the matching string. That is not what you want to do, you want to take a Stream of input and EDit it to output part of it. Guess what tool does THAT in UNIX.
$ echo "$var"
Hello, User
Your files are:
File1 : Exists
File2 : None
$ var2=$(echo "$var" | sed -n 's/^File2 : //p')
$ echo "$var2"
None
Given:
variable="Hello, User
Your files are:
File1 : Exists
File2 : None"
You can get the information for File2 into another variable file2 using:
file2=$(echo "$variable" | sed -n '/File2/ s/File2 *: *//p')
The double quotes preserve newlines in the variable. The -n suppresses the default printing. The pattern matches the line containing File2 followed by any number of spaces, a colon and any number of additional spaces; it is replaced by nothing, and the remainder of the line is printed by sed and that is captured in the variable file2. If there can be spaces in front of File2 in the data, you can arrange to match and remove them too.

Delete lines from a file matching first 2 fields from a second file in shell script

Suppose I have setA.txt:
a|b|0.1
c|d|0.2
b|a|0.3
and I also have setB.txt:
c|d|200
a|b|100
Now I want to delete from setA.txt lines that have the same first 2 fields with setB.txt, so the output should be:
b|a|0.3
I tried:
comm -23 <(sort setA.txt) <(sort setB.txt)
But the equality is defined for whole line, so it won't work. How can I do this?
$ awk -F\| 'FNR==NR{seen[$1,$2]=1;next;} !seen[$1,$2]' setB.txt setA.txt
b|a|0.3
This reads through setB.txt just once, extracts the needed information from it, and then reads through setA.txt while deciding which lines to print.
How it works
-F\|
This sets the field separator to a vertical bar, |.
FNR==NR{seen[$1,$2]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read. Thus, when FNR==NR, we are reading the first file, setB.txt. If so, set the value of associative array seen to true, 1, for the key consisting of fields one and two. Lastly, skip the rest of the commands and start over on the next line.
!seen[$1,$2]
If we get to this command, we are working on the second file, setA.txt. Since ! means negation, the condition is true if seen[$1,$2] is false which means that this combination of fields one and two was not in setB.txt. If so, then the default action is performed which is to print the line.
This should work:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p' setB.txt |sed -f- setA.txt
How this works:
sed -n 's#\(^[^|]*|[^|]*\)|.*#/^\1/d#p'
generates an output:
/^c|d/d
/^a|b/d
which is then used as a sed script for the next sed after the pipe and outputs:
b|a|0.3
(IFS=$'|'; cat setA.txt | while read x y z; do grep -q -P "\Q$x|$y|\E" setB.txt || echo "$x|$y|$z"; done; )
explanation: grep -q means only test if grep can find the regexp, but do not output, -P means use Perl syntax, so that the | is matched as is because the \Q..\E struct.
IFS=$'|' will make bash to use | instead of the spaces (SPC, TAB, etc.) as token separator.

print the duplicate lines using the sed command?

I am trying to print the duplicate lines in a file using the sed command.
In a file I have the following contents:
hi
hello
hi
how
hello
how can I print the duplicate lines in this file using sed command??
example: the output should be:
hi
hello
Not sure why it has to be in sed when you can use the uniq binary. Anywho, the file needs to be sorted so we have to do that first.
Using uniq and my preferred way:
$ sort file | uniq -d
hello
hi
Using GNU sed:
$ sort file | sed '$!N; s/^\(.*\)\n\1$/\1/; t; D'
hello
hi
We read the next line from input with the N command which appends the next line to pattern space separated by "\n" character.
$! prevents it from doing on the last line.
The substitution replaces two repeating strings with one.
The t command takes the script to the end where the current pattern space gets printed automatically.
If the substitution was not successful, D executes, deleting the non-repeated string.
The cycle continues and this way only the duplicate lines get printed once.
You can use process substitution if you please by doing <(sort file) to remove pipes.
Try something like:
sort file.txt | uniq -d
Sort the file and then print duplicate lines. If you wish to ignore the case then use -i option in uniq command.

How to delete 5 lines before and 6 lines after pattern match using Sed?

I want to search for a pattern "xxxx" in a file and delete 5 lines before this pattern and 6 lines after this match. How can i do this using Sed?
This might work for you (GNU sed):
sed ':a;N;s/\n/&/5;Ta;/xxxx/!{P;D};:b;N;s/\n/&/11;Tb;d' file
Keep a rolling window of 5 lines and on encountering the specified string add 6 more (11 in total) and delete.
N.B. This is a barebones solution and will most probably need tailoring to your specific needs. Questions such as: what if there are multiple string throughout the file? What if the string is within the first five lines or multiple strings are within five lines of each other etc etc etc.
Here's one way you could do it using awk. I assume that you also want to delete the line itself and that the file is small enough to fit into memory:
awk '{a[NR]=$0}/xxxx/{f=NR}END{for(i=1;i<=NR;++i)if(i<f-5||i>f+6)print a[i]}' file
Store every line into the array a. When the pattern /xxxx/ is matched, save the line number. After the whole file has been processed, loop through the array, only printing the lines you want to keep.
Alternatively, you can use grep to obtain the line number first:
grep -n 'xxxx' file | awk -F: 'NR==FNR{f=$1}NR<f-5||NR>f+6' - file
In both cases, the lines deleted will be surrounding the last line where the pattern is matched.
A third option would be to use grep to obtain the line number then use sed to delete the lines:
line=$(grep -nm1 'xxxx' file | cut -d: -f1)
sed "$((line-5)),$((line+6))d" file
In this case I've also added the -m switch so grep exits after finding the first match.
if you know, the line number (what is not difficult to obtain), you can use something like that:
filename="test"
start=`expr $curr_line - 5`
end=`expr $curr_line + 6`
sed "${start},${end}d" $filename (optionally sed -i)
of course, you have to remember about additional conditions like start shouldn't be less than 1 and end greater than number of lines in file.
Another - maybe more easy to follow - solution would be to use grep to find the keyword and the corresponding line:
grep -n 'KEYWORD' <file>
then use sed to get the line number only like this:
grep -n 'KEYWORD' <file> | sed 's/:.*//'
Now that you have the line number simply use sed like this:
sed -i "$(LINE_START),$(LINE_END) d" <file>
to remove lines before and/or after! With only the -i you will override the <file> (no backup).
A script example could be:
#!/bin/bash
KEYWORD=$1
LINES_BEFORE=$2
LINES_AFTER=$3
FILE=$4
LINE_NO=$(grep -n $KEYWORD $FILE | sed 's/:.*//' )
echo "Keyword found in line: $LINE_NO"
LINE_START=$(($LINE_NO-$LINES_BEFORE))
LINE_END=$(($LINE_NO+$LINES_AFTER))
echo "Deleting lines $LINE_START to $LINE_END!"
sed -i "$LINE_START,$LINE_END d" $FILE
Please note that this will work only if the keyword is found once! Adapt the script to your needs!

Error with a script in bash

I have a little error with a script I wrote in bash and I can't figure out what's I'm doing wrong
note that I'm using this script for thousands of calculations and this error happened only a few times (like 20 or so), but it still happened
What the script does is this: basically it takes in input a web page that I got from a site with the utility w3m and it counts all the occurrences of the words in it... After it orders them from the most common to the ones that occur only once
this is the code:
#!/bin/bash
# counts the numbers of words from specific sites #
# writes in a file the occurrences ordered from the most common #
touch check # file used to analyze the occurrences
touch distribution # final file ordered
page=$1 # the web page that needs to be analyzed
occurrences=$2 # temporary file for the occurrences
dictionary=$3 # dictionary used for another purpose (ignore this)
# write the words one by column
cat $page | tr -c [:alnum:] "\n" | sed '/^$/d' > check
# lopp to analyze the words
cat check | while read words
do
word=${words}
strlen=${#word}
# ignores blacklisted words or small ones
if ! grep -Fxq $word .blacklist && [ $strlen -gt 2 ]
then
# if the word isn't in the file
if [ `egrep -c -i "^$word: " $occurrences` -eq 0 ]
then
echo "$word: 1" | cat >> $occurrences
# else if it is already in the file, it calculates the occurrences
else
old=`awk -v words=$word -F": " '$1==words { print $2 }' $occurrences`
### HERE IS THE ERROR, EITHER THE LET OR THE SED ###
let "new=old+1"
sed -i "s/^$word: $old$/$word: $new/g" $occurrences
fi
fi
done
# orders the words
awk -F": " '{print $2" "$1}' $occurrences | sort -rn | awk -F" " '{print $2": "$1}' > distribution
# ignore this, not important
grep -w "1" distribution | awk -F ":" '{print $1}' > temp_dictionary
for line in `cat temp_dictionary`
do
if ! grep -Fxq $line $dictionary
then
echo $line >> $dictionary
fi
done
rm check
rm temp_dictionary
this is the error: (I'm translating it, so it could be different in english)
./wordOccurrences line:30 let:x // where x is a number, usually 9 or 10 (but also 11, 13, etc)
1: syntax error in the espression (the error token is 1)
sed: expression -e #1, character y: command 's' not terminated // where y is another number (this one is also usually 9 or 10) with y being different from x
EDIT:
Talking with kev it looks like it's a newline problem
I added an echo between let and sed to print the sed and it worked perfectly for like 5 to 10 minutes until that error. Usually the sed without error looked like this:
s/^CONSULENTI: 6$/CONSULENTI: 7/g
but when I got the error it was like this:
s/^00145: 1
1$/00145: 4/g
how to fix this?
If you get a new line in $old, it means awk prints two lines so there is a duplicate in $occurences.
The script seems complicated to count words, and not efficient because it launches many processes and process file in a loop ;
maybe you can do something similar with
sort | uniq -c
You should also consider that your case-insensitivity is not consistent throughout the program. I created a page with just "foooo" in it and ran the program, then created one with "Foooo" in it and ran the program again. The 'old=`awk...' line sets 'old' to the empty string because awk is matching case sensitively. This results in the occurrences file not being updated. The subsequent sed and possibly some of the greps are also case sensitive.
This may not be the only error since it doesn't explain the error message you saw, but it is an indication that the same word with different capitalization will be handled erroneously by your script.
The following would separate the words, lowercase them, and then remove the ones smaller than three characters:
tr -cs '[:alnum:]' '\n' <foo | tr '[:upper:]' '[:lower:]' | egrep -v '^.{0,2}$'
Using this at the front of your script would mean that the rest of the script would not have to be case insensitive to be correct.

Resources