How to add characters in word and replace it using sed command in linux - linux

I have one requirement.
I have one text file named as a.txt, which is having list of words -
GOOGLE
FACEBBOK
Now I have one another file named as b.txt , which is having content as
Company name is google.
Company name is facebook.
Like this n of lines are there with different different words.
Then I am writing script file -
FILENAME="a.txt"
SCHEMA=$(cat $FILENAME)
for L in $SCHEMA
do
echo "${L,,}"
sed -i -E "s/.+/\L&_/" b.txt
done
So after running script the output file of b.txt file I am expecting is
Company name is google_
Company name is facebook_
But the output after running that script I am getting is -
Company name is google.__
Company name is facebook.__
And this output will be saved in b.txt file as I mentioned in sed command
Note - In a.txt I am having the list of Words which I want to replace and in b.txt file I am having paragraphs of line in which I am having words like google. , facebook. and so on.
So that's why I am not able to give direct sed command for replacement.
I hope that you understand my requirement.
Thanks in advance!

You can use the following GNU sed solution:
FILENAME="a.txt"
while IFS= read -r L; do
sed -i "s/\($L\)\./\1_/gI" b.txt
done < $FILENAME
Or, the same without a loop as a single line (as used in anubhava's answer):
sed -i -f <(printf 's/\\(%s\\)\\./\\1_/gI\n' $(<"$FILENAME")) b.txt
With the script, you
while IFS= read -r L; do - read the file line by line, each line being assigned to L
sed -i "s/\($L\)\./\1_/gI" b.txt - replaces all occurrences of L (captured into Group 1 with the help of capturing \(...\) parentheses) followed with . (in a case insensitive way due to I flag) in b.txt with the same value as captured in Group 1 and _ appended to it.
-f allows passing a list of commands to sed
printf 's/\\(%s\\)\\./\\1_/gI\n' $(<"$FILENAME") creates a list of sed commands, in this case, it looks like
s/\(GOOGLE\)\./\1_/gI
s/\(FACEBOOK\)\./\1_/gI

Here is how you can do it in a single shell command without any loop using gnu-sed with printf in a process substitution:
sed -i -E -f <(printf 's/\\b(%s)\\./\\1_/I\n' $(<a.txt)) b.txt
cat b.txt
Company name is google_
Company name is facebook_
This would be far more efficient than running sed or awk in a loop esp if input files are big in size.
printf command is creating a sed command script that looks like this:
s/\b(GOOGLE)\./\1_/I
s/\b(FACEBOOK)\./\1_/I
sed -f runs that dynamically generated script

With a single awk reading 2 Input_files could you please try following.
awk '
FNR==NR{
a[tolower($0)]
next
}
($(NF-1) in a){
sub(/\.$/,"")
print $0"_"
}
' a.txt FS="[ .]" b.txt
Explanation: Adding detailed explanation for above solution.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when a.txt is being read.
a[tolower($0)] ##Creating array a with index of current line in lower case from a.txt here.
next ##next will skip all further statements from here.
}
($(NF-1) in a){ ##Checking condition if 2nd last field is present in array a then do following.
sub(/\.$/,"") ##Substituting last DOT with NULL here.
print $0"_" ##Printing current line with _ here.
}
' a.txt FS="[ .]" b.txt ##Mentioning a.txt and setting field separator as space and . for b.txt here.
2nd solution: Adding 1 more solution with awk here.
awk '
FNR==NR{
a[tolower($0)]
next
}
{
sub(/\.$/,"")
}
($NF in a){
print $0"_"
}
' a.txt b.txt

This might work for you (GNU sed):
sed 's#.*#s/(&)./\\1_/Ig#' a.txt | sed -i -Ef - b.txt
N.B. The match is case insensitive because of the I flag on the substitution command, however the replacement is from the original file i.e. if the original string is google the match is case insensitive to GOOGLE and replaced by google_.

Related

Capturing string between 2 specific letters/words using shell scripting

I am trying to capture the string between 2 specific letters/words using sed/awk. This is what I am trying to do:
The input is a file test.log containing
Owner: CN=abc.samplecerrt.com,o=IN,DC=com
Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
I want to extract only "CN=abc.samplecerrt.com"
I tried
sed 's/.*CN=\(.*\),.*/\1/p' test.log >> result.log
But this returns "abc.samplecerrt.com,o=IN,DC=com"
How do I go about this?
test file:
$ cat logs.txt
CN=abc.samplecerrt.com,o=IN,DC=com Owner: CN=abc1.samplecerrt.com,o=IN,DC=com
command and output:
$ grep -oP 'CN=(?:(?!CN=).)*?.com' logs.txt
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com
This might work for you (GNU sed):
sed -n 's/.*\(CN=[^,]*\).*/\1/p' file
Or:
sed 's/.*\(CN=[^,]*\).*/\1/p;d' file
The first turns off implicit printing -n so as to act like grep.
Matches and captures the string CN= followed by zero or more non-comma characters and prints the captured group \1 if a match is made.
The second solution is much the same except it deletes all lines and only prints the captured group as above.
With awk you can get the field where is the string you need. For it, you can set FS=:|, Now if you run
awk -v FS=":|," '{print $2}' file
CN=abc.samplecerrt.com
CN=abc1.samplecerrt.com
you get the field. But you only want one, so
awk -v FS=":|," '$2 !~ /abc1/ {print $2}' file
CN=abc.samplecerrt.com

Number lines and hide the empty ones

I am trying to number the lines of a txt file and hide the empty ones . I use this code :
cat -n file.txt | grep . file.txt
But it doesnt work . It ignores the cat command . I want to display all the non-empty lines and number them ( the txt file is not a static one , like a list that a user can type in ).
edit : Given the great solutions below , i would also add that grep . file.txt | cat -n also worked .
I assume you want to number the lines that remain after the empty lines are removed.
Solution #1
Use sed '/^$/d' to delete the empty lines then pipe its output to cat -n to number them:
sed '/^$/d' file.txt | cat -n
The sed program contains only one command: d (delete the line). The sed commands can be prefixed by zero, one or two addresses that tell what lines the command applies to.
In this case there is only one address /^$/. It is a regex (enclosed in /) that selects the empty lines; the lines where start of the line (^) is followed by the end of the line ($).
Solution #2
You can also use grep -v '^$' to filter out the empty lines:
grep -v '^$' file.txt | cat -n
Again, ^$ is a regular expression that matches the empty lines. -v reverses the condition and tells grep to display the lines that do not match the regex.
The commands above do not modify the file. They read the content of file.txt, process it and display the result on screen.
Update
As #robc suggests in a comment, nl is even better than cat -n to number the lines. Thank you #robc, I didn't know about nl until now (I didn't know about cat -n either). It is never too late to learn new things.
This could be easily done with awk. This will print line with line numbers and ignore empty lines.
awk 'NF{print FNR,$0}' file.txt
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
NF{ ##Checking condition if NF(number of fields) is NOT NULL in current line then do following.
print FNR,$0 ##Printing current line number by mentioning FNR and then current line value.
}
' file.txt ##Mentioning Input_file name which we are passing to awk program here.

how to show the third line of multiple files

I have a simple question. I am trying to check the 3rd line of multiple files in a folder, so I used this:
head -n 3 MiseqData/result2012/12* | tail -n 1
but this doesn't work obviously, because it only shows the third line of the last file. But I actually want to have last line of every file in the result2012 folder.
Does anyone know how to do that?
Also sorry just another questions, is it also possible to show which file the particular third line belongs to?
like before the third line is shown, is it also possible to show the filename of each of the third line extracted from?
because if I used head or tail command, the filename is also shown.
thank you
With Awk, the variable FNR is the number of the "record" (line, by default) in the current file, so you can simply compare it to 3 to print the third line of each input file:
awk 'FNR == 3' MiseqData/result2012/12*
A more optimized version for long files would skip to the next file on match, since you know there's only that one line where the condition is true:
awk 'FNR == 3 { print; nextfile }' MiseqData/result2012/12*
However, not all Awks support nextfile (but it is also not exclusive to GNU Awk).
A more portable variant using your head and tail solution would be a loop in the shell:
for f in MiseqData/result2012/12*; do head -n 3 "$f" | tail -n 1; done
Or with sed (without GNU extensions, i.e., the -s argument):
for f in MiseqData/result2012/12*; do sed '3q;d' "$f"; done
edit: As for the additional question of how to print the name of each file, you need to explicitly print it for each file yourself, e.g.,
awk 'FNR == 3 { print FILENAME ": " $0; nextfile }' MiseqData/result2012/12*
for f in MiseqData/result2012/12*; do
echo -n `basename "$f"`': '
head -n 3 "$f" | tail -n 1
done
for f in MiseqData/result2012/12*; do
echo -n "$f: "
sed '3q;d' "$f"
done
With GNU sed:
sed -s -n '3p' MiseqData/result2012/12*
or shorter
sed -s '3!d' MiseqData/result2012/12*
From man sed:
-s: consider files as separate rather than as a single continuous long stream.
You can do this:
awk 'FNR==3' MiseqData/result2012/12*
If you like the file name as well:
awk 'FNR==3 {print FILENAME,$0}' MiseqData/result2012/12*
This might work for you (GNU sed & parallel):
parallel -k sed -n '3p\;3q' {} ::: file1 file2 file3
Parallel applies the sed command to each file and returns the results in order.
N.B. All files will only be read upto the 3rd line.
Also,you may be tempted (as I was) to use:
sed -ns '3p;3q' file1 file2 file3
but this will only return the first file.
Hi bro I am answering this question as we know FNR is used to check no of lines so we can run this command to get 3rd line of every file.
awk 'FNR==3' MiseqData/result2012/12*

Bash - delete rows from one file while iterating through rows from another file

I have two files.
file.txt and delete.txt
file.txt contains the following for ex.:
/somedirectory/
/somedirectory2/somefile.txt
/anotherfile.txt
delete.txt contains:
/somedirectory/
/somedirectory2/somefile.txt
I need to delete the rows from file.txt that are contained within delete.txt
cat file.txt should result with:
/anotherfile.txt
So far I've tried this with no luck:
while read p; do
line=$p;
sed -i '/${line}/d' file.txt;
done < delete.txt
I don't receive any error, it just doesn't edit my file.txt file. I've tested the while loop with an echo ${line} and it works as expected in that scenario.
Any suggestions?
Note: the while loop above doesn't work as expected even when I took the forward slashes off of the files.
With a simple grep:
grep -vFxf delete.txt file.txt > temp.txt && mv temp.txt file.txt
With awk:
awk 'NR==FNR {a[$0]=1; next}; !a[$0]' delete.txt file.txt
NR==FNR {a[$0]=1; next} is only executed for the delete.txt file (first file argument); associative array a has the records as keys, and 1 as the value for every key
!a[$0] is executed for the second file argument i.e. file.txt; printing (default action) the record(s) that are not present in the array a as key(s)
Example:
% cat delete.txt
/somedirectory/
/somedirectory2/somefile.txt
% cat file.txt
/somedirectory/
/somedirectory2/somefile.txt
/anotherfile.txt
% awk 'NR==FNR {a[$0]=1; next}; !a[$0]' delete.txt file.txt
/anotherfile.txt
The ${line} inside the single quotes will not get expanded, so it would be looking for the actual string ${line} in file.txt. Put that into your sample file to see if it gets removed.
However, you'll still have problems because the slashes inside delete text will get interpreted by sed, so the regular expression won't get properly delimited. You'd have to jump through some hoops to get every character in your input line properly quoted as literal in order to use sed.
Use comm instead:
comm -23 file.txt delete.txt
The input files must be sorted, though. If they are not:
comm -23 <(sort file.txt) <(sort delete.txt)

variable assignment is not working in rhel6 linux

file1
ABY37499|ANK37528|DEL37508|SRILANKA|195203230000|445500759
ARJU7499|CHA38008|DEL37508|SRILANKA|195203230000|445500759
IB1704174|ANK37528|DEL37508|SRILANKA|195203230000|445500759
IB1704174|CHA38008|DEL37508|SRILANKA|195203230000|445500759
ABY37500|ANK37529|DEL37509|BRAZIL|195203240000|445500757
ARJU7500|CHA38009|DEL37509|BRAZIL|195203240000|445500757
IB1704175|ANK37529|DEL37509|BRAZIL|195203240000|445500757
i want to convert the fifth column date to another format script below
#!/bin/sh
dt="%Y-%m-%d %H:%M"
awk -F '|' '{print $5}' file1 | sed 's/.\{8\}/& /g'> f1.txt
aa=`(date -f f1.txt +"$dt")`
echo "$aa"
awk -F '|' '$5=$aa' file1
echo "$aa" got desired output but i cannot assign $aa to $5 please help me.
Thanks
I corrected my answer after the commento of Etan Reisner
from AWK man:
The input is read in units called records, and processed by the rules
of your program one record at a time. By default, each record is one
line. Each record is automatically split into chunks called fields.
This makes it more convenient for programs to work on the parts of a
record.
Fields are stored in variables $1, $2, ...
And
The contents of a field, as seen by awk, can be changed within an awk
program; this changes what awk perceives as the current input record.
see the man page
thus, this expression:
awk -F '|' '$5=$aa' file1
does not have the effect of substitute the fifth column of file1.
You have to write the modified output to a second file.
May be this could help you in sed
echo 195203240000 | sed -n -e "s_\(....\)\(..\)\(..\)\(..\)\(..\)_\1-\2-\3 \4:\5_p"
1952-03-24 00:00
This awk script should do what you want.
It isn't exactly pretty but it works assuming the input format is consistent.
awk '{$5=sprintf("%s-%s-%s %s:%s\n",
substr($5,1,4), substr($5,5,2), substr($5,7,2),
substr($5,9,2), substr($5,11,2))} 7' file1 > file1.new
It assigns the new value for the field to $5 and then uses 7 (as a truth-y value) to get the default awk {print} action to print the modified line.

Resources