find and replace line strings contained in one file to a second file in shell script - linux

I'm trying to find a solution to the following problem:
I have two files: i.e. file1 and file2.
In file1 there some lines with some key words and I want to find these lines in file2 by using the key words. Once find the key words in file2 I would like to update this line with the content of the same line in file1. This operation should be done for every line contained in file1.
Just an example of what I have in mind, but I don't know exactly how to transform in shell script command.
file1:
key1=new_value1
key2=new_value2
key3=new_value3
etc....
file2:
key1=value1
key2=value2
key3=value3
key4=value4
key5=value5
key6=value6
etc....
Result:
key1=new_value1
key2=new_value2
key3=new_value3
key4=value4
key5=value5
key6=value6
etc....
I don't know how can I use 'sed' or something else in shell script to accomplish this task.
Any help is welcomed.
Thank you

awk would be my first choice
awk -F= -v OFS== '
NR==FNR {new[$1]=$2; next}
$1 in new {$2=new[$1]}
{print}
' file1 file2

Related

Rename file as third word on it (bash)

I have several autogenerated files (see the picture below for example) and I want to rename them according to 3rd word in the first line (in this case, that would be 42.txt).
First line:
ligand CC##HOc3ccccc3 42 P10000001
Is there a way to do it?
Say you have file.txt containing:
ligand CC##HOc3ccccc3 42 P10000001
and you want to rename file.txt to 42.txt based on the 3rd field in the file.
*Using awk
The easiest way is simply to use mv with awk in a command substitution, e.g.:
mv file.txt $(awk 'NR==1 {print $3; exit}' file.txt).txt
Where the command-substitution $(...) is just the awk expression awk 'NR==1 {print $3; exit}' that simply outputs the 3rd-field (e.g. 42). Specifying NR==1 ensures only the first line is considered and exit at the end of that rule ensures no more lines are processed wasting time if file.txt is a 100000 line file.
Confirmation
file.txt is now renamed 42.txt, e.g.
$ cat 42.txt
ligand CC##HOc3ccccc3 42 P10000001
Using read
You can also use read to simply read the first line and take the 3rd word as the name there and then mv the file, e.g.
$ read -r a a name a <file.txt; mv file.txt "$name".txt
The temporary variable a above is just used to read and discard the other words in the first line of the file.

Shell capture substring which is a line below searched substring

I am searching through a text file for a particular string and am looking for a number which is a line below this string. So an example below to make it clearer.
This is the content of the text file
2017-08-14 14:04:53,836 INFO - XML File FILE1 is created in /path/to/file
2017-08-14 14:10:04,696 INFO - #Instances Extracted: 32960
2017-08-14 14:17:52,248 INFO - XML File FILE2 is created in /path/to/file
2017-08-14 14:41:33,720 INFO - #Instances Extracted: 119534
In the text file I want to search for the string FILE1 and capture the number on the line below it 32960.
What is the best method for this? I was considering searching for FILE1 and then searching for the first instance after this of "Instances Extracted" and capture the number after that, is this the best solution?
Many thanks to any help you can provide
You can use awk without getline():
awk 'p==1 {p=0; print $NF } /FILE1/ {p=1}' inputfile
A quick and dirty solution:
cat file.txt | grep -A1 FILE1 | sed "s/.*Instances Extracted: \([0-9]*\).*/\1/;tx;d;:x"
The grep -A1 pulls the matching line and the next one into stdout, and then the sed will pull the number out (and delete any lines that don't match due to the ;tx;d;:x at the end)

How to split a file into two files based on a pattern?

In a file in Linux I have the following
123_test
234_test
abc_rest
cde_rest
and so on
Now I want to get two files in Linux. One which contains only records like below
123_test
234_test
and 2nd file like below
abc_rest
cde_rest
I want to split files based on what comes after _ like _test or _rest
Edited:
123_test
234_test
abc_rest
cde_rest
456_test
fgh_rest
How can I achieve that in Linux?
Can we use split function for this?
You can use this single awk command for splitting:
awk '{ print > (/_test$/ ? "file1" : "file2") }' file
This awk command will copy all lines that start with digits to file1 and remaining lines to file2.

Linux : remove duplicate line

I have a file txt. I would like to remove all duplicate line.
I tried these, but did not work
sort -ur file.txt
or
uniq -D -f 2 file.txt
file.txt
34.78.54.21 websrv1 nameweb
34.78.54.21 nameweb
I just need one line
From your input I assume you are referring to the first field (34.78.54.21) as a duplicate. If you just want to keep the first occurrence of each number then this works for you:
awk '!a[$1]++' file.txt
Output:
34.78.54.21 websrv1 nameweb
This command looks if $1 is not as a key in the array. If it is not then it will be added to the array and the default print will happen. For the next line $1 is in the array and the whole thing will evaluate to false and not print.

Generate record of files which have been removed by grep as a secondary function of primary command

I asked a question here to remove unwanted lines which contained strings which matched a particular pattern:
Remove lines containg string followed by x number of numbers
anubhava provided a good line of code which met my needs perfectly. This code removes any line which contains the string vol followed by a space and three or more consecutive numbers:
grep -Ev '\bvol([[:blank:]]+[[:digit:]]+){2}' file > newfile
The command will be used on a fairly large csv file and be initiated by crontab. For this reason, I would like to keep a record of the lines this command is removing, just so I can go back to check the correct data that is being removed- I guess it will be some sort of log containing the name sof the lines that did not make the final cut. How can I add this functionality?
Drop grep and use awk instead:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print >> "deleted"; next} 1' file
The above uses GNU awk for word delimiters (\<) and will append every deleted line to a file named "deleted". Consider adding a timestamp too:
awk '/\<vol([[:blank:]]+[[:digit:]]+){2}/{print systime(), $0 >> "deleted"; next} 1' file

Resources