Replace first few lines with first few lines from other file - linux

I am working on Linux. I have 2 files - file1.dat and file2.dat.
cat file1.dat
1
2
3
4
5
6
7
8
9
10
and for file2:
cat file2.dat
1a
2a
3a
4a
5a
6a
7a
8a
9a
10a
I want to replace first 4 lines from file1.dat with first 3 lines from file2.dat. So my output would be following
cat file1.dat
1a
2a
3a
5
6
7
8
9
10
I tried following input:
sed -i.bak '1,4d;3r file2.dat' file1.dat
But with this input I have following output:
5
6
7
8
9
10
How should I modify input command? I tried various combinations.

Following awk may also help you in same, tested codes in GNU awk.
Solution 1st:
awk 'FNR==NR && FNR<4{print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
Solution 2nd:
awk 'FNR==NR && FNR==4{nextfile} FNR==NR{print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
OR
awk 'FNR==NR{if(FNR==4){nextfile};print;next} FNR>4 && FNR!=NR' file2.dat file1.dat
Solution 3rd: Using awk and head and tail command's combinations here.
awk 'FNR==1{system("head -n3 file2.dat");next} 1' <(tail -n +4 file1.dat)

Assuming GNU sed
$ sed '3q' f2 | sed -e '3r /dev/stdin' -e '1,4d' f1
1a
2a
3a
5
6
7
8
9
10
sed '3q' f2 gives the first three lines from second file
-e '3r /dev/stdin' use stdin data
-e '1,4d' delete required lines
order is important - first r then d
For small number of lines, you can also use
sed -e '3R f2' -e '3R f2' -e '3R f2' -e '1,4d' f1
R command reads one line at a time
With GNU coreutils, this would probably be better for all/most scenarios
head -n3 f2; tail -n +5 f1

awk is your friend
Script
# awk 'NR==FNR && FNR<=3 || NR>FNR && FNR>4' file2 file1
Output
1a
2a
3a
5
6
7
8
9
10
Tips
NR - Total number of records processed
FNR - Total number of records processed but resets when reading a new file.
When a condition evaluates to true and no extra commands are given,awk just prints.
All good :-)

Related

Changing contents of files through shell script

I have a requirement where I need to change the contents of a file say file.xyt. The file contains values like:
21 100 34 82
122 50 75 12
88 10 15 45
I need to see if the fourth argument in every line (which for this example are 82, 12, and 45) is less than 23.
And if so, i need to delete that specific line.
For this example, the result will be:
21 100 34 82
88 10 15 45
How can i achieve this using shell script? Thanks in advance.
You can use awk:
awk '$4 >= 23 {print}' file
that can be shortened to(thanks #RomanPerekhrest):
awk '$4 >= 23' file
If you want to write the file in place, you can use a temporary file:
awk '$4 >= 23' file > tmp && mv tmp file
In case you have gawk 4.1.0 or later, you can use the -i flag to edit the file in place:
gawk -i '$4 >= 23' file
Or using a Bash loop:
while read -r a b c d; do
[[ $d -ge 23 ]] && echo $a $b $c $d
done < file
In case if each line of the file contains 4 separate numbers, to modify the initial file in place you may use the following sed approach:
sed -Ei '/\<(1{,1}[0-9]|2[0-2])$/d' file.xyt
file.xyt contents:
21 100 34 82
88 10 15 45

Insert a space after the second character followed by every three characters

I need to insert a space after two characters, followed by a space after every three characters.
Data:
97100101101102101
Expected Output:
97 100 101 101 102 101
Attempted Code:
sed 's/.\{2\}/& /3g'
In two steps:
$ sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g' <<< 97100101101102101
97 100 101 101 102 101
That is:
's/^.{2}/& /'
catch the first two chars in the line and print them back with a space after.
's/[^ ]{3}/& /g'
catch three consecutive non-space characters and print them back followed by a space.
With GNU awk:
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 100 101 101 102 101
Note that unlike the currently accepted sed solution this will not add a blank char to the end of the line, e.g. using _ instead of a blank to make the issue visible:
$ echo '97100101101102101' | sed -r -e 's/^.{2}/&_/' -e 's/[^_]{3}/&_/g'
97_100_101_101_102_101_
$ echo '97100101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/,"_&","g",substr($0,3))}'
97_100_101_101_102_101
and it would work even if the input contained blank chars:
$ echo '971 0101101102101' | sed -r -e 's/^.{2}/& /' -e 's/[^ ]{3}/& /g'
97 1 010 110 110 210 1
$ echo '971 0101101102101' | awk '{print substr($0,1,2) gensub(/.{3}/," &","g",substr($0,3))}'
97 1 0 101 101 102 101

Compare two files and write the unmatched numbers in a new file

I have two files where ifile1.txt is a subset of ifile2.txt.
ifile1.txt ifile2.txt
2 2
23 23
43 33
51 43
76 50
81 51
100 72
76
81
89
100
Desire output
ofile.txt
33
50
72
89
I was trying with
diff ifile1.txt ifile2.txt > ofile.txt
but it is giving different format of output.
Since your files are sorted, you can use the comm command for this:
comm -1 -3 ifile1.txt ifile2.txt > ofile.txt
-1 means omit the lines unique to the first file, and -3 means omit the lines that are in both files, so this shows just the lines that are unique to the second file.
This will do your job:
diff file1 file2 |awk '{print $2}'
You could try:
diff file1 file2 | awk '{print $2}' | grep -v '^$' > output.file

keep groups of lines with specific keywords (bash)

I have a text file with plenty of lines in this format (the lines between every two # defined as a group):
# some str for test
hdfv 12 9 b
cgj 5 11 t
# another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
---
key.txt:
string to
---
output:
# another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
I should search some keywords(string,to) from lines which starts with # and if the keywords does not exist in key.txt (a file with two columns) then I should remove that line and the following lines(of that group).I've written this code without result!(key words are together in input file as the example )
cat input.txt | while IFS=$'#' read -r -a myarray
do
a=${myarray[1]}
b=${myarray[0]}
unset IFS
read -r a x y z <<< "$a"
key=$(echo "$x $y")
if grep "$key" key.txt > /dev/null
then
echo $key exists
else
grep -v -e "$a" -e "$b" input.txt > $$ && mv $$ input.txt
fi
done
can some one help me?
A simple way to get correct block is using awk and correct Record Selector:
awk 'FNR==NR {a[$0];next} { RS="#";for (i in a) if ($0~i) print}' key.txt input.txt
another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
This should reinsert the # that is used and remove the extra empty line. I may be simpler ways to do this, but this works.
awk 'FNR==NR {a[$0];next} { RS="#";for (i in a) if ($0~i) {sub(/^ /,RS);sub(/\n$/,x);print}}' key.txt input.txt
#another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j

AWK--Comparing the value of two variables in two different files

I have two text files A.txt and B.txt. Each line of A.txt
A.txt
100
222
398
B.txt
1 2 103 2
4 5 1026 74
7 8 209 55
10 11 122 78
What I am looking for is something like this:
for each line of A
search B;
if (the value of third column in a line of B - the value of the variable in A > 10)
print that line of B;
Any awk for doing that??
How about something like this,
I had some troubles understanding your question, but maybe this will give you some pointers,
#!/bin/bash
# Read intresting values from file2 into an array,
for line in $(cat 2.txt | awk '{print $3}')
do
arr+=($line)
done
# Linecounter,
linenr=0
# Loop through every line in file 1,
for val in $(cat 1.txt)
do
# Increment linecounter,
((linenr++))
# Loop through every element in the array (containing values from 3 colum from file2)
for el in "${!arr[#]}";
do
# If that value - the value from file 1 is bigger than 10, print values
if [[ $((${arr[$el]} - $val )) -gt 10 ]]
then
sed -n "$(($el+1))p" 2.txt
# echo "Value ${arr[$el]} (on line $(($el+1)) from 2.txt) - $val (on line $linenr from 1.txt) equals $((${arr[$el]} - $val )) and is hence bigger than 10"
fi
done
done
Note,
This is a quick and dirty thing, there is room for improvements. But I think it'll do the job.
Use awk like this:
cat f1
1
4
9
16
cat f2
2 4 10 8
3 9 20 8
5 1 15 8
7 0 30 8
awk 'FNR==NR{a[NR]=$1;next} $3-a[FNR] < 10' f1 f2
2 4 10 8
5 1 15 8
UPDATE: Based on OP's edited question:
awk 'FNR==NR{a[NR]=$1;next} {for (i in a) if ($3-a[i] > 10) print}'
and see how simple awk based solution is as compared to nested for loops.

Resources