Ignore asterix(*) while doing diff in Linux - linux

I am trying to use the standard diff command in Linux inorder to find differences in 2 files.The contents of the file are as follows:
File1
Jim
Jack
Tracy*
Michelle
File2
Jim
Jack
Tracy
Michael
diff File1 File2 gives me the following :
< Tracy*
< Michelle
---
> Tracy
> Michael
However,I want diff to ignore the asterix(*) and give the following output :
< Michelle
---
> Michael
Is it possible to do that ?

Try
diff -I '*$' FILE1 FILE2
-I RE --ignore-matching-lines=RE
Ignore changes whose lines all match RE
Note: this only works with line ending asterisks.

Using ShinTakezou's approach, but this time using sed:
diff <(sed 's/\*$//' file1) <(sed 's/\*$//' file2)

If you use a diff that has not the -I option you can grep away lines containing stars into temp files and then diff the temp files. If you are using bash you can use "two pipes", but if you have it likely you have the diff with the -I option too. Anyway it would be
sed 's/*$//' file1 >file1.temp
sed 's/*$//' file2 >file2.temp
diff file1.temp file2.temp
or
diff <(sed 's/*$//' file1) <(sed 's/*$//' file2)
(not tested, but it could work in other shells too)
note
the "star" is removed and from the diff point of view it has never existed.

Related

Append a file in the middle of another file in bash

I need to append a file in a specific location of another file.
I got the line number so, my file is:
file1.txt:
I
am
Cookie
While the second one is
file2.txt:
a
black
dog
named
So, after the solution, file1.txt should be like
I
am
a
black
dog
named
Cookie
The solution should be compatible with the presence of characters like " and / in both files.
Any tool is ok as long as it's native (I mean, no new software installation).
Another option apart from what RavinderSingh13 suggested using sed:
To add the text of file2.txt into file1.txt after a specific line:
sed -i '2 r file2.txt' file1.txt
Output:
I
am
a
black
dog
named
Cookie
Further to add the file after a matched pattern:
sed -i '/^YourPattern/ r file2.txt' file1.txt
Could you please try following and let me know if this helps you.
awk 'FNR==3{system("cat file2.txt")} 1' file1.txt
Output will be as follows.
I
am
a
black
dog
named
Cookie
Explanation: Checking here if line number is 3 while reading Input_file named file1.txt, if yes then using system utility of awk which will help us to call shell's commands, then I am printing the file2.txt with use of cat command. Then mentioning 1 will be printing all the lines from file1.txt. Thus we could concatenate lines from file2.txt into file1.txt.
How about
head -2 file1 && cat file2 && tail -1 file1
You can count the number of lines to decide head and tail parameters in file1 using
wc -l file1

Using Sed to extract the headers in multiple files

I used head -3 to extract headers from some files that I needed to show header data I did this:
head -3 file1 file2 file3
and head -3 * works also.
I thought sed 3 file1 file2 file3 would work but it only gives the first file's output and not the others. I then tried sed -n '1,2p' file1 file2 file3. Again only the first file produced any output. I also tried with a wildcard sed -n '1,2p' filename* same result only the first file's output.
Everything I read seems like it should work. sed *filesnames*.
Thanks in advance
Assuming GNU sed as question is tagged linux. From GNU sed manual
-s
--separate By default, sed will consider the files specified on the command line as a single continuous long stream. This GNU sed
extension allows the user to consider them as separate files: range
addresses (such as ‘/abc/,/def/’) are not allowed to span several
files, line numbers are relative to the start of each file, $ refers
to the last line of each file, and files invoked from the R commands
are rewound at the start of each file.
Example:
$ cat file1
foo
bar
$ cat file2
123
456
$ sed -n '1p' file1 file2
foo
$ sed -n '3p' file1 file2
123
$ sed -sn '1p' file1 file2
foo
123
When using -i, the -s option is implied
$ sed -i '1chello' file1 file2
$ cat file1
hello
bar
$ cat file2
hello
456

shell script to compare two files and write the difference to third file

I want to compare two files and redirect the difference between the two files to third one.
file1:
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
In case any file has # before /opt/c/c.sql, it should skip #
file2:
/opt/c/c.sql
/opt/a/a.sql
I want to get the difference between the two files. In this case, /opt/b/b.sql should be stored in a different file. Can anyone help me to achieve the above scenarios?
file1
$ cat file1 #both file1 and file2 may contain spaces which are ignored
/opt/a/a.sql
/opt/b/b.sql
/opt/c/c.sql
/opt/h/m.sql
file2
$ cat file2
/opt/c/c.sql
/opt/a/a.sql
Do
awk 'NR==FNR{line[$1];next}
{if(!($1 in line)){if($0!=""){print}}}
' file2 file1 > file3
file3
$ cat file3
/opt/b/b.sql
/opt/h/m.sql
Notes:
The order of files passed to awk is important here, pass the file to check - file2 here - first followed by the master file -file1.
Check awk documentation to understand what is done here.
You can use some tools like cat, sed, sort and uniq.
The main observation is this: if the line is in both files then it is not unique in cat file1 file2.
Furthermore in cat file1 file2| sort, all doubles are in sequence. Using uniq -u we get unique lines and have this pipe:
cat file1 file2 | sort | uniq -u
Using sed to remove leading whitespace, empty and comment lines, we get this final pipe:
cat file1 file2 | sed -r 's/^[ \t]+//; /^#/ d; /^$/ d;' | sort | uniq -u > file3

Awk to print each file

I have about 50 files in a directory
Have
File1: 1|2|3
File2: 3|4|5
File3: A|B|C
WANT
File1: A|1|2|3
File2: A|3|4|5
File3: A|A|B|C
I'll appreciate if anyone can solve it with awk command. I'm open to other solutions in linux. Also, I want to run it once an perform edits on all files in a directory.
The solution (see below) I have will require me to run it on each file one at a time and I don't think that's efficient
awk '{print "A|"$0}' File1
Try the below sed command,
sed -i 's/^/A|/' file1 file2 file3
To make it work on all the files in the current directory,
sed -i 's/^/A|/' *
With GNU awk for -i inplace:
gawk -i inplace '{print "A|"$0}' file1 file2 file3

Comparing two files in linux terminal

There are two files called "a.txt" and "b.txt" both have a list of words. Now I want to check which words are extra in "a.txt" and are not in "b.txt".
I need a efficient algorithm as I need to compare two dictionaries.
if you have vim installed,try this:
vimdiff file1 file2
or
vim -d file1 file2
you will find it fantastic.
Sort them and use comm:
comm -23 <(sort a.txt) <(sort b.txt)
comm compares (sorted) input files and by default outputs three columns: lines that are unique to a, lines that are unique to b, and lines that are present in both. By specifying -1, -2 and/or -3 you can suppress the corresponding output. Therefore comm -23 a b lists only the entries that are unique to a. I use the <(...) syntax to sort the files on the fly, if they are already sorted you don't need this.
If you prefer the diff output style from git diff, you can use it with the --no-index flag to compare files not in a git repository:
git diff --no-index a.txt b.txt
Using a couple of files with around 200k file name strings in each, I benchmarked (with the built-in timecommand) this approach vs some of the other answers here:
git diff --no-index a.txt b.txt
# ~1.2s
comm -23 <(sort a.txt) <(sort b.txt)
# ~0.2s
diff a.txt b.txt
# ~2.6s
sdiff a.txt b.txt
# ~2.7s
vimdiff a.txt b.txt
# ~3.2s
comm seems to be the fastest by far, while git diff --no-index appears to be the fastest approach for diff-style output.
Update 2018-03-25 You can actually omit the --no-index flag unless you are inside a git repository and want to compare untracked files within that repository. From the man pages:
This form is to compare the given two paths on the filesystem. You can omit the --no-index option when running the command in a working tree controlled by Git and at least one of the paths points outside the working tree, or when running the command outside a working tree controlled by Git.
Try sdiff (man sdiff)
sdiff -s file1 file2
You can use diff tool in linux to compare two files. You can use --changed-group-format and --unchanged-group-format options to filter required data.
Following three options can use to select the relevant group for each option:
'%<' get lines from FILE1
'%>' get lines from FILE2
'' (empty string) for removing lines from both files.
E.g: diff --changed-group-format="%<" --unchanged-group-format="" file1.txt file2.txt
[root#vmoracle11 tmp]# cat file1.txt
test one
test two
test three
test four
test eight
[root#vmoracle11 tmp]# cat file2.txt
test one
test three
test nine
[root#vmoracle11 tmp]# diff --changed-group-format='%<' --unchanged-group-format='' file1.txt file2.txt
test two
test four
test eight
You can also use: colordiff: Displays the output of diff with colors.
About vimdiff: It allows you to compare files via SSH, for example :
vimdiff /var/log/secure scp://192.168.1.25/var/log/secure
Extracted from: http://www.sysadmit.com/2016/05/linux-diferencias-entre-dos-archivos.html
Also, do not forget about mcdiff - Internal diff viewer of GNU Midnight Commander.
For example:
mcdiff file1 file2
Enjoy!
Use comm -13 (requires sorted files):
$ cat file1
one
two
three
$ cat file2
one
two
three
four
$ comm -13 <(sort file1) <(sort file2)
four
You can also use:
sdiff file1 file2
To display differences side by side within your terminal!
diff a.txt b.txt | grep '<'
can then pipe to cut for a clean output
diff a.txt b.txt | grep '<' | cut -c 3
Here is my solution for this :
mkdir temp
mkdir results
cp /usr/share/dict/american-english ~/temp/american-english-dictionary
cp /usr/share/dict/british-english ~/temp/british-english-dictionary
cat ~/temp/american-english-dictionary | wc -l > ~/results/count-american-english-dictionary
cat ~/temp/british-english-dictionary | wc -l > ~/results/count-british-english-dictionary
grep -Fxf ~/temp/american-english-dictionary ~/temp/british-english-dictionary > ~/results/common-english
grep -Fxvf ~/results/common-english ~/temp/american-english-dictionary > ~/results/unique-american-english
grep -Fxvf ~/results/common-english ~/temp/british-english-dictionary > ~/results/unique-british-english
Using awk for it. Test files:
$ cat a.txt
one
two
three
four
four
$ cat b.txt
three
two
one
The awk:
$ awk '
NR==FNR { # process b.txt or the first file
seen[$0] # hash words to hash seen
next # next word in b.txt
} # process a.txt or all files after the first
!($0 in seen)' b.txt a.txt # if word is not hashed to seen, output it
Duplicates are outputed:
four
four
To avoid duplicates, add each newly met word in a.txt to seen hash:
$ awk '
NR==FNR {
seen[$0]
next
}
!($0 in seen) { # if word is not hashed to seen
seen[$0] # hash unseen a.txt words to seen to avoid duplicates
print # and output it
}' b.txt a.txt
Output:
four
If the word lists are comma-separated, like:
$ cat a.txt
four,four,three,three,two,one
five,six
$ cat b.txt
one,two,three
you have to do a couple of extra laps (forloops):
awk -F, ' # comma-separated input
NR==FNR {
for(i=1;i<=NF;i++) # loop all comma-separated fields
seen[$i]
next
}
{
for(i=1;i<=NF;i++)
if(!($i in seen)) {
seen[$i] # this time we buffer output (below):
buffer=buffer (buffer==""?"":",") $i
}
if(buffer!="") { # output unempty buffers after each record in a.txt
print buffer
buffer=""
}
}' b.txt a.txt
Output this time:
four
five,six

Resources