Parsing result of Diff in Shell Script

Parsing result of Diff in Shell Script - linux

I want to compare two files and see if they are the same or not in my shell script, my way is:
diff_output=`diff ${dest_file} ${source_file}`
if [ some_other_condition -o ${diff_output} -o some_other_condition2 ]
then
....
fi
Basically, if they are the same ${diff_output} should contain nothing and the above test would evaluate to true.
But when I run my script, it says
[: too many arguments
On the if [....] line.
Any ideas?

Do you care about what the actual differences are, or just whether the files are different? If it's the latter you don't need to parse the output; you can check the exit code instead.
if diff -q "$source_file" "$dest_file" > /dev/null; then
: # files are the same
else
: # files are different
fi
Or use cmp which is more efficient:
if cmp -s "$source_file" "$dest_file"; then
: # files are the same
else
: # files are different
fi

There's an option provided precisely for doing this: -q (or --quiet). It tells diff to just let the exit code indicate whether the files were identical. That way you can do this:
if diff -q "$dest_file" "$source_file"; then
# files are identical
else
# files differ
fi
or if you want to swap the two clauses:
if ! diff -q "$dest_file" "$source_file"; then
# files differ
else
# files are identical
fi
If you really wanted to do it your way (i.e. you need the output) you should do this:
if [ -n "$diff_output" -o ... ]; then
...
fi
-n means "test if the following string is non-empty. You also have to surround it with quotes, so that if it's empty, the test still has a string there - you're getting your error because your test evaluates to some_other_condition -o -o some_other_condition2, which as you can see isn't going to work.

diff is for comparing files line by line for processing the differences tool like patch . If you just want to check if they are different, you should use cmp:
cmp --quiet $source_file $dest_file || echo Different

diff $FILE $FILE2
if [ $? = 0 ]; then
echo “TWO FILES ARE SAME”
else
echo “TWO FILES ARE SOMEWHAT DIFFERENT”
fi

Check files for difference in bash
source_file=abc.txt
dest_file=xyz.txt
if [[ -z $(sdiff -s $dest_file $source_file) ]]; then
echo "Files are same"
else
echo "Files are different"
fi
Code tested on RHEL/CentOS Linux (6.X and 7.X)

Related

How can I test if a file could be marked as executable and run? [duplicate]

I am wondering what's the easiest way to check if a program is executable with bash, without executing it ? It should at least check whether the file has execute rights, and is of the same architecture (for example, not a windows executable or another unsupported architecture, not 64 bits if the system is 32 bits, ...) as the current system.

Take a look at the various test operators (this is for the test command itself, but the built-in BASH and TCSH tests are more or less the same).
You'll notice that -x FILE says FILE exists and execute (or search) permission is granted.
BASH, Bourne, Ksh, Zsh Script
if [[ -x "$file" ]]
then
echo "File '$file' is executable"
else
echo "File '$file' is not executable or found"
fi
TCSH or CSH Script:
if ( -x "$file" ) then
echo "File '$file' is executable"
else
echo "File '$file' is not executable or found"
endif
To determine the type of file it is, try the file command. You can parse the output to see exactly what type of file it is. Word 'o Warning: Sometimes file will return more than one line. Here's what happens on my Mac:
$ file /bin/ls
/bin/ls: Mach-O universal binary with 2 architectures
/bin/ls (for architecture x86_64): Mach-O 64-bit executable x86_64
/bin/ls (for architecture i386): Mach-O executable i386
The file command returns different output depending upon the OS. However, the word executable will be in executable programs, and usually the architecture will appear too.
Compare the above to what I get on my Linux box:
$ file /bin/ls
/bin/ls: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), stripped
And a Solaris box:
$ file /bin/ls
/bin/ls: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped
In all three, you'll see the word executable and the architecture (x86-64, i386, or SPARC with 32-bit).
Addendum
Thank you very much, that seems the way to go. Before I mark this as my answer, can you please guide me as to what kind of script shell check I would have to perform (ie, what kind of parsing) on 'file' in order to check whether I can execute a program ? If such a test is too difficult to make on a general basis, I would at least like to check whether it's a linux executable or osX (Mach-O)
Off the top of my head, you could do something like this in BASH:
if [ -x "$file" ] && file "$file" | grep -q "Mach-O"
then
echo "This is an executable Mac file"
elif [ -x "$file" ] && file "$file" | grep -q "GNU/Linux"
then
echo "This is an executable Linux File"
elif [ -x "$file" ] && file "$file" | grep q "shell script"
then
echo "This is an executable Shell Script"
elif [ -x "$file" ]
then
echo "This file is merely marked executable, but what type is a mystery"
else
echo "This file isn't even marked as being executable"
fi
Basically, I'm running the test, then if that is successful, I do a grep on the output of the file command. The grep -q means don't print any output, but use the exit code of grep to see if I found the string. If your system doesn't take grep -q, you can try grep "regex" > /dev/null 2>&1.
Again, the output of the file command may vary from system to system, so you'll have to verify that these will work on your system. Also, I'm checking the executable bit. If a file is a binary executable, but the executable bit isn't on, I'll say it's not executable. This may not be what you want.

Seems nobody noticed that -x operator does not differ file with directory.
So to precisely check an executable file, you may use
[[ -f SomeFile && -x SomeFile ]]

Testing files, directories and symlinks
The solutions given here fail on either directories or symlinks (or both). On Linux, you can test files, directories and symlinks with:
if [[ -f "$file" && -x $(realpath "$file") ]]; then .... fi
On OS X, you should be able to install coreutils with homebrew and use grealpath.
Defining an isexec function
You can define a function for convenience:
isexec() {
if [[ -f "$1" && -x $(realpath "$1") ]]; then
true;
else
false;
fi;
}
Or simply
isexec() { [[ -f "$1" && -x $(realpath "$1") ]]; }
Then you can test using:
if `isexec "$file"`; then ... fi

Also seems nobody noticed -x operator on symlinks. A symlink (chain) to a regular file (not classified as executable) fails the test.

First you need to remember that in Unix and Linux, everything is a file, even directories. For a file to have the rights to be executed as a command, it needs to satisfy 3 conditions:
It needs to be a regular file
It needs to have read-permissions
It needs to have execute-permissions
So this can be done simply with:
[ -f "${file}" ] && [ -r "${file}" ] && [ -x "${file}" ]
If your file is a symbolic link to a regular file, the test command will operate on the target and not the link-name. So the above command distinguishes if a file can be used as a command or not. So there is no need to pass the file first to realpath or readlink or any of those variants.
If the file can be executed on the current OS, that is a different question. Some answers above already pointed to some possibilities for that, so there is no need to repeat it here.

To test whether a file itself has ACL_EXECUTE bit set in any of permission sets (user, group, others) regardless of where it resides, i. e. even on a tmpfs with noexec option, use stat -c '%A' to get the permission string and then check if it contains at least a single “x” letter:
if [[ "$(stat -c '%A' 'my_exec_file')" == *'x'* ]] ; then
echo 'Has executable permission for someone'
fi
The right-hand part of comparison may be modified to fit more specific cases, such as *x*x*x* to check whether all kinds of users should be able to execute the file when it is placed on a volume mounted with exec option.

This might be not so obvious, but sometime is required to test the executable to appropriately call it without an external shell process:
function tkl_is_file_os_exec()
{
[[ ! -x "$1" ]] && return 255
local exec_header_bytes
case "$OSTYPE" in
cygwin* | msys* | mingw*)
# CAUTION:
# The bash version 3.2+ might require a file path together with the extension,
# otherwise will throw the error: `bash: ...: No such file or directory`.
# So we make a guess to avoid the error.
#
{
read -r -n 4 exec_header_bytes 2> /dev/null < "$1" ||
{
[[ -x "${1%.exe}.exe" ]] && read -r -n 4 exec_header_bytes 2> /dev/null < "${1%.exe}.exe"
} ||
{
[[ -x "${1%.com}.com" ]] && read -r -n 4 exec_header_bytes 2> /dev/null < "${1%.com}.com"
}
} &&
if [[ "${exec_header_bytes:0:3}" == $'MZ\x90' ]]; then
# $'MZ\x90\00' for bash version 3.2.42+
# $'MZ\x90\03' for bash version 4.0+
[[ "${exec_header_bytes:3:1}" == $'\x00' || "${exec_header_bytes:3:1}" == $'\x03' ]] && return 0
fi
;;
*)
read -r -n 4 exec_header_bytes < "$1"
[[ "$exec_header_bytes" == $'\x7fELF' ]] && return 0
;;
esac
return 1
}
# executes script in the shell process in case of a shell script, otherwise executes as usual
function tkl_exec_inproc()
{
if tkl_is_file_os_exec "$1"; then
"$#"
else
. "$#"
fi
return $?
}
myscript.sh:
#!/bin/bash
echo 123
return 123
In Cygwin:
> tkl_exec_inproc /cygdrive/c/Windows/system32/cmd.exe /c 'echo 123'
123
> tkl_exec_inproc /cygdrive/c/Windows/system32/chcp.com 65001
Active code page: 65001
> tkl_exec_inproc ./myscript.sh
123
> echo $?
123
In Linux:
> tkl_exec_inproc /bin/bash -c 'echo 123'
123
> tkl_exec_inproc ./myscript.sh
123
> echo $?
123

How to extract only file name return from diff command?

I am trying to prepare a bash script for sync 2 directories. But I am not able to file name return from diff. everytime it converts to array.
Here is my code :
#!/bin/bash
DIRS1=`diff -r /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ `
for DIR in $DIRS1
do
echo $DIR
done
And if I run this script I get out put something like this :
Only
in
/opt/lampp/htdocs/scripts/www/:
file1
diff
-r
"/opt/lampp/htdocs/scripts/dev/File
1.txt"
"/opt/lampp/htdocs/scripts/www/File
1.txt"
0a1
>
sa
das
Only
in
/opt/lampp/htdocs/scripts/www/:
File
1.txt~
Only
in
/opt/lampp/htdocs/scripts/www/:
file
2
-
second
Actually I just want to file name where I find the diffrence so I can take perticular action either copy/delete.
Thanks

I don't think diff produces output which can be parsed easily for your purposes. It's possible to solve your problem by iterating over the files in the two directories and running diff on them, using the return value from diff instead (and throwing the diff output away).
The code to do this is a bit long, but here it is:
DIR1=./one # set as required
DIR2=./two # set as required
# Process any files in $DIR1 only, or in both $DIR1 and $DIR2
find $DIR1 -type f -print0 | while read -d $'\0' -r file1; do
relative_path=${file1#${DIR1}/};
file2="$DIR2/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR1' only"
# Do more stuff here
elif diff -q "$file1" "$file2" >/dev/null; then
echo "'$relative_path' same in '$DIR1' and '$DIR2'"
# Do more stuff here
else
echo "'$relative_path' different between '$DIR1' and '$DIR2'"
# Do more stuff here
fi
done
# Process files in $DIR2 only
find $DIR2 -type f -print0 | while read -d $'\0' -r file2; do
relative_path=${file2#${DIR2}/};
file1="$DIR1/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR2 only'"
# Do more stuff here
fi
done
This code leverages some tricks to safely handle files which contain spaces, which would be very difficult to get working by parsing diff output. You can find more details on that topic here.
Of course this doesn't do anything regarding files which have the same contents but different names or are located in different directories.
I tested by populating two test directories as follows:
echo "dir one only" > "$DIR1/dir one only.txt"
echo "dir two only" > "$DIR2/dir two only.txt"
echo "in both, same" > $DIR1/"in both, same.txt"
echo "in both, same" > $DIR2/"in both, same.txt"
echo "in both, and different" > $DIR1/"in both, different.txt"
echo "in both, but different" > $DIR2/"in both, different.txt"
My output was:
'dir one only.txt' in './one' only
'in both, different.txt' different between './one' and './two'
'in both, same.txt' same in './one' and './two'

Use -q flag and avoid the for loop:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/
If you only want the files that differs:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ |grep -Po '(?<=Files )\w+'|while read file; do
echo $file
done
-q --brief
Output only whether files differ.
But defitnitely you should check rsync: http://linux.die.net/man/1/rsync

What is the error in this shell script

I never used shell script, but now I have to , here is what I'm trying to do :
#!/bin/bash
echo running the program
./first
var = ($(ls FODLDER |wc -l)) #check how many files the folder contains
echo $var
if( ["$var" -gt "2"] #check if there are more the 2file
then ./second
fi
the scriopt crashes at the if statement. how may I solve this

Many:
var = ($(ls FODLDER |wc -l))
This is wrong, you cannot have those spaces around =.
if( ["$var" -gt "2"]
Your ( is not doing anything there, so it has to be deleted. Also, you need spaces around [ and ].
All together, this would make more sense:
#!/bin/bash
echo "running the program"
./first
var=$(find FOLDER -maxdepth 1 -type f|wc -l) # better find than ls
echo "$var"
if [ "$var" -gt "2" ]; then
./second
fi
Note:
quote whenever you echo, specially when handling variables.
see another way to look for files in a given path. Parsing ls is kind of evil.
indent your code for better readibility.

Edit your script.bash file as follow:
#!/bin/env bash
dir="$1"
echo "running the program"
./first
dir_list=( $dir/* ) # list files in directory
echo ${#dir_list[#]} # count files in array
if (( ${#dir_list[#]} > 2 )); then # test how many files
./second
fi
Usage
script.bash /tmp/
Explaination
You need to learn bash to avoid dangerous actions!
pass the directory to work with as first argument in the command line (/tmp/ → `$1)
use glob to create an array (dir_list) containing all file in given directory
count items in array (${#dir_list[#]})
test the number of item using arithmetic context.

Fastest way to tell if two files have the same contents in Unix/Linux?

I have a shell script in which I need to check whether two files contain the same data or not. I do this a for a lot of files, and in my script the diff command seems to be the performance bottleneck.
Here's the line:
diff -q $dst $new > /dev/null
if ($status) then ...
Could there be a faster way to compare the files, maybe a custom algorithm instead of the default diff?

I believe cmp will stop at the first byte difference:
cmp --silent $old $new || echo "files are different"

I like #Alex Howansky have used 'cmp --silent' for this. But I need both positive and negative response so I use:
cmp --silent file1 file2 && echo '### SUCCESS: Files Are Identical! ###' || echo '### WARNING: Files Are Different! ###'
I can then run this in the terminal or with a ssh to check files against a constant file.

To quickly and safely compare any two files:
if cmp --silent -- "$FILE1" "$FILE2"; then
echo "files contents are identical"
else
echo "files differ"
fi
It's readable, efficient, and works for any file names including "` $()

Because I suck and don't have enough reputation points I can't add this tidbit in as a comment.
But, if you are going to use the cmp command (and don't need/want to be verbose) you can just grab the exit status. Per the cmp man page:
If a FILE is '-' or missing, read standard input. Exit status is 0
if inputs are the same, 1 if different, 2 if trouble.
So, you could do something like:
STATUS="$(cmp --silent $FILE1 $FILE2; echo $?)" # "$?" gives exit status for each comparison
if [[ $STATUS -ne 0 ]]; then # if status isn't equal to 0, then execute code
DO A COMMAND ON $FILE1
else
DO SOMETHING ELSE
fi
EDIT: Thanks for the comments everyone! I updated the test syntax here. However, I would suggest you use Vasili's answer if you are looking for something similar to this answer in readability, style, and syntax.

You can compare by checksum algorithm like sha256
sha256sum oldFile > oldFile.sha256
echo "$(cat oldFile.sha256) newFile" | sha256sum --check
newFile: OK
if the files are distinct the result will be
newFile: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

For files that are not different, any method will require having read both files entirely, even if the read was in the past.
There is no alternative. So creating hashes or checksums at some point in time requires reading the whole file. Big files take time.
File metadata retrieval is much faster than reading a large file.
So, is there any file metadata you can use to establish that the files are different?
File size ? or even results of the file command which does just read a small portion of the file?
File size example code fragment:
ls -l $1 $2 |
awk 'NR==1{a=$5} NR==2{b=$5}
END{val=(a==b)?0 :1; exit( val) }'
[ $? -eq 0 ] && echo 'same' || echo 'different'
If the files are the same size then you are stuck with full file reads.

Try also to use the cksum command:
chk1=`cksum <file1> | awk -F" " '{print $1}'`
chk2=`cksum <file2> | awk -F" " '{print $1}'`
if [ $chk1 -eq $chk2 ]
then
echo "File is identical"
else
echo "File is not identical"
fi
The cksum command will output the byte count of a file. See 'man cksum'.

Doing some testing with a Raspberry Pi 3B+ (I'm using an overlay file system, and need to sync periodically), I ran a comparison of my own for diff -q and cmp -s; note that this is a log from inside /dev/shm, so disk access speeds are a non-issue:
[root#mypi shm]# dd if=/dev/urandom of=test.file bs=1M count=100 ; time diff -q test.file test.copy && echo diff true || echo diff false ; time cmp -s test.file test.copy && echo cmp true || echo cmp false ; cp -a test.file test.copy ; time diff -q test.file test.copy && echo diff true || echo diff false; time cmp -s test.file test.copy && echo cmp true || echo cmp false
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 6.2564 s, 16.8 MB/s
Files test.file and test.copy differ
real 0m0.008s
user 0m0.008s
sys 0m0.000s
diff false
real 0m0.009s
user 0m0.007s
sys 0m0.001s
cmp false
cp: overwrite âtest.copyâ? y
real 0m0.966s
user 0m0.447s
sys 0m0.518s
diff true
real 0m0.785s
user 0m0.211s
sys 0m0.573s
cmp true
[root#mypi shm]# pico /root/rwbscripts/utils/squish.sh
I ran it a couple of times. cmp -s consistently had slightly shorter times on the test box I was using. So if you want to use cmp -s to do things between two files....
identical (){
echo "$1" and "$2" are the same.
echo This is a function, you can put whatever you want in here.
}
different () {
echo "$1" and "$2" are different.
echo This is a function, you can put whatever you want in here, too.
}
cmp -s "$FILEA" "$FILEB" && identical "$FILEA" "$FILEB" || different "$FILEA" "$FILEB"

If you are looking for more customizable diff for this, then git diff can be used.
if (git diff --no-index --quiet -- old.txt new.txt) then
echo "files contents are identical"
else
echo "files differ"
fi
--quiet
Disable all output of the program. Implies --exit-code.
--exit-code
Make the program exit with codes similar to diff(1). That is, it exits with 1 if there were differences and 0 means no differences.
Also, there are various algorithms and settings to choose from: [ref]
--diff-algorithm={patience|minimal|histogram|myers}
Choose a diff algorithm. The variants are as follows:
default, myers The basic greedy diff algorithm. Currently, this is the
default.
minimal Spend extra time to make sure the smallest possible diff is
produced.
patience Use "patience diff" algorithm when generating patches.
histogram This algorithm extends the patience algorithm to "support
low-occurrence common elements".

Bash: Create a file if it does not exist, otherwise check to see if it is writeable

I have a bash program that will write to an output file. This file may or may not exist, but the script must check permissions and fail early. I can't find an elegant way to make this happen. Here's what I have tried.
set +e
touch $file
set -e
if [ $? -ne 0 ]; then exit;fi
I keep set -e on for this script so it fails if there is ever an error on any line. Is there an easier way to do the above script?

Why complicate things?
file=exists_and_writeable
if [ ! -e "$file" ] ; then
touch "$file"
fi
if [ ! -w "$file" ] ; then
echo cannot write to $file
exit 1
fi
Or, more concisely,
( [ -e "$file" ] || touch "$file" ) && [ ! -w "$file" ] && echo cannot write to $file && exit 1

Rather than check $? on a different line, check the return value immediately like this:
touch file || exit
As long as your umask doesn't restrict the write bit from being set, you can just rely on the return value of touch

You can use -w to check if a file is writable (search for it in the bash man page).
if [[ ! -w $file ]]; then exit; fi

Why must the script fail early? By separating the writable test and the file open() you introduce a race condition. Instead, why not try to open (truncate/append) the file for writing, and deal with the error if it occurs? Something like:
$ echo foo > output.txt
$ if [ $? -ne 0 ]; then die("Couldn't echo foo")
As others mention, the "noclobber" option might be useful if you want to avoid overwriting existing files.

Open the file for writing. In the shell, this is done with an output redirection. You can redirect the shell's standard output by putting the redirection on the exec built-in with no argument.
set -e
exec >shell.out # exit if shell.out can't be opened
echo "This will appear in shell.out"
Make sure you haven't set the noclobber option (which is useful interactively but often unusable in scripts). Use > if you want to truncate the file if it exists, and >> if you want to append instead.
If you only want to test permissions, you can run : >foo.out to create the file (or truncate it if it exists).
If you only want some commands to write to the file, open it on some other descriptor, then redirect as needed.
set -e
exec 3>foo.out
echo "This will appear on the standard output"
echo >&3 "This will appear in foo.out"
echo "This will appear both on standard output and in foo.out" | tee /dev/fd/3
(/dev/fd is not supported everywhere; it's available at least on Linux, *BSD, Solaris and Cygwin.)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string