how to filter out / ignore specific lines when comparing text files with diff

how to filter out / ignore specific lines when comparing text files with diff - linux

To further clarify what I am trying to do, I wrote the script below. I am attempting to audit some files between my QA and PRD environments and would like the final Diff output to Ignore hard coded values such as sql connections. I have about 6 different values to filer. I have tried several ways thus far I am not able to get any of them to work as needed. I am open to doing this another way if anyone has any ideas. I am pretty new to script development so Im open to any ideas or information. Thanks :)
#!/bin/bash
#*********************************************************************
#
# Name: compareMD5.sh
# Date: 02/12/2018
# Script Location:
# Author: Maggie o
#
# Description: This script will pull absolute paths from a text file
# and compare the files via ssh between QA & PRD on md5sum
# output match or no match
# Then the file the non matching files will be imported to a
# tmp directory via scp
# Files will be compared locally and exclude whitespace,
# spaces, comments, and hard coded values
# NOTE: Script may take a several minutes to run
#
# Usage: Auditing QA to PRD Pass 3
# nohup ./compareMD52.sh > /output/compareMD52.out 2> /error/compareMD52.err
# checking run ps -ef | grep compareMD52*
#**********************************************************************
rm /output/no_matchMD5.txt
rm /output/filesDiffer.txt
echo "Filename | Path" > /output/matchingMD5.txt
#Remove everything below tmp directory recursivly as it was created by previous script run
rm -rf /tmp/*
for i in $(cat /input/comp_list.txt) #list of files with absolute paths output by compare script
do
export filename=$(basename "$i") #Grab just the filename
export path=$(dirname "$i") #Just the Directory
qa_md5sum=$(md5sum "$i") #Get the md5sum
qa_md5="${qa_md5sum%% *}" #remove the appended path
export tmpdir=(/tmp"$path")
# if the stat is not null then run the if, if file is exisiting
if ssh oracle#Someconnection stat $path'$filename' \> /dev/null 2\>\&1
then
prd_md5sum=$(ssh oracle#Somelocation "cd $path; find -name '$filename' -
exec md5sum {} \;")
prd_md5="${prd_md5sum%% *}" #remove the appended path
if [[ $qa_md5 == $prd_md5 ]] #Match hash as integer
then
echo $filename $path " QA Matches PRD">> /output/matchingMD5.txt
else
echo $i
echo $tmpdir
echo "Copying "$i" to "$tmpdir >> /output/no_matchMD5.txt
#Copy the file from PRD to a tmp Dir in QA, keep dir structure to avoid issues of same filename exisiting in diffrent directorys
mkdir -p $tmpdir # -p creates only if not exisiting, does not produce errors if exisiting
scp oracle#Somelocation:$i $tmpdir # get the file from Prd, Insert into tmp Directory
fi
fi
done
for x in $(cat /output/no_matchMD5.txt) #do a local comapare using diff
do
comp_filename=$(basename "$x")
#Ignore Comments, no white space, no blank lines, and only report if different but not How different
qa=(/tmp"$x")
#IN TEST
if diff -bBq -I '^#' $x $qa >/dev/null
# Fails to catch files if the Comment then the start of a line
then
echo $comp_filename " differs more then just white space, or
comment"
echo $x >> /output/filesDiffer.txt
fi
done

You can pipe the output into grep -v
Like this:
diff -bBq TEST.sh TEST2.sh | grep -v "^#"

I was able to get this figured out using this method
if diff -bBqZ -I '^#' <(grep -vE '(thing1|thing2|thing3)' $x) <(grep -vE '(thing1|thing2|thing3)' $prdfile)

Related

how to move a file after grep command when there is no return result

I wanna move a file after the grep command but as I execute my script, I noticed that there are no results coming back. regardless of that, I want to move the file/s to another directory.
this is what I've been doing:
for file in *.sup
do
grep -iq "$file" '' /desktop/list/varlogs.txt || mv "$file" /desktop/first;
done
but I am getting this error:
mv: 0653-401 Cannot rename first /desktop/first/first
suggestions would be very helpful

I am not sure what the two single quotes are for in between ..."$file" '' /desktop.... With them there, grep is looking also for $file in a file called '', so grep will throw the grep: : No such file or directory error with that there.
Also pay attention to the behavior change of adding the -q or --quiet flags, as it affects the returned value of grep and will impact whether the command to the || is run or not (see man grep for more).
I can't make out exactly what you are trying to do, but you can add a couple statements to help figure out what is going on. You could run your script with bash -x ./myscript.sh to display everything that runs as it runs, or add set -x before and set +x after the for loop in the script to show what is happening.
I added some debugging to your script and changed th || to an if/then statement to expose what is happening. Try this and see if you can find where things are going awry.
echo -e "============\nBEFORE:\n============"
echo -e "\n## The files in current dir '$(pwd)' are: ##\n$(ls)"
echo -e "\n## The files in '/desktop/first' are: ##\n$(ls /desktop/first)"
echo -e "\n## Looking for '.sup' files in '$(pwd)' ##"
for file in *.sup; do
echo -e "\n## == look for '${file}' in '/desktop/list/varlogs.txt' == ##"
# let's change this to an if/else
# the || means try the left command for success, or try the right one
# grep -iq "$file" '' /desktop/list/varlogs.txt || mv -v "$file" /desktop/first
# based on `man grep`: EXIT STATUS
# Normally the exit status is 0 if a line is selected,
# 1 if no lines were selected, and 2 if an error occurred.
# However, if the -q or --quiet or --silent is used and a line
# is selected, the exit status is 0 even if an error occurred.
# note that --ignore-case and --quiet are long versions of -i and -q/ -iq
if grep --ignore-case --quiet "${file}" '' /desktop/list/varlogs.txt; then
echo -e "\n'${file}' found in '/desktop/list/varlogs.txt'"
else
echo -e "\n'${file}' not found in '/desktop/list/varlogs.txt'"
echo -e "\nmove '${file}' to '/desktop/first'"
mv --verbose "${file}" /desktop/first
fi
done
echo -e "\n============\nAFTER:\n============"
echo -e "\n## The files in current dir '$(pwd)' are: ##\n$(ls)"
echo -e "\n## The files in '/desktop/first' are: ##\n$(ls /desktop/first)"
|| means try the first command, and if it is not successful (i.e. does not return 0), then do the next command. In your case, it appears you are looking in /desktop/list/varlogs.txt to see if any .sup files in the current directory match any in the varlogs file and if not, then move them to the /desktop/first/ directory. If matches were found, leave them in the current dir. (according to the logic you have currently)
mv --verbose explain what is being done
echo -e enables interpretation of backslash escapes
set -x shows the commands that are being run/ debugging
Please respond and clarify if anything is different. I am trying to raise in the ranks to be more helpful so I would appreciate comments, and upvotes if this was helpful.

Suggesting to avoid repeated scans of /desktop/list/varlogs.txt, and remove duplicats:
mv $(grep -o -f <<<$(ls -1 *.sup) /desktop/list/varlogs.txt|sort|uniq) /desktop/first
Suggesting to test step 1. in explanation below to list the files to be moved.
Explanation
1. grep -o -f <<<$(ls -1 *.sup) /desktop/list/varlogs.txt| sort| uniq
List all the files selected in ls -1 *.sup mentioned in /desktop/list/varlogs.txt in a single scan.
-o list only matched filenames.
<<<$(ls -1 *.sup) prepare a temporary redirected input file containing all the pattern match strings. From the output of ls -1 *.sup
|sort|uniq Than, sort the list and remove duplicates (we can move the file only once).
2. mv <files-list-output-from-step-1> /desktop/first
Move all the files found in step 1 to directory /desktop/first

Why is a part of the code inside a (False) if statement executed?

I wrote a small script which:
prints the content of a file (generated by another application) on paper with a matrix printer
prints the same line into a backup file
removes the original file.
The script runs every minute by a cronjob and works fine as long as there are files to print. If there are no files to print, it prints an empty line on the matrix printer and in the backup file. I don't understand why this happens as i implemented an if statement which checks if there is a file to print before the print command is executed. This behaviour only happens if the script is executed by the cron and not if i execute it manually with ./script.sh. What's the reason of this? and how can i solve it?
Something i noticed on the side is that if I place an echo "hi" command in the script, its printed to the matrix printer and the backup file. I expected that its printed to the console console when it has no >> something behind. How does this work?
The script:
#!/bin/bash
# Make sure the backup directory exists
if [ ! -d /home/user/backup_logprint ]
then
mkdir /home/user/backup_logprint
fi
# Print the records if there are any
date=`date +%Y-%m-%d`
filename='_logprint_backup'
printer_path="/dev/usb/lp0"
if [ `ls /tmp/ | grep logprint | wc -l` -gt 0 ]
then
for f in `ls /tmp | grep logprint`
do
echo `cat /tmp/$f` >> "/home/user/backup_logprint/$date$filename"
echo `cat /tmp/$f` >> $printer_path
rm "/tmp/$f"
done
fi

There's no need for ls or an if statement. Just use a proper glob in the for loop, and if no file match, the loop won't be entered.
#!/bin/bash
# Don't check first; just let mkdir decide if
# anything actually needs to be created.
d=/home/user/backup_logprint
mkdir -p "$d"
filename=$(date +"$d/%Y-%m-%d_logprint_backup")
printer_path="/dev/usb/lp0"
# Cause non-matching globs to expand to an empty
# sequence instead of being treated literally.
shopt -s nullglob
for f in /tmp/*logprint*; do
cat "$f" > "$printer_path" && mv "$f" "$d"
done

Deleting all files except ones mentioned in config file

Situation:
I need a bash script that deletes all files in the current folder, except all the files mentioned in a file called ".rmignore". This file may contain addresses relative to the current folder, that might also contain asterisks(*). For example:
1.php
2/1.php
1/*.php
What I've tried:
I tried to use GLOBIGNORE but that didn't work well.
I also tried to use find with grep, like follows:
find . | grep -Fxv $(echo $(cat .rmignore) | tr ' ' "\n")

It is considered bad practice to pipe the exit of find to another command. You can use -exec, -execdir followed by the command and '{}' as a placeholder for the file, and ';' to indicate the end of your command. You can also use '+' to pipe commands together IIRC.
In your case, you want to list all the contend of a directory, and remove files one by one.
#!/usr/bin/env bash
set -o nounset
set -o errexit
shopt -s nullglob # allows glob to expand to nothing if no match
shopt -s globstar # process recursively current directory
my:rm_all() {
local ignore_file=".rmignore"
local ignore_array=()
while read -r glob; # Generate files list
do
ignore_array+=(${glob});
done < "${ignore_file}"
echo "${ignore_array[#]}"
for file in **; # iterate over all the content of the current directory
do
if [ -f "${file}" ]; # file exist and is file
then
local do_rmfile=true;
# Remove only if matches regex
for ignore in "${ignore_array[#]}"; # Iterate over files to keep
do
[[ "${file}" == "${ignore}" ]] && do_rmfile=false; #rm ${file};
done
${do_rmfile} && echo "Removing ${file}"
fi
done
}
my:rm_all;

If we assume that none of the files in .rmignore contain newlines in their name, the following might suffice:
# Gather our exclusions...
mapfile -t excl < .rmignore
# Reverse the array (put data in indexes)
declare -A arr=()
for file in "${excl[#]}"; do arr[$file]=1; done
# Walk through files, deleting anything that's not in the associative array.
shopt -s globstar
for file in **; do
[ -n "${arr[$file]}" ] && continue
echo rm -fv "$file"
done
Note: untested. :-) Also, associative arrays were introduced with Bash 4.
An alternate method might be to populate an array with the whole file list, then remove the exclusions. This might be impractical if you're dealing with hundreds of thousands of files.
shopt -s globstar
declare -A filelist=()
# Build a list of all files...
for file in **; do filelist[$file]=1; done
# Remove files to be ignored.
while read -r file; do unset filelist[$file]; done < .rmignore
# Annd .. delete.
echo rm -v "${!filelist[#]}"
Also untested.
Warning: rm at your own risk. May contain nuts. Keep backups.
I note that neither of these solutions will handle wildcards in your .rmignore file. For that, you might need some extra processing...
shopt -s globstar
declare -A filelist=()
# Build a list...
for file in **; do filelist[$file]=1; done
# Remove PATTERNS...
while read -r glob; do
for file in $glob; do
unset filelist[$file]
done
done < .rmignore
# And remove whatever's left.
echo rm -v "${!filelist[#]}"
And .. you guessed it. Untested. This depends on $f expanding as a glob.
Lastly, if you want a heavier-weight solution, you can use find and grep:
find . -type f -not -exec grep -q -f '{}' .rmignore \; -delete
This runs a grep for EACH file being considered. And it's not a bash solution, it only relies on find which is pretty universal.
Note that ALL of these solutions are at risk of errors if you have files that contain newlines.

This line do perfectly the job
find . -type f | grep -vFf .rmignore

If you have rsync, you might be able to copy an empty directory to the target one, with suitable rsync ignore files. Try it first with -n, to see what it will attempt, before running it for real!

This is another bash solution that seems to work ok in my tests:
while read -r line;do
exclude+=$(find . -type f -path "./$line")$'\n'
done <.rmignore
echo "ignored files:"
printf '%s\n' "$exclude"
echo "files to be deleted"
echo rm $(LC_ALL=C sort <(find . -type f) <(printf '%s\n' "$exclude") |uniq -u ) #intentionally non quoted to remove new lines
Test it online here

Alternatively, you may want to look at the simplest format:
rm $(ls -1 | grep -v .rmignore)

How can I batch rename multiple images with their path names and reordered sequences in bash?

My pictures are kept in the folder with the picture-date for folder name, for example the original path and file names:
.../Pics/2016_11_13/wedding/DSC0215.jpg
.../Pics/2016_11_13/afterparty/DSC0234.jpg
.../Pics/2016_11_13/afterparty/DSC0322.jpg
How do I rename the pictures into the format below, with continuous sequences and 4-digit padding?
.../Pics/2016_11_13_wedding.0001.jpg
.../Pics/2016_11_13_afterparty.0002.jpg
.../Pics/2016_11_13_afterparty.0003.jpg
I'm using Bash 4.1, so only mv command is available. Here is what I have now but it's not working
#!/bin/bash
p=0
for i in *.jpg;
do
mv "$i" "$dirname.%03d$p.JPG"
((p++))
done
exit 0

Let say you have something like .../Pics/2016_11_13/wedding/XXXXXX.jpg; then go in directory .../Pics/2016_11_13; from there, you should have a bunch of subdirectories like wedding, afterparty, and so on. Launch this script (disclaimer: I didn't test it):
#!/bin/sh
for subdir in *; do # scan directory
[ ! -d "$subdir" ] && continue; # skip non-directory
prognum=0; # progressive number
for file in $(ls "$dir"); do # scan subdirectory
(( prognum=$prognum+1 )) # increment progressive
newname=$(printf %4.4d $prognum) # format it
newname="$subdir.$newname.jpg" # compose the new name
if [ -f "$newname" ]; then # check to not overwrite anything
echo "error: $newname already exist."
exit
fi
# do the job, move or copy
cp "$subdir/$file" "$newname"
done
done
Please note that I skipped the "date" (2016_11_13) part - I am not sure about it. If you have a single date, then it is easy to add these digits in # compose the new name. If you have several dates, then you can add a nested for for scanning the "date" directories. One more reason I skipped this, is to let you develop something by yourself, something you can be proud of...

Using only mv and bash builtins:
#! /bin/bash
shopt -s globstar
cd Pics
p=1
# recursive glob for .jpg files
for i in **/*.jpg
do
# (date)/(event)/(filename).jpg
if [[ $i =~ (.*)/(.*)/(.*).jpg ]]
then
newname=$(printf "%s_%s.%04d.jpg" "${BASH_REMATCH[#]:1:2}" "$p")
echo mv "$i" "$newname"
((p++))
fi
done
globstar is a bash 4.0 feature, and regex matching is available even in OSX's anitque bash.

How to extract only file name return from diff command?

I am trying to prepare a bash script for sync 2 directories. But I am not able to file name return from diff. everytime it converts to array.
Here is my code :
#!/bin/bash
DIRS1=`diff -r /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ `
for DIR in $DIRS1
do
echo $DIR
done
And if I run this script I get out put something like this :
Only
in
/opt/lampp/htdocs/scripts/www/:
file1
diff
-r
"/opt/lampp/htdocs/scripts/dev/File
1.txt"
"/opt/lampp/htdocs/scripts/www/File
1.txt"
0a1
>
sa
das
Only
in
/opt/lampp/htdocs/scripts/www/:
File
1.txt~
Only
in
/opt/lampp/htdocs/scripts/www/:
file
2
-
second
Actually I just want to file name where I find the diffrence so I can take perticular action either copy/delete.
Thanks

I don't think diff produces output which can be parsed easily for your purposes. It's possible to solve your problem by iterating over the files in the two directories and running diff on them, using the return value from diff instead (and throwing the diff output away).
The code to do this is a bit long, but here it is:
DIR1=./one # set as required
DIR2=./two # set as required
# Process any files in $DIR1 only, or in both $DIR1 and $DIR2
find $DIR1 -type f -print0 | while read -d $'\0' -r file1; do
relative_path=${file1#${DIR1}/};
file2="$DIR2/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR1' only"
# Do more stuff here
elif diff -q "$file1" "$file2" >/dev/null; then
echo "'$relative_path' same in '$DIR1' and '$DIR2'"
# Do more stuff here
else
echo "'$relative_path' different between '$DIR1' and '$DIR2'"
# Do more stuff here
fi
done
# Process files in $DIR2 only
find $DIR2 -type f -print0 | while read -d $'\0' -r file2; do
relative_path=${file2#${DIR2}/};
file1="$DIR1/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR2 only'"
# Do more stuff here
fi
done
This code leverages some tricks to safely handle files which contain spaces, which would be very difficult to get working by parsing diff output. You can find more details on that topic here.
Of course this doesn't do anything regarding files which have the same contents but different names or are located in different directories.
I tested by populating two test directories as follows:
echo "dir one only" > "$DIR1/dir one only.txt"
echo "dir two only" > "$DIR2/dir two only.txt"
echo "in both, same" > $DIR1/"in both, same.txt"
echo "in both, same" > $DIR2/"in both, same.txt"
echo "in both, and different" > $DIR1/"in both, different.txt"
echo "in both, but different" > $DIR2/"in both, different.txt"
My output was:
'dir one only.txt' in './one' only
'in both, different.txt' different between './one' and './two'
'in both, same.txt' same in './one' and './two'

Use -q flag and avoid the for loop:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/
If you only want the files that differs:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ |grep -Po '(?<=Files )\w+'|while read file; do
echo $file
done
-q --brief
Output only whether files differ.
But defitnitely you should check rsync: http://linux.die.net/man/1/rsync

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to filter out / ignore specific lines when comparing text files with diff - linux

You can pipe the output into grep -v Like this: diff -bBq TEST.sh TEST2.sh | grep -v "^#"

I was able to get this figured out using this method if diff -bBqZ -I '^#' <(grep -vE '(thing1|thing2|thing3)' $x) <(grep -vE '(thing1|thing2|thing3)' $prdfile)

Related

how to move a file after grep command when there is no return result

Why is a part of the code inside a (False) if statement executed?

Deleting all files except ones mentioned in config file

How can I batch rename multiple images with their path names and reordered sequences in bash?

How to extract only file name return from diff command?

Categories

Resources