How to extract only file name return from diff command? - linux

I am trying to prepare a bash script for sync 2 directories. But I am not able to file name return from diff. everytime it converts to array.
Here is my code :
#!/bin/bash
DIRS1=`diff -r /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ `
for DIR in $DIRS1
do
echo $DIR
done
And if I run this script I get out put something like this :
Only
in
/opt/lampp/htdocs/scripts/www/:
file1
diff
-r
"/opt/lampp/htdocs/scripts/dev/File
1.txt"
"/opt/lampp/htdocs/scripts/www/File
1.txt"
0a1
>
sa
das
Only
in
/opt/lampp/htdocs/scripts/www/:
File
1.txt~
Only
in
/opt/lampp/htdocs/scripts/www/:
file
2
-
second
Actually I just want to file name where I find the diffrence so I can take perticular action either copy/delete.
Thanks

I don't think diff produces output which can be parsed easily for your purposes. It's possible to solve your problem by iterating over the files in the two directories and running diff on them, using the return value from diff instead (and throwing the diff output away).
The code to do this is a bit long, but here it is:
DIR1=./one # set as required
DIR2=./two # set as required
# Process any files in $DIR1 only, or in both $DIR1 and $DIR2
find $DIR1 -type f -print0 | while read -d $'\0' -r file1; do
relative_path=${file1#${DIR1}/};
file2="$DIR2/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR1' only"
# Do more stuff here
elif diff -q "$file1" "$file2" >/dev/null; then
echo "'$relative_path' same in '$DIR1' and '$DIR2'"
# Do more stuff here
else
echo "'$relative_path' different between '$DIR1' and '$DIR2'"
# Do more stuff here
fi
done
# Process files in $DIR2 only
find $DIR2 -type f -print0 | while read -d $'\0' -r file2; do
relative_path=${file2#${DIR2}/};
file1="$DIR1/$relative_path"
if [[ ! -f "$file2" ]]; then
echo "'$relative_path' in '$DIR2 only'"
# Do more stuff here
fi
done
This code leverages some tricks to safely handle files which contain spaces, which would be very difficult to get working by parsing diff output. You can find more details on that topic here.
Of course this doesn't do anything regarding files which have the same contents but different names or are located in different directories.
I tested by populating two test directories as follows:
echo "dir one only" > "$DIR1/dir one only.txt"
echo "dir two only" > "$DIR2/dir two only.txt"
echo "in both, same" > $DIR1/"in both, same.txt"
echo "in both, same" > $DIR2/"in both, same.txt"
echo "in both, and different" > $DIR1/"in both, different.txt"
echo "in both, but different" > $DIR2/"in both, different.txt"
My output was:
'dir one only.txt' in './one' only
'in both, different.txt' different between './one' and './two'
'in both, same.txt' same in './one' and './two'

Use -q flag and avoid the for loop:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/
If you only want the files that differs:
diff -rq /opt/lampp/htdocs/scripts/dev/ /opt/lampp/htdocs/scripts/www/ |grep -Po '(?<=Files )\w+'|while read file; do
echo $file
done
-q --brief
Output only whether files differ.
But defitnitely you should check rsync: http://linux.die.net/man/1/rsync

Related

How can I remove the extension of files with a specific extension?

I'm trying to create a program that would remove the extensions of files with that specific extension in a directory.
So for instance there exists a directory d1, within that directory there are three files a.jpg, b.jpg and c.txt and the extension that I want to manipulate is .jpg.
After calling my program, my output should be a b c.txt since all files with .jpg now have jpg removed from them.
Here is my attempt to solve it so far:
#!/bin/bash
echo "Enter an extension"
read extension
echo "Enter a directory"
read directory
allfiles=$( ls -l $directory)
for x in $allfiles
do
ext=$( echo $x | sed 's:.*.::')
if [ $ext -eq $extension]
then
echo $( $x | cut -f 2 -d '.')
else
echo $x
fi
done
However, when I run this, I get an error saying
'-f' is not defined
'-f' is not defined
what should I change in my code?
You can solve your problem by piping the result of find to a while loop:
# First step - basic idea:
# Note: requires hardening
find . -type f | while read file; do
# do some work with ${file}
done
Next, you can extract a filename without an extension with ${file%.*} and an extension itself with ${file##*.} (see Bash - Shell Parameter Expansion):
# Second step - work with file extension:
# Note: requires hardening
find . -type f | while read file; do
[[ "${file##*.}" == "jpg" ]] && echo "${file%.*}" || echo "${file}";
done
The final step is to introduce some kind of hardening. Filenames may contain "strange" characters, like a new line character or a backslash. We can force find to print the filename followed by a null character (instead of the newline character), and then tune read to be able to deal with it:
# Final step
find . -type f -print0 | while IFS= read -r -d '' file; do
[[ "${file##*.}" == "jpg" ]] && echo "${file%.*}" || echo "${file}";
done
What about use mv command?
mv a.jpg a

Shell - iterate over content of file but do something only the first x lines

So guys,
I need your help trying to identify the fastest and the most "fault" tolerant solution to my problem.
I have a shell script which executes some functions, based on a txt file, in which I have a list of files.
The list can contain from 1 file to X files.
What I would like to do is iterate over the content of the file and execute my scripts for only 4 items out of the file.
Once the functions have been executed for these 4 files, go over to the next 4 .... and keep on doing so until all the files from the list have been "processed".
My code so far is as follows.
#!/bin/bash
number_of_files_in_folder=$(cat list.txt | wc -l)
max_number_of_files_to_process=4
Translated_files=/home/german_translated_files/
while IFS= read -r files
do
while [[ $number_of_files_in_folder -gt 0 ]]; do
i=1
while [[ $i -le $max_number_of_files_to_process ]]; do
my_first_function "$files" & # I execute my translation function for each file, as it can only perform 1 file per execution
find /home/german_translator/ -name '*.logs' -exec mv {} $Translated_files \; # As there will be several files generated, I have them copied to another folder
sed -i "/$files/d" list.txt # We remove the processed file from within our list.txt file.
my_second_function # Without parameters as it will process all the files copied at step 2.
done
# here, I want to have all the files processed and don't stop after the first iteration
done
done < list.txt
Unfortunately, as I am not quite good at shell scripting, I do not know how to structure it so that it won't waste any resources and mostly, to make sure that it "processes" everything from that file.
Do you have any advice on how to achieve what I am trying to achieve?
only 4 items out of the file. Once the functions have been executed for these 4 files, go over to the next 4
Seems to be quite easy with xargs.
your_function() {
echo "Do something with $1 $2 $3 $4"
}
export -f your_function
xargs -d '\n' -n 4 bash -c 'your_function "$#"' _ < list.txt
xargs -d '\n' for each line
-n 4 take for arguments
bash .... - run this command with 4 arguments
_ - the syntax is bash -c <script> $0 $1 $2 etc..., see man bash.
"$#" - forward arguments
export -f your_function - export your function to environment so child bash can pick it up.
I execute my translation function for each file
So you execute your translation function for each file, not for each 4 files. If the "translation function" is really for each file with no inter-file state, consider rather executing 4 processes in parallel with same code and just xargs -P 4.
If you have GNU Parallel it looks something like this:
doit() {
my_first_function "$1"
my_first_function "$2"
my_first_function "$3"
my_first_function "$4"
my_second_function "$1" "$2" "$3" "$4"
}
export -f doit
cat list.txt | parallel -n4 doit

extracting files that doesn't have a dir with the same name

sorry for that odd title. I didn't know how to word it the right way.
I'm trying to write a script to filter my wiki files to those got directories with the same name and the ones without. I'll elaborate further.
here is my file system:
what I need to do is print a list of those files which have directories in their name and another one of those without.
So my ultimate goal is getting:
with dirs:
Docs
Eng
Python
RHEL
To_do_list
articals
without dirs:
orphan.txt
orphan2.txt
orphan3.txt
I managed to get those files with dirs. Here is me code:
getname () {
file=$( basename "$1" )
file2=${file%%.*}
echo $file2
}
for d in Mywiki/* ; do
if [[ -f $d ]]; then
file=$(getname $d)
for x in Mywiki/* ; do
dir=$(getname $x)
if [[ -d $x ]] && [ $dir == $file ]; then
echo $dir
fi
done
fi
done
but stuck with getting those without. if this is the wrong way of doing this please clarify the right one.
any help appreciated. Thanks.
Here's a quick attempt.
for file in Mywiki/*.txt; do
nodir=${file##*/}
test -d "${file%.txt}" && printf "%s\n" "$nodir" >&3 || printf "%s\n" "$nodir"
done >with 3>without
This shamelessly uses standard output for the non-orphans. Maybe more robustly open another separate file descriptor for that.
Also notice how everything needs to be quoted unless you specifically require the shell to do whitespace tokenization and wildcard expansion on the value of a token. Here's the scoop on that.
That may not be the most efficient way of doing it, but you could take all files, remove the extension, and the check if there isn't a directory with that name.
Like this (untested code):
for file in Mywiki/* ; do
if [ -f "$d" ]; then
dirname=$(getname "$d")
if [ ! -d "Mywiki/$dirname" ]; then
echo "$file"
fi
fi
done
To List all the files in current dir
list1=`ls -p | grep -v /`
To List all the files in current dir without extension
list2=`ls -p | grep -v / | sed 's/\.[a-z]*//g'`
To List all the directories in current dir
list3=`ls -d */ | sed -e "s/\///g"`
Now you can get the desired directory listing using intersection of list2 and list3. Intersection of two lists in Bash

Renaming directories at multiple levels using find from bash

I'm looping over the results of find, and I'm changing every one of those folders, so my problem is that when I encounter:
/aaaa/logs/ and after that: /aaaa/logs/bbb/logs, when I try to mv /aaaa/logs/bbb/logs /aaaa/log/bbb/log it can't find the folder because it has already been renamed. That is, the output from find may report that the name is /aaaa/logs/bbb/logs, when the script previously moved output to /aaaa/log/bbb/.
Simple code:
#!/bin/bash
script_log="/myPath"
echo "Info" > $script_log
search_names_folders=`find /home/ -type d -name "logs*"`
while read -r line; do
mv $line ${line//logs/log} >>$script_log 2>&1
done <<< "$search_names_folders"
My Solution is:
#!/bin/bash
script_log="/myPath"
echo "Info" > $script_log
search_names_folders=`find /home/ -type d -name "logs*"`
while read -r line; do
number_of_occurrences=$(grep -o "logs" <<< "$line" | wc -l)
if [ "$number_of_occurrences" != "1" ]; then
real_path=${line//logs/log} ## get the full path, the suffix will be incorrect
real_path=${real_path%/*} ## get the prefix until the last /
suffix=${line##*/} ## get the real suffix
line=$real_path/$suffix ## add the full correct path to line
mv $line ${line//logs/log} >>$script_log 2>&1
fi
done <<< "$search_names_folders"
But its bad idea, Has anyone have other solutions?
Thanks!
Use the -depth option to find. This makes it process directory contents before it processes the directory itself.

Looping through files in different directory given command line argument

I'm trying to extend a script that implements something like a recycling bin for files on Linux. I have the code that I'm extending at the bottom.
In my extension, when the script is presented with the command line argument -cleanup I want to loop through files that are in the /home/7/bearm/.garbage directory, and have the user decide whether they want to delete the file or not.
However, I don't know how to detect when the command line argument is there. The command line can have other parameters, I just want to loop through the files when -cleanup is used.
I also do not know how to loop through files that are in a different directory (/home/7/bearm/.garbage).
How would I go around doing these things?
set directory = '/home/7/bearm/.garbage/'
if(! -d "$directory") then
mkdir .garbage
mv .garbage /home/7/bearm/
endif
set n = 1
while ($n <= $#argv)
set file = $argv[$n]
if(-d $file) then
#do nothing
echo "Cannot trash directory $file"
else
mv $file /home/7/bearm/.garbage
echo "Trashed $file"
endif
# n++
end
du -h /home/7/bearm/.garbage
To test if arguments contains -cleanup, you can do that (tested with ash on Minix3):
if echo "$#" | grep -- "-cleanup" >/dev/null 2>&1; then
echo "-cleanup is present..."
fi
Moreover, if you want a proper solution to use long GNU style options, see http://www.sputnick-area.net/scripts/getopts_long_example.sh and http://www.sputnick-area.net/scripts/getopts_long.sh
A bash version of your pseudo script :
#!/bin/bash
directory='/home/7/bearm/.garbage/'
mkdir -p "$directory"
for arg; do
if [[ -d $arg ]]; then
#do nothing
echo "Cannot trash directory $arg" >&2
else
mv "$arg" "$directory"
echo "Trashed $arg"
fi
done
du -sh "$directory"
Feel free to improve it with -cleanup switch.

Resources