Select and looping over differences in two directories linux - linux

I have bash script that that loops through files in the raw folder and puts them into the audio folder. This works just fine.
#!/bin/bash
PATH_IN=('/nas/data/customers/test2/raw/')
PATH_OUT=('/nas/data/customers/test2/audio/')
mkdir -p /nas/data/customers/test2/audio
IFS=$'\n'
find $PATH_IN -type f -name '*.wav' -exec basename {} \; | while read -r file; do
sox -S ${PATH_IN}${file} -e signed-integer ${PATH_OUT}${file}
done
My issue is that, as the folders grow I do not want to run the script on the files that has already been converted, so I would like to loop over only the files that has not been converted yet. I.e the files only in raw but not in audio.
I found the function
diff audio raw
That can I do just that, but I cannot find a good way to incorporate this into my bash script. Any help or nudges in the right direction would be highly appreciated.

You could do:
diff <(ls -1a $PATH_OUT) <(ls -1a $PATH_IN) | grep -E ">" | sed -E 's/> //'
The first part will diff the files on both folders, the second part will filter out to get only the additions, and the third one will clean the list from the diff symbols to get just the names.

Related

Need a shell script that bzip2's every single file recursively from a certain folder

I know that this is really not my strong side to write those scripts.
I need a shell script which recursively packs every single file in its folder into the .bz2 format because I have a lot of files and when I do this manually it takes me hours.
For example here are alot of files(much more then this example):
/home/user/data/file1.yyy
/home/user/data/file2.xxx
/home/user/data/file3.zzz
/home/user/data/file4.txt
/home/user/data/file5.deb
/home/user/data/moredata/file1.xyz
/home/user/data/muchmoredata/file1.xyx
And I need them all formated into .bz2 like this:
/home/user/data/file1.yyy.bz2
/home/user/data/file2.xxx.bz2
/home/user/data/file3.zzz.bz2
/home/user/data/file4.txt.bz2
/home/user/data/file5.deb.bz2
/home/user/data/moredata/file1.xyz.bz2
/home/user/data/muchmoredata/file1.xyx.bz2
Another thing that would be great when at then end the script will run one times chown -R example:example /home/user/data
I hope you can help me
bzip2 will accept multiple files as arguments on the command line. To solve your specific example, I would do
cd /home/user/
find . -type f | egrep -v '\.bz2' | xargs bzip2 -9 &
This will find all files under /home/user, exclude any already existing .bz2 files from processing, and then send the remaining list via xargs to bzip2. The -9 gives you maximum compression (but takes more time). There is no limit to the number or length of filenames that can be processed when using xargs to feed the command (in this case bzip2).
The & char means "run all of this in the background". This means the command prompt will return to you immediately, and you continue other work, but don't expect all the files to be compressed for a while. At some point you'll also get messages like 'Job %1 ... started' and later, 'Job %1 ... finished'.
As you asked for a script, we can also do this
#!/bin/bash
if [[ ! -d "$1" ]] ; then
echo "usage: b2zipper /path/to/dir/to/search" 1>&2
exit 1
if
find "$1" -type f | egrep -v '\.bz2' | xargs bzip2 -9 &
Save this as b2zipper, and then make it executable with
chmod +x b2zipper
IHTH
To build on accepted answer, an alternative would be:
find /path/to/dir -type f -exec bzip2 {} \;

Move files to directories based on extension

I am new to Linux. I am trying to write a shell script which will move files to certain folders based on their extension, like for example in my downloads folder, I have all files of mixed file types. I have written the following script
mv *.mp3 ../Music
mv *.ogg ../Music
mv *.wav ../Music
mv *.mp4 ../Videos
mv *.flv ../Videos
How can I make it run automatically when a file is added to this folder? Now I have to manually run the script each time.
One more question, is there any way of combining these 2 statements
mv *.mp3 ../../Music
mv *.ogg ../../Music
into a single statement? I tried using || (C programming 'or' operator) and comma but they don't seem to work.
There is no trigger for when a file is added to a directory. If the file is uploaded via a webpage, you might be able to make the webpage do it.
You can put a script in crontab to do this, on unix machines (or task schedular in windows). Google crontab for a how-to.
As for combining your commands, use the following:
mv *.mp3 *.ogg ../../Music
You can include as many different "globs" (filenames with wildcards) as you like. The last thing should be the target directory.
Two ways:
find . -name '*mp3' -or -name '*ogg' -print | xargs -J% mv % ../../Music
find . -name '*mp3' -or -name '*ogg' -exec mv {} ../Music \;
The first uses a pipe and may run out of argument space; while the second may use too many forks and be slower. But, both will work.
Another way is:
mv -v {*.mp3,*.ogg,*.wav} ../Music
mv -v {*.mp4,*.flv} ../Videos
PS: option -v shows what is going on (verbose).
I like this method:
#!/bin/bash
for filename in *; do
if [[ -f "$filename" ]]; then
base=${filename%.*}
ext=${filename#$base.}
mkdir -p "${ext}"
mv "$filename" "${ext}"
fi
done
incron will watch the filesystem and perform run commands upon certain events.
You can combine multiple commands on a single line by using a command separator. The unconditional serialized command separator is ;.
command1 ; command2
You can use for loop to traverse through folders and subfolders inside the source folder.
The following code will help you move files in pair from "/source/foler/path/" to "/destination/fodler/path/". This code will move file matching their name and having different extensions.
for d in /source/folder/path/*; do
ls -tr $d |grep txt | rev | cut -f 2 -d '.' | rev | uniq | head -n 4 | xargs -I % bash -c 'mv -v '$d'/%.{txt,csv} /destination/folder/path/'
sleep 30
done

bash - redirect ls into custom script

For college I am writting a script to read and display id3 tags in mp3 files. The arguments would be the files i.e
./id3.sh file1.mp3 file2.mp3 morefiles.mp3
I can read the arguments using $0, $1 etc. and get the number of args with $#. how can I get it to read the output from a ls command?
ls *.mp3 | ./id3.sh
Try this:
ls *.mp3 | xargs id3.sh
The ls *.mp3 > ./id3.sh command is going to overwrite your id3.sh script with the list of mp3's. You can try this:
./id3.sh `ls *.mp3`
EDIT: actually, what was I thinking? Is there a reason you just can't do this?
./id3.sh *.mp3
I would suggest using pipe and xargs with -n argument, in the example below the id3.sh script will be called with at most 10 files listed by ls *.mp3. This is very important, especially if you can have hounreds or thousands of files in the list. If you omit the -n 10 then your script will be called only once with the whole list. If the list is too long your system may refuse to run your script. You can experiment how much files in each invokation of your script to process (e.g. what is more efficient in your case).
ls *.mp3 | xargs -n 10 id3.sh
then you can read the files in your id3.sh script like this
while [ "$1" != "" ]; do
#next file available in ${1}
shift
done
Any solution involving the expansion of *.mp3 risks failure if the number of .mp3 files is so large that the resultant expanded *.mp3 exceeds the shell's limit. The solutions above all have this problem:
ls *.mp3 | ...
for file in *.mp3; do ...
In fact, even though ls *.mp3|xargs ... is a good start, but it has the same problem because it requires the shell to expand the *.mp3 list and use that list as command-line arguments to the ls command.
One way to properly handle an arbitrary number of files is:
find . -maxdepth 1 -iname '*.mp3'|while read f; do
do_something_one_file_at_a_time.sh "$f"
done
OR:
find . -maxdepth 1 -iname '*.mp3' -print0|xargs -0 do_something.sh
(Both variants have the side benefit of properly handling filenames with spaces e.g. "Raindrops Keep Falling On My Head.mp3".
Note that in do_something.sh, you need to do for file in "$#"; do ... and not just for file in $*;do ... or for file in $#; do ....
Note also that amit_g's solution breaks if there are filenames with spaces.)
What's wrong with ./id3.sh *.mp3? It's safer than any solution with ls, and provides exactly the same globbing features. There's no need for xargs here, unless you're using an old kernel and have enormous amounts of files.
./id3.sh *.mp3 # if the number of files is not too many
or
ls *.mp3 | xargs -n 10 ./id3.sh # if the number of files could be too many
then in the id3.sh
while [ "$1" != "" ]
do
filename=$1
#do whatever with $filename
shift
done

How can I calculate an MD5 checksum of a directory?

I need to calculate a summary MD5 checksum for all files of a particular type (*.py for example) placed under a directory and all sub-directories.
What is the best way to do that?
The proposed solutions are very nice, but this is not exactly what I need. I'm looking for a solution to get a single summary checksum which will uniquely identify the directory as a whole - including content of all its subdirectories.
Create a tar archive file on the fly and pipe that to md5sum:
tar c dir | md5sum
This produces a single MD5 hash value that should be unique to your file and sub-directory setup. No files are created on disk.
find /path/to/dir/ -type f -name "*.py" -exec md5sum {} + | awk '{print $1}' | sort | md5sum
The find command lists all the files that end in .py.
The MD5 hash value is computed for each .py file. AWK is used to pick off the MD5 hash values (ignoring the filenames, which may not be unique).
The MD5 hash values are sorted. The MD5 hash value of this sorted list is then returned.
I've tested this by copying a test directory:
rsync -a ~/pybin/ ~/pybin2/
I renamed some of the files in ~/pybin2.
The find...md5sum command returns the same output for both directories.
2bcf49a4d19ef9abd284311108d626f1 -
To take into account the file layout (paths), so the checksum changes if a file is renamed or moved, the command can be simplified:
find /path/to/dir/ -type f -name "*.py" -exec md5sum {} + | md5sum
On macOS with md5:
find /path/to/dir/ -type f -name "*.py" -exec md5 {} + | md5
ire_and_curses's suggestion of using tar c <dir> has some issues:
tar processes directory entries in the order which they are stored in the filesystem, and there is no way to change this order. This effectively can yield completely different results if you have the "same" directory on different places, and I know no way to fix this (tar cannot "sort" its input files in a particular order).
I usually care about whether groupid and ownerid numbers are the same, not necessarily whether the string representation of group/owner are the same. This is in line with what for example rsync -a --delete does: it synchronizes virtually everything (minus xattrs and acls), but it will sync owner and group based on their ID, not on string representation. So if you synced to a different system that doesn't necessarily have the same users/groups, you should add the --numeric-owner flag to tar
tar will include the filename of the directory you're checking itself, just something to be aware of.
As long as there is no fix for the first problem (or unless you're sure it does not affect you), I would not use this approach.
The proposed find-based solutions are also no good because they only include files, not directories, which becomes an issue if you the checksumming should keep in mind empty directories.
Finally, most suggested solutions don't sort consistently, because the collation might be different across systems.
This is the solution I came up with:
dir=<mydir>; (find "$dir" -type f -exec md5sum {} +; find "$dir" -type d) | LC_ALL=C sort | md5sum
Notes about this solution:
The LC_ALL=C is to ensure reliable sorting order across systems
This doesn't differentiate between a directory "named\nwithanewline" and two directories "named" and "withanewline", but the chance of that occurring seems very unlikely. One usually fixes this with a -print0 flag for find, but since there's other stuff going on here, I can only see solutions that would make the command more complicated than it's worth.
PS: one of my systems uses a limited busybox find which does not support -exec nor -print0 flags, and also it appends '/' to denote directories, while findutils find doesn't seem to, so for this machine I need to run:
dir=<mydir>; (find "$dir" -type f | while read f; do md5sum "$f"; done; find "$dir" -type d | sed 's#/$##') | LC_ALL=C sort | md5sum
Luckily, I have no files/directories with newlines in their names, so this is not an issue on that system.
If you only care about files and not empty directories, this works nicely:
find /path -type f | sort -u | xargs cat | md5sum
A solution which worked best for me:
find "$path" -type f -print0 | sort -z | xargs -r0 md5sum | md5sum
Reason why it worked best for me:
handles file names containing spaces
Ignores filesystem meta-data
Detects if file has been renamed
Issues with other answers:
Filesystem meta-data is not ignored for:
tar c - "$path" | md5sum
Does not handle file names containing spaces nor detects if file has been renamed:
find /path -type f | sort -u | xargs cat | md5sum
For the sake of completeness, there's md5deep(1); it's not directly applicable due to *.py filter requirement but should do fine together with find(1).
If you want one MD5 hash value spanning the whole directory, I would do something like
cat *.py | md5sum
Checksum all files, including both content and their filenames
grep -ar -e . /your/dir | md5sum | cut -c-32
Same as above, but only including *.py files
grep -ar -e . --include="*.py" /your/dir | md5sum | cut -c-32
You can also follow symlinks if you want
grep -aR -e . /your/dir | md5sum | cut -c-32
Other options you could consider using with grep
-s, --no-messages suppress error messages
-D, --devices=ACTION how to handle devices, FIFOs and sockets;
-Z, --null print 0 byte after FILE name
-U, --binary do not strip CR characters at EOL (MSDOS/Windows)
GNU find
find /path -type f -name "*.py" -exec md5sum "{}" +;
Technically you only need to run ls -lR *.py | md5sum. Unless you are worried about someone modifying the files and touching them back to their original dates and never changing the files' sizes, the output from ls should tell you if the file has changed. My unix-foo is weak so you might need some more command line parameters to get the create time and modification time to print. ls will also tell you if permissions on the files have changed (and I'm sure there are switches to turn that off if you don't care about that).
Using md5deep:
md5deep -r FOLDER | awk '{print $1}' | sort | md5sum
I want to add that if you are trying to do this for files/directories in a Git repository to track if they have changed, then this is the best approach:
git log -1 --format=format:%H --full-diff <file_or_dir_name>
And if it's not a Git directory/repository, then the answer by ire_and_curses is probably the best bet:
tar c <dir_name> | md5sum
However, please note that tar command will change the output hash if you run it in a different OS and stuff. If you want to be immune to that, this is the best approach, even though it doesn't look very elegant on first sight:
find <dir_name> -type f -print0 | sort -z | xargs -0 md5sum | md5sum | awk '{ print $1 }'
md5sum worked fine for me, but I had issues with sort and sorting file names. So instead I sorted by md5sum result. I also needed to exclude some files in order to create comparable results.
find . -type f -print0 \
| xargs -r0 md5sum \
| grep -v ".env" \
| grep -v "vendor/autoload.php" \
| grep -v "vendor/composer/" \
| sort -d \
| md5sum
I had the same problem so I came up with this script that just lists the MD5 hash values of the files in the directory and if it finds a subdirectory it runs again from there, for this to happen the script has to be able to run through the current directory or from a subdirectory if said argument is passed in $1
#!/bin/bash
if [ -z "$1" ] ; then
# loop in current dir
ls | while read line; do
ecriv=`pwd`"/"$line
if [ -f $ecriv ] ; then
md5sum "$ecriv"
elif [ -d $ecriv ] ; then
sh myScript "$line" # call this script again
fi
done
else # if a directory is specified in argument $1
ls "$1" | while read line; do
ecriv=`pwd`"/$1/"$line
if [ -f $ecriv ] ; then
md5sum "$ecriv"
elif [ -d $ecriv ] ; then
sh myScript "$line"
fi
done
fi
If you want really independence from the file system attributes and from the bit-level differences of some tar versions, you could use cpio:
cpio -i -e theDirname | md5sum
There are two more solutions:
Create:
du -csxb /path | md5sum > file
ls -alR -I dev -I run -I sys -I tmp -I proc /path | md5sum > /tmp/file
Check:
du -csxb /path | md5sum -c file
ls -alR -I dev -I run -I sys -I tmp -I proc /path | md5sum -c /tmp/file

How to copy a file to multiple directories using the gnu cp command

Is it possible to copy a single file to multiple directories using the cp command ?
I tried the following , which did not work:
cp file1 /foo/ /bar/
cp file1 {/foo/,/bar}
I know it's possible using a for loop, or find. But is it possible using the gnu cp command?
You can't do this with cp alone but you can combine cp with xargs:
echo dir1 dir2 dir3 | xargs -n 1 cp file1
Will copy file1 to dir1, dir2, and dir3. xargs will call cp 3 times to do this, see the man page for xargs for details.
No, cp can copy multiple sources but will only copy to a single destination. You need to arrange to invoke cp multiple times - once per destination - for what you want to do; using, as you say, a loop or some other tool.
Wildcards also work with Roberts code
echo ./fs*/* | xargs -n 1 cp test
I would use cat and tee based on the answers I saw at https://superuser.com/questions/32630/parallel-file-copy-from-single-source-to-multiple-targets instead of cp.
For example:
cat inputfile | tee outfile1 outfile2 > /dev/null
As far as I can see it you can use the following:
ls | xargs -n 1 cp -i file.dat
The -i option of cp command means that you will be asked whether to overwrite a file in the current directory with the file.dat. Though it is not a completely automatic solution it worked out for me.
These answers all seem more complicated than the obvious:
for i in /foo /bar; do cp "$file1" "$i"; done
ls -db di*/subdir | xargs -n 1 cp File
-b in case there is a space in directory name otherwise it will be broken as a different item by xargs, had this problem with the echo version
Not using cp per se, but...
This came up for me in the context of copying lots of Gopro footage off of a (slow) SD card to three (slow) USB drives. I wanted to read the data only once, because it took forever. And I wanted it recursive.
$ tar cf - src | tee >( cd dest1 ; tar xf - ) >( cd dest2 ; tar xf - ) | ( cd dest3 ; tar xf - )
(And you can add more of those >() sections if you want more outputs.)
I haven't benchmarked that, but it's definitely a lot faster than cp-in-a-loop (or a bunch of parallel cp invocations).
If you want to do it without a forked command:
tee <inputfile file2 file3 file4 ... >/dev/null
To use copying with xargs to directories using wildcards on Mac OS, the only solution that worked for me with spaces in the directory name is:
find ./fs*/* -type d -print0 | xargs -0 -n 1 cp test
Where test is the file to copy
And ./fs*/* the directories to copy to
The problem is that xargs sees spaces as a new argument, the solutions to change the delimiter character using -d or -E is unfortunately not properly working on Mac OS.
Essentially equivalent to the xargs answer, but in case you want parallel execution:
parallel -q cp file1 ::: /foo/ /bar/
So, for example, to copy file1 into all subdirectories of current folder (including recursion):
parallel -q cp file1 ::: `find -mindepth 1 -type d`
N.B.: This probably only conveys any noticeable speed gains for very specific use cases, e.g. if each target directory is a distinct disk.
It is also functionally similar to the '-P' argument for xargs.
No - you cannot.
I've found on multiple occasions that I could use this functionality so I've made my own tool to do this for me.
http://github.com/ddavison/branch
pretty simple -
branch myfile dir1 dir2 dir3
ls -d */ | xargs -iA cp file.txt A
Suppose you want to copy fileName.txt to all sub-directories within present working directory.
Get all sub-directories names through ls and save them to some temporary file say, allFolders.txt
ls > allFolders.txt
Print the list and pass it to command xargs.
cat allFolders.txt | xargs -n 1 cp fileName.txt
Another way is to use cat and tee as follows:
cat <source file> | tee <destination file 1> | tee <destination file 2> [...] > <last destination file>
I think this would be pretty inefficient though, since the job would be split among several processes (one per destination) and the hard drive would be writing several files at once over different parts of the platter. However if you wanted to write a file out to several different drives, this method would probably be pretty efficient (as all copies could happen concurrently).
Using a bash script
DESTINATIONPATH[0]="xxx/yyy"
DESTINATIONPATH[1]="aaa/bbb"
..
DESTINATIONPATH[5]="MainLine/USER"
NumberOfDestinations=6
for (( i=0; i<NumberOfDestinations; i++))
do
cp SourcePath/fileName.ext ${DESTINATIONPATH[$i]}
done
exit
if you want to copy multiple folders to multiple folders one can do something like this:
echo dir1 dir2 dir3 | xargs -n 1 cp -r /path/toyourdir/{subdir1,subdir2,subdir3}
If all your target directories match a path expression — like they're all subdirectories of path/to — then just use find in combination with cp like this:
find ./path/to/* -type d -exec cp [file name] {} \;
That's it.
If you need to be specific on into which folders to copy the file you can combine find with one or more greps. For example to replace any occurences of favicon.ico in any subfolder you can use:
find . | grep favicon\.ico | xargs -n 1 cp -f /root/favicon.ico
This will copy to the immediate sub-directories, if you want to go deeper, adjust the -maxdepth parameter.
find . -mindepth 1 -maxdepth 1 -type d| xargs -n 1 cp -i index.html
If you don't want to copy to all directories, hopefully you can filter the directories you are not interested in. Example copying to all folders starting with a
find . -mindepth 1 -maxdepth 1 -type d| grep \/a |xargs -n 1 cp -i index.html
If copying to a arbitrary/disjoint set of directories you'll need Robert Gamble's suggestion.
I like to copy a file into multiple directories as such:
cp file1 /foo/; cp file1 /bar/; cp file1 /foo2/; cp file1 /bar2/
And copying a directory into other directories:
cp -r dir1/ /foo/; cp -r dir1/ /bar/; cp -r dir1/ /foo2/; cp -r dir1/ /bar2/
I know it's like issuing several commands, but it works well for me when I want to type 1 line and walk away for a while.
For example if you are in the parent directory of you destination folders you can do:
for i in $(ls); do cp sourcefile $i; done

Resources