Need a shell script that bzip2's every single file recursively from a certain folder - linux

I know that this is really not my strong side to write those scripts.
I need a shell script which recursively packs every single file in its folder into the .bz2 format because I have a lot of files and when I do this manually it takes me hours.
For example here are alot of files(much more then this example):
/home/user/data/file1.yyy
/home/user/data/file2.xxx
/home/user/data/file3.zzz
/home/user/data/file4.txt
/home/user/data/file5.deb
/home/user/data/moredata/file1.xyz
/home/user/data/muchmoredata/file1.xyx
And I need them all formated into .bz2 like this:
/home/user/data/file1.yyy.bz2
/home/user/data/file2.xxx.bz2
/home/user/data/file3.zzz.bz2
/home/user/data/file4.txt.bz2
/home/user/data/file5.deb.bz2
/home/user/data/moredata/file1.xyz.bz2
/home/user/data/muchmoredata/file1.xyx.bz2
Another thing that would be great when at then end the script will run one times chown -R example:example /home/user/data
I hope you can help me

bzip2 will accept multiple files as arguments on the command line. To solve your specific example, I would do
cd /home/user/
find . -type f | egrep -v '\.bz2' | xargs bzip2 -9 &
This will find all files under /home/user, exclude any already existing .bz2 files from processing, and then send the remaining list via xargs to bzip2. The -9 gives you maximum compression (but takes more time). There is no limit to the number or length of filenames that can be processed when using xargs to feed the command (in this case bzip2).
The & char means "run all of this in the background". This means the command prompt will return to you immediately, and you continue other work, but don't expect all the files to be compressed for a while. At some point you'll also get messages like 'Job %1 ... started' and later, 'Job %1 ... finished'.
As you asked for a script, we can also do this
#!/bin/bash
if [[ ! -d "$1" ]] ; then
echo "usage: b2zipper /path/to/dir/to/search" 1>&2
exit 1
if
find "$1" -type f | egrep -v '\.bz2' | xargs bzip2 -9 &
Save this as b2zipper, and then make it executable with
chmod +x b2zipper
IHTH

To build on accepted answer, an alternative would be:
find /path/to/dir -type f -exec bzip2 {} \;

Related

Unix: How to do mkdir, cp, without mkdir?

So script2 is:
find /some_directory -type f -not -iname "*.pdf" -exec bash -c './script "{}"' \; -print > temp_file
while read line
do
mkdir -p result"$(dirname "$line")"
cp "$line" ~/result"$(dirname "$line")"/$(basename "$line" .txt).pdf
done < temp_file
rm temp_file
the ./script is file "$1"|grep -q PDF
These two combined should find .txt files that are actually PDF files and then copy them to some result directory and rename them to .pdf file. But the files should be in: result/their/original/directories/file.pdf. (if the original was some_directory/their/original/directories/file.txt, also some directories have space in them, thus a lot of "..")
It is done, it works, but the question is: how to do it without mkdir?
I've tried many things but none seem to work and every post i read it says that it can be done just by using mkdir (but because my professor demands it to be done without mkdir, I'm quite sure it can be done. I don't HAVE to do it, but I'd like to know how it can be done).
Maybe it can be done in find with piping? (some argument that would print directories first and they would be copied one by one and then files?) I've wasted a lot of time on this and it would be a shame to quit it without correct answear.
You can copy a file while creating the leading target directories with install:
install -D src/file dest/src/file

Select and looping over differences in two directories linux

I have bash script that that loops through files in the raw folder and puts them into the audio folder. This works just fine.
#!/bin/bash
PATH_IN=('/nas/data/customers/test2/raw/')
PATH_OUT=('/nas/data/customers/test2/audio/')
mkdir -p /nas/data/customers/test2/audio
IFS=$'\n'
find $PATH_IN -type f -name '*.wav' -exec basename {} \; | while read -r file; do
sox -S ${PATH_IN}${file} -e signed-integer ${PATH_OUT}${file}
done
My issue is that, as the folders grow I do not want to run the script on the files that has already been converted, so I would like to loop over only the files that has not been converted yet. I.e the files only in raw but not in audio.
I found the function
diff audio raw
That can I do just that, but I cannot find a good way to incorporate this into my bash script. Any help or nudges in the right direction would be highly appreciated.
You could do:
diff <(ls -1a $PATH_OUT) <(ls -1a $PATH_IN) | grep -E ">" | sed -E 's/> //'
The first part will diff the files on both folders, the second part will filter out to get only the additions, and the third one will clean the list from the diff symbols to get just the names.

Move files to directories based on extension

I am new to Linux. I am trying to write a shell script which will move files to certain folders based on their extension, like for example in my downloads folder, I have all files of mixed file types. I have written the following script
mv *.mp3 ../Music
mv *.ogg ../Music
mv *.wav ../Music
mv *.mp4 ../Videos
mv *.flv ../Videos
How can I make it run automatically when a file is added to this folder? Now I have to manually run the script each time.
One more question, is there any way of combining these 2 statements
mv *.mp3 ../../Music
mv *.ogg ../../Music
into a single statement? I tried using || (C programming 'or' operator) and comma but they don't seem to work.
There is no trigger for when a file is added to a directory. If the file is uploaded via a webpage, you might be able to make the webpage do it.
You can put a script in crontab to do this, on unix machines (or task schedular in windows). Google crontab for a how-to.
As for combining your commands, use the following:
mv *.mp3 *.ogg ../../Music
You can include as many different "globs" (filenames with wildcards) as you like. The last thing should be the target directory.
Two ways:
find . -name '*mp3' -or -name '*ogg' -print | xargs -J% mv % ../../Music
find . -name '*mp3' -or -name '*ogg' -exec mv {} ../Music \;
The first uses a pipe and may run out of argument space; while the second may use too many forks and be slower. But, both will work.
Another way is:
mv -v {*.mp3,*.ogg,*.wav} ../Music
mv -v {*.mp4,*.flv} ../Videos
PS: option -v shows what is going on (verbose).
I like this method:
#!/bin/bash
for filename in *; do
if [[ -f "$filename" ]]; then
base=${filename%.*}
ext=${filename#$base.}
mkdir -p "${ext}"
mv "$filename" "${ext}"
fi
done
incron will watch the filesystem and perform run commands upon certain events.
You can combine multiple commands on a single line by using a command separator. The unconditional serialized command separator is ;.
command1 ; command2
You can use for loop to traverse through folders and subfolders inside the source folder.
The following code will help you move files in pair from "/source/foler/path/" to "/destination/fodler/path/". This code will move file matching their name and having different extensions.
for d in /source/folder/path/*; do
ls -tr $d |grep txt | rev | cut -f 2 -d '.' | rev | uniq | head -n 4 | xargs -I % bash -c 'mv -v '$d'/%.{txt,csv} /destination/folder/path/'
sleep 30
done

What is the best and the fastest way to delete large directory containing thousands of files (in ubuntu)

As I know the commands like
find <dir> -type f -exec rm {} \;
are not the best variant to remove large amount of files (total files, including subfolder). It works good if you have small amount of files, but if you have 10+ mlns files in subfolders, it can hang a server.
Does anyone know any specific linux commands to solve this problem?
It may seem strange but:
$ rm -rf <dir>
Here's an example bash script:
#!/bin/bash
local LOCKFILE=/tmp/rmHugeNumberOfFiles.lock
# this process gets ultra-low priority
ionice -c2 -n7 -p $$ > /dev/null
if [ $? ]; then
echo "Could not set disk IO priority. Exiting..."
exit
fi
renice +19 -p $$ > /dev/null
if [ $? ]; then
echo "Could not renice process. Exiting..."
exit
fi
# check if there's an instance running already. If so--exit
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
echo "An instance of this script is already running."
exit
fi
# make sure the lockfile is removed when we exit. Then: claim the lock
trap "command rm -f -- $LOCKFILE; exit" INT TERM EXIT
echo $$ > $LOCKFILE
# also create a tempfile, and make sure that's removed too upon exit
tmp=$(tempfile) || exit
trap "command rm -f -- '$tmp'" INT TERM EXIT
# ----------------------------------------
# option 1
# ----------------------------------------
# find your specific files
find "$1" -type f [INSERT SPECIFIC SEARCH PATTERN HERE] > "$tmp"
cat $tmp | rm
# ----------------------------------------
# option 2
# ----------------------------------------
command rm -r "$1"
# remove the lockfile, tempfile
command rm -f -- "$tmp" $LOCKFILE
This script starts by setting its own process priority and diskIO priority to very low values, to ensure other running processes are as unaffected as possible.
Then it makes sure that it is the ONLY such process running.
The core of the script is really up to your preference. You can use rm -r if you are sure that the whole dir can be deleted indesciminately (option 2), or you can use find for more specific file deletion (option 1, possibly using command line options "$2" and onw. for convenience).
In the implementation above, Option 1 (find) first outputs everything to a tempfile, so that the rm function is only called once instead of after each file found by find. When the number of files is indeed huge, this can amount to a significant time saving. On the downside, the size of the tempfile may become an issue, but this is only likely if you're deleting literally billions of files, plus, because the diskIO has such low priority, using a tempfile followed by a single rm may in total be slower than using the find (...) -exec rm {} \; option. As always, you should experiment a bit to see what best fits your needs.
EDIT: As suggested by user946850, you can also skip the whole tempfile and use find (...) -print0 | xargs -0 rm. This has a larger memory footprint, since all full paths to all matching files will be inserted in RAM until the find command is completely finished. On the upside: there is no additional file IO due to writes to the tempfile. Which one to choose depends on your use-case.
The -r (recursive) switch removes everything below a directory, too -- including subdirectories. (Your command does not remove the directories, only the files.)
You can also speed up the find approach:
find -type f -print0 | xargs -0 rm
I tried every one of these commands, but problem I had was that the deletion process was locking the disk, and since no other processes could access it, there was a big pileup of processes trying to access the disk making the problem worse. Run "iotop" and see how much disk IO your process is using.
Here's the python script that solved my problem. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their business, then continues.
import os, os.path
import time
for root, dirs, files in os.walk('/dir/to/delete/files'):
i = 0
file_num = 0
for f in files:
fullpath = os.path.join(root, f)
i = i + 1
file_num = file_num + 1
os.remove(fullpath)
if i%500 == 1:
time.sleep(2)
print "Deleted %i files" % file_num
Hope this helps some people.
If you need to deal with space limit issue on a very large file tree (in my case many perforce branches), that sometimes being hanged while running the find and delete process -
Here's a script that I schedule daily to find all directories with specific file ("ChangesLog.txt"),
and then Sort all directories found that are older than 2 days, and Remove the first matched directory (each schedule there could be a new match):
bash -c "echo #echo Creating Cleanup_Branch.cmd on %COMPUTERNAME% - %~dp0 > Cleanup_Branch.cmd"
bash -c "echo -n 'bash -c \"find ' >> Cleanup_Branch.cmd"
rm -f dirToDelete.txt
rem cd. > dirToDelete.txt
bash -c "find .. -maxdepth 9 -regex ".+ChangesLog.txt" -exec echo {} >> dirToDelete.txt \; & pid=$!; sleep 100; kill $pid "
sed -e 's/\(.*\)\/.*/\1/' -e 's/^./"&/;s/.$/&" /' dirToDelete.txt | tr '\n' ' ' >> Cleanup_Branch.cmd
bash -c "echo -n '-maxdepth 0 -type d -mtime +2 | xargs -r ls -trd | head -n1 | xargs -t rm -Rf' >> Cleanup_Branch.cmd"
bash -c 'echo -n \" >> Cleanup_Branch.cmd'
call Cleanup_Branch.cmd
Note the requirements:
Deleting only those directories with "ChangesLog.txt", since other old directories should not be deleted.
Calling the OS commands in cygwin directly, since otherwise it used Windows default commands.
Collecting the directories to delete into external text file, in order to save find results, since sometimes the find process has hanged.
Setting a timeout to the find process by using & background process that being killed after 100 seconds.
Sorting the directories oldest first, for the delete priority.
If you have a reasonably modern version of find (4.2.3 or greater) you can use the -delete flag.
find <dir> -type f -delete
If you have version 4.2.12 or greater you can take advantage of xargs style command line stacking via the \+ -exec modifier. This way you don't run a separate copy of /bin/rm for every file.
find <dir> -type f -exec rm {} \+
The previous commands are good.
rm -rf directory/ also works faster for billion of files in one folder. I tried that.
If you would like delete tons of files as soon as possible, try this:
find . -type f -print0 | xargs -P 0 -0 rm -f
Note the -P option will make xargs use processes as many as possible.
mv large_folder /tmp/.
sudo reboot
Call to mv is fast - it just modifies labels. System reboot will clear the /tmp folder (mount it again?) in the fastest way possible.
You can create a empty directory and RSYNC it to the directory which you need to empty.
You will avoid time out and memory out issue

bash - redirect ls into custom script

For college I am writting a script to read and display id3 tags in mp3 files. The arguments would be the files i.e
./id3.sh file1.mp3 file2.mp3 morefiles.mp3
I can read the arguments using $0, $1 etc. and get the number of args with $#. how can I get it to read the output from a ls command?
ls *.mp3 | ./id3.sh
Try this:
ls *.mp3 | xargs id3.sh
The ls *.mp3 > ./id3.sh command is going to overwrite your id3.sh script with the list of mp3's. You can try this:
./id3.sh `ls *.mp3`
EDIT: actually, what was I thinking? Is there a reason you just can't do this?
./id3.sh *.mp3
I would suggest using pipe and xargs with -n argument, in the example below the id3.sh script will be called with at most 10 files listed by ls *.mp3. This is very important, especially if you can have hounreds or thousands of files in the list. If you omit the -n 10 then your script will be called only once with the whole list. If the list is too long your system may refuse to run your script. You can experiment how much files in each invokation of your script to process (e.g. what is more efficient in your case).
ls *.mp3 | xargs -n 10 id3.sh
then you can read the files in your id3.sh script like this
while [ "$1" != "" ]; do
#next file available in ${1}
shift
done
Any solution involving the expansion of *.mp3 risks failure if the number of .mp3 files is so large that the resultant expanded *.mp3 exceeds the shell's limit. The solutions above all have this problem:
ls *.mp3 | ...
for file in *.mp3; do ...
In fact, even though ls *.mp3|xargs ... is a good start, but it has the same problem because it requires the shell to expand the *.mp3 list and use that list as command-line arguments to the ls command.
One way to properly handle an arbitrary number of files is:
find . -maxdepth 1 -iname '*.mp3'|while read f; do
do_something_one_file_at_a_time.sh "$f"
done
OR:
find . -maxdepth 1 -iname '*.mp3' -print0|xargs -0 do_something.sh
(Both variants have the side benefit of properly handling filenames with spaces e.g. "Raindrops Keep Falling On My Head.mp3".
Note that in do_something.sh, you need to do for file in "$#"; do ... and not just for file in $*;do ... or for file in $#; do ....
Note also that amit_g's solution breaks if there are filenames with spaces.)
What's wrong with ./id3.sh *.mp3? It's safer than any solution with ls, and provides exactly the same globbing features. There's no need for xargs here, unless you're using an old kernel and have enormous amounts of files.
./id3.sh *.mp3 # if the number of files is not too many
or
ls *.mp3 | xargs -n 10 ./id3.sh # if the number of files could be too many
then in the id3.sh
while [ "$1" != "" ]
do
filename=$1
#do whatever with $filename
shift
done

Resources