Copy files in different folders in ther respective folder via shell - linux

I'm trying to copy files in different folders to their own folder:
/Test/Folder1/File1
/Test/Folder2/File2
/Test/Folder3/File3
I want to create a copy of each file in it's own folder:
/Test/Folder1/File1.Copy
/Test/Folder2/File2.Copy
/Test/Folder3/File3.Copy
I would try using
find /SapBackup/server*/SAPDBBAK/*_COM.dump -mmin 360 -type f -execdir cp . {}
but I don't know how to use the filename and folder of the found files as an operand.
I want to use a one-liner to add it to crontab, so a for-solution would not be suitable (AFAIK)
Thanks for your help

You are on the right track with -execdir action.
Just note you can use {} multiple times, for example:
find /some/path -mmin 360 -type f -execdir cp {} {}.copy \;
or even simpler, combine it with brace expansion1 in bash:
find /some/path -mmin 360 -type f -execdir cp {}{,.copy} \;
1 Brace expansion, as explained in the docs, is a shell expansion by which arbitrary strings may be generated. In fact, you might consider it to be a Cartesian product in bash.
For example: a{b,c} will expand to two words: ab and ac. Namely, set containing word a was "multiplied" with a set containing two words, b and c.
Similarly when used multiple times, e.g. {a,b}{c,d} expands to 4 words: ac, ad, bc and bd (to test, try: echo {a,b}{c,d}).
In cp command above, we used a zero-length word and .copy to (re-)produce the original word and the original word with .copy appended:
$ echo filename{,.copy}
filename filename.copy

If you put these lines on a bash script, you can call it from the crontab.
for i in $(ls /path/to/folder); do cp $i $i.copy; done
It's a simple code and you can change it so easy.
I hope that's usefull.

In order to copy directories you have to add -r flag to cp command. What's about a oneliner like following:
find /SapBackup/server*/SAPDBBAK/*_COM.dump -mmin 360 | xargs -I {} cp -r {} {}.copy

Related

Command Linux to copy files from a certain weekday

I am figuring out a command to copy files that are modified on a Saturday.
find -type f -printf '%Ta\t%p\n'
This way the line starts with the weekday.
When I combine this with a 'egrep' command using a regular expression (starts with "za") it shows only the files which start with "za".
find -type f -printf '%Ta\t%p\n' | egrep "^(za)"
("za" is a Dutch abbreviation for "zaterdag", which means Saturday,
This works just fine.
Now I want to copy the files with this command:
find -type f -printf '%Ta\t%p\n' -exec cp 'egrep "^(za)" *' /home/richard/test/ \;
Unfortunately it doesn't work.
Any suggestions?
The immediate problem is that -printf and -exec are independent of each other. You want to process the result of -printf to decide whether or not to actually run the -exec part. Also, of course, passing an expression in single quotes simply passes a static string, and does not evaluate the expression in any way.
The immediate fix to the evaluation problem is to use a command substitution instead of single quotes, but the problem that the -printf function's result is not available to the command substitution still remains (and anyway, the command substitution would happen before find runs, not while it runs).
A common workaround would be to pass a shell script snippet to -exec, but that still doesn't expose the -printf function to the -exec part.
find whatever -printf whatever -exec sh -c '
case $something in za*) cp "$1" "$0"; esac' "$DEST_DIR" {} \;
so we have to figure out a different way to pass the $something here.
(The above uses a cheap trick to pass the value of $DEST_DIR into the subshell so we don't have to export it. The first argument to sh -c ... ends up in $0.)
Here is a somewhat roundabout way to accomplish this. We create a format string which can be passed to sh for evaluation. In order to avoid pesky file names, we print the inode numbers of matching files, then pass those to a second instance of find for performing the actual copying.
find \( -false $(find -type f \
-printf 'case %Ta in za*) printf "%%s\\n" "-o -inum %i";; esac\n' |
sh) \) -exec cp -t "$DEST_DIR" \+
Using the inode number means any file name can be processed correctly (including one containing newlines, single or double quotes, etc) but may increase running time significantly, because we need two runs of find. If you have a large directory tree, you will probably want to refactor this for your particular scenario (maybe run only in the current directory, and create a wrapper to run it in every directory you want to examine ... thinking out loud here; not sure it helps actually).
This uses features of GNU find which are not available e.g. in *BSD (including OSX). If you are not on Linux, maybe consider installing the GNU tools.
What you can do is a shell expansion. Something like
cp $(find -type f -printf '%Ta\t%p\n' | egrep "^(za)") $DEST_DIR
Assuming that the result of your find and grep is just the filenames (and full paths, at that), this will copy all the files that match your criteria to whatever you set $DEST_DIR to.
EDIT As mentioned in the comments, this won't work if your filenames contain spaces. If that's the case, you can do something like this:
find -type f -printf '%Ta\t%p\n' | egrep "^(za)" | while read file; do cp "$file" $DEST_DIR; done

Embedding a bash command inside the mv command

I have a directory that contains a list of files having the following format:
240-timestamp1.ts
240-timestamp2.ts
...
360-timestamp1.ts
360-timestamp2.ts
Now, I want to implement a bash command which matches the files that start with '240' and renames them so that instead of '240-timestampX.ts' the files look like '240-human-readable-timestampX.ts'.
I have tried the following:
find . -maxdepth 1 -mmin +5 -type f -name "240*"
-exec mv $0 {$0/240-***and here I want to insert
either stat -c %y filename or date -d #timestampX***} '{}' \;
I stuck here because I don't know if I can embed a bash command inside the mv command. I know the task may look a bit confusing and over-complicated, but I would like to know if it is possible to do so. Of course I can create a bash script that would go through all the files in the directory and while loop them with changing their respective names, but somehow I think that a single command would be more efficient (even if less readable).
The OS is Linux Ubuntu 12.04.5
The shell is bash
Thank you both Kenavoz and Kurt Stutsman for the proposed solutions. Both your answers perform the task; however, I marked Kenavoz's answer as the accepted one because of the degree of similarity between my question and Kenavoz's answer. Even if it is indeed possible to do it in a cleaner way with omitting the find command, it is necessary in my case to use the respective command because I need to find files older than X units of time. So thank you both once again!
In case you want to keep your mmin option, your can use find and process found files with a bash command using xargs :
find . -maxdepth 1 -mmin +5 -type f -name "240*.ts" | xargs -L 1 bash -c 'mv "${1}" "240-$(stat -c %y ${1}).ts"' \;
In bash if all your files are in a single directory, you don't need to use find at all. You can do a for loop:
for file in 240-*; do
hr_timestamp=$(date -d $(echo "$file" | sed 's/.*-\([0-9]*\)\.ts/\1/'))
mv "$file" "240-$hr_timestamp.ts"
done

Copy specific files recursively

This problem has been discussed extensively but I couldn't find a solution that would help me.
I'm trying to selectively copy files from a directory tree into a specific folder. After reading some Q&A, here's what I tried:
cp `find . -name "*.pdf" -type f` ../collect/
I am in the right parent directory and there indeed is a collect directory a level above. Now I'm getting the error: cp: invalid option -- 'o'
What is going wrong?
To handle difficult file names:
find . -name "*.pdf" -type f -exec cp {} ../collect/ \;
By default, find will print the file names that it finds. If one uses the -exec option, it will instead pass the file names on to a command of your choosing, in this case a cp command which is written as:
cp {} ../collect/ \;
The {} tells find where to insert the file name. The end of the command given to -exec is marked by a semicolon. Normally, the shell would eat the semicolon. So, we escape the semicolon with a backslash so that it is passed as an argument to the find command.
Because find gives the file name to cp directly without interference from the shell, this approach works for even the most difficult file names.
More efficiency
The above runs cp on every file found. If there are many files, that would be a lot of processes started. If one has GNU tools, that can be avoided as follows:
find . -name '*.pdf' -type f -exec cp -t ../collect {} +
In this variant of the command, find will supply many file names for each single invocation of cp, potentially greatly reducing the number of processes that need to be started.

Argument list too long error for rm, cp, mv commands

I have several hundred PDFs under a directory in UNIX. The names of the PDFs are really long (approx. 60 chars).
When I try to delete all PDFs together using the following command:
rm -f *.pdf
I get the following error:
/bin/rm: cannot execute [Argument list too long]
What is the solution to this error?
Does this error occur for mv and cp commands as well? If yes, how to solve for these commands?
The reason this occurs is because bash actually expands the asterisk to every matching file, producing a very long command line.
Try this:
find . -name "*.pdf" -print0 | xargs -0 rm
Warning: this is a recursive search and will find (and delete) files in subdirectories as well. Tack on -f to the rm command only if you are sure you don't want confirmation.
You can do the following to make the command non-recursive:
find . -maxdepth 1 -name "*.pdf" -print0 | xargs -0 rm
Another option is to use find's -delete flag:
find . -name "*.pdf" -delete
tl;dr
It's a kernel limitation on the size of the command line argument. Use a for loop instead.
Origin of problem
This is a system issue, related to execve and ARG_MAX constant. There is plenty of documentation about that (see man execve, debian's wiki, ARG_MAX details).
Basically, the expansion produce a command (with its parameters) that exceeds the ARG_MAX limit.
On kernel 2.6.23, the limit was set at 128 kB. This constant has been increased and you can get its value by executing:
getconf ARG_MAX
# 2097152 # on 3.5.0-40-generic
Solution: Using for Loop
Use a for loop as it's recommended on BashFAQ/095 and there is no limit except for RAM/memory space:
Dry run to ascertain it will delete what you expect:
for f in *.pdf; do echo rm "$f"; done
And execute it:
for f in *.pdf; do rm "$f"; done
Also this is a portable approach as glob have strong and consistant behavior among shells (part of POSIX spec).
Note: As noted by several comments, this is indeed slower but more maintainable as it can adapt more complex scenarios, e.g. where one want to do more than just one action.
Solution: Using find
If you insist, you can use find but really don't use xargs as it "is dangerous (broken, exploitable, etc.) when reading non-NUL-delimited input":
find . -maxdepth 1 -name '*.pdf' -delete
Using -maxdepth 1 ... -delete instead of -exec rm {} + allows find to simply execute the required system calls itself without using an external process, hence faster (thanks to #chepner comment).
References
I'm getting "Argument list too long". How can I process a large list in chunks? # wooledge
execve(2) - Linux man page (search for ARG_MAX) ;
Error: Argument list too long # Debian's wiki ;
Why do I get “/bin/sh: Argument list too long” when passing quoted arguments? # SuperUser
find has a -delete action:
find . -maxdepth 1 -name '*.pdf' -delete
Another answer is to force xargs to process the commands in batches. For instance to delete the files 100 at a time, cd into the directory and run this:
echo *.pdf | xargs -n 100 rm
If you’re trying to delete a very large number of files at one time (I deleted a directory with 485,000+ today), you will probably run into this error:
/bin/rm: Argument list too long.
The problem is that when you type something like rm -rf *, the * is replaced with a list of every matching file, like “rm -rf file1 file2 file3 file4” and so on. There is a relatively small buffer of memory allocated to storing this list of arguments and if it is filled up, the shell will not execute the program.
To get around this problem, a lot of people will use the find command to find every file and pass them one-by-one to the “rm” command like this:
find . -type f -exec rm -v {} \;
My problem is that I needed to delete 500,000 files and it was taking way too long.
I stumbled upon a much faster way of deleting files – the “find” command has a “-delete” flag built right in! Here’s what I ended up using:
find . -type f -delete
Using this method, I was deleting files at a rate of about 2000 files/second – much faster!
You can also show the filenames as you’re deleting them:
find . -type f -print -delete
…or even show how many files will be deleted, then time how long it takes to delete them:
root#devel# ls -1 | wc -l && time find . -type f -delete
100000
real 0m3.660s
user 0m0.036s
sys 0m0.552s
Or you can try:
find . -name '*.pdf' -exec rm -f {} \;
you can try this:
for f in *.pdf
do
rm "$f"
done
EDIT:
ThiefMaster comment suggest me not to disclose such dangerous practice to young shell's jedis, so I'll add a more "safer" version (for the sake of preserving things when someone has a "-rf . ..pdf" file)
echo "# Whooooo" > /tmp/dummy.sh
for f in '*.pdf'
do
echo "rm -i \"$f\""
done >> /tmp/dummy.sh
After running the above, just open the /tmp/dummy.sh file in your favorite editor and check every single line for dangerous filenames, commenting them out if found.
Then copy the dummy.sh script in your working dir and run it.
All this for security reasons.
For somone who doesn't have time.
Run the following command on terminal.
ulimit -S -s unlimited
Then perform cp/mv/rm operation.
I'm surprised there are no ulimit answers here. Every time I have this problem I end up here or here. I understand this solution has limitations but ulimit -s 65536 seems to often do the trick for me.
You could use a bash array:
files=(*.pdf)
for((I=0;I<${#files[#]};I+=1000)); do
rm -f "${files[#]:I:1000}"
done
This way it will erase in batches of 1000 files per step.
you can use this commend
find -name "*.pdf" -delete
The rm command has a limitation of files which you can remove simultaneous.
One possibility you can remove them using multiple times the rm command bases on your file patterns, like:
rm -f A*.pdf
rm -f B*.pdf
rm -f C*.pdf
...
rm -f *.pdf
You can also remove them through the find command:
find . -name "*.pdf" -exec rm {} \;
If they are filenames with spaces or special characters, use:
find -name "*.pdf" -delete
For files in current directory only:
find -maxdepth 1 -name '*.pdf' -delete
This sentence search all files in the current directory (-maxdepth 1) with extension pdf (-name '*.pdf'), and then, delete.
i was facing same problem while copying form source directory to destination
source directory had files ~3 lakcs
i used cp with option -r and it's worked for me
cp -r abc/ def/
it will copy all files from abc to def without giving warning of Argument list too long
Try this also If you wanna delete above 30/90 days (+) or else below 30/90(-) days files/folders then you can use the below ex commands
Ex: For 90days excludes above after 90days files/folders deletes, it means 91,92....100 days
find <path> -type f -mtime +90 -exec rm -rf {} \;
Ex: For only latest 30days files that you wanna delete then use the below command (-)
find <path> -type f -mtime -30 -exec rm -rf {} \;
If you wanna giz the files for more than 2 days files
find <path> -type f -mtime +2 -exec gzip {} \;
If you wanna see the files/folders only from past one month .
Ex:
find <path> -type f -mtime -30 -exec ls -lrt {} \;
Above 30days more only then list the files/folders
Ex:
find <path> -type f -mtime +30 -exec ls -lrt {} \;
find /opt/app/logs -type f -mtime +30 -exec ls -lrt {} \;
And another one:
cd /path/to/pdf
printf "%s\0" *.[Pp][Dd][Ff] | xargs -0 rm
printf is a shell builtin, and as far as I know it's always been as such. Now given that printf is not a shell command (but a builtin), it's not subject to "argument list too long ..." fatal error.
So we can safely use it with shell globbing patterns such as *.[Pp][Dd][Ff], then we pipe its output to remove (rm) command, through xargs, which makes sure it fits enough file names in the command line so as not to fail the rm command, which is a shell command.
The \0 in printf serves as a null separator for the file names wich are then processed by xargs command, using it (-0) as a separator, so rm does not fail when there are white spaces or other special characters in the file names.
Argument list too long
As this question title for cp, mv and rm, but answer stand mostly for rm.
Un*x commands
Read carefully command's man page!
For cp and mv, there is a -t switch, for target:
find . -type f -name '*.pdf' -exec cp -ait "/path to target" {} +
and
find . -type f -name '*.pdf' -exec mv -t "/path to target" {} +
Script way
There is an overall workaroung used in bash script:
#!/bin/bash
folder=( "/path to folder" "/path to anther folder" )
if [ "$1" != "--run" ] ;then
exec find "${folder[#]}" -type f -name '*.pdf' -exec $0 --run {} +
exit 0;
fi
shift
for file ;do
printf "Doing something with '%s'.\n" "$file"
done
What about a shorter and more reliable one?
for i in **/*.pdf; do rm "$i"; done
I had the same problem with a folder full of temporary images that was growing day by day and this command helped me to clear the folder
find . -name "*.png" -mtime +50 -exec rm {} \;
The difference with the other commands is the mtime parameter that will take only the files older than X days (in the example 50 days)
Using that multiple times, decreasing on every execution the day range, I was able to remove all the unnecessary files
You can create a temp folder, move all the files and sub-folders you want to keep into the temp folder then delete the old folder and rename the temp folder to the old folder try this example until you are confident to do it live:
mkdir testit
cd testit
mkdir big_folder tmp_folder
touch big_folder/file1.pdf
touch big_folder/file2.pdf
mv big_folder/file1,pdf tmp_folder/
rm -r big_folder
mv tmp_folder big_folder
the rm -r big_folder will remove all files in the big_folder no matter how many. You just have to be super careful you first have all the files/folders you want to keep, in this case it was file1.pdf
To delete all *.pdf in a directory /path/to/dir_with_pdf_files/
mkdir empty_dir # Create temp empty dir
rsync -avh --delete --include '*.pdf' empty_dir/ /path/to/dir_with_pdf_files/
To delete specific files via rsync using wildcard is probably the fastest solution in case you've millions of files. And it will take care of error you're getting.
(Optional Step): DRY RUN. To check what will be deleted without deleting. `
rsync -avhn --delete --include '*.pdf' empty_dir/ /path/to/dir_with_pdf_files/
.
.
.
Click rsync tips and tricks for more rsync hacks
I found that for extremely large lists of files (>1e6), these answers were too slow. Here is a solution using parallel processing in python. I know, I know, this isn't linux... but nothing else here worked.
(This saved me hours)
# delete files
import os as os
import glob
import multiprocessing as mp
directory = r'your/directory'
os.chdir(directory)
files_names = [i for i in glob.glob('*.{}'.format('pdf'))]
# report errors from pool
def callback_error(result):
print('error', result)
# delete file using system command
def delete_files(file_name):
os.system('rm -rf ' + file_name)
pool = mp.Pool(12)
# or use pool = mp.Pool(mp.cpu_count())
if __name__ == '__main__':
for file_name in files_names:
print(file_name)
pool.apply_async(delete_files,[file_name], error_callback=callback_error)
If you want to remove both files and directories, you can use something like:
echo /path/* | xargs rm -rf
I only know a way around this.
The idea is to export that list of pdf files you have into a file. Then split that file into several parts. Then remove pdf files listed in each part.
ls | grep .pdf > list.txt
wc -l list.txt
wc -l is to count how many line the list.txt contains. When you have the idea of how long it is, you can decide to split it in half, forth or something. Using split -l command
For example, split it in 600 lines each.
split -l 600 list.txt
this will create a few file named xaa,xab,xac and so on depends on how you split it.
Now to "import" each list in those file into command rm, use this:
rm $(<xaa)
rm $(<xab)
rm $(<xac)
Sorry for my bad english.
I ran into this problem a few times. Many of the solutions will run the rm command for each individual file that needs to be deleted. This is very inefficient:
find . -name "*.pdf" -print0 | xargs -0 rm -rf
I ended up writing a python script to delete the files based on the first 4 characters in the file-name:
import os
filedir = '/tmp/' #The directory you wish to run rm on
filelist = (os.listdir(filedir)) #gets listing of all files in the specified dir
newlist = [] #Makes a blank list named newlist
for i in filelist:
if str((i)[:4]) not in newlist: #This makes sure that the elements are unique for newlist
newlist.append((i)[:4]) #This takes only the first 4 charcters of the folder/filename and appends it to newlist
for i in newlist:
if 'tmp' in i: #If statment to look for tmp in the filename/dirname
print ('Running command rm -rf '+str(filedir)+str(i)+'* : File Count: '+str(len(os.listdir(filedir)))) #Prints the command to be run and a total file count
os.system('rm -rf '+str(filedir)+str(i)+'*') #Actual shell command
print ('DONE')
This worked very well for me. I was able to clear out over 2 million temp files in a folder in about 15 minutes. I commented the tar out of the little bit of code so anyone with minimal to no python knowledge can manipulate this code.
I have faced a similar problem when there were millions of useless log files created by an application which filled up all inodes. I resorted to "locate", got all the files "located"d into a text file and then removed them one by one. Took a while but did the job!
I solved with for
I am on macOS with zsh
I moved thousands only jpg files. Within mv in one line command.
Be sure there are no spaces or special characters in the name of the files you are trying to move
for i in $(find ~/old -type f -name "*.jpg"); do mv $i ~/new; done
A bit safer version than using xargs, also not recursive:
ls -p | grep -v '/$' | grep '\.pdf$' | while read file; do rm "$file"; done
Filtering our directories here is a bit unnecessary as 'rm' won't delete it anyway, and it can be removed for simplicity, but why run something that will definitely return error?
Using GNU parallel (sudo apt install parallel) is super easy
It runs the commands multithreaded where '{}' is the argument passed
E.g.
ls /tmp/myfiles* | parallel 'rm {}'
For remove first 100 files:
rm -rf 'ls | head -100'

unix bash - save environment variable and loop

Let's say you have a first.sh file in a directory: "/home/userbob/scripts/foo/". Basically I would like to know how to loop through specific directories, each time going back up to a higher level directory and repeating.
The .sh file has something like this pseudocode:
#!/bin/bash
curdi={$PATH} #where the first.sh file sits on the server
FOLDERS="$curdi/waffles/inner/
$curdi/pancakes/inner/
$curdi/bagels/inner/"
for f in $FOLDERS
do
cd $f
cp innerofinner/* .
cd $curdi
done
The idea is to somehow copy all the contents of /home/userbob/scripts/foo/waffles/inner/innerofinner to /home/userbob/scripts/foo/waffles/inner/
(and basically repeating just with the path having pancakes, bagels.etc.)
Can't do it for all directories (*) under /home/userbob/scripts/foo/ because there are some that I don't want to copy.
This should do it:
for name in waffles pancakes bagels
do
cp "$curdi/$name/inner/innferofinner/"* "$curdi/waffles/inner"
done
Walking file trees? Sounds like a job for find!
#!/usr/local/bin/env bash
# only environment variables should be all-caps
dirs=({bagels,pancakes}/inner)
find "${dirs[#]}" -type d -maxdepth 1 -mindepth 1 -name innerofinner -execdir bash -c 'cp "$1"/* .' -- {} \;
I did a partial path and assumed a working directory of /home/userbob/scripts/foo. An absolute path would work, too, and would look like
dirs=(/home/userbob/scripts/foo/{bagels,pancakes}/inner)
This finds all directories exactly one level below the listed directory that are named "innerofinner" and, in their parent directories, executs bash and a simple cp script.
If you're wondering how this works, read below.
The dirs=() syntax creates an empty array named dirs. dirs+(a b) creates an array with a at index 0 and b at index 1. Any whitespace-delimited string will work, here. In a shell script {a,b,c} expands to a b c but A{a,b,c}B expands to AaB AbB AcB. So specifying {bagels,pancakes}/inner is just a way to say both bagels/inner and pancakes/inner without having to type as much.
A variable in bash can be expanded with $foo or with ${foo}; these are the same. An array in shell can be expanded to all of its elements with ${foo[#]} delimited by spaces (if you know perl or php this will make some sense) and quoting the expansion (always a good idea in shell!) prevents spaces innside the variable from being processed again by the shell. Thus, "${dir[#]}" becomes bagels/inner pancakes/inner.
Knowing this we see that the find command has become find bagels/inner pancakes/inner -maxdepth 1 -mindepth 1 -type d -name innerofinner and if you execute this it will return exactly two lines: both full paths to each innerofinner directory. All we want now is to do something for each one, which -execdir does nicely.
Use a recursive function or invoke the script recursively.
I am not sure if I understand your problem statement correctly. Your psuedo code seems good. But, I see a problem with the following line.
curdi={$PWD}
It does not give you the directory where the script resides but gives the directory you are in. If your script directory is in the path and you are running the script from your home directory then $curdi would point to your home directory and not the directory where your script resides. This will lead to undesired results.
Incidentally, if you really wanted to do it in the way that your pseudo-script attempts it, you'd do it like this
#!/usr/bin/env bash
for f in "$PWD"/{waffles,pancakes,bagels}/inner ; do
cd "$f"
cp innerofinner/* .
# if you know for sure that it's one level up
cd ..
done
Presuming that $PWD is a good enough indicator of "current" directory for you. Me, I'd pass it in to the script.
#!/usr/bin/env bash
base="${1-$PWD}"
for f in "$base"/{waffles,pancakes,bagels}/inner ; do
cd "$f"
cp innerofinner/* .
cd ..
done
at call it like
breakfast.sh /home/userbob/scripts/foo/
find . \( -iname '*waffles*innferofinner*' -o \
-iname '*pancakes*innferofinner*' -o \
-iname '*baggels*innferofinner*' \) \
-type f \
-exec cp {} "`echo {} | sed 's:\(.*\)/[^/]\+/[^/]\+:\1:'` \;
Should do. Finds every file in the desired subdirs, then copies it based on its name.
HTH

Resources