How to delete files which have X days lifetime, not last modified. Is it even possible. Linux - linux

I run some kind of server on my linux machine and I use simple bash script to delete files every 3 and some files every 7 days. I use find command for doing that.But my files are saved periodically, meaning that the last modification day is the current day. So files never get deleted. Only worked for me the first time, because it met the conditions. I can't find a way to delete those files using a creation date, not modification date.
Here's my simple script:
#!/bin/sh
while true
do
java -server file.jar nogui
echo ">$(tput setaf 3)STARTING REBOOT$(tput sgr0) $(tput setaf 7)(Ctrl+C To Stop!)$(tput sgr0)"
find /folder/* -mtime +7 -exec rm -rf {} \;
find /folder/* -mtime +3 -exec rm -rf {} \;
find /logs/* -mtime +1 -exec rm -rf {} \;
echo ">Rebooting in:"
for i in 5 4 3 2 1
do
echo ">$i..."
sleep 1
done
done
If someone could help me with this, I would be really thankful!

Just an idea-don't shoot... :-)
If the files are not system files automatically generated by some process but is lets say server log files, you could possibly echo inside the file the creation date (i.e at the end or beginning) and grep that value later to decide if must be removed or kept.

No, it is not possible. Standard Linux filesystems do not track creation time at all. (ctime, sometimes mistaken for creation time, is metadata change time -- as compared to mtime, which is data modification time).
That said, there are certainly ugly hacks available. For instance, if you have the following script invoked by incron (or, less efficiently, cron) to record file creation dates:
#!/bin/bash
mkdir -p .ctimes
for f in *; do
if [[ -f $f ]] && [[ ! -e .ctimes/$f ]]; then
touch ".ctimes/$f"
fi
done
...then you can look for files in the .ctimes directory that are older than three days, and delete both the markers and the files they stand for:
#!/bin/bash
find .ctimes -mtime +3 -type f -print0 | while IFS= read -r -d '' filename; do
realname=${filename#.ctimes/}
rm -f -- "$filename" "$realname"
done

If you are on ext4 Filesystem there is some hope. You can retrieve it using stat and debugfs utilities. ext4 stores creation time with inode table entry i_crtime which is 'File creation time, in seconds since the epoch' per docs. Reference Link.

Related

Delete files created before '7' days regardless of the modified time

I have to delete the log files older than 7 days even if they were modified within a period of 7 days. But the only solution I can find anywhere is based on find command using mtime option as below:
find /path/to/files -mtime +7 -exec rm {} \;
What is the possible solution to this problem.
If you're using a filesystem that tracks file birth time and a current enough Linux kernel, glibc and GNU coreutils, you can do something like
weekago=$(date -d "last week" +%s)
for file in /path/to/files/*; do
birthdate=$(stat -c %W "$file")
if [[ $birthdate -gt 0 && $birthdate -lt $weekago ]]; then
printf "%s\n" "$file"
# Uncomment when you're sure you're getting just the files you want
# rm -- "$file"
fi
done

How to delete older files but keep recent ones during backup?

I have a remote server that copies 30-some backup files to a local server every day and I want to remove the old backups if and only if a newer backup successfully copied.
With different codes I tried, I managed to erase older files, but I got the problem that if it found one new backup, it deleted ALL older ones.
I have something like (picture this with 20 virtual machines):
vm001-2019-08-01.bck
vm001-2019-07-28.bck
vm002-2019-08-01.bck
vm003-2019-07-29.bck
vm004-2019-08-01.bck
vm004-2019-07-31.bck
vm004-2019-07-30.bck
vm004-2019-07-29.bck
...
And I'd want to erase all but keep only the most recent ones.
i.e.: erase:
vm001-2019-07-28.bck
vm002-2019-07-29.bck
vm004-2019-07-31.bck
vm004-2019-07-30.bck
vm004-2019-07-29.bck
and keep only:
vm001-2019-08-01.bck
vm002-2019-08-01.bck
vm003-2019-07-29.bck
vm004-2019-08-01.bck
the problem I had is that if I have any recent backup of any machine, files like vm-003-2019-07-29 get deleted, because they are older, even if they are of different machines.
I know there are several variants of this question in the site, but I can't quite get this to work.
I've been trying variants of this code:
#!/bin/bash
for i in ./*.bck
do
echo "found" "$i"
if [[ -n $(find "$i" -type f -mmin -1440) ]]
then
echo "$i"
find "$i" -type f -mmin +1440 -exec rm -f "$i" {} +
fi
done
(The echos are for debugging purposes only)
At this time, this code finds the newer and the older files, but doesn't delete anything. If I put find "$i" -type f -mmin +1440 -exec echo "$i" {} +, it never prints anything, as if find $i is not finding anything, but when I run it as a solo command in the terminal, it does (minus the -exec part).
I've tested this script generating files with different timestamps using touch -d, but I had no success.
Unless you add the -name test before the filename find is going to consider "$i" to be the name of a directory to search in. So your find command should be:
find -name "$i" -type f -mmin -1440
which will search in the current directory. Or
find /path/to/dir -name "$i" -type f -mmin -1440
which will search in a directory named "/path/to/dir".
But, based on BashFAQ/099, I would do this to delete all but the newest file for each VM (untested):
#!/bin/bash
declare -A newest # associative array to store name of newest file for each VM
for f in *
do
vm=${f%%-*} # extracts vm name from filename (i.e. vmm001 from vm001-2019-08-01.bck)
if [[ -f $f && $f -nt ${newest["$vm"]} ]]
then
newest["$vm"]=$f
fi
done
for f in *
do
vm=${f%%-*}
if [[ -f $f && $f != ${newest["$vm"]} ]]
then
rm "$f"
fi
done
This is set up to run against files in the current directory. It assumes that the files are named as shown in the question (the VM name is separated from the rest of the file name by a hyphen). In order to use an associative array, Bash 4 or higher is required.

linux script to move files in directories with data

I'm using freeradius with daloradius appliance.
Now there are lots of routers connected to this radius server and the database and log files are growing too much.
The log files from freeradius are saved on:
/radacct/xxx.xxx.xxx.xxx/detail-20150404
xxx.xxx.xxx.xxx are differents IP clients, so there are lots of folders and lots of files inside these folders.
I can add this directory to rotate log because the file detail-TODAY can't be modified during the day and can be accessible all the 24h.
So I'm asking for:
A script to move /radacct/xxx.xxx.xxx.xxx/detail-yyyymmdd to a new folder /radacct_old/xxx.xxx.xxx.xxx/detail-yyyymmdd.
We have to move all files except where yyyymmdd is the current date (date when the script is executed).
After this I can rotate log radacct_old or just add to zip radacct_old_yyyymmdd.
I'm planning to do this job every week or so.
What’s the best way do you suggest?
Try something like this:
function move {
today=$(date +%Y%m%d)
file="${1#/radacct/}"
ip="${file%%/*}"
name="${file##*/}"
if [[ ! $name =~ detail-$today ]]; then
dir="/radacct_old/$ip"
[ -d "${dir}" ] || mkdir "${dir}"
mv "${1}" "${dir}/${name}"
fi
}
export -f move
find /radacct -type d -mindepth 2 -maxdepth 2 -name '*detail*' -exec bash -c 'move "$0"' {} \;
Beware this is untested, you will certainly be able to fill the gaps. I will test it out and debug later if you can't seem to make it work. Post when you have further questions.
Explanation: generally the script looks for all directories of the required format and moves them (last two lines) by calling a function (beginning).
Move function
today=$(date +%Y%m%d) constructs the date in the required format.
file="${1#/radacct/}" remove leading directory name from the directory we found using find.
ip="${file%%/*}" extract the ip address.
name="${file##*/}" extract the dir name.
if [[ ! $name =~ detail-$today ]]; then if dir name is from today.
dir="/radacct_old/$ip" construct target directory.
[ -d "${dir}" ] || mkdir "${dir}" create it if it doesn't exist.
mv "${1}" "${dir}/${name}" move the dir to the new location.
export -f move export the function so it can be called in subshell
Find function
find /radacct look in /radacct dir
-type d -mindepth 2 -maxdepth 2 look for dirs in dirs.
-name '*detail*' which contain the word detail.
-exec bash -c 'move "$0"' {} \; and execute the move function, supplying the name of the dir as argument.
Note that I will add more details and test it later today.
To perform this weekly, use a cron job.

What is the best and the fastest way to delete large directory containing thousands of files (in ubuntu)

As I know the commands like
find <dir> -type f -exec rm {} \;
are not the best variant to remove large amount of files (total files, including subfolder). It works good if you have small amount of files, but if you have 10+ mlns files in subfolders, it can hang a server.
Does anyone know any specific linux commands to solve this problem?
It may seem strange but:
$ rm -rf <dir>
Here's an example bash script:
#!/bin/bash
local LOCKFILE=/tmp/rmHugeNumberOfFiles.lock
# this process gets ultra-low priority
ionice -c2 -n7 -p $$ > /dev/null
if [ $? ]; then
echo "Could not set disk IO priority. Exiting..."
exit
fi
renice +19 -p $$ > /dev/null
if [ $? ]; then
echo "Could not renice process. Exiting..."
exit
fi
# check if there's an instance running already. If so--exit
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
echo "An instance of this script is already running."
exit
fi
# make sure the lockfile is removed when we exit. Then: claim the lock
trap "command rm -f -- $LOCKFILE; exit" INT TERM EXIT
echo $$ > $LOCKFILE
# also create a tempfile, and make sure that's removed too upon exit
tmp=$(tempfile) || exit
trap "command rm -f -- '$tmp'" INT TERM EXIT
# ----------------------------------------
# option 1
# ----------------------------------------
# find your specific files
find "$1" -type f [INSERT SPECIFIC SEARCH PATTERN HERE] > "$tmp"
cat $tmp | rm
# ----------------------------------------
# option 2
# ----------------------------------------
command rm -r "$1"
# remove the lockfile, tempfile
command rm -f -- "$tmp" $LOCKFILE
This script starts by setting its own process priority and diskIO priority to very low values, to ensure other running processes are as unaffected as possible.
Then it makes sure that it is the ONLY such process running.
The core of the script is really up to your preference. You can use rm -r if you are sure that the whole dir can be deleted indesciminately (option 2), or you can use find for more specific file deletion (option 1, possibly using command line options "$2" and onw. for convenience).
In the implementation above, Option 1 (find) first outputs everything to a tempfile, so that the rm function is only called once instead of after each file found by find. When the number of files is indeed huge, this can amount to a significant time saving. On the downside, the size of the tempfile may become an issue, but this is only likely if you're deleting literally billions of files, plus, because the diskIO has such low priority, using a tempfile followed by a single rm may in total be slower than using the find (...) -exec rm {} \; option. As always, you should experiment a bit to see what best fits your needs.
EDIT: As suggested by user946850, you can also skip the whole tempfile and use find (...) -print0 | xargs -0 rm. This has a larger memory footprint, since all full paths to all matching files will be inserted in RAM until the find command is completely finished. On the upside: there is no additional file IO due to writes to the tempfile. Which one to choose depends on your use-case.
The -r (recursive) switch removes everything below a directory, too -- including subdirectories. (Your command does not remove the directories, only the files.)
You can also speed up the find approach:
find -type f -print0 | xargs -0 rm
I tried every one of these commands, but problem I had was that the deletion process was locking the disk, and since no other processes could access it, there was a big pileup of processes trying to access the disk making the problem worse. Run "iotop" and see how much disk IO your process is using.
Here's the python script that solved my problem. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their business, then continues.
import os, os.path
import time
for root, dirs, files in os.walk('/dir/to/delete/files'):
i = 0
file_num = 0
for f in files:
fullpath = os.path.join(root, f)
i = i + 1
file_num = file_num + 1
os.remove(fullpath)
if i%500 == 1:
time.sleep(2)
print "Deleted %i files" % file_num
Hope this helps some people.
If you need to deal with space limit issue on a very large file tree (in my case many perforce branches), that sometimes being hanged while running the find and delete process -
Here's a script that I schedule daily to find all directories with specific file ("ChangesLog.txt"),
and then Sort all directories found that are older than 2 days, and Remove the first matched directory (each schedule there could be a new match):
bash -c "echo #echo Creating Cleanup_Branch.cmd on %COMPUTERNAME% - %~dp0 > Cleanup_Branch.cmd"
bash -c "echo -n 'bash -c \"find ' >> Cleanup_Branch.cmd"
rm -f dirToDelete.txt
rem cd. > dirToDelete.txt
bash -c "find .. -maxdepth 9 -regex ".+ChangesLog.txt" -exec echo {} >> dirToDelete.txt \; & pid=$!; sleep 100; kill $pid "
sed -e 's/\(.*\)\/.*/\1/' -e 's/^./"&/;s/.$/&" /' dirToDelete.txt | tr '\n' ' ' >> Cleanup_Branch.cmd
bash -c "echo -n '-maxdepth 0 -type d -mtime +2 | xargs -r ls -trd | head -n1 | xargs -t rm -Rf' >> Cleanup_Branch.cmd"
bash -c 'echo -n \" >> Cleanup_Branch.cmd'
call Cleanup_Branch.cmd
Note the requirements:
Deleting only those directories with "ChangesLog.txt", since other old directories should not be deleted.
Calling the OS commands in cygwin directly, since otherwise it used Windows default commands.
Collecting the directories to delete into external text file, in order to save find results, since sometimes the find process has hanged.
Setting a timeout to the find process by using & background process that being killed after 100 seconds.
Sorting the directories oldest first, for the delete priority.
If you have a reasonably modern version of find (4.2.3 or greater) you can use the -delete flag.
find <dir> -type f -delete
If you have version 4.2.12 or greater you can take advantage of xargs style command line stacking via the \+ -exec modifier. This way you don't run a separate copy of /bin/rm for every file.
find <dir> -type f -exec rm {} \+
The previous commands are good.
rm -rf directory/ also works faster for billion of files in one folder. I tried that.
If you would like delete tons of files as soon as possible, try this:
find . -type f -print0 | xargs -P 0 -0 rm -f
Note the -P option will make xargs use processes as many as possible.
mv large_folder /tmp/.
sudo reboot
Call to mv is fast - it just modifies labels. System reboot will clear the /tmp folder (mount it again?) in the fastest way possible.
You can create a empty directory and RSYNC it to the directory which you need to empty.
You will avoid time out and memory out issue

Argument list too long error for rm, cp, mv commands

I have several hundred PDFs under a directory in UNIX. The names of the PDFs are really long (approx. 60 chars).
When I try to delete all PDFs together using the following command:
rm -f *.pdf
I get the following error:
/bin/rm: cannot execute [Argument list too long]
What is the solution to this error?
Does this error occur for mv and cp commands as well? If yes, how to solve for these commands?
The reason this occurs is because bash actually expands the asterisk to every matching file, producing a very long command line.
Try this:
find . -name "*.pdf" -print0 | xargs -0 rm
Warning: this is a recursive search and will find (and delete) files in subdirectories as well. Tack on -f to the rm command only if you are sure you don't want confirmation.
You can do the following to make the command non-recursive:
find . -maxdepth 1 -name "*.pdf" -print0 | xargs -0 rm
Another option is to use find's -delete flag:
find . -name "*.pdf" -delete
tl;dr
It's a kernel limitation on the size of the command line argument. Use a for loop instead.
Origin of problem
This is a system issue, related to execve and ARG_MAX constant. There is plenty of documentation about that (see man execve, debian's wiki, ARG_MAX details).
Basically, the expansion produce a command (with its parameters) that exceeds the ARG_MAX limit.
On kernel 2.6.23, the limit was set at 128 kB. This constant has been increased and you can get its value by executing:
getconf ARG_MAX
# 2097152 # on 3.5.0-40-generic
Solution: Using for Loop
Use a for loop as it's recommended on BashFAQ/095 and there is no limit except for RAM/memory space:
Dry run to ascertain it will delete what you expect:
for f in *.pdf; do echo rm "$f"; done
And execute it:
for f in *.pdf; do rm "$f"; done
Also this is a portable approach as glob have strong and consistant behavior among shells (part of POSIX spec).
Note: As noted by several comments, this is indeed slower but more maintainable as it can adapt more complex scenarios, e.g. where one want to do more than just one action.
Solution: Using find
If you insist, you can use find but really don't use xargs as it "is dangerous (broken, exploitable, etc.) when reading non-NUL-delimited input":
find . -maxdepth 1 -name '*.pdf' -delete
Using -maxdepth 1 ... -delete instead of -exec rm {} + allows find to simply execute the required system calls itself without using an external process, hence faster (thanks to #chepner comment).
References
I'm getting "Argument list too long". How can I process a large list in chunks? # wooledge
execve(2) - Linux man page (search for ARG_MAX) ;
Error: Argument list too long # Debian's wiki ;
Why do I get “/bin/sh: Argument list too long” when passing quoted arguments? # SuperUser
find has a -delete action:
find . -maxdepth 1 -name '*.pdf' -delete
Another answer is to force xargs to process the commands in batches. For instance to delete the files 100 at a time, cd into the directory and run this:
echo *.pdf | xargs -n 100 rm
If you’re trying to delete a very large number of files at one time (I deleted a directory with 485,000+ today), you will probably run into this error:
/bin/rm: Argument list too long.
The problem is that when you type something like rm -rf *, the * is replaced with a list of every matching file, like “rm -rf file1 file2 file3 file4” and so on. There is a relatively small buffer of memory allocated to storing this list of arguments and if it is filled up, the shell will not execute the program.
To get around this problem, a lot of people will use the find command to find every file and pass them one-by-one to the “rm” command like this:
find . -type f -exec rm -v {} \;
My problem is that I needed to delete 500,000 files and it was taking way too long.
I stumbled upon a much faster way of deleting files – the “find” command has a “-delete” flag built right in! Here’s what I ended up using:
find . -type f -delete
Using this method, I was deleting files at a rate of about 2000 files/second – much faster!
You can also show the filenames as you’re deleting them:
find . -type f -print -delete
…or even show how many files will be deleted, then time how long it takes to delete them:
root#devel# ls -1 | wc -l && time find . -type f -delete
100000
real 0m3.660s
user 0m0.036s
sys 0m0.552s
Or you can try:
find . -name '*.pdf' -exec rm -f {} \;
you can try this:
for f in *.pdf
do
rm "$f"
done
EDIT:
ThiefMaster comment suggest me not to disclose such dangerous practice to young shell's jedis, so I'll add a more "safer" version (for the sake of preserving things when someone has a "-rf . ..pdf" file)
echo "# Whooooo" > /tmp/dummy.sh
for f in '*.pdf'
do
echo "rm -i \"$f\""
done >> /tmp/dummy.sh
After running the above, just open the /tmp/dummy.sh file in your favorite editor and check every single line for dangerous filenames, commenting them out if found.
Then copy the dummy.sh script in your working dir and run it.
All this for security reasons.
For somone who doesn't have time.
Run the following command on terminal.
ulimit -S -s unlimited
Then perform cp/mv/rm operation.
I'm surprised there are no ulimit answers here. Every time I have this problem I end up here or here. I understand this solution has limitations but ulimit -s 65536 seems to often do the trick for me.
You could use a bash array:
files=(*.pdf)
for((I=0;I<${#files[#]};I+=1000)); do
rm -f "${files[#]:I:1000}"
done
This way it will erase in batches of 1000 files per step.
you can use this commend
find -name "*.pdf" -delete
The rm command has a limitation of files which you can remove simultaneous.
One possibility you can remove them using multiple times the rm command bases on your file patterns, like:
rm -f A*.pdf
rm -f B*.pdf
rm -f C*.pdf
...
rm -f *.pdf
You can also remove them through the find command:
find . -name "*.pdf" -exec rm {} \;
If they are filenames with spaces or special characters, use:
find -name "*.pdf" -delete
For files in current directory only:
find -maxdepth 1 -name '*.pdf' -delete
This sentence search all files in the current directory (-maxdepth 1) with extension pdf (-name '*.pdf'), and then, delete.
i was facing same problem while copying form source directory to destination
source directory had files ~3 lakcs
i used cp with option -r and it's worked for me
cp -r abc/ def/
it will copy all files from abc to def without giving warning of Argument list too long
Try this also If you wanna delete above 30/90 days (+) or else below 30/90(-) days files/folders then you can use the below ex commands
Ex: For 90days excludes above after 90days files/folders deletes, it means 91,92....100 days
find <path> -type f -mtime +90 -exec rm -rf {} \;
Ex: For only latest 30days files that you wanna delete then use the below command (-)
find <path> -type f -mtime -30 -exec rm -rf {} \;
If you wanna giz the files for more than 2 days files
find <path> -type f -mtime +2 -exec gzip {} \;
If you wanna see the files/folders only from past one month .
Ex:
find <path> -type f -mtime -30 -exec ls -lrt {} \;
Above 30days more only then list the files/folders
Ex:
find <path> -type f -mtime +30 -exec ls -lrt {} \;
find /opt/app/logs -type f -mtime +30 -exec ls -lrt {} \;
And another one:
cd /path/to/pdf
printf "%s\0" *.[Pp][Dd][Ff] | xargs -0 rm
printf is a shell builtin, and as far as I know it's always been as such. Now given that printf is not a shell command (but a builtin), it's not subject to "argument list too long ..." fatal error.
So we can safely use it with shell globbing patterns such as *.[Pp][Dd][Ff], then we pipe its output to remove (rm) command, through xargs, which makes sure it fits enough file names in the command line so as not to fail the rm command, which is a shell command.
The \0 in printf serves as a null separator for the file names wich are then processed by xargs command, using it (-0) as a separator, so rm does not fail when there are white spaces or other special characters in the file names.
Argument list too long
As this question title for cp, mv and rm, but answer stand mostly for rm.
Un*x commands
Read carefully command's man page!
For cp and mv, there is a -t switch, for target:
find . -type f -name '*.pdf' -exec cp -ait "/path to target" {} +
and
find . -type f -name '*.pdf' -exec mv -t "/path to target" {} +
Script way
There is an overall workaroung used in bash script:
#!/bin/bash
folder=( "/path to folder" "/path to anther folder" )
if [ "$1" != "--run" ] ;then
exec find "${folder[#]}" -type f -name '*.pdf' -exec $0 --run {} +
exit 0;
fi
shift
for file ;do
printf "Doing something with '%s'.\n" "$file"
done
What about a shorter and more reliable one?
for i in **/*.pdf; do rm "$i"; done
I had the same problem with a folder full of temporary images that was growing day by day and this command helped me to clear the folder
find . -name "*.png" -mtime +50 -exec rm {} \;
The difference with the other commands is the mtime parameter that will take only the files older than X days (in the example 50 days)
Using that multiple times, decreasing on every execution the day range, I was able to remove all the unnecessary files
You can create a temp folder, move all the files and sub-folders you want to keep into the temp folder then delete the old folder and rename the temp folder to the old folder try this example until you are confident to do it live:
mkdir testit
cd testit
mkdir big_folder tmp_folder
touch big_folder/file1.pdf
touch big_folder/file2.pdf
mv big_folder/file1,pdf tmp_folder/
rm -r big_folder
mv tmp_folder big_folder
the rm -r big_folder will remove all files in the big_folder no matter how many. You just have to be super careful you first have all the files/folders you want to keep, in this case it was file1.pdf
To delete all *.pdf in a directory /path/to/dir_with_pdf_files/
mkdir empty_dir # Create temp empty dir
rsync -avh --delete --include '*.pdf' empty_dir/ /path/to/dir_with_pdf_files/
To delete specific files via rsync using wildcard is probably the fastest solution in case you've millions of files. And it will take care of error you're getting.
(Optional Step): DRY RUN. To check what will be deleted without deleting. `
rsync -avhn --delete --include '*.pdf' empty_dir/ /path/to/dir_with_pdf_files/
.
.
.
Click rsync tips and tricks for more rsync hacks
I found that for extremely large lists of files (>1e6), these answers were too slow. Here is a solution using parallel processing in python. I know, I know, this isn't linux... but nothing else here worked.
(This saved me hours)
# delete files
import os as os
import glob
import multiprocessing as mp
directory = r'your/directory'
os.chdir(directory)
files_names = [i for i in glob.glob('*.{}'.format('pdf'))]
# report errors from pool
def callback_error(result):
print('error', result)
# delete file using system command
def delete_files(file_name):
os.system('rm -rf ' + file_name)
pool = mp.Pool(12)
# or use pool = mp.Pool(mp.cpu_count())
if __name__ == '__main__':
for file_name in files_names:
print(file_name)
pool.apply_async(delete_files,[file_name], error_callback=callback_error)
If you want to remove both files and directories, you can use something like:
echo /path/* | xargs rm -rf
I only know a way around this.
The idea is to export that list of pdf files you have into a file. Then split that file into several parts. Then remove pdf files listed in each part.
ls | grep .pdf > list.txt
wc -l list.txt
wc -l is to count how many line the list.txt contains. When you have the idea of how long it is, you can decide to split it in half, forth or something. Using split -l command
For example, split it in 600 lines each.
split -l 600 list.txt
this will create a few file named xaa,xab,xac and so on depends on how you split it.
Now to "import" each list in those file into command rm, use this:
rm $(<xaa)
rm $(<xab)
rm $(<xac)
Sorry for my bad english.
I ran into this problem a few times. Many of the solutions will run the rm command for each individual file that needs to be deleted. This is very inefficient:
find . -name "*.pdf" -print0 | xargs -0 rm -rf
I ended up writing a python script to delete the files based on the first 4 characters in the file-name:
import os
filedir = '/tmp/' #The directory you wish to run rm on
filelist = (os.listdir(filedir)) #gets listing of all files in the specified dir
newlist = [] #Makes a blank list named newlist
for i in filelist:
if str((i)[:4]) not in newlist: #This makes sure that the elements are unique for newlist
newlist.append((i)[:4]) #This takes only the first 4 charcters of the folder/filename and appends it to newlist
for i in newlist:
if 'tmp' in i: #If statment to look for tmp in the filename/dirname
print ('Running command rm -rf '+str(filedir)+str(i)+'* : File Count: '+str(len(os.listdir(filedir)))) #Prints the command to be run and a total file count
os.system('rm -rf '+str(filedir)+str(i)+'*') #Actual shell command
print ('DONE')
This worked very well for me. I was able to clear out over 2 million temp files in a folder in about 15 minutes. I commented the tar out of the little bit of code so anyone with minimal to no python knowledge can manipulate this code.
I have faced a similar problem when there were millions of useless log files created by an application which filled up all inodes. I resorted to "locate", got all the files "located"d into a text file and then removed them one by one. Took a while but did the job!
I solved with for
I am on macOS with zsh
I moved thousands only jpg files. Within mv in one line command.
Be sure there are no spaces or special characters in the name of the files you are trying to move
for i in $(find ~/old -type f -name "*.jpg"); do mv $i ~/new; done
A bit safer version than using xargs, also not recursive:
ls -p | grep -v '/$' | grep '\.pdf$' | while read file; do rm "$file"; done
Filtering our directories here is a bit unnecessary as 'rm' won't delete it anyway, and it can be removed for simplicity, but why run something that will definitely return error?
Using GNU parallel (sudo apt install parallel) is super easy
It runs the commands multithreaded where '{}' is the argument passed
E.g.
ls /tmp/myfiles* | parallel 'rm {}'
For remove first 100 files:
rm -rf 'ls | head -100'

Resources