Unix: How to delete files listed in a file - linux

I have a long text file with list of file masks I want to delete
Example:
/tmp/aaa.jpg
/var/www1/*
/var/www/qwerty.php
I need delete them. Tried rm `cat 1.txt` and it says the list is too long.
Found this command, but when I check folders from the list, some of them still have files
xargs rm <1.txt Manual rm call removes files from such folders, so no issue with permissions.

This is not very efficient, but will work if you need glob patterns (as in /var/www/*)
for f in $(cat 1.txt) ; do
rm "$f"
done
If you don't have any patterns and are sure your paths in the file do not contain whitespaces or other weird things, you can use xargs like so:
xargs rm < 1.txt

Assuming that the list of files is in the file 1.txt, then do:
xargs rm -r <1.txt
The -r option causes recursion into any directories named in 1.txt.
If any files are read-only, use the -f option to force the deletion:
xargs rm -rf <1.txt
Be cautious with input to any tool that does programmatic deletions. Make certain that the files named in the input file are really to be deleted. Be especially careful about seemingly simple typos. For example, if you enter a space between a file and its suffix, it will appear to be two separate file names:
file .txt
is actually two separate files: file and .txt.
This may not seem so dangerous, but if the typo is something like this:
myoldfiles *
Then instead of deleting all files that begin with myoldfiles, you'll end up deleting myoldfiles and all non-dot-files and directories in the current directory. Probably not what you wanted.

Use this:
while IFS= read -r file ; do rm -- "$file" ; done < delete.list
If you need glob expansion you can omit quoting $file:
IFS=""
while read -r file ; do rm -- $file ; done < delete.list
But be warned that file names can contain "problematic" content and I would use the unquoted version. Imagine this pattern in the file
*
*/*
*/*/*
This would delete quite a lot from the current directory! I would encourage you to prepare the delete list in a way that glob patterns aren't required anymore, and then use quoting like in my first example.

You could use '\n' for define the new line character as delimiter.
xargs -d '\n' rm < 1.txt
Be careful with the -rf because it can delete what you don't want to if the 1.txt contains paths with spaces. That's why the new line delimiter a bit safer.
On BSD systems, you could use -0 option to use new line characters as delimiter like this:
xargs -0 rm < 1.txt

xargs -I{} sh -c 'rm "{}"' < 1.txt should do what you want. Be careful with this command as one incorrect entry in that file could cause a lot of trouble.
This answer was edited after #tdavies pointed out that the original did not do shell expansion.

You can use this one-liner:
cat 1.txt | xargs echo rm | sh
Which does shell expansion but executes rm the minimum number of times.

Just to provide an another way, you can also simply use the following command
$ cat to_remove
/tmp/file1
/tmp/file2
/tmp/file3
$ rm $( cat to_remove )

In this particular case, due to the dangers cited in other answers, I would
Edit in e.g. Vim and :%s/\s/\\\0/g, escaping all space characters with a backslash.
Then :%s/^/rm -rf /, prepending the command. With -r you don't have to worry to have directories listed after the files contained therein, and with -f it won't complain due to missing files or duplicate entries.
Run all the commands: $ source 1.txt

cat 1.txt | xargs rm -f | bash Run the command will do the following for files only.
cat 1.txt | xargs rm -rf | bash Run the command will do the following recursive behaviour.

Here's another looping example. This one also contains an 'if-statement' as an example of checking to see if the entry is a 'file' (or a 'directory' for example):
for f in $(cat 1.txt); do if [ -f $f ]; then rm $f; fi; done

Here you can use set of folders from deletelist.txt while avoiding some patterns as well
foreach f (cat deletelist.txt)
rm -rf ls | egrep -v "needthisfile|*.cpp|*.h"
end

This will allow file names to have spaces (reproducible example).
# Select files of interest, here, only text files for ex.
find -type f -exec file {} \; > findresult.txt
grep ": ASCII text$" findresult.txt > textfiles.txt
# leave only the path to the file removing suffix and prefix
sed -i -e 's/:.*$//' textfiles.txt
sed -i -e 's/\.\///' textfiles.txt
#write a script that deletes the files in textfiles.txt
IFS_backup=$IFS
IFS=$(echo "\n\b")
for f in $(cat textfiles.txt);
do
rm "$f";
done
IFS=$IFS_backup
# save script as "some.sh" and run: sh some.sh

In case somebody prefers sed and removing without wildcard expansion:
sed -e "s/^\(.*\)$/rm -f -- \'\1\'/" deletelist.txt | /bin/sh
Reminder: use absolute pathnames in the file or make sure you are in the right directory.
And for completeness the same with awk:
awk '{printf "rm -f -- '\''%s'\''\n",$1}' deletelist.txt | /bin/sh
Wildcard expansion will work if the single quotes are remove, but this is dangerous in case the filename contains spaces. This would need to add quotes around the wildcards.

Related

Generating a script to delete a list of files

I have a file containing a list of paths I want to delete.
Adding rm in front of each path (to generate a script that will run these deletions) seems like the obvious approach. How can I do this?
Changing a list of filenames into a shell script by prepending rm to the beginning of each line is dangerous practice: Filenames may not map to themselves when interpreted by a shell, and may even have side effects that include running arbitrary commands. Don't do that.
If you want to delete all files named in a file, just use xargs to directly invoke rm with the filenames passed:
xargs rm -f -- <input-file
Note that this will have xargs attempt to interpret escape characters, quotes, etc. inside the names; if you don't want this, and have GNU xargs:
xargs -d $'\n' rm -f -- <input-file
Similarly, if you had control over your input file's format, you should use a NUL-delimited stream of filenames rather than a newline-delimited list of names. (This is because POSIX filesystems allow newline literals inside filenames). If your input file is null-delimited, then you can use:
xargs -0 rm -f -- <null-delimimted-input-file
If you really want to generate a shell script that will delete a listed set of names, by the way, you can do this in bash, like so:
while IFS= read -r filename; do
printf 'rm -f -- %q\n' "$filename"
done <input-list >output-script
Using printf %q escapes content in such a way that when reread by bash, it will be parsed as its literal contents (thus, putting backslashes before characters like * or $ which might otherwise be interpreted).
That said, because this invokes rm once per file, it will be less efficient than xargs (which passes multiple filenames to each rm invocation).
That said -- there actually is a middle ground: You can have xargs invoke bash, and generated a safely quoted list in the latter, with only a minimal number of invocations:
{
echo "#!/bin/bash"
xargs bash -c 'printf "rm -f -- "; printf "%q " "$#"; printf "\n"'
} <input-file >output-script
you can use sed
sed 's/^/rm /' foo.sh > foo2.sh
^ is the beginning of a line, so a start of each line will be replaced by rm.

Remove directory based on content of text file, Linux

I have a directory full of sub-directories that look like this:
Track_0000111
Track_0004444
Track_0022222
Track_0333333
Track_5555555
I would like to remove certain directories if they are contained within a list in the file "RemoveFromTop6000_reformatted.txt"
The contents of the text file look like this:
Track_0000111
Track_0022222
Track_0333333
I tried to write a small script to handle this, but it does not seem to work:
#!/bin/bash
for file in cat RemoveFromTop6000_reformatted.txt; do
rm -rfv $file
done
Unfortunately this simply removes the text files, rather than the directories. Any tips?
Thanks!
You forgot backquotes around your call to cat. Without them, rm will simply delete the files cat (which probably doesn't exist, but you might not notice because you're using rm -f) and RemoveFromTop6000_reformatted.txt
Try this:
#!/bin/bash
for file in `cat RemoveFromTop6000_reformatted.txt`; do
rm -rv "$file"
done
or, more simply,
rm -rv `cat $file`
(but this will only work if the directory names don't contain whitespace).
No need to for, for something like this you can do a while read ...; do ... done < file just like this:
#!/bin/bash
while read file
rm -rfv "$file"
done < RemoveFromTop6000_reformatted.txt
you can try below command,
Command:
sed 's/^/"/g' sample.txt | sed 's/$/"/g' | xargs rm -rfv
Description:
Command will remove files as well as directories mentioned in "sample.txt".
NOTE:
In your case,make sure that "RemoveFromTop6000_reformatted.txt"
contains only directories name.
Command will also work if the directories name contains whitespace.

Removing 10 Characters of Filename in Linux

I just downloaded about 600 files from my server and need to remove the last 11 characters from the filename (not including the extension). I use Ubuntu and I am searching for a command to achieve this.
Some examples are as follows:
aarondyne_kh2_13thstruggle_or_1250556383.mus should be renamed to aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 should be renamed to aarondyne_kh2_darknessofunknow.mp3
It seems that some duplicates might exist after I do this, but if the command fails to complete and tells me what the duplicates would be, I can always remove those manually.
Try using the rename command. It allows you to rename files based on a regular expression:
The following line should work out for you:
rename 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
The following changes will occur:
aarondyne_kh2_13thstruggle_or_1250556383.mus renamed as aarondyne_kh2_13thstruggle_or.mus
aarondyne_kh2_darknessofunknow_1250556659.mp3 renamed as aarondyne_kh2_darknessofunknow.mp3
You can check the actions rename will do via specifying the -n flag, like this:
rename -n 's/_\d+(\.[a-z0-9A-Z]+)$/$1/' *
For more information on how to use rename simply open the manpage via: man rename
Not the prettiest, but very simple:
echo "$filename" | sed -e 's!\(.*\)...........\(\.[^.]*\)!\1\2!'
You'll still need to write the rest of the script, but it's pretty simple.
find . -type f -exec sh -c 'mv {} `echo -n {} | sed -E -e "s/[^/]{10}(\\.[^\\.]+)?$/\\1/"`' ";"
one way to go:
you get a list of your files, one per line (by ls maybe) then:
ls....|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
this will print the mv a b command
e.g.
kent$ echo "aarondyne_kh2_13thstruggle_or_1250556383.mus"|awk '{o=$0;sub(/_[^_.]*\./,".",$0);print "mv "o" "$0}'
mv aarondyne_kh2_13thstruggle_or_1250556383.mus aarondyne_kh2_13thstruggle_or.mus
to execute, just pipe it to |sh
I assume there is no space in your filename.
This script assumes each file has just one extension. It would, for instance, rename "foo.something.mus" to "foo.mus". To keep all extensions, remove one hash mark (#) from the first line of the loop body. It also assumes that the base of each filename has at least 12 character, so that removing 11 doesn't leave you with an empty name.
for f in *; do
ext=${f##*.}
new_f=${base%???????????.$ext}
if [ -f "$new_f" ]; then
echo "Will not rename $f, $new_f already exists" >&2
else
mv "$f" "$new_f"
fi
done

Remove all files of a certain type except for one type in linux terminal

On my computer running Ubuntu, I have a folder full of hundreds files all named "index.html.n" where n starts at one and continues upwards. Some of those files are actual html files, some are image files (png and jpg), and some of them are zip files.
My goal is to permanently remove every single file except the zip archives. I assume it's some combination of rm and file, but I'm not sure of the exact syntax.
If it fits into your argument list and no filenames contain colon a simple pipe with xargs should do:
file * | grep -vi zip | cut -d: -f1 | tr '\n' '\0' | xargs -0 rm
First find to find matching file, then file to get file types. sed eliminates other file types and also removes everything but the filenames from the output of file. lastly, rm for deleting:
find -name 'index.html.[0-9]*' | \
xargs file | \
sed -n 's/\([^:]*\): Zip archive.*/\1/p' |
xargs rm
I would run:
for f in in index.html.*
do
file "$f" | grep -qi zip
[ $? -ne 0 ] && rm -i "$f"
done
and remove -i option if you feel confident enough
Here's the approach I'd use; it's not entirely automated, but it's less error-prone than some other approaches.
file * > cleanup.sh
or
file index.html.* > cleanup.sh
This generates a list of all files (excluding dot files), or of all index.html.* files, in your current directory and writes the list to cleanup.sh.
Using your favorite text editor (mine happens to be vim), edit cleanup.sh:
Add #!/bin/sh as the first line
Delete all lines containing the string "Zip archive"
On each line, delete everything from the : to the end of the line (in vim, :%s/:.*$//)
Replace the beginning of each line with "rm" followed by a space
Exit your editor, updating the file.
chmod +x cleanup.sh
You should now have a shell script that will delete everything except zip files.
Carefully inspect the script before running it. Look out for typos, and for files whose names contain shell metacharacters. You might need to add quotation marks to the file names.
(Note that if you do this as a one-line shell command, you don't have the opportunity to inspect the list of files you're going to delete before you actually delete them.)
Once you're satisfied that your script is correct, run
./cleanup.sh
from your shell prompt.
for i in index.html.*
do
$type = file $i;
if [[ ! $file =~ "Zip" ]]
then
rm $file
fi
done
Change the rm to a ls for testing purposes.

recursively "normalize" filenames

i mean getting rid of special chars in filenames, etc.
i have made a script, that can recursively rename files [http://pastebin.com/raw.php?i=kXeHbDQw]:
e.g.: before:
THIS i.s my file (1).txt
after running the script:
This-i-s-my-file-1.txt
Ok. here it is:
But: when i wanted to test it "fully", with filenames like this:
¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?#[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£.txt
it fails [http://pastebin.com/raw.php?i=iu8Pwrnr]:
$ sh renamer.sh directorythathasthefiles
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†....and so on
$
so "mv" can't handle special chars.. :\
i worked on it for many hours..
does anyone has a working one? [that can handle chars [filenames] in that 2 lines too?]
mv handles special characters just fine. Your script doesn't.
In no particular order:
You are using find to find all directories, and ls each directory separately.
Why use for DEPTH in... if you can do exactly the same with one command?
find -maxdepth 100 -type d
Which makes the arbitrary depth limit unnecessary
find -type d
Don't ever parse the output of ls, especially if you can let find handle that, too
find -not -type d
Make sure it works in the worst possible case:
find -not -type d -print0 | while read -r -d '' FILENAME; do
This stops read from eating certain escapes and choking on filenames with new-line characters.
You are repeating the entire ls | replace cycle for every single character. Don't - it kills performance. Loop over each directory all files once, and just use multiple sed's, or multiple replacements in one sed command.
sed 's/á/a/g; s/í/i/g; ...'
(I was going to suggest sed 'y/áí/ai/', but unfortunately that doesn't seem to work with Unicode. Perhaps perl -CS -Mutf8 -pe 'y/áí/ai/' would.)
You're still thinking in ASCII: "other special chars - ASCII Codes 33.. ..255". Don't.
These days, most systems use Unicode in UTF-8 encoding, which has a much wider range of "special" characters - so big that listing them out one by one becomes pointless. (It is even multibyte - "e" is one byte, "ė" is three bytes.)
True ASCII has 128 characters. What you currently have in mind are the ISO 8859 character sets (sometimes called "ANSI") - in particular, ISO 8859-1. But they go all the way up to 8859-16, and only the "ASCII" part stays the same.
echo -n $(command) is rather useless.
There are much easier ways to find the directory and basename given a path. For example, you can do
directory=$(dirname "$path")
oldnname=$(basename "$path")
# filter $oldname
mv "$path" "$directory/$newname"
Do not use egrep to check for errors. Check the program's return code. (Like you already do with cd.)
And instead of filtering out other errors, do...
if [[ -e $directory/$newname ]]; then
echo "target already exists, skipping: $oldname -> $newname"
continue
else
mv "$path" "$directory/$newname"
fi
The ton of sed 's/------------/-/g' calls can be changed to a single regexp:
sed -r 's/-{2,}/-/g'
The [ ]s in tr [foo] [bar] are unnecessary. They just cause tr to replace [ to [, and ] to ].
Seriously?
echo "$FOLDERNAME" | sed "s/$/\//g"
How about this instead?
echo "$FOLDERNAME/"
And finally, use detox.
Try something like:
find . -print0 -type f | awk 'BEGIN {RS="\x00"} { printf "%s\x00", $0; gsub("[^[:alnum:]]", "-"); printf "%s\0", $0 }' | xargs -0 -L 2 mv
Use of xargs(1) will ensure that each filename passed exactly as one parameter. awk(1) is used to add new filename right after old one.
One more trick: sed -e 's/-+/-/g' will replace groups of more than one "-" with exactly one.
Assuming the rest of your script is right, your problem is that you are using read but you should use read -r. Notice how the backslash disappeared:
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?#[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?#[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£
Ugh...
Some tips to clean up your script:
** Use sed to do translation on multiple characters at once, that'll clean things up and make it easier to manage:
dev:~$ echo 'áàaieeé!.txt' | sed -e 's/[áàã]/a/g; s/[éè]/e/g'
aaaieee!.txt
** rather than renaming the file for each change, run all your filters then do one move
$ NEWNAME='áàaieeé!.txt'
$ NEWNAME="$(echo "$NEWNAME" | sed -e 's/[áàã]/a/g; s/[éè]/e/g')"
$ NEWNAME="$(echo "$NEWNAME" | sed -e 's/aa*/a/g')"
$ echo $NEWNAME
aieee!.txt
** rather than doing a ls | read ... loop, use:
for OLDNAME in $DIR/*; do
blah
blah
blah
done
** separate out your path traversal and renaming logic into two scripts. One script finds the files which need to be renamed, one script handles the normalization of a single file. Once you learn the 'find' command, you'll realize you can toss the first script :)

Resources