cronjob to remove files older than 99 days - linux

I have to make a cronjob to remove files older than 99 days in a particular directory but I'm not sure the file names were made by trustworthy Linux users. I must expect special characters, spaces, slash characters, and others.
Here is what I think could work:
find /path/to/files -mtime +99 -exec rm {}\;
But I suspect this will fail if there are special characters or if it finds a file that's read-only, (cron may not be run with superuser privileges). I need it to go on if it meets such files.

When you use -exec rm {} \;, you shouldn't have any problems with spaces, tabs, returns, or special characters because find calls the rm command directly and passes it the name of each file one at a time.
Directories won't' be removed with that command because you aren't passing it the -r parameter, and you probably don't want too. That could end up being a bit dangerous. You might also want to include the -f parameter to do a force in case you don't have write permission. Run the cron script as root, and you should be fine.
The only thing I'd worry about is that you might end up hitting a file that you don't want to remove, but has not been modified in the past 100 days. For example, the password to stop the autodestruct sequence at your work. Chances are that file hasn't been modified in the past 100 days, but once that autodestruct sequence starts, you wouldn't want the one to be blamed because the password was lost.
Okay, more reasonable might be applications that are used but rarely modified. Maybe someone's resume that hasn't been updated because they are holding a current job, etc.
So, be careful with your assumptions. Just because a file hasn't been modified in 100 days doesn't mean it isn't used. A better criteria (although still questionable) is whether the file has been accessed in the last 100 days. Maybe this as a final command:
find /path/to/files -atime +99 -type f -exec rm -f {}\;
One more thing...
Some find commands have a -delete parameter which can be used instead of the -exec rm parameter:
find /path/to/files -atime +99 -delete
That will delete both found directories and files.
One more small recommendation: For the first week, save the files found in a log file instead of removing them, and then examine the log file. This way, you make sure that you're not deleting something important. Once you're happy thet there's nothing in the log file you don't want to touch, you can remove those files. After a week, and you're satisfied that you're not going to delete anything important, you can revert the find command to do the delete for you.

If you run rm with the -f option, your file is going to be deleted regardless of whether you have write permission on the file or not (all that matters is the containing folder). So, either you can erase all the files in the folder, or none. Add also -r if you want to erase subfolders.
And I have to say it: be very careful! You're playing with fire ;) I suggest you debug with something less harmful likfe the file command.
You can test this out by creating a bunch of files like, e.g.:
touch {a,b,c,d,e,f}
And setting permissions as desired on each of them

You should use -execdir instead of -exec. Even better, read the full Security considerations for find chapter in the findutils manual.

Please, always use rm [opts] -- [files], this will save you from errors with files like -rf wiich would otherwise be parsed as options. When you provide file names, then end all options.

Related

How to find the source of a copied file?

I have a file that I copied sometime back, but I forgot the source of it. Is there a way to find the source of the copied file? I don't remember which terminal I have used to try and check with Esc+P
Command used: cp -rf $source/file $destination/file
Thanks in advance!
You could try history | grep your_filename.
A Linux system has many files (and if you think of /proc/, it could change at every moment). And some other process can write or create (or append or truncate) files (e.g. some crontab(1) job...)
Assume you do know some parent directory containing the source file. Suppose it is /home/foo.
Then, you might use find(1) and some hashing command like md5sum(1) to compute and collect the hash of every file.
Use the property that two files A and B with identical contents (a sequence of bytes) have the same md5sum. Of course, the converse is false, but in practice unlikely.
So run first
find /home/foo -type f -exec md5sum '{}' \; > /tmp/foo-md5
then do seekingmd5=$(md5sum A )
then grep $seekingmd5 /tmp/foo-md5 will find lines for files having the same md5 than your original A
Depending on your filesystem and hardware, this could take hours.
You could accelerate slightly things by writing a C program using nftw(3) with md5init etc...

bash: get path from current directory given sub-directory name

Trying to write a script to clean up environment files after a resource is deleted. The problem is all the script is given as input is the name of the resource (this cannot be changed) with zero identifying information beyond that. How can I find the path of the directory the resource is sitting in?
The directory is set up a bit like the following, although much more extensive. All of these are directories, not files. There can be as many as 40+ directories to search, but the desired one is generally not more than 2-3 directories deep.
foo
aaa
aaa_green
aaa_blue
bbb
ccc
ccc_green
bar
ddd
eee
eee_green
eee_blue
fff
fff_green
fff_blue
fff_pink
I might be handed input like aaa_green or just ddd.
As an example, given eee_blue as input, I need to know eee_blue's path from the working directory so I can cd there and delete the directory. IE, I would expect to return bar/eee/eee_blue/ or bar/eee/, either is acceptable.
The "best" option I can see currently is to cd into the lowest level of each directory via multiple greps, get each's contents and look for a match, and when it does (eventually) match save that cd'ing as the path. This frankly sounds awful and inefficient.
The only other alternative method I could think of was a straight recursive grep, but I tested it and at 8 minutes it still hadn't finished running.
This script needs to run on both mac and linux, although in a desperate pinch I could go linux only.
The standard Unix tool for doing this sort of task is the find command. The GNU version of find has more extensive options than the POSIX specification (by quite a margin). The version on macOS Sierra (and Mac OS X) is similar to the GNU version. I found an online manual for OS X 10.9 at Apple find, but there's probably a better location somewhere.
It looks like you might want to run:
find . -name 'eee_blue'
which will print the names of matching files or directories, or perhaps:
find . -name 'eee_blue' -exec rm -fr {} +
which will run the rm -fr command on each name. You can run a custom script you create in place of rm -fr if you prefer; if the logic is complex, it's what I do.
Be extremely cautious before using rm -fr automatically!

Deleting large no of folders in one go

I have a list of 1M folders/directory, which I need to delete on my system. What is the best possible way to do it ?
I am looking for best possible solution, which will not consume lot of time, as I a have some processes which will be waiting for its completion.
PS: I can put all folders name in a file, if required, or do it in batch, if we can not do it in one go.
Use the xargs tool. It will read all folder names from the file and call a command - in this case rmdir.
xargs rmdir < list_of_folders
If you are sure you can delete non-empty folders, use rm -r instead of rmdir.
I think this is about the fastest you can get. rmdir will act as fast as it can (simple OS call), and using xargs ensures that you do not create 1M separate processes.
You may exploit that there are no "nested" directories in your list.
By that, if you have three folder a/ a/b/ and a/c/, and b/ and c/ are the only entries in a/, then you can omit a/b/ and a/c/ and just call rm -r a/.
But it will not be worth checking that by ls, as ls will also cost time which you probably won't save.
The rm command is perfectly capable of handling this. Just give it the list of folders you need to delete (shell expansions can save you some time for that matter), and don't forget the -r switch.
Exemple using some common expansions:
rm -r folder_a src/dir_* app_{logs,src,bin}

how to delete 600 GBs of small files?

This was an interview question, they did not tell any information about the files, ie: extension, hidden files?, location (stored in single directory or a directory tree), so my first reaction to this question was:
rm -fr *
oh no, wait, should be:
rm -fr -- *
Then I realize that the above command would not remove hidden files successfully and quite frankly directories like . and .. might interfere, my second and final thought was a ShellScript that uses find.
find -depth -type f -delete
I'm not sure if this is the right way of doing it, I'm wondering if there is a better way of doing this task.
It's not as obvious as it seems:
http://linuxnote.net/jianingy/en/linux/a-fast-way-to-remove-huge-number-of-files.html

How to directly overwrite with 'unexpand' (spaces-to-tabs conversion)?

I'm trying to use something along the lines of
unexpand -t 4 *.php
but am unsure how to write this command to do what I want.
Weirdly,
unexpand -t 4 file.php > file.php
gives me an empty file. (i.e. overwriting file.php with nothing)
I can specify multiple files okay, but don't know how to then overwrite each file.
I could use my IDE, but there are ~67000 instances of to be replaced over 200 files, and this will take a while.
I expect that the answers to my question(s) will be standard unix fare, but I'm still learning...
You can very seldom use output redirection to replace the input. Replacing works with commands that support it internally (since they then do the basic steps themselves). From the shell level, it's far better to work in two steps, like so:
Do the operation on foo, creating foo.tmp
Move (rename) foo.tmp to foo, overwriting the original
This will be fast. It will require a bit more disk space, but if you do both steps before continuing to the next file, you will only need as much extra space as the largest single file, this should not be a problem.
Sketch script:
for a in *.php
do
unexpand -t 4 $a >$a-notab
mv $a-notab $a
done
You could do better (error-checking, and so on), but that is the basic outline.
Here's the command I used:
for p in $(find . -iname "*.js")
do
unexpand -t 4 $(dirname $p)/"$(basename $p)" > $(dirname $p)/"$(basename $p)-tab"
mv $(dirname $p)/"$(basename $p)-tab" $(dirname $p)/"$(basename $p)"
done
This version changes all files within the directory hierarchy rooted at the current working directory.
In my case, I only wanted to make this change to .js files; you can omit the iname clause from find if you wish, or use different args to cast your net differently.
My version wraps filenames in quotes, but it doesn't use quotes around 'interesting' directory names that appear in the paths of matching files.
To get it all on one line, add a semi after lines 1, 3, & 4.
This is potentially dangerous, so make a backup or use git before running the command. If you're using git, you can verify that only whitespace was changed with git diff -w.

Resources