Shell script - Find files modified today, create directory, and move them there

Shell script - Find files modified today, create directory, and move them there - linux

I was wondering if there is a simple and concise way of writing a shell script that would go through a series of directories, (i.e., one for each student in a class), determine if within that directory there are any files that were modified within the last day, and only in that case the script would create a subdirectory and copy the files there. So if the directory had no files modified in the last 24h, it would remain untouched. My initial thought was this:
#!/bin/sh
cd /path/people/ #this directory has multiple subdirectories
for i in `ls`
do
if find ./$i -mtime -1 -type f then
mkdir ./$i/updated_files
#code to copy the files to the newly created directory
fi
done
However, that seems to create /updated_files for all subdirectories, not just the ones that have recently modified files.

Heavier use of find will probably make your job much easier. Something like
find /path/people -mtime -1 -type f -printf "mkdir --parents %h/updated_files\n" | sort | uniq | sh

The problem is that you are assuming the find command will fail if it finds nothing. The exit code is zero (success) even if it finds nothing that matches.
Something like
UPDATEDFILES=`find ./$i -mtime -1 -type f`
[ -z "$UPDATEDFILES" ] && continue
mkdir ...
cp ...
...

Related

How to find/list the directories where a particular sub-directory is not present

I am writing a shell script where it is checking if the bin directory is present under all the users directory under /home directory. The bin directory can be present directly under user directory or under the child directory of the user directory.
I mean let say I have a user as amit under /home. So the bin directory can be present directly as /amit/bin or can be present as /amit/jash/bin
Now my requirement is that I should have a list of users directories where the bin directory is not present either directly under user directory or under the child directory of the user directory. I tried the command as :
find /home -type d ! -exec test -e '{}/bin' \; -print
but it is not working. However when I am replacing the bin directory with some file, the command is working fine. Looks like this command is particularly for files. Is there any similar command for directories?? Any help on this will be greatly appreciated.

You're on the right track. The catch is that your test of "does the following directory NOT exist in this target" can't be expressed within find's conditions in such a way as to return only the top-level directory. So you need to nest, one way or another.
One strategy would be to use a for loop in bash:
$ mkdir foo bar baz one two
$ mkdir bar/bin baz/bin
$ for d in /home/*/; do find "$d" -type d -name bin | grep -q . || echo "$d"; done
foo/
one/
two/
This uses pathname expansion (globbing) to generate the list of directories to test, and then checks for the existence of "bin". If that check fails (i.e. find outputs nothing), the directory is printed. Note the trailing slash on /home/*/, which ensures that you will only be searching within directories, rather than files that might accidentally exist in /home/.
Another possibility might be to use nested finds, if you don't want to depend on bash:
$ find /home/ -type d -depth 1 -not -exec sh -c "find {}/ -type d -name bin -print | grep -q . " \; -print
/home/foo
/home/one
/home/two
This roughly duplicates the effect of the bash for loop above, but by nesting find within find -exec. It uses grep -q . to convert the output of find into an exit status that can be used as a condition for the outer find.
Note that since you're looking for a bin directory, we want to use test -d rather than test -e (which would also check for a bin file, which probably does not matter to you.)
Another option is to use bash process redirection. On multiple lines for easier reading:
cd /home/
comm -3 \
<(printf '%s\n' */ | sed 's|/.*||' | sort) \
<(find */ -type d -name bin | cut -d/ -f1 | uniq)
This unfortunately requires you to change to the /home directory before running, because of the way it strips off subdirectories. You can of course collapse this into a big long one-liner if you feel so inclined.
This comm solution also has the risk of failing on directories with special characters in their names, like newlines.
One last option is bash-only but more than a one-liner. It involves subtracting the directories containing "bin" from the full list. It uses an associative array and globstar, so it depends on bash version 4.
#!/usr/bin/env bash
shopt -s globstar
# Go to our root
cd /home
# Declare an associative array
declare -A dirs=()
# Populate the array with our "full" list of home directories
for d in */; do dirs[${d%/}]=""; done
# Remove directories that contain a "bin" somewhere inside 'em
for d in **/bin; do unset dirs[${d%%/*}]; done
# Print the result in reproducible form
declare -p dirs
# Or print the result just as a list of words.
printf '%s\n' "${!dirs[#]}"
Note that we're storing directories in the array index, which (1) makes it easy for us to find and delete items, and (2) insures unique entries, even if one user has multiple "bin" directories under their home.

cd /home
find . -maxdepth 1 -type d ! -name . | sort > a
find . -type d -name bin | cut -d/ -f1,2 | sort > b
comm -23 a b
Here, I'm making two sorted lists. The first contains all the home directories, and the second contains the top parent of any bin subdirectory. Finally I output any items from the first list not present in the second.

Find the name of subdirectories and process files in each

Let's say /tmp has subdirectories /test1, /test2, /test3 and so on,
and each has multiple files inside.
I have to run a while loop or for loop to find the name of the directories (in this case /test1, /test2, ...)
and run a command that processes all the files inside of each directory.
So, for example,
I have to get the directory names under /tmp which will be test1, test2, ...
For each subdirectory, I have to process the files inside of it.
How can I do this?
Clarification:
This is the command that I want to run:
find /PROD/140725_D0/ -name "*.json" -exec /tmp/test.py {} \;
where 140725_D0 is an example of one subdirectory to process - there are multiples, with different names.
So, by using a for or while loop, I want to find all subdirectories and run a command on the files in each.
The for or while loop should iteratively replace the hard-coded name 140725_D0 in the find command above.

You should be able to do with a single find command with an embedded shell command:
find /PROD -type d -execdir sh -c 'for f in *.json; do /tmp/test.py "$f"; done' \;
Note: -execdir is not POSIX-compliant, but the BSD (OSX) and GNU (Linux) versions of find support it; see below for a POSIX alternative.
The approach is to let find match directories, and then, in each matched directory, execute a shell with a file-processing loop (sh -c '<shellCmd>').
If not all subdirectories are guaranteed to have *.json files, change the shell command to for f in *.json; do [ -f "$f" ] && /tmp/test.py "$f"; done
Update: Two more considerations; tip of the hat to kenorb's answer:
By default, find processes the entire subtree of the input directory. To limit matching to immediate subdirectories, use -maxdepth 1[1]:
find /PROD -maxdepth 1 -type d ...
As stated, -execdir - which runs the command passed to it in the directory currently being processed - is not POSIX compliant; you can work around this by using -exec instead and by including a cd command with the directory path at hand ({}) in the shell command:
find /PROD -type d -exec sh -c 'cd "{}" && for f in *.json; do /tmp/test.py "$f"; done' \;
[1] Strictly speaking, you can place the -maxdepth option anywhere after the input file paths on the find command line - as an option, it is not positional. However, GNU find will issue a warning unless you place it before tests (such as -type) and actions (such as -exec).

Try the following usage of find:
find . -type d -exec sh -c 'cd "{}" && echo Do some stuff for {}, files are: $(ls *.*)' ';'
Use -maxdepth if you'd like to limit your directory levels.

You can do this using bash's subshell feature like so
for i in /tmp/test*; do
# don't do anything if there's no /test directory in /tmp
[ "$i" != "/tmp/test*" ] || continue
for j in $i/*.json; do
# don't do anything if there's nothing to run
[ "$j" != "$i/*.json" ] || continue
(cd $i && ./file_to_run)
done
done
When you wrap a command in ( and ) it starts a subshell to run the command. A subshell is exactly like starting another instance of bash except it's slightly more optimal.

You can also simply ask the shell to expand the directories/files you need, e.g. using command xargs:
echo /PROD/*/*.json | xargs -n 1 /tmp/test.py
or even using your original find command:
find /PROD/* -name "*.json" -exec /tmp/test.py {} \;
Both command will process all JSON files contained into any subdirectory of /PROD.

Another solution is to change slightly the Python code inside your script in order to accept and process multiple files.
For example, if your script contains something like:
def process(fname):
print 'Processing file', fname
if __name__ == '__main__':
import sys
process(sys.argv[1])
you could replace the last line with:
for fname in sys.argv[1:]:
process(fname)
After this simple modification, you can call your script this way:
/tmp/test.py /PROD/*/*.json
and have it process all the desired JSON files.

Linux recursive copy files to its parent folder

I want to copy recursively files to its parent folder for a specific file extension. For example:
./folderA/folder1/*.txt to ./folderA/*.txt
./folderB/folder2/*.txt to ./folderB/*.txt
etc.
I checked cp and find commands but couldn't get it working.

I suspect that while you say copy, you actually mean to move the files up to their respective parent directories. It can be done easily using find:
$ find . -name '*.txt' -type f -execdir mv -n '{}' ../ \;
The above command recurses into the current directory . and then applies the following cascade of conditionals to each item found:
-name '*.txt' will filter out only files that have the .txt extension
-type f will filter out only regular files (eg, not directories that – for whatever reason – happen to have a name ending in .txt)
-execdir mv -n '{}' ../ \; executes the command mv -n '{}' ../ in the containing directory where the {} is a placeholder for the matched file's name and the single quotes are needed to stop the shell from interpreting the curly braces. The ; terminates the command and again has to be escaped from the shell interpreting it.
I have passed the -n flag to the mv program to avoid accidentally overwriting an existing file.
The above command will transform the following file system tree
dir1/
dir11/
file3.txt
file4.txt
dir12/
file2.txt
dir2/
dir21/
file6.dat
dir22/
dir221/
dir221/file8.txt
file7.txt
file5.txt
dir3/
file9.dat
file1.txt
into this one:
dir1/
dir11/
dir12/
file3.txt
file4.txt
dir2/
dir21/
file6.dat
dir22/
dir221/
file8.txt
file7.txt
dir3/
file9.dat
file2.txt
file5.txt
To get rid of the empty directories, run
$ find . -type d -empty -delete
Again, this command will traverse the current directory . and then apply the following:
-type d this time filters out only directories
-empty filters out only those that are empty
-delete deletes them.
Fine print: -execdir is not specified by POSIX, though major implementations (at least the GNU and BSD one) support it. If you need strict POSIX compliance, you'll have to make do with the less safe -exec which would need additional thought to be applied correctly in this case.
Finally, please try your commands in a test directory with dummy files, not your actual data. Especially with the -delete option of find, you can loose all your data quicker than you might imaging. Read the man page and, if that is not enough, the reference manual of find. Never blindly copy shell commands from random strangers posted on the internet if you don't understand them.

$cp ./folderA/folder1/*.txt ./folderA
Try this commnad

Run something like this from the root(ish) directory:
#! /bin/bash
BASE_DIR=./
new_dir() {
LOC_DIR=`pwd`
for i in "${LOC_DIR}"/*; do
[[ -f "${i}" ]] && cp "${i}" ../
[[ -d "${i}" ]] && cd "${i}" && new_dir
cd ..
done
return 0
}
new_dir
This will search each directory. When a file is encountered, it copies the file up a directory. When a directory is found, it will move down into the directory and start the process over again. I think it'll work for you.
Good luck.

bash on Linux, delete files with certain file extension

I want to delete all files with a specific extension - ".fal", in a folder and its subfolders, except the one named "*Original.fal". The problem is that I want to delete other files that have the same extension:
*Original.fal.ds
*Original.fal.ds.erg
*Original.fal.ds.erg.neu
There are other ".fal"s that I want to delete as well, that don't have "Original" in them.
Names vary all the time, so I can't delete specific names. The *Original.fal doesn't vary.
I can only get up to here:
$find /disk_2/people/z183464/DOE-Wellen -name "*.fal" \! -name "*Original.fal" -type f -exec echo rm {} \;
It would be great if the command can delete only in the folder (and it's subfolders) where it has been called (executed)
When I run the code it gives me an error:
/disk_2/people/z183464/DOE-Wellen: is a directory

If you do not want find to dive too deep, you can restrict it with -maxdepth.

You can use a simple for loop for that. This command shows all the files you might want to delete. Change echo with rm to delete them.
cd /disk_2/people/z183464/DOE-Wellen && for I in `find . -name "*.fal" ! -name "*Original.fal"`; do echo $I; done
With "find ... | grep ..." you can use regex too, if you need more flexibility.

unix bash - save environment variable and loop

Let's say you have a first.sh file in a directory: "/home/userbob/scripts/foo/". Basically I would like to know how to loop through specific directories, each time going back up to a higher level directory and repeating.
The .sh file has something like this pseudocode:
#!/bin/bash
curdi={$PATH} #where the first.sh file sits on the server
FOLDERS="$curdi/waffles/inner/
$curdi/pancakes/inner/
$curdi/bagels/inner/"
for f in $FOLDERS
do
cd $f
cp innerofinner/* .
cd $curdi
done
The idea is to somehow copy all the contents of /home/userbob/scripts/foo/waffles/inner/innerofinner to /home/userbob/scripts/foo/waffles/inner/
(and basically repeating just with the path having pancakes, bagels.etc.)
Can't do it for all directories (*) under /home/userbob/scripts/foo/ because there are some that I don't want to copy.

This should do it:
for name in waffles pancakes bagels
do
cp "$curdi/$name/inner/innferofinner/"* "$curdi/waffles/inner"
done

Walking file trees? Sounds like a job for find!
#!/usr/local/bin/env bash
# only environment variables should be all-caps
dirs=({bagels,pancakes}/inner)
find "${dirs[#]}" -type d -maxdepth 1 -mindepth 1 -name innerofinner -execdir bash -c 'cp "$1"/* .' -- {} \;
I did a partial path and assumed a working directory of /home/userbob/scripts/foo. An absolute path would work, too, and would look like
dirs=(/home/userbob/scripts/foo/{bagels,pancakes}/inner)
This finds all directories exactly one level below the listed directory that are named "innerofinner" and, in their parent directories, executs bash and a simple cp script.
If you're wondering how this works, read below.
The dirs=() syntax creates an empty array named dirs. dirs+(a b) creates an array with a at index 0 and b at index 1. Any whitespace-delimited string will work, here. In a shell script {a,b,c} expands to a b c but A{a,b,c}B expands to AaB AbB AcB. So specifying {bagels,pancakes}/inner is just a way to say both bagels/inner and pancakes/inner without having to type as much.
A variable in bash can be expanded with $foo or with ${foo}; these are the same. An array in shell can be expanded to all of its elements with ${foo[#]} delimited by spaces (if you know perl or php this will make some sense) and quoting the expansion (always a good idea in shell!) prevents spaces innside the variable from being processed again by the shell. Thus, "${dir[#]}" becomes bagels/inner pancakes/inner.
Knowing this we see that the find command has become find bagels/inner pancakes/inner -maxdepth 1 -mindepth 1 -type d -name innerofinner and if you execute this it will return exactly two lines: both full paths to each innerofinner directory. All we want now is to do something for each one, which -execdir does nicely.

Use a recursive function or invoke the script recursively.

I am not sure if I understand your problem statement correctly. Your psuedo code seems good. But, I see a problem with the following line.
curdi={$PWD}
It does not give you the directory where the script resides but gives the directory you are in. If your script directory is in the path and you are running the script from your home directory then $curdi would point to your home directory and not the directory where your script resides. This will lead to undesired results.

Incidentally, if you really wanted to do it in the way that your pseudo-script attempts it, you'd do it like this
#!/usr/bin/env bash
for f in "$PWD"/{waffles,pancakes,bagels}/inner ; do
cd "$f"
cp innerofinner/* .
# if you know for sure that it's one level up
cd ..
done
Presuming that $PWD is a good enough indicator of "current" directory for you. Me, I'd pass it in to the script.
#!/usr/bin/env bash
base="${1-$PWD}"
for f in "$base"/{waffles,pancakes,bagels}/inner ; do
cd "$f"
cp innerofinner/* .
cd ..
done
at call it like
breakfast.sh /home/userbob/scripts/foo/

find . \( -iname '*waffles*innferofinner*' -o \
-iname '*pancakes*innferofinner*' -o \
-iname '*baggels*innferofinner*' \) \
-type f \
-exec cp {} "`echo {} | sed 's:\(.*\)/[^/]\+/[^/]\+:\1:'` \;
Should do. Finds every file in the desired subdirs, then copies it based on its name.
HTH

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string