Iterate through files in sub-folders inside a main folder in bash [duplicate] - linux

This question already has answers here:
Interacting with files from multiple directories via bash script
(2 answers)
Closed 3 years ago.
The question is really simple, I know how to do it in Python but I want to do it in Linux shell (bash).
I have a main folder Dataset inside which there are multiple sub-folders Dataset_FinalFolder_0_10 all the way up to Dataset_FinalFolder_1090_1100 each with 10 files.
I want to run a program on each of those files. In Python I would do this with something like:
for folder in /path/to/folders:
for file in folder:
run program
Is there any way so mimic this in Shell / bash?
I have this code which I have used for more direct iterations:
for i in /path/to/folder/*;
do program "$i";
done
Thanks in advance

If you are sure that there are no files mixed in with the folders, and no folders mixed in with the files:
for folder in /path/to/Dataset/*; do
for file in "$folder"/*; do
program "$file"
done
done
Alternatively, it is possible to give more than one *:
for file in /path/to/Dataset/*/*; do
program "$file"
done
If you aren't sure about the folder contents, then find can help. This example selects files in just the first-level subdirectories of the given folder and xargs calls program for each one:
find /path/to/Dataset/ -mindepth 2 -maxdepth 2 -type f |\
xargs -n1 program
The find method may also be useful if .../*/*/*/... could expand to a huge number of paths. On linux, the commandline length limit is shown by:
getconf ARG_MAX
On my machine that is 2^21 (~2 million) characters. So the limit is high, but worth keeping at the back of your mind that there is one.

From the Linux perspective, you have to watch out for properly escaping spaces, new lines, etc which can get kinda funky. There are multiple references for why not to do it - see
http://mywiki.wooledge.org/ParsingLs
And
https://unix.stackexchange.com/questions/128985/why-not-parse-ls-and-what-do-to-instead
That said...
You can always use the find command wiht the -exec option -
find /path/to/top/level -type f -exec /path/to/processing/program {} \;
The \; at the end is required to indicate the end of the exec

You don't need nested loops in either Python or the shell unless you have so many files that you are running up against "argument list too long" errors.
for file in /path/to/folders/*/*; do
program "$file"
done
This is equivalent to the Python code
from glob import glob
from subprocess import run
for file in glob('/path/to/folders/*/*'):
run(['program', file])
Of course, if program is at all competently written, you can probably simply do
program /path/to/folders/*/*
This corresponds to
run(['program'] + glob('/path/to/folders/*/*')
If program accepts a list of file name arguments, but you do need to break up the command line to avoid "argument list too long" errors, try
printf '%s\0' /path/to/folders/*/* |
xargs -r0 program
(The zero-terminator pattern is a GNU find extension, as is the -r option.)

for dir in ./* ./**/* # list directories in the current directory
do
python $dir
done
./* are files in dir and ./**/* are files in subfolders.
Make sure you have only python files in your directory it will run all the files in that directory
Actually I have already answered it here
Iterate shell script over list of subdirectories

Related

BASH loop through multiple files replacing content and filename

I'm trying to find all files with a .pux extension 1 level below my parent directory. Once the files are identified, I want to add a carriage return to each line inside each file (if possible, only if that line doesn't already have the CR). Finally I want to rename the extension to .pun ensuring there is no .pux left behind.
I've been trying different methods with this and my biggest problem is that I cannot develop or debug this code easily as I cannot access the command line directly. I can't access the Linux server that the script will run on. I can only call it from my application on my windows server (trust me, I'm thinking exactly what you are right now).
The Linux server is running BASH 3.2.57(2). I don't believe the Unix2Dos utility is installed as I've tried using it in it's most basic form with no success. I've confirmed my find command can successfully identify the files I need as I have ran this and checked my log file output.
#!/bin/bash
MYSCRIPTS=${0%/*}
PARENTDIR=/home/clnt/parent/
LOGFILE="$MYSCRIPTS"/PUX2PUN.log
find "$PARENTDIR" -mindepth 2 -maxdepth 2 -type f -name "*.pux" > "$LOGFILE"
Logfile output:
/home/clnt/parent/z3y/prz3y.pux
/home/clnt/parent/wsl/prwsl.pux
However when I have tried to build on this code and pipe those results to a while read do, it doesn't appear to do anything.
#!/bin/bash
MYSCRIPTS=${0%/*}
PARENTDIR=/home/clnt/parent/
LOGFILE="$MYSCRIPTS"/PUX2PUN.log
find "$PARENTDIR" -mindepth 2 -maxdepth 2 -type f -name "*.pux" -print0 | while IFS= read -r file; do
sed -i '/\r/! s/$/\r/' "${file}" &&
mv "${file}" "${file/%pux/pun}" >> "$LOGFILE"
done
I'm open to other methods if they are standard in my BASH version and safe. Below my parent first should be anywhere from 1-250 folders max and each of those children folders can have up to 1 pr*.pux file each (* will match the folder name as shown in my example output earlier). So were' not dealing with a ton of files.

Recursively find a directory and rename it in Shell Script

Im putting together a simple Shell script to run on a Linux Machine where I would:
1) Look for specific sub-directories within a main directory. These sub-dirs have a very specific naming convention (see below) and they are always 2 -max depth below the main directory.
2) Rename those sub-dirs to PART of its original name.
For example,
The sub directories are named:
andrew-11111
andrew-11112
andrew-11113
andrew-11114
The path to get to these sub dirs would look something like this:
myphotos/sailing/photos/andrew-1111
myphotos/sailing/photos/andrew-1112
myphotos/biking/photos/andrew-1113
myphotos/hiking/photos/andrew-1114
Id like take out the 'andrew-' from each of these sub dirs:
myphotos/sailing/photos/1111
myphotos/sailing/photos/1112
myphotos/biking/photos/1113
myphotos/hiking/photos/1114
Ive gotten as far as "finding" the sub dirs and listing them. I also understand how to copy and rename in command line. But putting it together at my level of shell scripting knowledge has been taking much more time than I can afford. Just a disclaimer, I am more than willing to learn, and have written a handful of shell scripts, but still new to this. Any help or examples are much appreciated!
Use wildcards to match the files in the nested directories
You can use bash parameter expansion operators to manipulate the filenames.
for file in myphotos/*/photos/*; do
name=${file##*/} # remove everything up to last /
dir=${file%/*} # remove everything from last /
newname=${name##*-} # remove everything up to last -
mv "$file" "$dir/$newname"
done
If you have the perl-based rename command, you can do:
rename 's#[^/]*-##' myphotos/*/photos/*
You can do it with this one-liner:
find -type d -name andrew\* -exec sh -c 'mv {} $(dirname {})/$(basename {} | cut -d"-" -f2)' \;
Explanation:
-type d find only directories
-name andrew\* self-explaining, you have to escape the * though
-exec sh -c '...' execute it in a subshell, so you can do the command substitution ($(...)) without problems
mv {} the {} holds whatever find finds
dirname gives you the path to a directory (try it out with a random path, my english is too bad now to explain better)
basename gives you the last directory of a given path
cut -d"-" -f2 use cut to cut off "andrew-". For this set the delimiter to - and select the field number 2

Copy only executable files (cross platform)

I want to copy executable files from one directory to another.
The source directory includes all sorts of files I don't care about (build artifacts). I want to grab only the executable files, using a bash script that works on both OS X and Linux.
By executable, I mean a file that has the executable permission, and would pass test -x $filename.
I know I can write some python script but then I would be introducing a dependency on python (just to copy files!) which is something I really want to avoid.
Note: I've seem a couple of similar questions but the answers seem to only work on Linux (as the questions specifically asked about Linux). Please do not mark this as duplicate unless the duplicate question is indeed about cross platform copying of executable files only.
Your own answer is conceptually elegant, but slow, because it creates at least one child process for every input file (test), plus an additional one for each matching file (cp).
Here's a more efficient bash alternative that:
builds up an array of matching input files in shell code,
and then copies them using a single invocation of cp.
#!/usr/bin/env bash
exeFiles=()
for f in "$src_dir"/*; do [[ -x $f && -f $f ]] && exeFiles+=( "$f" ); done
cp "${exeFiles[#]}" "$dest_dir/"
exeFiles=() initializes the array in which to store the matching filenames.
for f in "$src_dir"/* loops over all files and directories located directly in $scr_dir; note how * must be unquoted for globbing (filename expansion) to occur.
[[ -x $f && -f $f ]] determines whether the item at hand is executable (-x) and a (regular) file -f; note that double-quoting variable references inside [[ ... ]] is (mostly) optional.
exeFiles+=( "$f" ) appends a new element to the array
"${exeFiles[#]}" refers to the resulting array as a whole and robustly expands to the array elements as individual arguments - see Bash arrays.
After some experimentation, this seems to work on both OS X and Ubuntu
find "$src_dir" -maxdepth 1 -type f -exec test -x {} \; -exec cp {} "$dest_dir/" \;
Note that the -maxdepth 1 is specific to my use case where I don't care about recursively going through all the directories.
-type f is necessary because directories also count as executables
I pass two -exec flags. The exec flag not only executes the command but also counts as a filter so that if the command returns a non-zero exit code, the file is filtered out.
The way to use exec is write out whatever command you want, use {} to supply the current file, and terminate with \;
The first -exec returns a success exit code only if the file is executable.
The second -exec performs the copy, but it's not executed if the first -exec fails.

Retrieving the sub-directory, which had most recently been modified, in a Linux shell script?

How can I retrieve the sub-directory, which had most recently been modified, in a directory?
I am using a shell script on a Linux distribution (Ubuntu).
Sounds like you want the ls options
-t sort by modification time, newest first
And only show directories, use something like this answer suggests Listing only directories using ls in bash: An examination
ls -d */
And if you want each directory listed on one line (if your file/dirnames have no newlines or crazy characters) I'd add -1 So all together, this should list directories in the current directory, with the newest modified times at the top
ls -1td */
And only the single newest directory:
ls -1td */ | head -n 1
Or if you want to compare to a specific time you can use find and it's options like -cmin -cnewer -ctime -mmin -mtime and find can handle crazy names like newlines, spaces, etc with null terminated names options like -print0
How much the subdirectory is modified is irrelevant. Do you know the name of the subdirectory? Get its content like this:
files=$(ls subdir-name)
for file in ${files}; do
echo "I see there is a file named ${file}"
done

sed not working as expected, but only for directory depth greater than 1

I am trying to find all instances of a string in all files on my system up to a specified directory depth. I then want to replace these with another string and I am using 'find' and 'sed' by piping one into the other.
This works where I use the base path as cd /home/../.. or any other directory which isn't "/". It also only works if I select a directory depth of 1 (so /test.txt is changed, but /home/test.txt isn't) If I change nothing else and used say a depth of 2 or 3, neither /test.txt nor /home/text.txt are changed. In the former, no warnings appear, and in the latter, the results below (And no strings are replaced in either of the files).
Worryingly, it did work once out of the blue, but I have no idea how and I can't recreate the results. I should say I know the risks of using these commands with root from base directory, and the specific use of the programs below is intentional so I am not looking for an alternative way, just a clue as to how this isn't working and perhaps a suggestion on how to fix it.
cd /;find . -maxdepth 3 -type f -print0 | xargs -0 sed -i 's/teststring123/itworked/gI'
sed: couldn't open temporary file ./sys/kernel/sedoPGqGB: No such file or directory
sed: couldn't open temporary file ./proc/878/sedtqayiq: No such file or directory
As you see, there are warnings, but nether the less I would expect it to work, the commands appear good, anything I am missing folks?
This should be:
find / -maxdepth 3 -type f -print -exec sed -i -e 's/teststring123/itworked/g' {} \;
Although changing all files below / strikes me as a very bad idea indeed (I hope you're not running as root!).
The "couldn't open temporary file ./[...]" errors are likely to be because sed, running as your user, doesn't have permission to create files in /.
My version runs from your current working directory, I assume your ${HOME}, where you'll be able to create the temporary file, but you're still unlikely to be able to replace those files vital to the continued running of your operating system.

Resources