Get files in bash script that contain an underscore

Get files in bash script that contain an underscore - linux

In a directory with a bunch of files that look like this:
./test1_November 08, 2014 AM.flv
./test2.flv
./script1.sh
./script2.sh
I want to process only files that have an .flv extension and no underscore. I'm trying to eliminate the files with underscores without much luck.
bash --version
GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)
script:
#!/bin/bash
FILES=$(find . -mtime 0)
for f in "${FILES}"
do
if [[ "$f" != *_* ]]; then
echo "$f"
fi
done
This gives me no files. Changing the != to == gives me all files instead of just those with an underscore. Other answers on SO indicate this should work.
Am I missing something simple here?

You can use this extended glob pattern:
shopt -s extglob
echo +([^_]).flv
+([^_]) will match 1 or more of any non underscore character.
Testing:
ls -l *flv
-rw-r--r-- 1 user1 staff 0 Nov 8 12:44 test1_November 08, 2014 AM.flv
-rw-r--r-- 1 user1 staff 0 Nov 8 12:44 test2.flv
echo +([^_]).flv
test2.flv
To process these files in a loop use:
for f in +([^_]).flv; do
echo "Processing: $f"
done
PS: Not sure you're using -mtime 0 in your find as my answer is for the requiremnt:
I want to process only files that have an .flv extension and no underscore

You can pass multiple patterns to find and include a not
find . -name "*.flv" -not -name "*_*"
You can loop over the results of find by piping it into a while
find -name "*.flv" -not -name "*_*" -print0 | while IFS= read -r -d '' filename; do
echo "$filename"
done
or you can forgo the loop completely and use xargs
find -name "*.flv" -not -name "*_*" -print0 | xargs -0 -n1 echo

The problem is this line:
for f in "${FILES}"
The quotes are preventing word splitting, so entire list of filenames is being processed as a single item. What you want is:
IFS=$'\n'
for f in $FILES
The IFS setting makes it use newlines as the word delimiters, so you can have filenames with spaces in them.
A better way to write loops like this is to avoid using the variable:
find ... | while read -r f
do
...
done

Related

How to find a list of files that are of specific extension but do not contain certain characters in their file name?

I have a folder with files that have extensions, such as .txt, .sh and .out.
However, I want a list of files that have only .txt extension, with the file names not containing certain characters.
For example, the .txt files are named L-003_45.txt and so on all up to L-003_70.txt. Some files have a change in the L-003 part to lets say L-004, creating duplicates of lets say file 45, so basically both L-003_45.txt and L-004_45.txt exist. So I want to get a list of text files that don't have 45 in their name.
How would I do that?
I tried with find and ls and succeeded but I would like to know how to do a for loop instead.
I tried:
for FILE in *.txt; do ls -I '*45.txt'; done but it failed.
Would be grateful for the help!

Or you use Bash's extendedglobing
#!/usr/bin/env bash
# Enables extended globing
shopt -s extglob
# Prevents iterating patterns if no match found
shopt -s nullglob
# Iterates files not having 45 or 57 before .txt
for file in !(*#(45|57)).txt; do
printf '%s\n' "$file"
done

I would advise you to use the find command to find all files with the required extensions, and later filter out the ones with the "strange" characters, e.g. for finding the file extensions:
find ./ -name "*.txt" -o -name "*.sh" -o name "*.out"
... and now, for not showing the ones with "45" in the name, you can do:
find ./ -name "*.txt" -o -name "*.sh" -o name "*.out" | grep -v "45"
... and if you don't want "45" nor "56", you can do:
find ./ -name "*.txt" -o -name "*.sh" -o name "*.out" | grep -v "45" | grep -v "56"
Explanation:
-o stands for OR
grep -v stands for "--invert-match" (not showing those results)

Setup:
$ touch L-004_23.txt L-003_45.txt L-004_45.txt L-003_70.txt
$ ls -1 L*txt
L-003_45.txt
L-003_70.txt
L-004_23.txt
L-004_45.txt
One idea using ! to negate a criteria:
$ find . -name "*.txt" ! -name "*_45.txt"
./L-003_70.txt
./L-004_23.txt
Feeding the find results to a while loop, eg:
while read -r file
do
echo "file: ${file}"
done < <(find . -name "*.txt" ! -name "*_45.txt")
This generates:
file: ./L-003_70.txt
file: ./L-004_23.txt

The proposed solution with extglob is a very good one. In case you need to exclude more than one pattern you can also test and continue. Example to exclude all *45.txt and *57.txt:
declare -a excludes=("45" "57")
for f in *.txt; do
for e in "${excludes[#]}"; do
[[ "$f" == *"$e.txt" ]] && continue 2
done
printf '%s\n' "$f"
done

LINUX Copy the name of the newest folder and paste it in a command [duplicate]

I would like to find the newest sub directory in a directory and save the result to variable in bash.
Something like this:
ls -t /backups | head -1 > $BACKUPDIR
Can anyone help?

BACKUPDIR=$(ls -td /backups/*/ | head -1)
$(...) evaluates the statement in a subshell and returns the output.

There is a simple solution to this using only ls:
BACKUPDIR=$(ls -td /backups/*/ | head -1)
-t orders by time (latest first)
-d only lists items from this folder
*/ only lists directories
head -1 returns the first item
I didn't know about */ until I found Listing only directories using ls in bash: An examination.

This ia a pure Bash solution:
topdir=/backups
BACKUPDIR=
# Handle subdirectories beginning with '.', and empty $topdir
shopt -s dotglob nullglob
for file in "$topdir"/* ; do
[[ -L $file || ! -d $file ]] && continue
[[ -z $BACKUPDIR || $file -nt $BACKUPDIR ]] && BACKUPDIR=$file
done
printf 'BACKUPDIR=%q\n' "$BACKUPDIR"
It skips symlinks, including symlinks to directories, which may or may not be the right thing to do. It skips other non-directories. It handles directories whose names contain any characters, including newlines and leading dots.

Well, I think this solution is the most efficient:
path="/my/dir/structure/*"
backupdir=$(find $path -type d -prune | tail -n 1)
Explanation why this is a little better:
We do not need sub-shells (aside from the one for getting the result into the bash variable).
We do not need a useless -exec ls -d at the end of the find command, it already prints the directory listing.
We can easily alter this, e.g. to exclude certain patterns. For example, if you want the second newest directory, because backup files are first written to a tmp dir in the same path:
backupdir=$(find $path -type -d -prune -not -name "*temp_dir" | tail -n 1)

The above solution doesn't take into account things like files being written and removed from the directory resulting in the upper directory being returned instead of the newest subdirectory.
The other issue is that this solution assumes that the directory only contains other directories and not files being written.
Let's say I create a file called "test.txt" and then run this command again:
echo "test" > test.txt
ls -t /backups | head -1
test.txt
The result is test.txt showing up instead of the last modified directory.
The proposed solution "works" but only in the best case scenario.
Assuming you have a maximum of 1 directory depth, a better solution is to use:
find /backups/* -type d -prune -exec ls -d {} \; |tail -1
Just swap the "/backups/" portion for your actual path.
If you want to avoid showing an absolute path in a bash script, you could always use something like this:
LOCALPATH=/backups
DIRECTORY=$(cd $LOCALPATH; find * -type d -prune -exec ls -d {} \; |tail -1)

With GNU find you can get list of directories with modification timestamps, sort that list and output the newest:
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\0" | sort -z -n | cut -z -f2- | tail -z -n1
or newline separated
find . -mindepth 1 -maxdepth 1 -type d -printf "%T#\t%p\n" | sort -n | cut -f2- | tail -n1
With POSIX find (that does not have -printf) you may, if you have it, run stat to get file modification timestamp:
find . -mindepth 1 -maxdepth 1 -type d -exec stat -c '%Y %n' {} \; | sort -n | cut -d' ' -f2- | tail -n1
Without stat a pure shell solution may be used by replacing [[ bash extension with [ as in this answer.

Your "something like this" was almost a hit:
BACKUPDIR=$(ls -t ./backups | head -1)
Combining what you wrote with what I have learned solved my problem too. Thank you for rising this question.
Note: I run the line above from GitBash within Windows environment in file called ./something.bash.

Bash Script - iterating over output of find

I have a bash script in which I need to iterate over each line of the ouput of the find command, but it appears that I am iterating over each Word (space delimited) from the find command. My script looks like this so far:
folders=`find -maxdepth 1 -type d`
for $i in $folders
do
echo $i
done
I would expect this to give output like:
./dir1 and foo
./dir2 and bar
./dir3 and baz
But I am insted getting output like this:
./dir1
and
foo
./dir2
and
bar
./dir3
and
baz
What am I doing wrong here?

folders=`foo`
is always wrong, because it assumes that your directories won't contain spaces, newlines (yes, they're valid!), glob characters, etc. One robust approach (which requires the GNU extension -print0) follows:
while IFS='' read -r -d '' filename; do
: # something with "$filename"
done < <(find . -maxdepth 1 -type d -print0)
Another safe and robust approach is to have find itself directly invoke your desired command:
find . -maxdepth 1 -type d -exec printf '%s\n' '{}' +
See the UsingFind wiki page for a complete treatment of the subject.

Since you aren't using any of the more advanced features of find, you can use a simple pattern to iterate over the subdirectories:
for i in ./*/; do
echo "$i"
done

You can do something like this:
find -maxdepth 1 -type d | while read -r i
do
echo "$i"
done

How to perform a for-each loop over all the files under a specified path?

The following command attempts to enumerate all *.txt files in the current directory and process them one by one:
for line in "find . -iname '*.txt'"; do
echo $line
ls -l $line;
done
Why do I get the following error?:
ls: invalid option -- 'e'
Try `ls --help' for more information.

Here is a better way to loop over files as it handles spaces and newlines in file names:
#!/bin/bash
find . -type f -iname "*.txt" -print0 | while IFS= read -r -d $'\0' line; do
echo "$line"
ls -l "$line"
done

The for-loop will iterate over each (space separated) entry on the provided string.
You do not actually execute the find command, but provide it is as string (which gets iterated by the for-loop).
Instead of the double quotes use either backticks or $():
for line in $(find . -iname '*.txt'); do
echo "$line"
ls -l "$line"
done
Furthermore, if your file paths/names contains spaces this method fails (since the for-loop iterates over space separated entries). Instead it is better to use the method described in dogbanes answer.
To clarify your error:
As said, for line in "find . -iname '*.txt'"; iterates over all space separated entries, which are:
find
.
-iname
'*.txt' (I think...)
The first two do not result in an error (besides the undesired behavior), but the third is problematic as it executes:
ls -l -iname
A lot of (bash) commands can combine single character options, so -iname is the same as -i -n -a -m -e. And voila: your invalid option -- 'e' error!

More compact version working with spaces and newlines in the file name:
find . -iname '*.txt' -exec sh -c 'echo "{}" ; ls -l "{}"' \;

Use command substitution instead of quotes to execute find instead of passing the command as a string:
for line in $(find . -iname '*.txt'); do
echo $line
ls -l $line;
done

How to loop over directories in Linux?

I am writing a script in bash on Linux and need to go through all subdirectory names in a given directory. How can I loop through these directories (and skip regular files)?
For example:
the given directory is /tmp/
it has the following subdirectories: /tmp/A, /tmp/B, /tmp/C
I want to retrieve A, B, C.

All answers so far use find, so here's one with just the shell. No need for external tools in your case:
for dir in /tmp/*/ # list directories in the form "/tmp/dirname/"
do
dir=${dir%*/} # remove the trailing "/"
echo "${dir##*/}" # print everything after the final "/"
done

cd /tmp
find . -maxdepth 1 -mindepth 1 -type d -printf '%f\n'
A short explanation:
find finds files (quite obviously)
. is the current directory, which after the cd is /tmp (IMHO this is more flexible than having /tmp directly in the find command. You have only one place, the cd, to change, if you want more actions to take place in this folder)
-maxdepth 1 and -mindepth 1 make sure that find only looks in the current directory and doesn't include . itself in the result
-type d looks only for directories
-printf '%f\n prints only the found folder's name (plus a newline) for each hit.
Et voilà!

You can loop through all directories including hidden directrories (beginning with a dot) with:
for file in */ .*/ ; do echo "$file is a directory"; done
note: using the list */ .*/ works in zsh only if there exist at least one hidden directory in the folder. In bash it will show also . and ..
Another possibility for bash to include hidden directories would be to use:
shopt -s dotglob;
for file in */ ; do echo "$file is a directory"; done
If you want to exclude symlinks:
for file in */ ; do
if [[ -d "$file" && ! -L "$file" ]]; then
echo "$file is a directory";
fi;
done
To output only the trailing directory name (A,B,C as questioned) in each solution use this within the loops:
file="${file%/}" # strip trailing slash
file="${file##*/}" # strip path and leading slash
echo "$file is the directoryname without slashes"
Example (this also works with directories which contains spaces):
mkdir /tmp/A /tmp/B /tmp/C "/tmp/ dir with spaces"
for file in /tmp/*/ ; do file="${file%/}"; echo "${file##*/}"; done

Works with directories which contains spaces
Inspired by Sorpigal
while IFS= read -d $'\0' -r file ; do
echo $file; ls $file ;
done < <(find /path/to/dir/ -mindepth 1 -maxdepth 1 -type d -print0)
Original post (Does not work with spaces)
Inspired by Boldewyn: Example of loop with find command.
for D in $(find /path/to/dir/ -mindepth 1 -maxdepth 1 -type d) ; do
echo $D ;
done

find . -mindepth 1 -maxdepth 1 -type d -printf "%P\n"

The technique I use most often is find | xargs. For example, if you want to make every file in this directory and all of its subdirectories world-readable, you can do:
find . -type f -print0 | xargs -0 chmod go+r
find . -type d -print0 | xargs -0 chmod go+rx
The -print0 option terminates with a NULL character instead of a space. The -0 option splits its input the same way. So this is the combination to use on files with spaces.
You can picture this chain of commands as taking every line output by find and sticking it on the end of a chmod command.
If the command you want to run as its argument in the middle instead of on the end, you have to be a bit creative. For instance, I needed to change into every subdirectory and run the command latemk -c. So I used (from Wikipedia):
find . -type d -depth 1 -print0 | \
xargs -0 sh -c 'for dir; do pushd "$dir" && latexmk -c && popd; done' fnord
This has the effect of for dir $(subdirs); do stuff; done, but is safe for directories with spaces in their names. Also, the separate calls to stuff are made in the same shell, which is why in my command we have to return back to the current directory with popd.

a minimal bash loop you can build off of (based off ghostdog74 answer)
for dir in directory/*
do
echo ${dir}
done
to zip a whole bunch of files by directory
for dir in directory/*
do
zip -r ${dir##*/} ${dir}
done

If you want to execute multiple commands in a for loop, you can save the result of find with mapfile (bash >= 4) as a variable and go through the array with ${dirlist[#]}. It also works with directories containing spaces.
The find command is based on the answer by Boldewyn. Further information about the find command can be found there.
IFS=""
mapfile -t dirlist < <( find . -maxdepth 1 -mindepth 1 -type d -printf '%f\n' )
for dir in ${dirlist[#]}; do
echo ">${dir}<"
# more commands can go here ...
done

TL;DR:
(cd /tmp; for d in */; do echo "${d%/}"; done)
Explanation.
There's no need to use external programs. What you need is a shell globbing pattern. To avoid the need of removing /tmp afterward, I'm running it in a subshell, which may or not be suitable for your purposes.
Shell globbing patterns in a nutshell:
* Match any non-empty string any number of times.
? Match exactly one character.
[...] Matches with a character from between the brackets. You can also specify ranges ([a-z], [A-F0-9], etc.) or classes ([:digit:], [:alpha:], etc.).
[^...] Match one of the characters not between the braces.
* If no file names match the pattern, the shell will return the pattern unchanged. Any character or string that is not one of the above represents itself.
Consequently, the pattern */ will match any file name that ends with a /. A trailing / in a file name unambiguously identifies a directory.
The last bit is removing the trailing slash, which is achieved with the variable substitution ${var%PATTERN}, which removes the shortest matching pattern from the end of the string contained in var, and where PATTERN is any valid globbing pattern. So we write ${d%/}, meaning we want to remove the trailing slash from the string represented by d.

find . -type d -maxdepth 1

In short, put the results of find into an array and iterate the array and do what you want. Not the quickest but more organized thinking.
#!/bin/bash
cd /tmp
declare -a results=(`find -type d`)
#Iterate the results
for path in ${results[#]}
do
echo "Your path is $path"
#Do something with the path..
if [[ $path =~ "/A" ]]; then
echo $path | awk -F / '{print $NF}'
#prints A
elif [[ $path =~ "/B" ]]; then
echo $path | awk -F / '{print $NF}'
#Prints B
elif [[ $path =~ "/C" ]]; then
echo $path | awk -F / '{print $NF}'
#Prints C
fi
done
This can be reduced to find -type d | grep "/A" | awk -F / '{print $NF}' prints A
find -type d | grep "/B" | awk -F / '{print $NF}' prints B
find -type d | grep "/C" | awk -F / '{print $NF}' prints C

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Get files in bash script that contain an underscore - linux

Related

How to find a list of files that are of specific extension but do not contain certain characters in their file name?

LINUX Copy the name of the newest folder and paste it in a command [duplicate]

Bash Script - iterating over output of find

How to perform a for-each loop over all the files under a specified path?

How to loop over directories in Linux?

Categories

Resources