find files based on extension but display name without extension no basename, sed, awk, grep or ; allowed - linux

I need to write a script that lists all the files with a .gif extension in the current directory and all its sub-directories BUT DO NOT use ANY of:
basename
grep
egrep
fgrep
rgrep
&&
||
;
sed
awk
AND still include hidden files.
I tried find . -type f -name '*.gif' -printf '%f\n' which will succesfully display .gif files, but still shows extension. Here's the catch: if I try to use cut -d . -f 1 to remove file extension, I also remove hidden files (which I don't want to) because their names start with ".".
Then I turned to use tr -d '.gif' but some of the files have a 'g' or a '.' in their name.
I also tried to use some of these answers BUT all of them include either basename, sed, awk or use some ";" in their script.
With so many restrictions I really don't know if it's even possible to achieve that but I'm supposed to.
How would you do it?

files/dirs structure:
$ tree -a
.
├── bar
├── bar.gif
├── base
│   └── foo.gif
├── foo
│   └── aaa.gif
└── .qux.gif
3 directories, 4 files
Code
find -type f -name '*.gif' -exec bash -c 'printf "%s\n" "${#%.gif}"' bash {} +
Output
./bar
./.qux
./foo/aaa
./base/foo
Explanations
Parameter Expansion expands parameters: $foo, $1. You can use it to perform string or array operations: "${file%.mp3}", "${0##*/}", "${files[#]: -4}". They should always be quoted. See: http://mywiki.wooledge.org/BashFAQ/073 and "Parameter Expansion" in man bash. Also see http://wiki.bash-hackers.org/syntax/pe.

Something like:
find . -name '*.gif' -type f -execdir bash -c 'printf "%s\n" "${#%.*}"' bash {} +

Using perl:
perl -MFile::Find::Rule -E '
say s/\.gif$//r for File::Find::Rule
->file()
->name(qr/\.gif\z/)
->in(".")
'
Output:
bar
.qux
foo/aaa
base/foo

Related

Find directories where a text is found in a specific file

How can I find the directories where a text is found in a specific file? E.g. I want to get all the directories in "/var/www/" that contain the text "foo-bundle" in the composer.json file. I have a command that already does it:
find ./ -maxdepth 2 -type f -print | grep -i 'composer.json' | xargs grep -i '"foo-bundle"'
However I want to make an sh script that gets all those directories and do things with them. Any idea?
find
Your current command is almost there, instead off using xargs with grep, lets:
Move the grep to an -exec
Use xargs to pass the result to dirname to show only the parent folder
find ./ -maxdepth 2 -type f -exec grep -l "foo-bundle" {} /dev/null \; | xargs dirname
If you only want to search for composer.json files, we can include the -iname option like so:
find ./ -maxdepth 2 -type f -iname '*composer.json' -exec grep -l "foo-bundle" {} /dev/null \; | xargs dirname
If the | xargs dirname doesn't give enough data, we can extend it so we can loop over the results of find using a while read like so:
find ./ -maxdepth 2 -type f -iname '*composer.json' -exec grep -l "foo-bundle" {} /dev/null \; | while read -r line ; do
parent="$(dirname ${line%%:*})"
echo "$parent"
done
grep
We can use grep to search for all files containing a specific text.
After looping over each line, we can
Remove behind the : to get the filepath
Use dirname to get the parent folder path
Consider this file setup, were /test/b/composer.json contains foo-bundle
➜ /tmp tree
.
├── test
│   ├── a
│   │   └── composer.json
│   └── b
│   └── composer.json
└── test.sh
When running the following test.sh:
#!/bin/bash
grep -rw '/tmp/test' --include '*composer.json' -e 'foo-bundle' | while read -r line ; do
parent="$(dirname ${line%:*})"
echo "$parent"
done
The result is as expected, the path to folder b:
/tmp/test/b
In order to find all files, containing a particular piece of text, you can use:
find ./ -maxdepth 2 -type f -exec grep -l "composer.json" {} /dev/null \;
The result is a list of filenames. Now all you need to do is to get a way to launch the command dirname on all of them. (I tried using a simple pipe, but that would have been too easy :-) )
Thanks to #0stone0 for leading the way. I finally got it with:
#!/bin/sh
find /var/www -maxdepth 2 -type f -print | grep -i 'composer.json' | xargs grep -i 'foo-bundle' | while read -r line ; do
parent="$(dirname ${line%%:*})"
echo "$parent"
done

Linux find files and folders based on name length but output full path

I have the following folder structure:
├── longdirectorywithsillylengththatyouwouldntnormallyhave
│   ├── asdasdads9ads9asd9asd89asdh9asd9asdh9asd
│   └── sinlf
└── shrtdir
├── nowthisisalongfile0000000000000000000000000
└── sfile
I need to find files and folders where their names length is longer is than x characters. I have been able to achieve this with:
find . -exec basename '{}' ';' | egrep '^.{20,}$'
longdirectorywithsillylengththatyouwouldntnormallyhave
asdasdads9ads9asd9asd89asdh9asd9asdh9asd
nowthisisalongfile0000000000000000000000000
However, This only outputs the name of the file or folder in question. How can I output the full path of resulting matches like this:
/home/user/Desktop/longdirectorywithsillylengththatyouwouldntnormallyhave
/home/user/Desktop/longdirectorywithsillylengththatyouwouldntnormallyhave/asdasdads9ads9asd9asd89asdh9asd9asdh9asd
/home/user/Desktop/shrtdir/nowthisisalongfile0000000000000000000000000
If you use basename on your files, you lose the information about what file you are actually handling.
Therefore you have to change your regex to be able to recognize the length of the last path component.
The simplest way I could think of, would be:
find . | egrep '[^/]{20,}$' | xargs readlink -f
This makes use of the fact, that filenames cannot contain slashes.
As the result then contains path relative to you current cwd, readlink to can be used to give you the full path.
I cant test it right now but this should do the job:
find $(pwd) -exec basename '{}' ';' | egrep '^.{20,}$'
find -name "????????????????????*" -printf "$PWD/%P\n"
The -printf option of find is very mighty. %P:
%P File's name with the name of the starting-point under which it was found removed. (%p starts with ./).
So we add $PWD/ in front.
/home/stefan/proj/mini/forum/tmp/Mo/shrtdir/nowthisisalongfile0000000000000000000000000
/home/stefan/proj/mini/forum/tmp/Mo/longdirectorywithsillylengththatyouwouldntnormallyhave
/home/stefan/proj/mini/forum/tmp/Mo/longdirectorywithsillylengththatyouwouldntnormallyhave/asdasdads9ads9asd9asd89asdh9asd9asdh9asd
To prevent us from manually counting question marks, we use:
for i in {1..20}; do echo -n "?" ; done; echo
????????????????????

How to list directories and files in a Bash by script?

I would like to list directory tree, but I have to write script for it and as parameter script should take path to base directory. Listing should start from this base directory.
The output should look like this:
Directory: ./a
File: ./a/A
Directory: ./a/aa
File: ./a/aa/AA
Directory: ./a/ab
File: ./a/ab/AB
So I need to print path from the base directory for every directory and file in this base directory.
UPDATED
Running the script I should type in the terminal this: ".\test.sh /home/usr/Desktop/myDirectory" or ".\test.sh myDirectory" - since I run the test.sh from the Desktop level.
And right now the script should be run from the level of /home/usr/Dekstop/myDirectory"
I have the following command in my test.sh file:
find . | sed -e "s/[^-][^\/]*\// |/g"
But It is the command, not shell code and prints the output like this:
DIR: dir1
DIR: dir2
fileA
DIR: dir3
fileC
fileB
How to print the path from base directory for every dir or file from the base dir? Could someone help me to work it out?
Not clear what you want maybe,
find . -type d -printf 'Directory: %p\n' -o -type f -printf 'File: %p\n'
However to see the subtree of a directory, I find more useful
find "$dirname" -type f
To answer comment it can also be done in pure bash (builtin without external commands), using a recursive function.
rec_find() {
local f
for f in "$1"/*; do
[[ -d $f ]] && echo "Directory: $f" && rec_find "$f"
[[ -f $f ]] && echo "File: $f"
done
}
rec_find "$1"
You can use tree command. Key -L means max depth. Examples:
tree
.
├── 1
│   └── test
├── 2
│   └── test
└── 3
└── test
3 directories, 3 files
Or
tree -L 1
.
├── 1
├── 2
└── 3
3 directories, 0 files
Create your test.sh with the below codes. Here you are reading command line parameter in system variable $1 and provides parameter to find command.
#!/bin/bash #in which shell you want to execute this script
find $1 | sed -e "s/[^-][^\/]*\// |/g"
Now how will it work:-
./test.sh /home/usr/Dekstop/myDirectory #you execute this command
Here command line parameter will be assign into $1. More than one parameter you can use $1 till $9 and after that you have to use shift command. (You will get more detail information online).
So your command will be now:-
#!/bin/bash #in which shell you want to execute this script
find /home/usr/Dekstop/myDirectory | sed -e "s/[^-][^\/]*\// |/g"
Hope this will help you.

Bash script to rename file names with correct date format in all sub folders in Linux

I have a buch of logs with names in "filename.logdate month year" (for example, filename.log25 Aug 2015, note there are space between the date/month/year) and I'd like to change them to "filename.logmonthdateyear" (for example filename.logOct052015, with no space).
These files are in a bunch of sub folders which makes it more challenging.
Parent Folder
--- sub folder1
file1
file2
--- sub folder2
file3
file4
etc.
Can anyone suggest a bash script that can do this?
Thank you!
find and rename should do the trick
strawman example:
to go from
...
├── foo/
│   ├── file name with spaces
│   └── bar/
│   └── another file with spaces
...
you can use
find foo/ -type f -exec rename 's/ //g' '{}' \;
to get
...
├── foo/
│ ├── filenamewithspaces
│ └── bar/
│ └── anotherfilewithspaces
...
in your case:
in your case, it would be something like
find path/to/files/ -type f -exec rename 's/ //g' '{}' \;
but you can use fancier filters in your find command like
find path/to/files/ -type f -name *.log* -exec rename 's/ //g' '{}' \;
to select only .log files in case there are other file names with spaces you don't want to touch
heads up:
as pointed out in the comments there's the potential to overwrite files if their names only differ by space placement (e.g., a bc.log and ab c.log if carelessly renamed would end up with a single abc.log).
for your case, you have two things on your side:
rename will give you a heads up as long as you're not using it's --force option
and will give you a helpful message like ./ab c.log not renamed: ./abc.log already exists
your files are named programatically, and you're stripping the spaces in dates, so, assuming that's all you have in there, you shouldn't have any problems
regardless, it's good to be mindful of this sort of thing
This is a way to do it with just Bash (4+) and 'mv':
# Prevent breakages when nothing matches patterns
shopt -s nullglob
# Enable '**' matches (requires Bash 4)
shopt -s globstar
topdir=$PWD
for folder in **/ ; do
# Work in the directory to avoid problems if its path has spaces
cd -- "$folder"
for file in *' '*' '* ; do
# Use the '-i' option to prevent silent clobbering
mv -i -- "$file" "${file// /}"
done
cd -- "$topdir"
done
If there is just one level of subfolders (as stated in the question), the requirement for Bash 4+ can be dropped: remove the shopts -s globstar, and change the first line of the outer loop to for folder in */ ; do.

Delete all files except the newest 3 in bash script

Question: How do you delete all files in a directory except the newest 3?
Finding the newest 3 files is simple:
ls -t | head -3
But I need to find all files except the newest 3 files. How do I do that, and how do I delete these files in the same line without making an unnecessary for loop for that?
I'm using Debian Wheezy and bash scripts for this.
This will list all files except the newest three:
ls -t | tail -n +4
This will delete those files:
ls -t | tail -n +4 | xargs rm --
This will also list dotfiles:
ls -At | tail -n +4
and delete with dotfiles:
ls -At | tail -n +4 | xargs rm --
But beware: parsing ls can be dangerous when the filenames contain funny characters like newlines or spaces. If you are certain that your filenames do not contain funny characters then parsing ls is quite safe, even more so if it is a one time only script.
If you are developing a script for repeated use then you should most certainly not parse the output of ls and use the methods described here: http://mywiki.wooledge.org/ParsingLs
Solution without problems with "ls" (strange named files)
This is a combination of ceving's and anubhava's answer.
Both solutions are not working for me. Because I was looking for a script that should run every day for backing up files in an archive, I wanted to avoid problems with ls (someone could have saved some funny named file in my backup folder). So I modified the mentioned solutions to fit my needs.
My solution deletes all files, except the three newest files.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm
Some explanation:
find lists all files (not directories) in current folder. They are printed out with timestamps.
sort sorts the lines based on timestamp (oldest on top).
head prints out the top lines, up to the last 3 lines.
cut removes the timestamps.
xargs runs rm for every selected file.
For you to verify my solution:
(
touch -d "6 days ago" test_6_days_old
touch -d "7 days ago" test_7_days_old
touch -d "8 days ago" test_8_days_old
touch -d "9 days ago" test_9_days_old
touch -d "10 days ago" test_10_days_old
)
This creates 5 files with different timestamps in the current folder. Run this script first and then the code for deleting old files.
The following looks a bit complicated, but is very cautious to be correct, even with unusual or intentionally malicious filenames. Unfortunately, it requires GNU tools:
count=0
while IFS= read -r -d ' ' && IFS= read -r -d '' filename; do
(( ++count > 3 )) && printf '%s\0' "$filename"
done < <(find . -maxdepth 1 -type f -printf '%T# %P\0' | sort -g -z) \
| xargs -0 rm -f --
Explaining how this works:
Find emits <mtime> <filename><NUL> for each file in the current directory.
sort -g -z does a general (floating-point, as opposed to integer) numeric sort based on the first column (times) with the lines separated by NULs.
The first read in the while loop strips off the mtime (no longer needed after sort is done).
The second read in the while loop reads the filename (running until the NUL).
The loop increments, and then checks, a counter; if the counter's state indicates that we're past the initial skipping, then we print the filename, delimited by a NUL.
xargs -0 then appends that filename into the argv list it's collecting to invoke rm with.
ls -t | tail -n +4 | xargs -I {} rm {}
If you want a 1 liner
In zsh:
rm /files/to/delete/*(Om[1,-4])
If you want to include dotfiles, replace the parenthesized part with (Om[1,-4]D).
I think this works correctly with arbitrary chars in the filenames (just checked with newline).
Explanation: The parentheses contain Glob Qualifiers. O means "order by, descending", m means mtime (See man zshexpn for other sorting keys - large manpage; search for "be sorted"). [1,-4] returns only the matches at one-based index 1 to (last + 1 - 4) (note the -4 for deleting all but 3).
Don't use ls -t as it is unsafe for filenames that may contain whitespaces or special glob characters.
You can do this using all gnu based utilities to delete all but 3 newest files in the current directory:
find . -maxdepth 1 -type f -printf '%T#\t%p\0' |
sort -z -nrk1 |
tail -z -n +4 |
cut -z -f2- |
xargs -0 rm -f --
ls -t | tail -n +4 | xargs -I {} rm {}
Michael Ballent's answer works best as
ls -t | tail -n +4 | xargs rm --
throw me error if I have less than 3 file
Recursive script with arbitrary num of files to keep per-directory
Also handles files/dirs with spaces, newlines and other odd characters
#!/bin/bash
if (( $# != 2 )); then
echo "Usage: $0 </path/to/top-level/dir> <num files to keep per dir>"
exit
fi
while IFS= read -r -d $'\0' dir; do
# Find the nth oldest file
nthOldest=$(find "$dir" -maxdepth 1 -type f -printf '%T#\0%p\n' | sort -t '\0' -rg \
| awk -F '\0' -v num="$2" 'NR==num+1{print $2}')
if [[ -f "$nthOldest" ]]; then
find "$dir" -maxdepth 1 -type f ! -newer "$nthOldest" -exec rm {} +
fi
done < <(find "$1" -type d -print0)
Proof of concept
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   ├── sub1_1_days_old.txt
│   ├── sub1_2_days_old.txt
│   ├── sub1_3_days_old.txt
│   └── sub1\ 4\ days\ old\ with\ spaces.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   ├── sub2_1_days_old.txt
│   ├── sub2_2_days_old.txt
│   └── sub2\ 3\ days\ old\ with\ spaces.txt
└── tld_0_days_old.txt
2 directories, 10 files
$ ./keepNewest.sh test/ 2
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   └── sub1_1_days_old.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   └── sub2_1_days_old.txt
└── tld_0_days_old.txt
2 directories, 5 files
As an extension to the answer by flohall. If you want to remove all folders except the newest three folders use the following:
find . -maxdepth 1 -mindepth 1 -type d -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm -rf
The -mindepth 1 will ignore the parent folder and -maxdepth 1 subfolders.
This uses find instead of ls with a Schwartzian transform.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
tail -3 |
cut -d $'\t' -f 2-
find searches the files and decorates them with a time stamp and uses the tabulator to separate the two values. sort splits the input by the tabulator and performs a general numeric sort, which sorts floating point numbers correctly. tail should be obvious and cut undecorates.
The problem with decorations in general is to find a suitable delimiter, which is not part of the input, the file names. This answer uses the NULL character.
Below worked for me:
rm -rf $(ll -t | tail -n +5 | awk '{ print $9}')

Resources