Bash - finding files with spaces and rename with sed [duplicate] - linux

This question already has answers here:
Recursively rename files using find and sed
(20 answers)
Closed 9 years ago.
I have been trying to write a script to rename all files that contain a space and replace the space with a dash.
Example: "Hey Bob.txt" to "Hey-Bob.txt"
When I used a for-loop, it just split up the file name at the space, so "Hey Bob.txt" gave separate argument like "Hey" and "Bob.txt".
I tried the following script but it keeps hanging on me.
#!/bin/bash
find / -name '* *' -exec mv {} $(echo {} | sed 's/ /-g')\;

Building off OP's idea:
find ${PATH_TO_FILES} -name '* *' -exec bash -c 'eval $(echo mv -v \"{}\" $(echo {} | sed "s/ /-/g"))' \;
NOTE: need to specify the PATH_TO_FILES variable
EDIT: BroSlow pointed out need to consider directory structure:
find ${PATH_TO_FILES} -name '* *' -exec bash -c 'DIR=$(dirname "{}" | sed "s/ /-/g" ); BASE=$(basename "{}"); echo mv -v \"$DIR/$BASE\" \"$DIR/$(echo $BASE | sed "s/ /-/g")\"' \; > rename-script.sh ; sh rename-script.sh

Another way:
find . -name "* *" -type f |while read file
do
new=${file// /}
mv "${file}" $new
done

Not one line, but avoids sed and should work just as well if you're going to be using it for a script anyway. (replace the mv with an echo if you want to test)
In bash 4+
#!/bin/bash
shopt -s globstar
for file in **/*; do
filename="${file##*/}"
if [[ -f $file && $filename == *" "* ]]; then
onespace=$(echo $filename)
dir="${file%/*}"
[[ ! -f "$dir/${onespace// /-}" ]] && mv "$file" "$dir/${onespace// /-}" || echo "$dir/${onespace// /-} already exists, so not moving $file" 1>&2
fi
done
Older bash
#!/bin/bash
find . -type f -print0 | while read -r -d '' file; do
filename="${file##*/}"
if [[ -f $file && $filename == *" "* ]]; then
onespace=$(echo $filename)
dir="${file%/*}"
[[ ! -f "$dir/${onespace// /-}" ]] && mv "$file" "$dir/${onespace// /-}" || echo "$dir/${onespace// /-} already exists, so not moving $file" 1>&2
fi
done
Explanation of algorithm
**/* This recursively lists all files in the current directory (** technically does it but /* is added at the end so it doesn't list the directory itself)
${file##*/} Will search for the longest pattern of */ in file and remove it from the string. e.g. /foo/bar/test.txt gets printed as test.txt
$(echo $filename) Without quoting echo will truncate spaces to one, making them easier to replace with one - for any number of spaces
${file%/*} Remove everything after and including the last /, e.g. /foo/bar/test.txt prints /foo/bar
mv "$file" ${onespace// /-} replace every space in our filename with - (we check if the hyphened version exists before hand and if it does echo that it failed to stderr, note && is processed before || in bash)
find . -type f -print0 | while read -r -d '' file This is used to avoid break up strings with spaces in them by setting a delimiter and not processing \
Sample Output
$ tree
.
├── bar
│   ├── some dir
│   │   ├── some-name-without-space1.pdf
│   │   ├── some name with space1.pdf
│   ├── some-name-without-space1.pdf
│   ├── some name with space1.pdf
│   └── some-name-with-space1.pdf
└── space.sh
$ ./space.sh
bar/some-name-with-space1.pdf already exists, so not moving bar/some name with space1.pdf
$ tree
.
├── bar
│   ├── some dir
│   │   ├── some-name-without-space1.pdf
│   │   ├── some-name-with-space1.pdf
│   ├── some-name-without-space1.pdf
│   ├── some name with space1.pdf
│   └── some-name-with-space1.pdf
└── space.sh

Related

How to get occurrences of word in all files? But with count of the words per directory instead of single number

I would like to get given word count in all the files but per directory instead of a single count. I do get the word count with simple grep foo error*.log | wc -l by going to a specific directory. I would like to get the word count per directory when the directory structure is like below.
Directory tree
.
├── dir1
│   └── error2.log
└── error1.log
└── dir2
└── error_123.log
└── error_234.log
── dir3
└── error_12345.log
└── error_23554.log
Update: The following command can be used on AIX:
#!/bin/bash
for name in /path/to/folder/* ; do
if [ ! -d "${name}" ] ; then
continue
fi
# See: https://unix.stackexchange.com/a/398414/45365
count="$(cat "${name}"/error*.log | tr '[:space:]' '[\n*]' | grep -c 'SEARCH')"
printf "%s %s\n" "${name}" "${count}"
done
On GNU/Linux, with GNU findutils and GNU grep:
find /path/to/folder -maxdepth 1 -type d \
-printf "%p " -exec bash -c 'grep -ro 'SEARCH' {} | wc -l' \;
Replace SEARCH by the actual search term.

How to list directories and files in a Bash by script?

I would like to list directory tree, but I have to write script for it and as parameter script should take path to base directory. Listing should start from this base directory.
The output should look like this:
Directory: ./a
File: ./a/A
Directory: ./a/aa
File: ./a/aa/AA
Directory: ./a/ab
File: ./a/ab/AB
So I need to print path from the base directory for every directory and file in this base directory.
UPDATED
Running the script I should type in the terminal this: ".\test.sh /home/usr/Desktop/myDirectory" or ".\test.sh myDirectory" - since I run the test.sh from the Desktop level.
And right now the script should be run from the level of /home/usr/Dekstop/myDirectory"
I have the following command in my test.sh file:
find . | sed -e "s/[^-][^\/]*\// |/g"
But It is the command, not shell code and prints the output like this:
DIR: dir1
DIR: dir2
fileA
DIR: dir3
fileC
fileB
How to print the path from base directory for every dir or file from the base dir? Could someone help me to work it out?
Not clear what you want maybe,
find . -type d -printf 'Directory: %p\n' -o -type f -printf 'File: %p\n'
However to see the subtree of a directory, I find more useful
find "$dirname" -type f
To answer comment it can also be done in pure bash (builtin without external commands), using a recursive function.
rec_find() {
local f
for f in "$1"/*; do
[[ -d $f ]] && echo "Directory: $f" && rec_find "$f"
[[ -f $f ]] && echo "File: $f"
done
}
rec_find "$1"
You can use tree command. Key -L means max depth. Examples:
tree
.
├── 1
│   └── test
├── 2
│   └── test
└── 3
└── test
3 directories, 3 files
Or
tree -L 1
.
├── 1
├── 2
└── 3
3 directories, 0 files
Create your test.sh with the below codes. Here you are reading command line parameter in system variable $1 and provides parameter to find command.
#!/bin/bash #in which shell you want to execute this script
find $1 | sed -e "s/[^-][^\/]*\// |/g"
Now how will it work:-
./test.sh /home/usr/Dekstop/myDirectory #you execute this command
Here command line parameter will be assign into $1. More than one parameter you can use $1 till $9 and after that you have to use shift command. (You will get more detail information online).
So your command will be now:-
#!/bin/bash #in which shell you want to execute this script
find /home/usr/Dekstop/myDirectory | sed -e "s/[^-][^\/]*\// |/g"
Hope this will help you.

Bash find fails to return all matching files when called from a script

Running the same command from the command line and from a bash script produces different results on Ubuntu 16.04.
I have a folder with the following contents:
├── audio
│   └── delete_me.mp3
├── words
│   └── audio
│   └── delete_me.mp3
│   └── images
│   └── delete_me.jpg
└── keep_me.txt
I have a bash script named findKeepers.sh:
#!/usr/bin/env bash
findKeepers () {
local dir=$1
echo "$(find $dir -type f ! -name delete_me*)"
}
findKeepers /path/to/directory
I expect it to output the path to the keep_me.txt file. Instead, I get a blank line.
 
If I run what seems to me to be identical commands from the command line, I get what I expect:
dir=/path/to/directory; echo "$(find $dir -type f ! -name delete_me*)"
/path/to/directory/keep_me.txt
If search instead for all files not called keep_me, the bash script ignores the audio folder. Here's another bash script called findUnwanted.sh:
#!/usr/bin/env bash
findUnwanted () {
local dir=$1
echo "$(find $dir -type f ! -name keep_me*)"
}
findUnwanted /path/to/directory
Here's the result:
$ ./findUnwanted.sh
/path/to/directory/words/audio/delete_me.mp3
/path/to/directory/words/images/delete_me.jpg
If I run the same thing from the command line, I get all three delete_me files:
$ dir=/path/to/directory; echo "$(find $dir -type f ! -name keep_me*)"
/path/to/directory/words/audio/delete_me.mp3
/path/to/directory/words/images/delete_me.jpg
/path/to/directory/audio/delete_me.mp3
It seems to me that the bash script starts by going deep into the words folder, and then does not come out again to search adjacent folders or files. Is there something special about the #!/usr/bin/env bash environment that makes it do this? Or is there some other difference that I'm not seeing?
CODA: I'm guessing it was pilot error, because after more modifications it started working for me again. For anyone who is interested, the final version of my function is shown below.
#!/usr/bin/env bash
# Returns 1 if the given directory contains only placeholder files, or
# 0 if the directory contains something worth keeping
checkForDeletion () {
local _dir=$1
local _temp=$(find "$_dir" -type f ! -regex '.*\(unused.txt\|delete_me.*\)')
if [ -z "$_temp" ]
then
return 1
fi
}
I use it like this:
parent=/path/to/parent/
for dir in $parent*/
do
checkForDeletion $dir
if [ $? = 1 ]
then
echo "DELETE? $dir" # rm -rf $dir
fi
done
I am guessing that your '!' is breaking the whole pipe. Try using '-not' instead, so your first code snippet would look like this:
echo "$(find $dir -type f -not -name delete_me*)"
I am not that good at explaining where you should escape special characters and where not, but the fact that things work differently when using that outside function suggests that escaping may be the issue.

Delete all files except the newest 3 in bash script

Question: How do you delete all files in a directory except the newest 3?
Finding the newest 3 files is simple:
ls -t | head -3
But I need to find all files except the newest 3 files. How do I do that, and how do I delete these files in the same line without making an unnecessary for loop for that?
I'm using Debian Wheezy and bash scripts for this.
This will list all files except the newest three:
ls -t | tail -n +4
This will delete those files:
ls -t | tail -n +4 | xargs rm --
This will also list dotfiles:
ls -At | tail -n +4
and delete with dotfiles:
ls -At | tail -n +4 | xargs rm --
But beware: parsing ls can be dangerous when the filenames contain funny characters like newlines or spaces. If you are certain that your filenames do not contain funny characters then parsing ls is quite safe, even more so if it is a one time only script.
If you are developing a script for repeated use then you should most certainly not parse the output of ls and use the methods described here: http://mywiki.wooledge.org/ParsingLs
Solution without problems with "ls" (strange named files)
This is a combination of ceving's and anubhava's answer.
Both solutions are not working for me. Because I was looking for a script that should run every day for backing up files in an archive, I wanted to avoid problems with ls (someone could have saved some funny named file in my backup folder). So I modified the mentioned solutions to fit my needs.
My solution deletes all files, except the three newest files.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm
Some explanation:
find lists all files (not directories) in current folder. They are printed out with timestamps.
sort sorts the lines based on timestamp (oldest on top).
head prints out the top lines, up to the last 3 lines.
cut removes the timestamps.
xargs runs rm for every selected file.
For you to verify my solution:
(
touch -d "6 days ago" test_6_days_old
touch -d "7 days ago" test_7_days_old
touch -d "8 days ago" test_8_days_old
touch -d "9 days ago" test_9_days_old
touch -d "10 days ago" test_10_days_old
)
This creates 5 files with different timestamps in the current folder. Run this script first and then the code for deleting old files.
The following looks a bit complicated, but is very cautious to be correct, even with unusual or intentionally malicious filenames. Unfortunately, it requires GNU tools:
count=0
while IFS= read -r -d ' ' && IFS= read -r -d '' filename; do
(( ++count > 3 )) && printf '%s\0' "$filename"
done < <(find . -maxdepth 1 -type f -printf '%T# %P\0' | sort -g -z) \
| xargs -0 rm -f --
Explaining how this works:
Find emits <mtime> <filename><NUL> for each file in the current directory.
sort -g -z does a general (floating-point, as opposed to integer) numeric sort based on the first column (times) with the lines separated by NULs.
The first read in the while loop strips off the mtime (no longer needed after sort is done).
The second read in the while loop reads the filename (running until the NUL).
The loop increments, and then checks, a counter; if the counter's state indicates that we're past the initial skipping, then we print the filename, delimited by a NUL.
xargs -0 then appends that filename into the argv list it's collecting to invoke rm with.
ls -t | tail -n +4 | xargs -I {} rm {}
If you want a 1 liner
In zsh:
rm /files/to/delete/*(Om[1,-4])
If you want to include dotfiles, replace the parenthesized part with (Om[1,-4]D).
I think this works correctly with arbitrary chars in the filenames (just checked with newline).
Explanation: The parentheses contain Glob Qualifiers. O means "order by, descending", m means mtime (See man zshexpn for other sorting keys - large manpage; search for "be sorted"). [1,-4] returns only the matches at one-based index 1 to (last + 1 - 4) (note the -4 for deleting all but 3).
Don't use ls -t as it is unsafe for filenames that may contain whitespaces or special glob characters.
You can do this using all gnu based utilities to delete all but 3 newest files in the current directory:
find . -maxdepth 1 -type f -printf '%T#\t%p\0' |
sort -z -nrk1 |
tail -z -n +4 |
cut -z -f2- |
xargs -0 rm -f --
ls -t | tail -n +4 | xargs -I {} rm {}
Michael Ballent's answer works best as
ls -t | tail -n +4 | xargs rm --
throw me error if I have less than 3 file
Recursive script with arbitrary num of files to keep per-directory
Also handles files/dirs with spaces, newlines and other odd characters
#!/bin/bash
if (( $# != 2 )); then
echo "Usage: $0 </path/to/top-level/dir> <num files to keep per dir>"
exit
fi
while IFS= read -r -d $'\0' dir; do
# Find the nth oldest file
nthOldest=$(find "$dir" -maxdepth 1 -type f -printf '%T#\0%p\n' | sort -t '\0' -rg \
| awk -F '\0' -v num="$2" 'NR==num+1{print $2}')
if [[ -f "$nthOldest" ]]; then
find "$dir" -maxdepth 1 -type f ! -newer "$nthOldest" -exec rm {} +
fi
done < <(find "$1" -type d -print0)
Proof of concept
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   ├── sub1_1_days_old.txt
│   ├── sub1_2_days_old.txt
│   ├── sub1_3_days_old.txt
│   └── sub1\ 4\ days\ old\ with\ spaces.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   ├── sub2_1_days_old.txt
│   ├── sub2_2_days_old.txt
│   └── sub2\ 3\ days\ old\ with\ spaces.txt
└── tld_0_days_old.txt
2 directories, 10 files
$ ./keepNewest.sh test/ 2
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   └── sub1_1_days_old.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   └── sub2_1_days_old.txt
└── tld_0_days_old.txt
2 directories, 5 files
As an extension to the answer by flohall. If you want to remove all folders except the newest three folders use the following:
find . -maxdepth 1 -mindepth 1 -type d -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm -rf
The -mindepth 1 will ignore the parent folder and -maxdepth 1 subfolders.
This uses find instead of ls with a Schwartzian transform.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
tail -3 |
cut -d $'\t' -f 2-
find searches the files and decorates them with a time stamp and uses the tabulator to separate the two values. sort splits the input by the tabulator and performs a general numeric sort, which sorts floating point numbers correctly. tail should be obvious and cut undecorates.
The problem with decorations in general is to find a suitable delimiter, which is not part of the input, the file names. This answer uses the NULL character.
Below worked for me:
rm -rf $(ll -t | tail -n +5 | awk '{ print $9}')

Using shell to finds directories that do not have a specified pattern of files in the folder

The directory tree is like this:
.
├── A_123
│   └── 123.txt
├── A_456
│   ├── tmp
│   └── tmp.log
└── A_789
└── 789.txt
There're 3 directories (A_123, A_456, A_789).
The pattern of a directory name is: A_{numbers} and the file I'm interested in is {numbers}.txt.
I was wondering whether there's a way to get the directories A_{numbers} that has no {numbers}.txt file in them. For example above, this script should return:
./A_456
as A_456 doesn't have 456.txt in its folder but A_123 and A_789 have their {numbers}.txt files in the relevant folder.
Anyone has ideas about this? Thanks!
Here's one approach:
for dir in *;do
if [ $(find "$dir" -maxdepth 1 -regex '.*/[0-9][0-9]*\.txt' | wc -l) = 0 ]; then
echo $dir
fi
done
Since the A_[0-9] directories are not nested, you can easily do this with a glob in a loop. This implementation is pure bash, and does not spawn in external utilities:
for d in A_[0-9]*/; do # the trailing / causes only directories to be matched
files=("$d"/[0-9]*.txt) # populate an array with matching text files
((!${#files})) && echo "$d" # echo $d if the array is empty
done
There are some problems with this implementation. It will match a file such as "12ab.txt" and requires loading all the filenames for a directory into the array.
Here is another method in bash that does a more accurate filename matching:
re='^[0-9]+[.]txt$'
for d in A_[0-9]*/; do
for f in *; do
if [[ -f $f && $f =~ $re ]]; then
echo "$d"
break
fi
done
done
A slight variation on a couple of other answers: use bash extended pattern matching:
shopt -s extglob nullglob
for dir in A_+([0-9]); do
files=($dir/+([0-9]).txt)
(( ${#files[#]} == 0 )) && echo $dir
done
for file in *; do
if [[ "$file" =~ ^A_([0-9]+)$ && ! -f "$file/${BASH_REMATCH[1]}.txt" ]]; then
echo $file;
fi
done
How it works:
Checks using regexp (note the =~ ) that the file/folder's name is "A_" followed by number.
At the same time, it captures the numbers (note the parentheses) and stores them in ${BASH_REMATCH[1]}
Next, check if the folder contains {number}.txt.
If it does not, echo the folder's name.

Resources