How to get occurrences of word in all files? But with count of the words per directory instead of single number

How to get occurrences of word in all files? But with count of the words per directory instead of single number - linux

I would like to get given word count in all the files but per directory instead of a single count. I do get the word count with simple grep foo error*.log | wc -l by going to a specific directory. I would like to get the word count per directory when the directory structure is like below.
Directory tree
.
├── dir1
│   └── error2.log
└── error1.log
└── dir2
└── error_123.log
└── error_234.log
── dir3
└── error_12345.log
└── error_23554.log

Update: The following command can be used on AIX:
#!/bin/bash
for name in /path/to/folder/* ; do
if [ ! -d "${name}" ] ; then
continue
fi
# See: https://unix.stackexchange.com/a/398414/45365
count="$(cat "${name}"/error*.log | tr '[:space:]' '[\n*]' | grep -c 'SEARCH')"
printf "%s %s\n" "${name}" "${count}"
done
On GNU/Linux, with GNU findutils and GNU grep:
find /path/to/folder -maxdepth 1 -type d \
-printf "%p " -exec bash -c 'grep -ro 'SEARCH' {} | wc -l' \;
Replace SEARCH by the actual search term.

Related

Find directories where a text is found in a specific file

How can I find the directories where a text is found in a specific file? E.g. I want to get all the directories in "/var/www/" that contain the text "foo-bundle" in the composer.json file. I have a command that already does it:
find ./ -maxdepth 2 -type f -print | grep -i 'composer.json' | xargs grep -i '"foo-bundle"'
However I want to make an sh script that gets all those directories and do things with them. Any idea?

find
Your current command is almost there, instead off using xargs with grep, lets:
Move the grep to an -exec
Use xargs to pass the result to dirname to show only the parent folder
find ./ -maxdepth 2 -type f -exec grep -l "foo-bundle" {} /dev/null \; | xargs dirname
If you only want to search for composer.json files, we can include the -iname option like so:
find ./ -maxdepth 2 -type f -iname '*composer.json' -exec grep -l "foo-bundle" {} /dev/null \; | xargs dirname
If the | xargs dirname doesn't give enough data, we can extend it so we can loop over the results of find using a while read like so:
find ./ -maxdepth 2 -type f -iname '*composer.json' -exec grep -l "foo-bundle" {} /dev/null \; | while read -r line ; do
parent="$(dirname ${line%%:*})"
echo "$parent"
done
grep
We can use grep to search for all files containing a specific text.
After looping over each line, we can
Remove behind the : to get the filepath
Use dirname to get the parent folder path
Consider this file setup, were /test/b/composer.json contains foo-bundle
➜ /tmp tree
.
├── test
│   ├── a
│   │   └── composer.json
│   └── b
│   └── composer.json
└── test.sh
When running the following test.sh:
#!/bin/bash
grep -rw '/tmp/test' --include '*composer.json' -e 'foo-bundle' | while read -r line ; do
parent="$(dirname ${line%:*})"
echo "$parent"
done
The result is as expected, the path to folder b:
/tmp/test/b

In order to find all files, containing a particular piece of text, you can use:
find ./ -maxdepth 2 -type f -exec grep -l "composer.json" {} /dev/null \;
The result is a list of filenames. Now all you need to do is to get a way to launch the command dirname on all of them. (I tried using a simple pipe, but that would have been too easy :-) )

Thanks to #0stone0 for leading the way. I finally got it with:
#!/bin/sh
find /var/www -maxdepth 2 -type f -print | grep -i 'composer.json' | xargs grep -i 'foo-bundle' | while read -r line ; do
parent="$(dirname ${line%%:*})"
echo "$parent"
done

Give out parent folder name if not containing a certain file

I am looking for a terminal linux command to give out the folder parent name that does not contain a certain file:
By now I use the following command:
find . -type d -exec test -e '{}'/recon-all.done \; -print| wc -l
Which gives me the amount of folders which contain then file.
The file recon-all.done would be in /subject/../../recon-all.done and I would need every single "subject" name which does not contain the recon-all.done file.

Loop through the directories, test for the existence of the file, and print the directory if the test fails.
for subject in */; do
if ! [ -e "${subject}scripts/recon-all.done" ]
then echo "$subject"
fi
done

Your command;
find . -type d -exec test -e '{}'/recon-all.done \; -print| wc -l
Almost does the job, we'll just need to
Remove | wc -l to show the directory path witch does not contain the recon-all file
Now, we can negate the -exec test by adding a ! like so:
find . -type d \! -exec test -e '{}'/recon-all.done \; -print
This way find will show each folder name if it does not contain the recon-all file
Note; Based on your comment on Barmar's answer, I've added a -maxdepth 1 to prevent deeper directorys from being checked.
Small example from my local machine:
$ /tmp/test$ tree
.
├── a
│   └── test.xt
├── b
├── c
│   └── test.xt
└── x
├── a
│   └── test.xt
└── b
6 directories, 3 files
$ /tmp/test$ find . -maxdepth 1 -type d \! -exec test -e '{}/test.xt' \; -print
.
./b
./x
$ /tmp/test$

How to pass directory name from find to grep through xargs?

.
├── AAA
│   └── 01.txt
├── AAA_X
│   └── 03.txt
├── BBB
│   └── 02.txt
└── BBB_X
└── 04.txt
$ find . -not -name \*_X -type d -print0 | xargs -0 -n1 -I {} grep 'Hello' {}/\*.txt
grep: ./*.txt: No such file or directory
grep: ./AAA/*.txt: No such file or directory << Why failed here?
grep: ./BBB/*.txt: No such file or directory
$ grep 'Hello' AAA/*.txt
Hello
Question> How can I pass the directory names to grep from find with xargs?

The problem is that xargs doesn't execute the command through the shell.
You should use -name '*.txt' to get the files directly in the find command. To exclude the *_X directories, you can use -prune:
find . -type d -name '*_X' -prune -o -name '*.txt' -exec grep 'Hello' {} +

Why not use --exclude with grep?
grep --exclude=*_X/*.txt Hello */*.txt

Delete all files except the newest 3 in bash script

Question: How do you delete all files in a directory except the newest 3?
Finding the newest 3 files is simple:
ls -t | head -3
But I need to find all files except the newest 3 files. How do I do that, and how do I delete these files in the same line without making an unnecessary for loop for that?
I'm using Debian Wheezy and bash scripts for this.

This will list all files except the newest three:
ls -t | tail -n +4
This will delete those files:
ls -t | tail -n +4 | xargs rm --
This will also list dotfiles:
ls -At | tail -n +4
and delete with dotfiles:
ls -At | tail -n +4 | xargs rm --
But beware: parsing ls can be dangerous when the filenames contain funny characters like newlines or spaces. If you are certain that your filenames do not contain funny characters then parsing ls is quite safe, even more so if it is a one time only script.
If you are developing a script for repeated use then you should most certainly not parse the output of ls and use the methods described here: http://mywiki.wooledge.org/ParsingLs

Solution without problems with "ls" (strange named files)
This is a combination of ceving's and anubhava's answer.
Both solutions are not working for me. Because I was looking for a script that should run every day for backing up files in an archive, I wanted to avoid problems with ls (someone could have saved some funny named file in my backup folder). So I modified the mentioned solutions to fit my needs.
My solution deletes all files, except the three newest files.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm
Some explanation:
find lists all files (not directories) in current folder. They are printed out with timestamps.
sort sorts the lines based on timestamp (oldest on top).
head prints out the top lines, up to the last 3 lines.
cut removes the timestamps.
xargs runs rm for every selected file.
For you to verify my solution:
(
touch -d "6 days ago" test_6_days_old
touch -d "7 days ago" test_7_days_old
touch -d "8 days ago" test_8_days_old
touch -d "9 days ago" test_9_days_old
touch -d "10 days ago" test_10_days_old
)
This creates 5 files with different timestamps in the current folder. Run this script first and then the code for deleting old files.

The following looks a bit complicated, but is very cautious to be correct, even with unusual or intentionally malicious filenames. Unfortunately, it requires GNU tools:
count=0
while IFS= read -r -d ' ' && IFS= read -r -d '' filename; do
(( ++count > 3 )) && printf '%s\0' "$filename"
done < <(find . -maxdepth 1 -type f -printf '%T# %P\0' | sort -g -z) \
| xargs -0 rm -f --
Explaining how this works:
Find emits <mtime> <filename><NUL> for each file in the current directory.
sort -g -z does a general (floating-point, as opposed to integer) numeric sort based on the first column (times) with the lines separated by NULs.
The first read in the while loop strips off the mtime (no longer needed after sort is done).
The second read in the while loop reads the filename (running until the NUL).
The loop increments, and then checks, a counter; if the counter's state indicates that we're past the initial skipping, then we print the filename, delimited by a NUL.
xargs -0 then appends that filename into the argv list it's collecting to invoke rm with.

ls -t | tail -n +4 | xargs -I {} rm {}
If you want a 1 liner

In zsh:
rm /files/to/delete/*(Om[1,-4])
If you want to include dotfiles, replace the parenthesized part with (Om[1,-4]D).
I think this works correctly with arbitrary chars in the filenames (just checked with newline).
Explanation: The parentheses contain Glob Qualifiers. O means "order by, descending", m means mtime (See man zshexpn for other sorting keys - large manpage; search for "be sorted"). [1,-4] returns only the matches at one-based index 1 to (last + 1 - 4) (note the -4 for deleting all but 3).

Don't use ls -t as it is unsafe for filenames that may contain whitespaces or special glob characters.
You can do this using all gnu based utilities to delete all but 3 newest files in the current directory:
find . -maxdepth 1 -type f -printf '%T#\t%p\0' |
sort -z -nrk1 |
tail -z -n +4 |
cut -z -f2- |
xargs -0 rm -f --

ls -t | tail -n +4 | xargs -I {} rm {}
Michael Ballent's answer works best as
ls -t | tail -n +4 | xargs rm --
throw me error if I have less than 3 file

Recursive script with arbitrary num of files to keep per-directory
Also handles files/dirs with spaces, newlines and other odd characters
#!/bin/bash
if (( $# != 2 )); then
echo "Usage: $0 </path/to/top-level/dir> <num files to keep per dir>"
exit
fi
while IFS= read -r -d $'\0' dir; do
# Find the nth oldest file
nthOldest=$(find "$dir" -maxdepth 1 -type f -printf '%T#\0%p\n' | sort -t '\0' -rg \
| awk -F '\0' -v num="$2" 'NR==num+1{print $2}')
if [[ -f "$nthOldest" ]]; then
find "$dir" -maxdepth 1 -type f ! -newer "$nthOldest" -exec rm {} +
fi
done < <(find "$1" -type d -print0)
Proof of concept
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   ├── sub1_1_days_old.txt
│   ├── sub1_2_days_old.txt
│   ├── sub1_3_days_old.txt
│   └── sub1\ 4\ days\ old\ with\ spaces.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   ├── sub2_1_days_old.txt
│   ├── sub2_2_days_old.txt
│   └── sub2\ 3\ days\ old\ with\ spaces.txt
└── tld_0_days_old.txt
2 directories, 10 files
$ ./keepNewest.sh test/ 2
$ tree test/
test/
├── sub1
│   ├── sub1_0_days_old.txt
│   └── sub1_1_days_old.txt
├── sub2\ with\ spaces
│   ├── sub2_0_days_old.txt
│   └── sub2_1_days_old.txt
└── tld_0_days_old.txt
2 directories, 5 files

As an extension to the answer by flohall. If you want to remove all folders except the newest three folders use the following:
find . -maxdepth 1 -mindepth 1 -type d -printf '%T#\t%p\n' |
sort -t $'\t' -g |
head -n -3 |
cut -d $'\t' -f 2- |
xargs rm -rf
The -mindepth 1 will ignore the parent folder and -maxdepth 1 subfolders.

This uses find instead of ls with a Schwartzian transform.
find . -type f -printf '%T#\t%p\n' |
sort -t $'\t' -g |
tail -3 |
cut -d $'\t' -f 2-
find searches the files and decorates them with a time stamp and uses the tabulator to separate the two values. sort splits the input by the tabulator and performs a general numeric sort, which sorts floating point numbers correctly. tail should be obvious and cut undecorates.
The problem with decorations in general is to find a suitable delimiter, which is not part of the input, the file names. This answer uses the NULL character.

Below worked for me:
rm -rf $(ll -t | tail -n +5 | awk '{ print $9}')

Bash - finding files with spaces and rename with sed [duplicate]

This question already has answers here:
Recursively rename files using find and sed
(20 answers)
Closed 9 years ago.
I have been trying to write a script to rename all files that contain a space and replace the space with a dash.
Example: "Hey Bob.txt" to "Hey-Bob.txt"
When I used a for-loop, it just split up the file name at the space, so "Hey Bob.txt" gave separate argument like "Hey" and "Bob.txt".
I tried the following script but it keeps hanging on me.
#!/bin/bash
find / -name '* *' -exec mv {} $(echo {} | sed 's/ /-g')\;

Building off OP's idea:
find ${PATH_TO_FILES} -name '* *' -exec bash -c 'eval $(echo mv -v \"{}\" $(echo {} | sed "s/ /-/g"))' \;
NOTE: need to specify the PATH_TO_FILES variable
EDIT: BroSlow pointed out need to consider directory structure:
find ${PATH_TO_FILES} -name '* *' -exec bash -c 'DIR=$(dirname "{}" | sed "s/ /-/g" ); BASE=$(basename "{}"); echo mv -v \"$DIR/$BASE\" \"$DIR/$(echo $BASE | sed "s/ /-/g")\"' \; > rename-script.sh ; sh rename-script.sh

Another way:
find . -name "* *" -type f |while read file
do
new=${file// /}
mv "${file}" $new
done

Not one line, but avoids sed and should work just as well if you're going to be using it for a script anyway. (replace the mv with an echo if you want to test)
In bash 4+
#!/bin/bash
shopt -s globstar
for file in **/*; do
filename="${file##*/}"
if [[ -f $file && $filename == *" "* ]]; then
onespace=$(echo $filename)
dir="${file%/*}"
[[ ! -f "$dir/${onespace// /-}" ]] && mv "$file" "$dir/${onespace// /-}" || echo "$dir/${onespace// /-} already exists, so not moving $file" 1>&2
fi
done
Older bash
#!/bin/bash
find . -type f -print0 | while read -r -d '' file; do
filename="${file##*/}"
if [[ -f $file && $filename == *" "* ]]; then
onespace=$(echo $filename)
dir="${file%/*}"
[[ ! -f "$dir/${onespace// /-}" ]] && mv "$file" "$dir/${onespace// /-}" || echo "$dir/${onespace// /-} already exists, so not moving $file" 1>&2
fi
done
Explanation of algorithm
**/* This recursively lists all files in the current directory (** technically does it but /* is added at the end so it doesn't list the directory itself)
${file##*/} Will search for the longest pattern of */ in file and remove it from the string. e.g. /foo/bar/test.txt gets printed as test.txt
$(echo $filename) Without quoting echo will truncate spaces to one, making them easier to replace with one - for any number of spaces
${file%/*} Remove everything after and including the last /, e.g. /foo/bar/test.txt prints /foo/bar
mv "$file" ${onespace// /-} replace every space in our filename with - (we check if the hyphened version exists before hand and if it does echo that it failed to stderr, note && is processed before || in bash)
find . -type f -print0 | while read -r -d '' file This is used to avoid break up strings with spaces in them by setting a delimiter and not processing \
Sample Output
$ tree
.
├── bar
│   ├── some dir
│   │   ├── some-name-without-space1.pdf
│   │   ├── some name with space1.pdf
│   ├── some-name-without-space1.pdf
│   ├── some name with space1.pdf
│   └── some-name-with-space1.pdf
└── space.sh
$ ./space.sh
bar/some-name-with-space1.pdf already exists, so not moving bar/some name with space1.pdf
$ tree
.
├── bar
│   ├── some dir
│   │   ├── some-name-without-space1.pdf
│   │   ├── some-name-with-space1.pdf
│   ├── some-name-without-space1.pdf
│   ├── some name with space1.pdf
│   └── some-name-with-space1.pdf
└── space.sh

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get occurrences of word in all files? But with count of the words per directory instead of single number - linux

Related

Find directories where a text is found in a specific file

Give out parent folder name if not containing a certain file

How to pass directory name from find to grep through xargs?

Delete all files except the newest 3 in bash script

Bash - finding files with spaces and rename with sed [duplicate]

Categories

Resources