Create filename based on file output in shell - linux

I'm looking to create files with names based on the command output of the previous command i.e. if i run
find . -name *.mp4 | wc -l > filename
So that the output of the amount of files of that type is the filename of the created file.

Here's a solution that renames the file after it has been created:
find . -name *.mp4 | wc -l > filename && mv filename `tail -n 1 filename`
What is happening in this one-liner:
find . -name *mp4 | wc -l > filename : Finds files with mp4 suffix and then counts how many were found and redirects the output to a file named filename
tail -n 1 filename: Outputs the very last line in the file named filename. If you put backticks around it (`tail -n 1 filename`) then that statement is executed and replaced by the text it returns.
mv filename `tail -n 1 filename`: Renames the original file named filename to the executed statement above.
When you combine these with &&, the second statement only runs if the first was successful.

Related

How to read out a file line by line and for every line do a search with find and copy the search result to destination?

I hope you can help me with the following problem:
The Situation
I need to find files in various folders and copy them to another folder. The files and folders can contain white spaces and umlauts.
The filenames contain an ID and a string like:
"2022-01-11-02 super important file"
The filenames I need to find are collected in a textfile named ids.txt. This file only contains the IDs but not the whole filename as a string.
What I want to achieve:
I want to read out ids.txt line by line.
For every line in ids.txt I want to do a find search and copy cp the result to destination.
So far I tried:
for n in $(cat ids.txt); do find /home/alex/testzone/ -name "$n" -exec cp {} /home/alex/testzone/output \; ;
while read -r ids; do find /home/alex/testzone -name "$ids" -exec cp {} /home/alex/testzone/output \; ; done < ids.txt
The output folder remains empty. Not using -exec also gives no (search)results.
I was thinking that -name "$ids" is the root cause here. My files contain the ID + a String so I should search for names containing the ID plus a variable string (star)
As argument for -name I also tried "$ids *" "$ids"" *" and so on with no luck.
Is there an argument that I can use in conjunction with find instead of using the star in the -name argument?
Do you have any solution for me to automate this process in a bash script to read out ids.txt file, search the filenames and copy them over to specified folder?
In the end I would like to create a bash script that takes ids.txt and the search-folder and the output-folder as arguments like:
my-id-search.sh /home/alex/testzone/ids.txt /home/alex/testzone/ /home/alex/testzone/output
EDIT:
This is some example content of the ids.txt file where only ids are listed (not the whole filename):
2022-01-11-01
2022-01-11-02
2020-12-01-62
EDIT II:
Going on with the solution from tripleee:
#!/bin/bash
grep . $1 | while read -r id; do
echo "Der Suchbegriff lautet:"$id; echo;
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/ausgabe \;
done
In case my ids.txt file contains empty lines the -name "$id*" will be -name * which in turn finds all files and copies all files.
Trying to prevent empty line to be read does not seem to work. They should be filtered by the expression grep . $1 |. What am I doing wrong?
If your destination folder is always the same, the quickest and absolutely most elegant solution is to run a single find command to look for all of the files.
sed 's/.*/-o\n—name\n&*/' ids.txt |
xargs -I {} find -false {} -exec cp {} /home/alex/testzone/output +
The -false predicate is a bit of a hack to allow the list of actual predicates to start with -o (as in "or").
This could fail if ids.txt is too large to fit into a single xargs invocation, or if your sed does not understand \n to mean a literal newline.
(Here's a fix for the latter case:
xargs printf '-o\n-name\n%s*\n' <ids.txt |
...
Still the inherent problem with using xargs find like this is that xargs could split the list between -o and -name or between -name and the actual file name pattern if it needs to run more than one find command to process all the arguments.
A slightly hackish solution to that is to ensure that each pair is a single string, and then separately split them back out again:
xargs printf '-o_-name_%s*\n' <ids.txt |
xargs bash -c 'arr=("$#"); find -false ${arr[#]/-o_-name_/-o -name } -exec cp {} "$0"' /home/alex/testzone/ausgabe
where we temporarily hold the arguments in an array where each file name and its flags is a single item, and then replace the flags into separate tokens. This still won't work correctly if the file names you operate on contain literal shell metacharacters like * etc.)
A more mundane solution fixes your while read attempt by adding the missing wildcard in the -name argument. (I also took the liberty to rename the variable, since read will only read one argument at a time, so the variable name should be singular.)
while read -r id; do
find /home/alex/testzone -name "$id*" -exec cp {} /home/alex/testzone/output \;
done < ids.txt
Please try the following bash script copier.sh
#!/bin/bash
IFS=$'\n' # make newlines the only separator
set -f # disable globbing
file="files.txt" # name of file containing filenames
finish="finish" # destination directory
while read -r n ; do (
du -a | awk '{for(i=2;i<=NF;++i)printf $i" " ; print " "}' | grep $n | sed 's/ *$//g' | xargs -I '{}' cp '{}' $finish
);
done < $file
which copies recursively all the files named in files.txt from . and it's subfiles to ./finish
This new version works even if there are spaces in the directory names or file names.

using grep in single-line files to find the number of occurrences of a word/pattern

I have json files in the current directory, and subdirectories. All the files have a single line of content.
I want to a list of all files that contain the word XYZ, and the number of times it occurs in that file.
I want to print the list according to the following format:
file_name pattern_occurence_times
It should look something like:
.\x1\x2\file1.json 3
.\x1\file3.json 2
The problem is that grep counts the NUMBER of lines containing XYZ, not the number of occurrences.
Since the whole content of the files is always contained in a single line, the count is always 1 (if the pattern occurs in the file).
I used this command for that:
find . -type f -name "*.json" -exec grep --files-with-match -i 'xyz' {} \; -exec grep -wci 'xyz' {} \;
I wrote a python code, and it works, but I would like to know if there is any way of doing that using find and grep or any other command line tools.
Thanks
The classical approach to this problem is the pipeline grep -o regex file | wc -l. However, to execute a pipeline in find's -exec you have to run a shell (e.g. sh -c ... ). But all these things together will only print the number of matches, not the file names. Also, files with no matches have to be filtered out.
Because of all of this I think a single awk command would be preferable:
find ... -type f -exec awk '{$0=tolower($0); c+=gsub(/xyz/,"")}
END {if(c>0) print FILENAME " " c}' {} \;
Here the tolower($0) emulates grep's -i option. Make sure to write your search pattern xyz only in lowercase.
If you want to combine this with subsequent filters in find you can add else exit 1 at the end of the last awk block to continue (inside find) only with the printed files.
Use the -o option of grep, e.g. in conjunction with wc, e.g.
find . -name "*.json" | while read -r f ; do
echo $f : $(grep -ow XYZ "$f" | wc -l)
done

Find Files Containing Certain String and Copy To Directory Using Linux

I am trying to find files that contain a certain string in a current directory and make a copy of all of these files into a new directory.
My scrip that I'm trying to use
grep *Qtr_1_results*; cp /data/jobs/file/obj1
I am unable to copy and the output message is:
Usage: cp [-fhipHILPU][-d|-e] [-r|-R] [-E{force|ignore|warn}] [--] src target
or: cp [-fhipHILPU] [-d|-e] [-r|-R] [-E{force|ignore|warn}] [--] src1 ... srcN directory
Edit: After clearing things up (see comment)...
cp *Qtr_1_results* /data/jobs/file/obj1
What you're doing is just greping for nothing. With ; you end the command and cp prints the error message because you only provide the source, not the destination.
What you want to do is the following. First you want to grep for the filename, not the string (which you didn't provide).
grep -l the_string_you_are_looking_for *Qtr_1_results*
The -l option gives you the filename, instead of the line where the_string_you_are_looking_for is found. In this case grep will search in all files where the filename contains Qtr_1_results.
Then you want send the output of grep to a while loop to process it. You do this with a pipe (|). The semicolon ; just ends lines.
grep -l the_string_you_are_looking_for *Qtr_1_results* | while read -r filename; do cp $filename /path/to/your/destination/folder; done
In the while loop read -r will put the output of grep into the variable filename. When you assing a value to a variable you just write the name of the variable. When you want to have the value of the variable, you put a $ in front of it.
You can use multiple exec in find to do this task
For eg:
find . -type f -exec grep -lr "Qtr_1_results" {} \; -exec cp -r {} /data/jobs/file/obj1 \;
Details:
Find all files that contains the string. grep -l will list the files.
find . -type f -exec grep -lr "Qtr_1_results" {} \;
Result set from first part is a list of files. Copy each files from the result to destination.
-exec cp -r {} /data/jobs/file/obj1 \;

Linux piping find and md5sum not sending output

Trying to loop every file, do some cutting, extract the first 4 characters of the MD5.
Here's what I got so far:
find . -name *.jpg | cut -f4 -d/ | cut -f1 -d. | md5sum | head -c 4
Problem is, I don't see any more output at this point. How can I send output to md5sum and continue sending the result?
md5sum reads everything from stdin till end of file (eof) and outputs md5 sum of full file. You should separate input into lines and run md5sum per line, for example with while read var loop:
find . -name *.jpg | cut -f4 -d/ | cut -f1 -d. |
while read -r a;
do echo -n $a| md5sum | head -c 4;
done
read builtin bash command will read one line from input into shell variable $a; while loop will run loop body (commands between do and done) for every return from read, and $a will be the current line. -r option of read is to not convert backslash; -n option of echo command will not add newline (if you want newline, remove -n option of echo).
This will be slow for thousands of files and more, as there are several forks/execs for every file inside loop. Faster will be some scripting with perl or python or nodejs or any other scripting language with builtin md5 hash computing (or with some library).
You can do what you are attempting to do with a short "helper" script that you call from find. For example, you could create a short script to find the basename of each file passed as an argument, remove the '.jpg' extension, and then provide the remaining name w/o extension as input to md5sum on stdin to get the md5sum of the name itself. Call the script anything you like, say namemd5.sh. Example:
#!/bin/bash
[ -z "$1" ] && exit 1 ## validate single argument
fname=$(basename "$1") ## get the filename alone
fname="${fname%.jpg}" ## remove .jpg extension
fnsum=$(md5sum - <<<"$fname") ## get md5sum of name w/o .jpg
fnsum=${fnsum%% *} ## remove trailing ' -'
echo "$fnsum - $fname" ## output md5sum - name
## (remove ' - $fname' for md5sum alone)
(note: the name is provided as part of the output for example purposes, remove if you want the md5sum alone as shown in the comment above)
Example Files
$ find /home/david/img/wp/ -type f -name "*.jpg"
/home/david/img/wp/hacker_manifesto_1200x900.jpg
/home/david/img/wp/hacker_manifesto_by_otalicus.jpg
/home/david/img/wp/reflections-triple-1920x1200.jpg
/home/david/img/wp/hacker_wallpaper_1600x900.jpg
/home/david/img/wp/Zen.jpg
/home/david/img/wp/hacker_wallpaper_by_vanilla23-dot254.jpg
/home/david/img/wp/hacker_manifesto_1600x900.jpg
Example Use/Output
$ find /home/david/img/wp/ -type f -name "*.jpg" -exec ./namemd5.sh '{}' \;
0f7d2aac158eb9f7842215e14ff6573c - hacker_manifesto_1200x900
604bc695a0bb70b8db0352267caf226f - hacker_manifesto_by_otalicus
5decea0e306f185bf988ac9934ec0e2c - reflections-triple-1920x1200
82bd8e1ad3df588eb0e0848c5f764812 - hacker_wallpaper_1600x900
0f4daba431a22c03f28977f087e4c695 - Zen
0c55cd3ebd2a847e10c20d86e80e6ceb - hacker_wallpaper_by_vanilla23-dot254
e5c1da0c2db3827d2bf81c306633cc56 - hacker_manifesto_1600x900
You can also call the script with the -execdir version within find as well, e.g.
$ find /home/david/img/wp/ -type f -name "*.jpg" -execdir \
/full/path/to/namemd5.sh '{}' \;
(note: the use of the /full/path to your helper script above)
How to find all .jpg file then execute md5sum then cut first 4 caracters:
find . -name '*.jpg' -exec md5sum {} \; | cut -b 1-4

Find all directories containing a file that contains a keyword in linux

In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done
... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)
Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'
Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;
You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.
Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.
IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/

Resources