Sorting files based on content using command line

Sorting files based on content using command line - linux

I have a database of files in a folder. I wish to sort the files containing *C: into one folder and the files containing *c: into another folder. How can this by achieved?**
I can use *.krn to access every file.

$ grep --help | grep with-matches
-l, --files-with-matches print only names of FILEs containing matches
What now depends on how many files there are and how paranoid you must be about their names. From the simplest
mv $(grep -l pattern files) target
to the most robust
grep -l -Z pattern files | xargs -0 mv -t target-directory --

Related

How to copy files filtered with grep

I need find and copy files in /usr/share/man
Especialy need man7-8 and everything that have "et" in name.
I try this:
ls man7 man8 | grep "et"
This works perfectly.
Than i want that files copy with cp but i dont know how to format it properly
ls man7 man8 | grep "et" | xargs -I '{}' cp '{}' /home/marty/homework
But this is not working

It's not working because ls directory just outputs the filenames, without the directory prefixes. So cp doesn't know what directory to copy the file from.
But there's need for ls or grep, just use a wildcard.
cp man7/*et* man8/*et* /home/marty/homework
Your code would also fail for any filenames containing whitespace, since xargs treats that as a delimiter by default.

Use a text file (containing file names) to copy files from current directory to new directory

I have created a file (search.txt) containing file names of .fasta files I want to copy from the current directory (which also contains many unwanted .fasta files). Is it possible to use this text file to find and copy the matching files in the current directory to a new location?
The search.txt file contains a list of names like this:
name_1
name_2
name_3
I tried to build the search term using find and grep, like this:
find . *.fasta | grep -f search.txt
which is returning output like this for each matching file:
./name_1.fasta
./name_2.fasta
./name_3.fasta
name_1.fasta
name_2.fasta
name_3.fasta
It's finding the correct files, but I'm not sure if this output is useful / can be used to copy these files?

To get only matching filenames from search.txt I would do this:
find . -type f -name '*.fasta' -print0 | grep -zf search.txt | xargs -r0 cp -t target-dir/
It will find all files with the extension .fasta, display only the ones with matching patterns in search.txt, and bulk cp them to target-dir, and each filename is terminated with a nullbyte in case filenames contain newlines.

Using Bash, you can read all the files from the list into an array:
$ mapfile -t files < search.txt
$ declare -p files
declare -a files=([0]="name_1" [1]="name_2" [2]="name_3")
Then, you can append the desired file extension to all array elements:
$ files=("${files[#]/%/.fasta}")
$ declare -p files
declare -a files=([0]="name_1.fasta" [1]="name_2.fasta" [2]="name_3.fasta")
And finally, move them to the desired location:
$ mv "${files[#]}" path/to/new/location
You don't actually need the intermediate step:
mapfile -t files < search.txt
mv "${files[#]/%/.fasta}" path/to/new/location

Search filenames for a list of patterns and copy to destination

I have a list of patterns in filenames.txt, and I want to search a folder for filenames containing the names.
patterns.txt:
254b
0284ee
001ty
288qa
I want to search a folder for filenames containing any of these patterns in its filename and copy all found files to a destination directory.
So far i found a solution to view files as follows:
set -f; find ./ -type f \( $(printf -- ' -o - iname *%s*' $(cat patterns.txt) | cut -b4-) \); set +f
I can find all files based on the patterns on my patterns.txt file, but how do I copy them top a newfolder ?

Assuming target folder will not need to maintain the original hierarchy (or that the input directory does not have sub directories), using find, grep, and xargs should work:
find . -type f -print0 |
grep -z -i -F -f patterns.txt |
xargs -0 -s1000 cp -t /new/folder
The sequence has the advantage of bulking the copy - will be efficient for large number of files. Using NUL to separate file name should allow any special character in the file name.

Run recursive grep using two patterns

How can I use this grep pattern to recursively search a directory? I need for both of these to be on the same line in the file the string. I keep getting the message back this is a directory. How can I make it search recursively all files with the extension .cfc?
"<cffunction" and "inject="
grep -insR "<cffunction" | grep "inject=" /c/mydirectory/

Use find and exec:
find your_dir -name "*.cfc" -type f -exec grep -insE 'inject=.*<cffunction|<cffunction.*inject=' /dev/null {} +
find finds your *.cfc files recursively and feeds into grep, picking only regular files (-type f)
inject=.*<cffunction|<cffunction.*inject= catches lines that have your patterns in either order
{} + ensures each invocation of grep gets up to ARG_MAX files
/dev/null argument to grep ensures that the output is prefixed with the name of file even when there is a single *.cfc file

You've got it backwards, you should pipe your file search to the second command, like:
grep -nisr "inject=" /c/mydirectory | grep "<cffunction"
edit: to exclude some directories and search only in *.cfc files, use:
grep -nisr --exclude-dir={/abs/dir/path,rel/dir/path} --include \*.cfc "inject=" /c/mydirectory | grep "<cffunction"

Find all directories containing a file that contains a keyword in linux

In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done

... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)

Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'

Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;

You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.

Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.

IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sorting files based on content using command line - linux

I have a database of files in a folder. I wish to sort the files containing C: into one folder and the files containing c: into another folder. How can this by achieved?** I can use *.krn to access every file.

Related

How to copy files filtered with grep

Use a text file (containing file names) to copy files from current directory to new directory

Search filenames for a list of patterns and copy to destination

Run recursive grep using two patterns

Find all directories containing a file that contains a keyword in linux

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Sorting files based on content using command line - linux

I have a database of files in a folder. I wish to sort the files containing *C: into one folder and the files containing *c: into another folder. How can this by achieved?** I can use *.krn to access every file.

Related

How to copy files filtered with grep

Use a text file (containing file names) to copy files from current directory to new directory

Search filenames for a list of patterns and copy to destination

Run recursive grep using two patterns

Find all directories containing a file that contains a keyword in linux

Categories

Resources

I have a database of files in a folder. I wish to sort the files containing C: into one folder and the files containing c: into another folder. How can this by achieved?** I can use *.krn to access every file.