How to pass hundreds of files from multiple subdirectories to cat? - linux

I have a directory full of subdirectories and each subdirectory has some text files inside of them (i.e. depth is 1).
I'd like to cat all these files (in no particular order) into one file:
cat file1 file2.... fileN >new.txt
Is there a bash shell one-liner that could list all the files inside of these directories and pass them to cat?

How about this?
find . -name '*.txt' -exec cat {} \; > concatenated.txt
Granted it calls cat a bunch of times rather than just once, but the effect is the same.

find . -type f -print0 | xargs -0 cat
find will recursively search for files (-type f) and print their names as null-terminated strings (-print0).
xargs will read null terminated strings (-0) from stdin and pass them as arguments to cat

Related

"find" specific contents [linux]

I would like to go through all the files in the current directory (or sub-directories) and echoes me back the name of files only if they contain certain words.
More detail:
find -type f -name "*hello *" will give me all file names that have "hello" in their names. But instead of that, I want to search through the files and if that file's content contains "hello" then prints out the name of the file.
Is there a way to approach this?
You can use GNU find and GNU grep as
find /path -type f -exec grep -Hi 'hello' {} +
This is efficient in a way that it doesn't invoke as many grep instances to as many files returned from find. This works in an underlying assumption that find returns a set of files for grep to search on. If you are unsure if the files may not be available, as a fool-proof way, you can use xargs with -r flag, in which case the commands following xargs are executed only if the piped commands return any results
find /path -type f | xargs -r0 grep -Hi 'hello'

How to recursively remove different files in two directories

I have 2 different recursive directories, in one directory have 200 .txt files in another have 210 .txt files, need a script to find the different file names and remove them from the directory.
There are probably better ways, but I think about:
find directory1 directory2 -name \*.txt -printf '%f\n' |
sort | uniq -u |
xargs -I{} find directory1 directory2 -name {} -delete
find directory1 directory2 -name \*.txt -printf '%f\n':
print basename of each file matching the glob *.txt
sort | uniq -u:
only print unique lines (if you wanted to delete duplicate, it would have been uniq -d)
xargs -I{} find directory1 directory2 -name {} -delete:
remove them (re-specify the path to narrow the search and avoid deleting files outside the initial search path)
Notes
Thank's to #KlausPrinoth for all the suggestions.
Obviously I'm assuming a GNU userland, I suppose people running with the tools providing bare minimum POSIX compatibility will be able to adapt it.
Yet another way is to use diff which is more than capable in finding file differences in files in directories. For instance if you have d1 and d2 that contain your 200 and 210 files respectively (with the first 200 files being the same), you could use diff and process substitution to provide the names to remove to a while loop:
( while read -r line; do printf "rm %s\n" ${line##*: }; done < <(diff -q d1 d2) )
Output (of d1 with 10 files, d2 with 12 files)
rm file11.txt
rm file12.txt
diff will not fit all circumstances, but is does a great job finding directory differences and is quite flexible.

find -exec doesn't recognize argument

I'm trying to count the total lines in the files within a directory. To do this I am trying to use a combination of find and wc. However, when I run find . -exec wc -l {}\;, I recieve the error find: missing argument to -exec. I can't see any apparent issues, any ideas?
You simply need a space between {} and \;
find . -exec wc -l {} \;
Note that if there are any sub-directories from the current location, wc will generate an error message for each of them that looks something like that:
wc: ./subdir: Is a directory
To avoid that problem, you may want to tell find to restrict the search to files :
find . -type f -exec wc -l {} \;
Another note: good idea using the -exec option . Too many times people pipe commands together thinking to get the same result, for instance here it would be :
find . -type f | xargs wc -l
The problem with piping commands in such a manner is that it breaks if any files has spaces in it. For instance here if a file name was "a b" , wc would receive "a" and then "b" separately and you would obviously get 2 error messages: a: no such file and b: no such file.
Unless you know for a fact that your file names never have any spaces in them (or non-printable characters), if you do need to pipe commands together, you need to tell all the tools you are piping together to use the NULL character (\0) as a separator instead of a space. So the previous command would become:
find . -type f -print0 | xargs -0 wc -l
With version 4.0 or later of bash, you don't need your find command at all:
shopt -s globstar
wc -l **/*
There's no simple way to skip directories, which as pointed out by Gui Rava you might want to do, unless you can differentiate files and directories by name alone. For example, maybe directories never have . in their name, while all the files have at least one extension:
wc -l **/*.*

Find all directories containing a file that contains a keyword in linux

In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done
... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)
Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'
Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;
You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.
Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.
IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/

how do i script hunpos through text files?

my purpose is to parse several text files using the POS parser HunPos http://code.google.com/p/hunpos/wiki/UserManualI
is there a way to bash script hunpos through a bunch of text files?
Typical mechanisms look like:
for f in glob; do command $f ; done
I often run commands like: for f in *; do echo -n "$f " ; cat $f ; done to see the contents of all the files in a directory. (Especially nice with /proc/sys/kernel/-style directories, where all the files have very short contents.)
or
find . -type f -exec command {} \;
or
find . -type f -print0 | xargs -0 command parameters
Something like find . -type f -exec file {} \; or find . -type f -print0 | xargs -0 file (only works if the command accepts multiple filenames during input).
Of course, if the program accepts multiple filename arguments (like cat or more or similar Unix shell tools) and all the files are in a single directory, you can very easily run: cat * (show contents of all files in the directory) or cat *.* (show contents of all files with a period in the filename).
If you frequently want "all files in all [sub]*directories", the zsh **/ option can be handy: ls -l **/*.c would show you foo/bar/baz.c and /blort/bleet/boop.c at once. Neat tool, but I usually don't mind writing the find command equivalent, I just don't need it that often. (And zsh isn't installed everywhere, so relying on its features could be frustrating in the future.)

Resources