How to print the output of find command with tab delimited at beginning - linux

I Have tried the below command to print the output of a find command with tab delimited.
echo -e "\t"; find /usr/live/class/$client_abbr -name "$line.cls" -exec grep '^#include' {} \;
If the output contains n number of lines, only the first line is printed with tab delimited, and it was not applied to rest of the lines. Please let me know how could i modify the above command to have tab at front of all lines.

You will likely find piping to xargs more efficient than using -exec. The extra quotes, -type f and -print0 are respectively for safety, for specifying that you need a file (not a directory) and for enabling file names with embedded white space. With the grep output piped to sed (attribution to Fischer's comment), you get what you need.
find "/usr/live/class/$client_abbr" -type f -name "$line.cls" -print0 |
xargs -0 grep '^#include' |
sed 's/^/\t/'

Related

Format xargs output to grep

I have a script that I'm trying to optimize with xargs. The current version uses find with -exec to call the command:
find -type f -iname "*.mp4" -print0 -printf '\n' -exec getfattr -d --absolute-names {} \;
after which I can pipe to grep with something like:
grep -z -P user\.md5\=\"$input_search_hash\"
to filter the results while keeping the whole output with -z.
I need the whole output returned from getfattr to be "preserved", per file, because I need the filename for which there is a matching extended attribute, which then is then passed to sed to extract it. There are also cases where I have multiple grep commands in sequence if I need to search for files with multiple matches in the extended attributes. The problem is that the output of:
find -type f -iname "*.mp4" -print0 | xargs -0 getfattr -d --absolute-names
is not formatted in such a way that grep will filter in this way. This does work with the -exec method. Can I pass an addional option to xargs or pipe in some additional command that will format the output to make grep properly replicate the behaviour of -exec? I'm guessing I need some sort of line-break before feeding to grep like what -printf '\n' does in the -exec method. I would just use getfattr to "search" the extended attributes instead of needing to grep the output at all, but it has no way to do this by suppling a xattr name and value.
Example
The input comes from the find command, which is a list of video files in an arbitrary directory structure. The output of each getfattr command, for each file is such:
# file: /path/to/file/test.mp4
user.md5="0e29a7f555af518872771689e28d998d"
user.quality="10"
user.sha256="d49ba58e3b30f4ef8c81d19ce960edcf6552977bb8adb79b5b9a677ba9a54b2b"
user.size="1645645"
If I attempt to grep the output of find using the + method, say for a value of "10" on the quality, I will get results like this:
# file: /path/to/file/test.mp4
user.md5="8cf97b888e6fdbed27b02233cd6779f5"
user.quality="12"
user.sha256="613d16b2a0270e2e5f81cfd58b1eacf710a65b82ce2dab49a1e415275440f429"
user.size="1645645"
# file: /path/to/file/test1.mp4
user.md5="3c5a39f1ceefce1e124bcd6786a99155"
user.quality="10"
user.sha256="0d7128a7642d24ea879bbfb3de812b7939b618d8af639f07d5104c954c8049c3"
user.size="5674567"
# file: /path/to/file/test2.mp4
user.md5="0e29a7f555af518872771689e28d998d"
user.quality="6"
user.sha256="d49ba58e3b30f4ef8c81d19ce960edcf6552977bb8adb79b5b9a677ba9a54b2b"
user.size="15645"
All files that find locates are returned and the string to be searched from grep, in this example user.quality="10", is highlighted, but the other files test.mp4 and test2.mp4 still have the output printed post-grep. In other words, find may locate 1000 mp4 files of which maybe 20 have a user.quality="10" entry, but even applying grep to search for that string still returns 1000 filenames (after sed).
This does not happen when using \;. The only thing I would get out from grep would be:
# file: /path/to/file/test.mp4
user.md5="3c5a39f1ceefce1e124bcd6786a99155"
user.quality="10"
user.sha256="0d7128a7642d24ea879bbfb3de812b7939b618d8af639f07d5104c954c8049c3"
user.size="5674567"
This is the expected behaviour.
xargs vs find -exec
To me it seems like you want to use xargs instead of find -exec {} \; to speed things up.
Yes, xargs is faster than find -exec {} \;, not because it does the same work more efficiently, but because it does different work!
find -exec {} \; calls once for each file (getfattr file1, then getfattr file2, and so on).
xargs crams as many files into one call as possible (getfattr file1 file2 file3 ...).
The same behavior (and even more speedup) can be achieved with find -exec {} + -- no need to use xargs for that.
With xargs and find -exec {} + you loose control over the output format. There is only one call of getfattr so that program decides what to print between file1, file2 and so on. getfattr has no option to customize its output format.
No problem! You can ...
Parse getfattr's output
... pretty easily.
For starters, we assume that all path names are pretty normal. Spaces, *, and ? are ok though. For really unusual path names containing backslashes and linebreaks see the last section.
If you output only the relevant attribute using -n user.md5 instead of -d, then you know that the output (if any) for each file is always of the form
# file: path in a single line
user.md5=encoded value of the attribute
Files without the attribute user.md5 are not printed at all. They cause a warning on stderr which can be suppressed by 2> /dev/null.
Now, grep for matching attributes. Use grep -B1 to print the line above each match (i.e. the path) too. Then use sed -n or grep -o to extract the filenames.
find -type f -iname '*.mp4' -exec getfattr -n user.md5 --absolute-names {} + 2> /dev/null |
grep -B1 -Fx "user.md5=\"$input_search_hash\"" |
sed -n 's/^# file: //p'
Above command prints the paths of all mp4 files having the attribute user.md5 with value $input_search_hash.
Handling Unusual Filenames
At least my version (getfattr 2.4.48 by Andreas Gruenbacher) on Debian 10 always prints the file name in a single line. Linebreaks are encoded using \012 and backslashes are encoded using \134. Therefore, safe processing of those files is possible.
Above command works, but prints only the encoded file names. To get the actual filenames you have to extend the sed command or add another command to interpret octal escape sequences. For me, getfattr only escapes \n, \r and \\, thus sed 's:\\012:\n:g;s:\\015:\r:g;s:\\134:\\:g' should be sufficient for printing. For further processing, you may want to use tr \\n \\0 | sed -z ... instead, such that filenames are separated by null bytes.
To test which characters are escaped for you, create a filename containing all allowed bytes and let getfattr print its name:
f=$(printf $(printf '\\%o' $(seq 1 255)) | tr -d /)
touch "$f"
setfattr -n user.md5 -v 123 "$f"
getfattr -n user.md5 "$f"
rm "$f"

using grep in single-line files to find the number of occurrences of a word/pattern

I have json files in the current directory, and subdirectories. All the files have a single line of content.
I want to a list of all files that contain the word XYZ, and the number of times it occurs in that file.
I want to print the list according to the following format:
file_name pattern_occurence_times
It should look something like:
.\x1\x2\file1.json 3
.\x1\file3.json 2
The problem is that grep counts the NUMBER of lines containing XYZ, not the number of occurrences.
Since the whole content of the files is always contained in a single line, the count is always 1 (if the pattern occurs in the file).
I used this command for that:
find . -type f -name "*.json" -exec grep --files-with-match -i 'xyz' {} \; -exec grep -wci 'xyz' {} \;
I wrote a python code, and it works, but I would like to know if there is any way of doing that using find and grep or any other command line tools.
Thanks
The classical approach to this problem is the pipeline grep -o regex file | wc -l. However, to execute a pipeline in find's -exec you have to run a shell (e.g. sh -c ... ). But all these things together will only print the number of matches, not the file names. Also, files with no matches have to be filtered out.
Because of all of this I think a single awk command would be preferable:
find ... -type f -exec awk '{$0=tolower($0); c+=gsub(/xyz/,"")}
END {if(c>0) print FILENAME " " c}' {} \;
Here the tolower($0) emulates grep's -i option. Make sure to write your search pattern xyz only in lowercase.
If you want to combine this with subsequent filters in find you can add else exit 1 at the end of the last awk block to continue (inside find) only with the printed files.
Use the -o option of grep, e.g. in conjunction with wc, e.g.
find . -name "*.json" | while read -r f ; do
echo $f : $(grep -ow XYZ "$f" | wc -l)
done

Grep regular files in a linux File System and show their content

How do I display the content of files regular files matched with grep command? For example I grep a directory in order to see the regular files it has. I used the next line to see the regular files only:
ls -lR | grep ^-
Then I would like to display the content of the files found there. How do I do it?
I would do something like:
$ cat `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
Use ls to find the files
grep finds your pattern
reverse the whole result
cut out the first file separated field to get the file name (files with spaces are problematic)
reverse the file name back to normal direction
Backticks will execute that and return the list of file names to cat.
or the way I would probably do it is use vim to look at each file.
$ vim `ls -lR | egrep "^-" | rev | cut -d ' ' -f 1 | rev`
It feels like you are trying to find only the files recursively. This is what I do in those cases:
$ vim `find . -type f -print`
There are multiple ways of doing it. Would try to give you a few easy and clean ways here. All of them handle filenames with space.
$ find . -type f -print0 | xargs -0 cat
-print0 adds a null character '\0' delimiter and you need to call xargs -0 to recognise the null delimiter. If you don't do that, whitespace in the filename create problems.
e.g. without -print0 filenames: abc 123.txt and 1.inc would be read as three separate files abc, 123.txt and 1.inc.
with -print0 this becomes abc 123.txt'\0' and 1.inc'\0' and would be read as abc 123.txt and 1.inc
As for xargs, it can accept the input as a parameter. command1 | xargs command2 means the output of command1 is passed to command2.
cat displays the content of the file.
$ find . -type f -exec echo {} \; -exec cat {} \;
This is just using the find command. It finds all the files (type f), calls echo to output the filename, then calls cat to display its content.
If you don't want the filename, omit -exec echo {} \;
Alternatively you can use cat command and pass the output of find.
$ cat `find . -type f -print`
If you want to scroll through the content of multiple files one by one. You can use.
$ less `find . -type f -print`
When using less, you can navigate through :n and :p for next and previous file respectively. press q to quit less.

How can I search for files in directories that contain spaces in names, using "find"?

How can I search for files in directories that contain spaces in names, using find?
i use script
#!/bin/bash
for i in `find "/tmp/1/" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
do
for j in `ls "$i" | grep sh | sed 's/\.txt//g'`
do
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
but the files and directories that contain spaces in names are not processed?
This will grab all the files that have spaces in them
$ls
more space nospace stillnospace this is space
$find -type f -name "* *"
./this is space
./more space
I don't know how to achieve you goal. But given your actual solution, the problem is not really with find but with the for loops since "spaces" are taken as delimiter between items.
find has a useful option for those cases:
from man find:
-print0
True; print the full file name on the standard output, followed by a null character
(instead of the newline character that -print uses). This allows file names
that contain newlines or other types of white space to be correctly interpreted
by programs that process the find output. This option corresponds to the -0
option of xargs.
As the man saids, this will match with the -0 option of xargs. Several other standard tools have the equivalent option. You probably have to rewrite your complex pipeline around those tools in order to process cleanly file names containing spaces.
In addition, see bash "for in" looping on null delimited string variable to learn how to use for loop with 0-terminated arguments.
Do it like this
find . -type f -name "* *"
Instead of . you can specify your path, where you want to find files with your criteria
Your first for loop is:
for i in `find "/tmp/1" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
If I understand it correctly, it is looking for all text files in the /tmp/1 directory, and then attempting to remove the file name with the sed command right? This would cause a single directory with multiple .txt files to be processed by the inner for loop more than once. Is that what you want?
Instead of using sed to get rid of the filename, you can use dirname instead. Also, later on, you use sed to get rid of the extension. You can use basename for that.
for i in `find "/tmp/1" -iname "*.txt"` ; do
path=$(dirname "$i")
for j in `ls $path | grep POD` ; do
file=$(basename "$j" .txt)
# Do what ever you want with the file
This doesn't solve the problem of having a single directory processed multiple times, but if it is an issue for you, you can use the for loop above to store the file name in an array instead and then remove duplicates with sort and uniq.
Use while read loop with null-delimited pathname output from find:
#!/bin/bash
while IFS= read -rd '' i; do
while IFS= read -rd '' j; do
find "/tmp/2/" -iname "$j.sh" -exec echo cp '{}' "$i" \;
done <(exec find "$i" -maxdepth 1 -mindepth 1 -name '*POD*' -not -name '*.txt' -printf '%f\0')
done <(exec find /tmp/1 -iname '*.txt' -not -iname '[0-9A-Za-z]*.txt' -print0)
Never used for i in $(find...) or similar as it'll fail for file names containing white space as you saw.
Use find ... | while IFS= read -r i instead.
It's hard to say without sample input and expected output but something like this might be what you need:
find "/tmp/1/" -iname "*.txt" |
while IFS= read -r i
do
i="${i%%[0-9A-Za-z]*\.txt}"
for j in "$i"/*sh*
do
j="${j%%\.txt}"
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
The above will still fail for file names that contains newlines. If you have that situation and can't fix the file names then look into the -print0 option for find, and piping it to xargs -0.

find and copy all images in directory using terminal linux mint, trying to understand syntax

OS Linux Mint
Like the title says finally I would like to find and copy all images in a directory.
I found:
find all jpg (or JPG) files in a directory and copy them into the folder /home/joachim/neu2:
find . -iname \*.jpg -print0 | xargs -I{} -0 cp -v {} /home/joachim/neu2
and
find all image files in a direcotry:
find . -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image'
My problem is first of all, I don't really understand the syntax. Could someone explain the code?
And secondly can someone connect the two codes for generating a code that does what I want ;)
Greetings and thanks in advance!
First, understand that the pipe "|" links commands piping the output of the first into the second as an argument. Your two shell codes both pipe output of the find command into other commands (grep and xargs). Let's look at those commands one after another:
First command: find
find is a program to "search for files in a directory hierarchy" (that is the explanation from find's man page). The syntax is (in this case)
find <search directory> <search pattern> <action>
In both cases the search directory is . (that is the current directory). Note that it does not just search the current directory but all its subdirectories as well (the directory hierarchy).
The search pattern accepts options -name (meaning it searches for files the name of which matches the pattern given as an argument to this option) or -iname (same as name but case insensitive) among others.
The action pattern may be -print0 (print the exact filename including its position in the given search directory, i.e. the relative or absolute path to the file) or -exec (execute the given command on the file(s), the command is to be ended with ";" and every instance of "{}" is replaced by the filename).
That is, the first shell code (first part, left of the pipe)
find . -iname \*.jpg -print0
searches all files with ending ".jpg" in the current directory hierarchy and prints their paths and names. The second one (first part)
find . -name '*' -exec file {} \;
finds all files in the current directory hierarchy and executes
file <filename>
on them. File is another command that determines and prints the file type (have a look at the man page for details, man file).
Second command: xargs
xargs is a command that "builds and exectues command lines from standard input" (man xargs), i.e. from the find output that is piped into xargs. The command that it builds and executes is in this case
cp -v {} /home/joachim/neu2"
Option -I{} defines the replacement string, i.e. every instance of {} in the command is to be replaced by the input it gets from file (that is, the filenames). Option -0 defines that input items are not terminated (seperated) by whitespace or newlines but only by a null character. This seems to be necessary when using and the standard way to deal with find output as xargs input.
The command that is built and executed is then of course the copy command with option -v (verbose) and it copies each of the filenames it gets from find to the directory.
Third command: grep
grep filters its input giving only those lines or strings that match a particular output pattern. Option -o tells grep to print only the matching string, not the entire line (see man grep), -P tells it to interpret the following pattern as a perl regexp pattern. In perl regex, ^ is the start of the line, .+ is any arbitrary string, this arbitrary should then be followed by a colon, a space, a number of alphanumeric characters (in perl regex denoted \w+) a space and the string "image". Essentially this grep command filters the file output to only output the filenames that are image files. (Read about perl regex's for instance here: http://www.comp.leeds.ac.uk/Perl/matching.html )
The command you actually wanted
Now what you want to do is (1) take the output of the second shell command (which lists the image files), (2) bring it into the appropriate form and (3) pipe it into the xargs command from the first shell command line (which then builds and executes the copy command you wanted). So this time we have a three (actually four) stage shell command with two pipes. Not a problem. We already have stages (1) and (3) (though in stage (3) we need to leave out the -0 option because the input is not find output any more; we need it to treat newlines as item seperators).
Stage (2) is still missing. I suggest using the cut command for this. cut changes strings py splitting them into different fields (seperated by a delimiter character in the original string) that can then be rearranged. I will choose ":" as the delimiter character (this ends the filename in the grep output, option -d':') and tell it to give us just the first field (option -f1, essentialls: print only the filename, not the part that comes after the ":"), i.e. stage (2) would then be
cut -d':' -f1
And the entire command you wanted will then be:
find . -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image' | cut -d':' -f1 | xargs -I{} cp -v {} /home/joachim/neu2
Note that you can find all the man pages for instance here: http://www.linuxmanpages.com
I figured out a command only using awk that does the job as well:
find . -name '*' -exec file {} \; |
awk '{
if ($3=="image"){
print substr($1, 0, length($1)-1);
system("cp " substr($1, 0, length($1)-1) " /home/joachim/neu2" )
}
}'
the substr($1, 0, length($1)-1) is needed because in first column file returns name;
The above answer is really good. but it could take longer if it a huge directory.
here is a shorter version of it , if you already know your file extension
find . -name \*.jpg | cut -d':' -f1 | xargs -I{} cp --parents -v {} ~/testimage/
Here's another one which works like a charm.
It adds the EPOCH time to prevent overwriting files with the same name.
cd /media/myhome/'Local station'/
find . -path ./jpg -prune -o -type f -iname '*.jpg' -exec sh -c '
for file do
newname="${file##*/}"
newname="${newname%.jpg}"
mv -T -- "$file" "/media/myhome/Local station/jpg/$newname-$(date +%s).jpg"
done
' find-sh {} +
cd ~/
It's been designed by Kamil in this post here.
Find a specific type file from a directory:
find /home/user/find/data/ -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image'
Copy specific type of file from one directory to another directory:
find /home/user/find/data/ -name '*' -exec file {} \; | grep -o -P '^.+: \w+ image' | cut -d':' -f1 | xargs -I{} cp -v {} /home/user/copy/data/

Resources