linux grep match a subset of files from a previous match

linux grep match a subset of files from a previous match - linux

I am needing pipe this result:
grep -R "extends Some_Critical_Class" *
to another grep:
grep "function init("
ie. "files that extend Some_Critical_Class that also have function init()"
If there is a way to do it in one operation in grep, that would be great, but I'd also like to see the how the piping is done to improve my programming in *nix (which is rudimentary right now). Thanks.

To be clear, you want the list of files that contain both strings. Not only you need two greps for this, but you also need the -l (a.k.a. --files-with-matches) option.
Here is one way of doing this:
grep -F -R -l -Z "extends Some_Critical_Class" . \
| xargs -0 grep -F -l "function init("
We first obtain a (NUL-delimited) list of files that contain your first string, and then we use xargs to pass this list of files to the second grep.

Don't use grep (g/re/p) to find files, adding that functionality to GNU grep was just a bad idea since there's already a perfectly good tool to find files with an extremely obvious name.
You didn't say what your expected output was but maybe this does what you want:
find . -type f -exec \
awk '
/extends Some_Critical_Class/ { x=1 }
/function init\(/ { y=1 }
END { if (x && y) print FILENAME }
' {} \;
The above will work on any Unix box, not just one with GNU tools, and can be trivially modified to add more regexps or strings to search for, various "and" and "or" combinations, etc.

Related

Using cut in Linux Mint Terminal more precisely

In the directory /usr/lib on Linux Mint there are files, among other things, that goes by the name of xxx.so.d where xxx is their name, and d being a number. The assignment is to find all files with .so file ending and write out their name, xxx. The code I got so far is
ls | grep "\.so\." | cut -d "." -f 1
The problem now is that cut cuts of some filenames short, as an example there is an file called libgimp-2.0.so.0, where the wanted output would be libgimp-2.0 since that part is infront of .so
Is there anyway to make cut cut at ".so" instead of the first .?

The answer given by pacholik can give you wrong files (ie: 'xyz.socket' will appear on your list). To correct his script:
for i in *.so.*; do echo "${i%%.so*}"; done
Another way to do this (easier to read in my opinion) is to use a little Perl:
ls | grep "\.so\." | perl -n0e "print ((split(/\.so/))[0], \"\n\")"
Sorry, I don't think there is a way to use only "cut" as you asked.

for i in *.so*; do echo "${i%.so*}"; done
just a bash parameter substitution
http://www.tldp.org/LDP/abs/html/parameter-substitution.html

Just use sed instead:
ls | grep -v ".socket" | grep .so | sed "s/.so.*//"
This will delete everything behind the first found .so in the file names. So also files named xxx.so.so would work.

Depend on the size of the directory probably using find could be the best option, as a start point give a try to this:
find . -iname "*.so.*" -exec basename {} \; | cut -d "." -f 1
Like cut there are many other options, like sed, awk that could help you achieve in some cases the same result in a faster way.

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

This might be a very simple thing for a shell scripting programmer but am pretty new to it. I was trying to execute the below command in a shell script and save the output into a variable
inputfile=$(ls -ltr *.{PDF,pdf} | head -1 | awk '{print $9}')
The command works fine when I fire it from terminal but fails when executed through a shell script (sh). Why is that the command fails, does it mean that shell script doesn't support the command or am I doing it wrong? Also how do I know if a command will work in shell or not?
Just to give you a glimpse of my requirement, I was trying to get the oldest file from a particular directory (I also want to make sure upper case and lower case extensions are handled). Is there any other way to do this ?

The above command will work correctly only if BOTH *.pdf and *.PDF files are in the directory you are currently.
If you would like to execute it in a directory with only one of those you should consider using e.g.:
inputfiles=$(find . -maxdepth 1 -type f \( -name "*.pdf" -or -name "*.PDF" \) | xargs ls -1tr | head -1 )
NOTE: The above command doesn't work with files with new lines, or with long list of found files.

Parsing ls is always a bad idea. You need another strategy.
How about you make a function that gives you the oldest file among the ones given as argument? the following works in Bash (adapt to your needs):
get_oldest_file() {
# get oldest file among files given as parameters
# return is in variable get_oldest_file_ret
local oldest f
for f do
[[ -e $f ]] && [[ ! $oldest || $f -ot $oldest ]] && oldest=$f
done
get_oldest_file_ret=$oldest
}
Then just call as:
get_oldest_file *.{PDF,pdf}
echo "oldest file is: $get_oldest_file_ret"
Now, you probably don't want to use brace expansions like this at all. In fact, you very likely want to use the shell options nocaseglob and nullglob:
shopt -s nocaseglob nullglob
get_oldest_file *.pdf
echo "oldest file is: $get_oldest_file_ret"
If you're using a POSIX shell, it's going to be a bit trickier to have the equivalent of nullglob and nocaseglob.

Is perl an option? It's ubiquitous on Unix.
I would suggest:
perl -e 'print ((sort { -M $b <=> -M $a } glob ( "*.{pdf,PDF}" ))[0]);';
Which:
uses glob to fetch all files matching the pattern.
sort, using -M which is relative modification time. (in days).
fetches the first element ([0]) off the sort.
Prints that.

As #gniourf_gniourf says, parsing ls is a bad idea. Such as leaving unquoted globs, and generally not counting for funny characters in file names.
find is your friend:
#!/bin/sh
get_oldest_pdf() {
#
# echo path of oldest *.pdf (case-insensitive) file in current directory
#
find . -maxdepth 1 -mindepth 1 -iname "*.pdf" -printf '%T# %p\n' \
| sort -n \
| tail -1 \
| cut -d\ -f1-
}
whatever=$(get_oldest_pdf)
Notes:
find has numerous ways of formatting the output, including
things like access time and/or write time. I used '%T# %p\n',
where %T# is last write time in UNIX time format incl.fractal part.
This will never containt space so it's safe to use as separator.
Numeric sort and tail get the last item, sorting by the time,
cut removes the time from the output.
I used IMO much easier to read/maintain pipe notation, with help of \.
the code should run on any POSIX shell,
You could easily adjust the function to parametrize the pattern,
time used (access/write), control the search depth or starting dir.

Easy replace with/without regex in multiple files

Hundred times a day I need to search for patterns in files and sometime I have to replace these patterns with something else. Most of the time it is simple patterns like a word or a short sentence but sometime I have to look for more complex regexp. I don't really like sed (at least the sed version I have because it is not much compliant with the PCRE engine). So I rather prefer using perl -pi -e.
However, Perl pie is not very attractive on Cygwin because of the mandatory -i.bak temp files. I need to find a way to automatically remove the .bak files after processing. Moreover, if I want to replace recursively in a project I have to list all the files first:
find . | xargs -n1 perl -pi -e 's/foo/bar/'
This command is quite long to write especially if you use it thousand times a month. So I decided to write a more useful tool working in the same way as the great silver searcher ag.
ag 'foo\d{3}[^\w]' # Search for a pattern
# Oh yes this one should be renamed!
replace 's/(foo)\d{3}[^\w]/\U$1\E_bar/g'
I wrote this very primitive bash function
function replace
{
EXTENSION=.perlpie_tmp
perl -p -i$EXTENSION -e $1 ${*:2}
for file in ${*:2}; do
rm "$file$EXTENSION";
done;
}
But I am not satisfied at all because it doesn't automatically search for all files recursively if there is no more than one argument. I may either modify this function an add find . if the number of arguments is 1, or I can write a much complex program in Perl that can support command line options, pretty output, smart case search or even plain text search.
What is the most suitable option to this problem and is there any advanced search/replace tool on the linux world? If not I may try to write my own rip tool standing for replace-in-place which can support all the options that I need.
Before that I need some advices...
EDIT
Actually I think to fork https://github.com/petdance/ack2 to add a replacement feature... This may or may not be a good idea...

Here's an alternative to your function (edited to use the suggestion provided by gniourf_gniourf, thanks):
find -type f . -exec sh -c 'perl -pi.bak -e "s/foo/bar/" "$0" && rm -f "$0".bak' {} \;
Using this approach, you can remove the file as you go.

I think you can use
grep -Hrn -e "string" .
to find a pattern, and
find -type f -exec sed -i "s#string1#string2#g" {} \;
to replace a pattern

I would slightly modify your existing function:
function replace {
local perl_code=$1 EXTENSION=.perlpie_tmp file
shift
for file; do
perl -p -i$EXTENSION -e "$perl_code" "$file" && rm "$file$EXTENSION"
done;
}
This will slightly worsen the performance as you're now calling perl multiple times, but I suspect you won't notice.

How to tell how many files match description with * in unix

Pretty simple question: say I have a set of files:
a1.txt
a2.txt
a3.txt
b1.txt
And I use the following command:
ls a*.txt
It will return:
a1.txt a2.txt a3.txt
Is there a way in a bash script to tell how many results will be returned when using the * pattern. In the above example if I were to use a*.txt the answer should be 3 and if I used *1.txt the answer should be 2.

Comment on using ls:
I see all the other answers attempt this by parsing the output of
ls. This is very unpredictable because this breaks when you have
file names with "unusual characters" (e.g. spaces).
Another pitfall would be, it is ls implementation dependent. A
particular implementation might format output differently.
There is a very nice discussion on the pitfalls of parsing ls output on the bash wiki maintained by Greg Wooledge.
Solution using bash arrays
For the above reasons, using bash syntax would be the more reliable option. You can use a glob to populate a bash array with all the matching file names. Then you can ask bash the length of the array to get the number of matches. The following snippet should work.
files=(a*.txt) && echo "${#files[#]}"
To save the number of matches in a variable, you can do:
files=(a*.txt)
count="${#files[#]}"
One more advantage of this method is you now also have the matching files in an array which you can iterate over.
Note: Although I keep repeating bash syntax above, I believe the above solution applies to all sh-family of shells.

You can't know ahead of time, but you can count how many results are returned. I.e.
ls -l *.txt | wc -l
ls -l will display the directory entries matching the specified wildcard, wc -l will give you the count.
You can save the value of this command in a shell variable with either
num=$(ls * | wc -l)
or
num=`ls -l *.txt | wc -l`
and then use $num to access it. The first form is preferred.

You can use ls in combination with wc:
ls a*.txt | wc -l
The ls command lists the matching files one per line, and wc -l counts the number of lines.

I like suvayu's answer, but there's no need to use an array:
count() { echo $#; }
count *

In order to count files that might have unpredictable names, e.g. containing new-lines, non-printable characters etc., I would use the -print0 option of find and awk with RS='\0':
num=$(find . -maxdepth 1 -print0 | awk -v RS='\0' 'END { print NR }')
Adjust the options to find to refine the count, e.g. if the criteria is files starting with a lower-case a with .txt extension in the current directory, use:
find . -type f -name 'a*.txt' -maxdepth 1 -print0

How can I use xargs to copy files that have spaces and quotes in their names?

I'm trying to copy a bunch of files below a directory and a number of the files have spaces and single-quotes in their names. When I try to string together find and grep with xargs, I get the following error:
find .|grep "FooBar"|xargs -I{} cp "{}" ~/foo/bar
xargs: unterminated quote
Any suggestions for a more robust usage of xargs?
This is on Mac OS X 10.5.3 (Leopard) with BSD xargs.

You can combine all of that into a single find command:
find . -iname "*foobar*" -exec cp -- "{}" ~/foo/bar \;
This will handle filenames and directories with spaces in them. You can use -name to get case-sensitive results.
Note: The -- flag passed to cp prevents it from processing files starting with - as options.

find . -print0 | grep --null 'FooBar' | xargs -0 ...
I don't know about whether grep supports --null, nor whether xargs supports -0, on Leopard, but on GNU it's all good.

The easiest way to do what the original poster wants is to change the delimiter from any whitespace to just the end-of-line character like this:
find whatever ... | xargs -d "\n" cp -t /var/tmp

This is more efficient as it does not run "cp" multiple times:
find -name '*FooBar*' -print0 | xargs -0 cp -t ~/foo/bar

I ran into the same problem. Here's how I solved it:
find . -name '*FoooBar*' | sed 's/.*/"&"/' | xargs cp ~/foo/bar
I used sed to substitute each line of input with the same line, but surrounded by double quotes. From the sed man page, "...An ampersand (``&'') appearing in the replacement is replaced by the string matching the RE..." -- in this case, .*, the entire line.
This solves the xargs: unterminated quote error.

This method works on Mac OS X v10.7.5 (Lion):
find . | grep FooBar | xargs -I{} cp {} ~/foo/bar
I also tested the exact syntax you posted. That also worked fine on 10.7.5.

Just don't use xargs. It is a neat program but it doesn't go well with find when faced with non trivial cases.
Here is a portable (POSIX) solution, i.e. one that doesn't require find, xargs or cp GNU specific extensions:
find . -name "*FooBar*" -exec sh -c 'cp -- "$#" ~/foo/bar' sh {} +
Note the ending + instead of the more usual ;.
This solution:
correctly handles files and directories with embedded spaces, newlines or whatever exotic characters.
works on any Unix and Linux system, even those not providing the GNU toolkit.
doesn't use xargs which is a nice and useful program, but requires too much tweaking and non standard features to properly handle find output.
is also more efficient (read faster) than the accepted and most if not all of the other answers.
Note also that despite what is stated in some other replies or comments quoting {} is useless (unless you are using the exotic fishshell).

Look into using the --null commandline option for xargs with the -print0 option in find.

For those who relies on commands, other than find, eg ls:
find . | grep "FooBar" | tr \\n \\0 | xargs -0 -I{} cp "{}" ~/foo/bar

find | perl -lne 'print quotemeta' | xargs ls -d
I believe that this will work reliably for any character except line-feed (and I suspect that if you've got line-feeds in your filenames, then you've got worse problems than this). It doesn't require GNU findutils, just Perl, so it should work pretty-much anywhere.

I have found that the following syntax works well for me.
find /usr/pcapps/ -mount -type f -size +1000000c | perl -lpe ' s{ }{\\ }g ' | xargs ls -l | sort +4nr | head -200
In this example, I am looking for the largest 200 files over 1,000,000 bytes in the filesystem mounted at "/usr/pcapps".
The Perl line-liner between "find" and "xargs" escapes/quotes each blank so "xargs" passes any filename with embedded blanks to "ls" as a single argument.

Frame challenge — you're asking how to use xargs. The answer is: you don't use xargs, because you don't need it.
The comment by user80168 describes a way to do this directly with cp, without calling cp for every file:
find . -name '*FooBar*' -exec cp -t /tmp -- {} +
This works because:
the cp -t flag allows to give the target directory near the beginning of cp, rather than near the end. From man cp:
-t, --target-directory=DIRECTORY
copy all SOURCE arguments into DIRECTORY
The -- flag tells cp to interpret everything after as a filename, not a flag, so files starting with - or -- do not confuse cp; you still need this because the -/-- characters are interpreted by cp, whereas any other special characters are interpreted by the shell.
The find -exec command {} + variant essentially does the same as xargs. From man find:
-exec command {} +
This variant of the -exec action runs the specified command on
the selected files, but the command line is built by appending
each selected file name at the end; the total number of invoca‐
matched files. The command line is built in much the same way
that xargs builds its command lines. Only one instance of `{}'
is allowed within the command, and (when find is being invoked
from a shell) it should be quoted (for example, '{}') to protect
it from interpretation by shells. The command is executed in
the starting directory. If any invocation returns a non-zero
value as exit status, then find returns a non-zero exit status.
If find encounters an error, this can sometimes cause an immedi‐
ate exit, so some pending commands may not be run at all. This
variant of -exec always returns true.
By using this in find directly, this avoids the need of a pipe or a shell invocation, such that you don't need to worry about any nasty characters in filenames.

With Bash (not POSIX) you can use process substitution to get the current line inside a variable. This enables you to use quotes to escape special characters:
while read line ; do cp "$line" ~/bar ; done < <(find . | grep foo)

Be aware that most of the options discussed in other answers are not standard on platforms that do not use the GNU utilities (Solaris, AIX, HP-UX, for instance). See the POSIX specification for 'standard' xargs behaviour.
I also find the behaviour of xargs whereby it runs the command at least once, even with no input, to be a nuisance.
I wrote my own private version of xargs (xargl) to deal with the problems of spaces in names (only newlines separate - though the 'find ... -print0' and 'xargs -0' combination is pretty neat given that file names cannot contain ASCII NUL '\0' characters. My xargl isn't as complete as it would need to be to be worth publishing - especially since GNU has facilities that are at least as good.

For me, I was trying to do something a little different. I wanted to copy my .txt files into my tmp folder. The .txt filenames contain spaces and apostrophe characters. This worked on my Mac.
$ find . -type f -name '*.txt' | sed 's/'"'"'/\'"'"'/g' | sed 's/.*/"&"/' | xargs -I{} cp -v {} ./tmp/

If find and xarg versions on your system doesn't support -print0 and -0 switches (for example AIX find and xargs) you can use this terribly looking code:
find . -name "*foo*" | sed -e "s/'/\\\'/g" -e 's/"/\\"/g' -e 's/ /\\ /g' | xargs cp /your/dest
Here sed will take care of escaping the spaces and quotes for xargs.
Tested on AIX 5.3

I created a small portable wrapper script called "xargsL" around "xargs" which addresses most of the problems.
Contrary to xargs, xargsL accepts one pathname per line. The pathnames may contain any character except (obviously) newline or NUL bytes.
No quoting is allowed or supported in the file list - your file names may contain all sorts of whitespace, backslashes, backticks, shell wildcard characters and the like - xargsL will process them as literal characters, no harm done.
As an added bonus feature, xargsL will not run the command once if there is no input!
Note the difference:
$ true | xargs echo no data
no data
$ true | xargsL echo no data # No output
Any arguments given to xargsL will be passed through to xargs.
Here is the "xargsL" POSIX shell script:
#! /bin/sh
# Line-based version of "xargs" (one pathname per line which may contain any
# amount of whitespace except for newlines) with the added bonus feature that
# it will not execute the command if the input file is empty.
#
# Version 2018.76.3
#
# Copyright (c) 2018 Guenther Brunthaler. All rights reserved.
#
# This script is free software.
# Distribution is permitted under the terms of the GPLv3.
set -e
trap 'test $? = 0 || echo "$0 failed!" >& 2' 0
if IFS= read -r first
then
{
printf '%s\n' "$first"
cat
} | sed 's/./\\&/g' | xargs ${1+"$#"}
fi
Put the script into some directory in your $PATH and don't forget to
$ chmod +x xargsL
the script there to make it executable.

bill_starr's Perl version won't work well for embedded newlines (only copes with spaces). For those on e.g. Solaris where you don't have the GNU tools, a more complete version might be (using sed)...
find -type f | sed 's/./\\&/g' | xargs grep string_to_find
adjust the find and grep arguments or other commands as you require, but the sed will fix your embedded newlines/spaces/tabs.

I used Bill Star's answer slightly modified on Solaris:
find . -mtime +2 | perl -pe 's{^}{\"};s{$}{\"}' > ~/output.file
This will put quotes around each line. I didn't use the '-l' option although it probably would help.
The file list I was going though might have '-', but not newlines. I haven't used the output file with any other commands as I want to review what was found before I just start massively deleting them via xargs.

I played with this a little, started contemplating modifying xargs, and realised that for the kind of use case we're talking about here, a simple reimplementation in Python is a better idea.
For one thing, having ~80 lines of code for the whole thing means it is easy to figure out what is going on, and if different behaviour is required, you can just hack it into a new script in less time than it takes to get a reply on somewhere like Stack Overflow.
See https://github.com/johnallsup/jda-misc-scripts/blob/master/yargs and https://github.com/johnallsup/jda-misc-scripts/blob/master/zargs.py.
With yargs as written (and Python 3 installed) you can type:
find .|grep "FooBar"|yargs -l 203 cp --after ~/foo/bar
to do the copying 203 files at a time. (Here 203 is just a placeholder, of course, and using a strange number like 203 makes it clear that this number has no other significance.)
If you really want something faster and without the need for Python, take zargs and yargs as prototypes and rewrite in C++ or C.

You might need to grep Foobar directory like:
find . -name "file.ext"| grep "FooBar" | xargs -i cp -p "{}" .

If you are using Bash, you can convert stdout to an array of lines by mapfile:
find . | grep "FooBar" | (mapfile -t; cp "${MAPFILE[#]}" ~/foobar)
The benefits are:
It's built-in, so it's faster.
Execute the command with all file names in one time, so it's faster.
You can append other arguments to the file names. For cp, you can also:
find . -name '*FooBar*' -exec cp -t ~/foobar -- {} +
however, some commands don't have such feature.
The disadvantages:
Maybe not scale well if there are too many file names. (The limit? I don't know, but I had tested with 10 MB list file which includes 10000+ file names with no problem, under Debian)
Well... who knows if Bash is available on OS X?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

linux grep match a subset of files from a previous match - linux

Related

Using cut in Linux Mint Terminal more precisely

listing file in unix and saving the output in a variable(Oldest File fetching for a particular extension)

Easy replace with/without regex in multiple files

How to tell how many files match description with * in unix

How can I use xargs to copy files that have spaces and quotes in their names?

Categories

Resources