I'm trying to grep multiple arguments in shell.
I put orders like ./script arg1 arg2.. argN
And I want them to act
egrep -i "arg1" mydata | egrep -i "arg2" | ... egrep -i "argN" | awk -f display.awk
in order to match patterns in AND format.
What's wrong in my process?
Is it even right to code like
egrep -i "arg1" mydata | egrep -i "arg2" | ... egrep -i "argN" | awk -f display.awk
to get multiple patterns in AND format??
if [ $# -eq 0 ]
then
echo "Usage:phone searchfor [...searchfor]"
echo "(You didn't tell me what you want to search for.)"
exit 0
else
for arg in $*
do
if [ $arg -eq $1 ]
then
egrep -i "arg" mydata |
else
egrep -i "arg" |
fi
done
awk -f display.awk
fi
If my data has
'happy sunny bunny',
'sleepy bunny',
and 'happy sunny'
I want them to perform if I tried ./script happy sunny bunny
then only
'happy sunny bunny'
comes out.
and if i tried ./script bunny then
'happy sunny bunny'
'sleepy bunny'
both coming out.
The immediate fix is to move the pipe character to after the done.
Also, you should loop over "$#" to preserve the quoting of your arguments, and generally quote your variables.
if [ $# -eq 0 ]
then
# print diagnostics to stderr
echo "Usage: phone searchfor [...searchfor]" >&2
echo "(You didn't tell me what you want to search for.)" >&2
exit 0
fi
for arg in "$#"
do
# Add missing dash before eq
if [ "$arg " -eq "$1" ]
then
# Surely you want "$arg" here, not the static string "arg"?
grep -E -i "$arg" mydata
else
grep -E -i "$arg"
fi
done |
awk -f display.awk
The overall logic still seems flawed; you will be grepping standard input for the first argument if there are more than two arguments. Perhaps you want to add an option to allow the user to specify an input file name, with - to specify standard input? And then all the regular arguments will be search strings, like the usage message suggests.
If indeed the intent is to loop over all the arguments to produce a logical AND, try this:
also () {
local what
what=$1
shift
if [ $# -gt 0 ]; then
grep -E -i "$what" | also "$#"
else
grep -E -i "$what"
fi
}
also "$#" <mydata | awk -f display.awk
... though a better implementation might be to build a simple Awk or sed script from the arguments:
script='1'
for arg in "$#"; do
script="$script && tolower(\$0) ~ tolower(\"$arg\")"
done
awk "$script" | awk -f display.awk
This breaks down if the search phrases could contain regex specials, though (which of course is true for the grep -E version as well; but then you could easily switch to grep -F).
Merging the two Awk scripts into one should probably not be hard either, though without seeing display.awk, this is speculative.
You can solve it recursively:
#! /bin/bash
if (( $# == 0)); then
exec cat
else
arg=$1; shift
egrep "$arg" | "$0" "$#"
fi
The recursion ends, if the script is called with no arguments. In this case it behaves like cat. In your example you can put your awk there. If the script is called with one or more arguemnts, it calles egrep with the first argument ($1) and passes the remaining arguments ($# after shift) to itself ($0).
Example:
$ ./recursive-egrep sys < /etc/passwd
sys:x:3:3:sys:/dev:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
systemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false
$ ./recursive-egrep sys no < /etc/passwd
sys:x:3:3:sys:/dev:/usr/sbin/nologin
Use G from https://gitlab.com/ole.tange/tangetools/tree/master/G which does this (except for the awk part).
SYNOPSIS
G [[grep options] string] [[grep options] string] ...
DESCRIPTION
G is a shorthand of writing (search for single lines matching expressions):
grep --option string | grep --option2 string2
or with -g (search full files matching expressions):
find . -type f | xargs grep -l string1 | xargs grep -l string1
Related
I have multiple fasta files, where the first line always contains a > with multiple words, for example:
File_1.fasta:
>KY620313.1 Hepatitis C virus isolate sP171215 polyprotein gene, complete cds
File_2.fasta:
>KY620314.1 Hepatitis C virus isolate sP131957 polyprotein gene, complete cds
File_3.fasta:
>KY620315.1 Hepatitis C virus isolate sP127952 polyprotein gene, complete cds
I would like to take the word starting with sP* from each file and rename each file to this string (for example: File_1.fasta to sP171215.fasta).
So far I have this:
$ for match in "$(grep -ro '>')";do
fname=$("echo $match|awk '{print $6}'")
echo mv "$match" "$fname"
done
But it doesn't work, I always get the error:
grep: warning: recursive search of stdin
I hope you can help me!
you can use something like this:
grep '>' *.fasta | while read -r line ; do
new_name="$(echo $line | cut -d' ' -f 6)"
old_name="$(echo $line | cut -d':' -f 1)"
mv $old_name "$new_name.fasta"
done
It searches for *.fasta files and handles every "hitted" line
it splits each result of grep by spaces and gets the 6th element as new name
it splits each result of grep by : and gets the first element as old name
it
moves/renames from old filename to new filename
There are several things going on with this code.
For a start, .. I actually don't get this particular error, and this might be due to different versions.
It might resolve to the fact that grep interprets '>' the same as > due to bash expansion being done badly. I would suggest maybe going for "\>".
Secondly:
fname=$("echo $match|awk '{print $6}'")
The quotes inside serve unintended purpose. Your code should like like this, if anything:
fname="$(echo $match|awk '{print $6}')"
Lastly, to properly retrieve your data, this should be your final code:
for match in "$(grep -Hr "\>")"; do
fname="$(echo "$match" | cut -d: -f1)"
new_fname="$(echo "$match" | grep -o "sP[^ ]*")".fasta
echo mv "$fname" "$new_fname"
done
Explanations:
grep -H -> you want your grep to explicitly use "Include Filename", just in case other shell environments decide to alias grep to grep -h (no filenames)
you don't want to be doing grep -o on your file search, as you want to have both the filename and the "new filename" in one data entry.
Although, i don't see why you would search for '>' and not directory for 'sP' as such:
for match in "$(grep -Hro "sP[0-9]*")"
This is not the exact same behaviour, and has different edge cases, but it just might work for you.
Quite straightforward in (g)awk :
create a file "script.awk":
FNR == 1 {
for (i=1; i<=NF; i++) {
if (index($i, "sP")==1) {
print "mv", FILENAME, $i ".fasta"
nextfile
}
}
}
use it :
awk -f script.awk *.fasta > cmmd.txt
check the content of the output.
mv File_1.fasta sP171215.fasta
mv File_2.fasta sP131957.fasta
if ok, launch rename with . cmmd.txt
For all fasta files in directory, search their first line for the first word starting with sP and rename them using that word as the basename.
Using a bash array:
for f in *.fasta; do
arr=( $(head -1 "$f") )
for word in "${arr[#]}"; do
[[ "$word" =~ ^sP* ]] && echo mv "$f" "${word}.fasta" && break
done
done
or using grep:
for f in *.fasta; do
word=$(head -1 "$f" | grep -o "\bsP\w*")
[ -z "$word" ] || echo mv "$f" "${word}.fasta"
done
Note: remove echo after you are ok with testing.
I want to find all function in c file and print those functions, but i do not know how Correct expression to grep the variable $i
find=c
for i in *; do
if [ "${i}" != "${i%.${find}}" ]
then
echo "$i"
grep "^int|^void" "${1}-${i}" | sed 's/{//g'
else
echo "unable to find any funcitions"
fi
done
I agree with the commenters, that tool they speak about is ctags, you might already have it on your system. To get the list of functions:
ctags -o - yourFile.c | awk '$4=="f"{print $1}'
My first parameter is the file that contains the given words and the rest are the other directories in which I'm searching for files, that contain at least 3 of the words from the 1st parameter
I can successfully print out the number of matching words, but when testing if it's greater then 3 it gives me the error: test: too many arguments
Here's my code:
#!/bin/bash
file=$1
shift 1
for i in $*
do
for j in `find $i`
do
if test -f "$j"
then
if test grep -o -w "`cat $file`" $j | wc -w -ge 3
then
echo $j
fi
fi
done
done
You first need to execute the grep | wc, and then compare that output with 3. You need to change your if statement for that. Since you are already using the backquotes, you cannot nest them, so you can use the other syntax $(command), which is equivalent to `command`:
if [ $(grep -o -w "`cat $file`" $j | wc -w) -ge 3 ]
then
echo $j
fi
I believe your problem is that you are trying to get the result of grep -o -w "cat $file" $j | wc -w to see if it's greater or equal to three, but your syntax is incorrect. Try this instead:
if test $(grep -o -w "`cat $file`" $j | wc -w) -ge 3
By putting the grep & wc commands inside the $(), the shell executes those commands and uses the output rather than the text of the commands themselves. Consider this:
> cat words
western
found
better
remember
> echo "cat words | wc -w"
cat words | wc -w
> echo $(cat words | wc -w)
4
> echo "cat words | wc -w gives you $(cat words | wc -w)"
cat words | wc -w gives you 4
>
Note that the $() syntax is equivalent to the double backtick notation you're already using for the cat $file command.
Hope this helps!
Your code can be refactored and corrected at few places.
Have it this way:
#!/bin/bash
input="$1"
shift
for dir; do
while IFS= read -r d '' file; do
if [[ $(grep -woFf "$input" "$file" | sort -u | wc -l) -ge 3 ]]; then
echo "$file"
fi
done < <(find "$dir" -type f -print0)
done
for dir loops through all the arguments
Use of sort -u is to remove duplicate words from output of grep.
Usewc -linstead ofwc -wsincegrep -o` prints matching words in separate lines.
find ... -print0 is to take care of file that may have whitespaces.
find ... -type f is to retrieve only files and avoid checking for -f later.
I tried to create a script from this suggestion like this :
#!/bin/bash
if [ $# -eq 0 ]; then
tail -f /var/log/mylog.log
fi
if [ $# -eq 1 ]; then
tail -f /var/log/mylog.log | perl -pe 's/.*$1.*/\e[1;31m$&\e[0m/g'
fi
It shows black tail of the file when I pass no arguments to the script, but every line is red when I pass an argument. I would like it to color only lines which contain the word passed to the script.
For example, this would color lines containing word "info" :
./color_lines.sh info
How to change the script to work with one argument?
Do not quote the argument variable:
tail -f input | perl -pe 's/.*'$1'.*/\e[1;31m$&\e[0m/g'
You can also use grep for this:
tail -f input | grep -e $1 -e '' --color=always
and to color the whole line with grep:
tail -f input | grep -e ".*$1.*" -e '' --color=always
I have an excersise in which I have to print all the file names which are contained in the current folder, which contain in the them one of the letters [a-k] and [m-p] and [1-9] atleast 1 time (each).
I probably have to use ls (glob-style).
If order is important then you can use globbing:
$ ls *[a-k]*[m-p]*[1-9]*
ajunk404 am1 cn5
Else just grep for each group separately:
ls | grep "[a-k]" | grep "[m-p]" | grep "[1-9]"
1ma
ajunk404
am1
cn5
m1a
Note: ls will show directories if you really only want files use find inside:
find . -maxdepth 1 -type f | grep "[a-k]" | grep "[m-p]" | grep "[1-9]"
A 100% pure bash (and funny!) possibility:
#!/bin/bash
shopt -s nullglob
a=( *[a-k]* )
b=(); for i in "${a[#]}"; do [[ "$i" = *[p-z]* ]] && b+=( "$i" ); done
c=(); for i in "${b[#]}"; do [[ "$i" = *[1-9]* ]] && c+=( "$i" ); done
printf "%s\n" "${c[#]}"
No external processes whatsoever! No pipes! Only pure bash! 100% safe regarding files with funny symbols in their name (e.g., newlines) (and that's not the case with other methods using ls). And if you want to actually see the funny symbols in the file names and have them properly quoted, so as to reuse the output, use
printf "%q\n" "${c[#]}"
in place of the last printf statement.
Note. The patterns [a-k], [p-z] are locale-dependent. You might want to set LC_ALL=C to be sure that [a-k] really means [abcdefghijk] and not something else, e.g., [aAbBcCdDeEfFgGhHiIjJk].
Hope this helps!
If order isn't important, and the letters appear once or more, you can use chained greps.
ls | egrep "[a-k]" | egrep "[m-p]" | egrep "[1-9]"
If order matters, then just use a glob pattern
ls *[a-k]*[m-p]*[1-9]*
To be complete, you need to search all the combinations:
ls *[a-k]*[m-p]*[1-9]* *[a-k]*[1-9]*[m-p]* \
*[m-p]*[a-k]*[1-9]* *[m-p]*[1-9]*[a-k]* \
*[1-9]*[m-p]*[a-k]* *[1-9]*[a-k]*[m-p]*