I'm looking to count url pattern in access log like
action.php?show_page=next&offset=1&xyzzzzz
Note that I need all url where offset values are between 1 to 9. Examples:
action.php?show_page=next&offset=1&xyzzzzz
action.php?show_page=next&offset=2&xyzzzzz
action.php?show_page=next&offset=3&xyzzzzz
.............
action.php?show_page=next&offset=9&xyzzzzz
This is what I tried:
grep "action.php?show_page=next" access.log.2 | grep "offset=[1-9]&"| wc -l
One way using grep:
grep -oc "action.php?show_page=next&offset=[1-9]&xyzzzzz" file.txt
You should scape the "?" of the first grep.
try with the regex:
action.php\?show_page=next&offset=[1-9]
Related
Please, accept my apologies, if this question was asked before. I am new and do not know how to do it. I have a file containing the data like this:
name=1|surname=2|phone=3|email=4
phone=5|surname=6|name=7|email=8
surname=9|phone=10|email=11|name=12
phone=13|email=14|name=15|surname=6
I would like to have a file like this:
name=1
name=7
name=12
name=15
Thanks in advance!
Say names.txt is your file, then use something like :
cat names.txt | tr "|" "\n" | grep "^name="
tr transforms | to newlines
grep filters for the lines with name
And here is a one command solution with GNU awk:
awk -v RS="[|\n]" '/^name=/' names.txt
the -v RS="[|\n]' set the record separatro to|` or newline
the /^name=/ filters for records starting with name= (and implicitly prints them)
I would go for the solution of #Lars, but I wanted to test this with "lookbehind".
With grep you can get the matches only with grep -o, but the following line will also find surname:
grep -o "name=[0-9]*" names.txt
You can fix this a little by looking for the character before name (start of line with ^ or |).
grep -o "(^|\|)name=[0-9]*" names.txt
What a fix! Now you get the right names, but sometimes with an extra |.
With \K (and grep option -P) you can tell grep to use something for the matching but skip it during output.
grep -oP "(^|\|)\Kname=[0-9]*" names.txt
I am using grep to get a list of files that I want to use for another grep search (and not simply piping it).
For example I got as an output:
file1.h:XXX: linecontent
file2.h:XXX: linecontent
file3.h:XXX: linecontent
file4.h:XXX: linecontent
and I want to grep only file1.h, file2.h ...
I'm assuming you want to search for files that contain two different patterns. If so this is what you want:
grep 'your pattern 2' `grep -l 'your pattern 1' *`
The contents of the back quotes will be executed first and the output substituted into the command line. Use of the -l flag will restrict the output of the grep command to just the file names.
If there are a very large number of files that match against your pattern 1 this could fail. The solution for that is to use xargs
grep -l 'your pattern 1' * | xargs grep 'your argument 2'
Assuming what you want is the names of files that contain 'lineofcontent', you could use:
grep -l 'lineofcontent' file*.h
I am trying to read/grep a particular word or content that is before a period (.).
e.g. file1 has abinaya.ashok and I want to grep whatever is before the period (.) without hardcoding anything.
if I try
grep \.\ file1
it gives abinaya.ashok.
I've tried: grep\*\.\ file1
it doesn't give anything.Can we find it using grep commands or should we do it only using awk command? Any thoughts?
Using GNU grep for PCRE regex (for non-greedy and positive look-ahead), you can do:
echo 'abinaya.ashok' | grep -oP '.*?(?=\.)'
abinaya
Using awk:
echo 'abinaya.ashok' | awk -F\. '{print $1}'
abinaya
Check the following simple examples.
Including the dot:
$ echo abinaya.ashok | grep -o '.*[.]'
abinaya.
Without the dot:
$ echo abinaya.ashok | grep -o '^[^.]\+'
abinaya
Hope I understand you correctly:
sed -n 's/\..*//p' file1 | grep whatever
sed expression will print only part before dot (lines without dot are not printed).
Now use grep to search what you need.
askljasklj.
twuikso lsliosus.
sakjsiua .sfdds* askhkjash.
Here I want to grep lines containing .* pattern.
Using command cat file | grep ".\*" is giving output sakjsiua .sfdds* askhkjash, but this should not be output or I am using the wrong command
Can anyone help here?
I assume that you have the words in a file foo. Hence giving a command:
cat foo | grep ".*"
Would print everything. If you just want .* pattern then use the below command:
cat foo | grep "\.\*"
By putting \ we can making . and * to behave as a normal text character
you should escape them both as below:
grep "\.\*" yourfile
I'm using the following command to extract distinct urls that contain .com extension and may contain .us or whatever country extension.
grep '\.com' source.txt -m 700 | uniq | sed -e 's/www.//'
> dest.txt
The problem is that, it extracts urls in the same doamin, the thing tht I don't want. Ex:
abc.yahoo.com
efg.yahoo.com
I only need the yahoo.com. How can I using grep or any other command extract distinct domain names only ?
Maybe something like this?
egrep -io '[a-z0-9\-]+\.[a-z]{2,3}(\.[a-z]{2})?' source.txt
Have you tried using awk in instead of sed and specify "." as the delimiter and only print out the two last fields.
awk -F "." '{ print $(NF-1)"."$NF }'
Perhaps something like this should help:
egrep -o '[^.]*.com' file