grep url pattern matching - linux

I'm looking to count url pattern in access log like
action.php?show_page=next&offset=1&xyzzzzz
Note that I need all url where offset values are between 1 to 9. Examples:
action.php?show_page=next&offset=1&xyzzzzz
action.php?show_page=next&offset=2&xyzzzzz
action.php?show_page=next&offset=3&xyzzzzz
.............
action.php?show_page=next&offset=9&xyzzzzz
This is what I tried:
grep "action.php?show_page=next" access.log.2 | grep "offset=[1-9]&"| wc -l

One way using grep:
grep -oc "action.php?show_page=next&offset=[1-9]&xyzzzzz" file.txt

You should scape the "?" of the first grep.
try with the regex:
action.php\?show_page=next&offset=[1-9]

Related

How to replace Pipe with a new line in Linux?

Please, accept my apologies, if this question was asked before. I am new and do not know how to do it. I have a file containing the data like this:
name=1|surname=2|phone=3|email=4
phone=5|surname=6|name=7|email=8
surname=9|phone=10|email=11|name=12
phone=13|email=14|name=15|surname=6
I would like to have a file like this:
name=1
name=7
name=12
name=15
Thanks in advance!
Say names.txt is your file, then use something like :
cat names.txt | tr "|" "\n" | grep "^name="
tr transforms | to newlines
grep filters for the lines with name
And here is a one command solution with GNU awk:
awk -v RS="[|\n]" '/^name=/' names.txt
the -v RS="[|\n]' set the record separatro to|` or newline
the /^name=/ filters for records starting with name= (and implicitly prints them)
I would go for the solution of #Lars, but I wanted to test this with "lookbehind".
With grep you can get the matches only with grep -o, but the following line will also find surname:
grep -o "name=[0-9]*" names.txt
You can fix this a little by looking for the character before name (start of line with ^ or |).
grep -o "(^|\|)name=[0-9]*" names.txt
What a fix! Now you get the right names, but sometimes with an extra |.
With \K (and grep option -P) you can tell grep to use something for the matching but skip it during output.
grep -oP "(^|\|)\Kname=[0-9]*" names.txt

How do I grep in a list of files targeted by a previous grep?

I am using grep to get a list of files that I want to use for another grep search (and not simply piping it).
For example I got as an output:
file1.h:XXX: linecontent
file2.h:XXX: linecontent
file3.h:XXX: linecontent
file4.h:XXX: linecontent
and I want to grep only file1.h, file2.h ...
I'm assuming you want to search for files that contain two different patterns. If so this is what you want:
grep 'your pattern 2' `grep -l 'your pattern 1' *`
The contents of the back quotes will be executed first and the output substituted into the command line. Use of the -l flag will restrict the output of the grep command to just the file names.
If there are a very large number of files that match against your pattern 1 this could fail. The solution for that is to use xargs
grep -l 'your pattern 1' * | xargs grep 'your argument 2'
Assuming what you want is the names of files that contain 'lineofcontent', you could use:
grep -l 'lineofcontent' file*.h

grep a particular content before a period

I am trying to read/grep a particular word or content that is before a period (.).
e.g. file1 has abinaya.ashok and I want to grep whatever is before the period (.) without hardcoding anything.
if I try
grep \.\ file1
it gives abinaya.ashok.
I've tried: grep\*\.\ file1
it doesn't give anything.Can we find it using grep commands or should we do it only using awk command? Any thoughts?
Using GNU grep for PCRE regex (for non-greedy and positive look-ahead), you can do:
echo 'abinaya.ashok' | grep -oP '.*?(?=\.)'
abinaya
Using awk:
echo 'abinaya.ashok' | awk -F\. '{print $1}'
abinaya
Check the following simple examples.
Including the dot:
$ echo abinaya.ashok | grep -o '.*[.]'
abinaya.
Without the dot:
$ echo abinaya.ashok | grep -o '^[^.]\+'
abinaya
Hope I understand you correctly:
sed -n 's/\..*//p' file1 | grep whatever
sed expression will print only part before dot (lines without dot are not printed).
Now use grep to search what you need.

grep command in searching wildcards

askljasklj.
twuikso lsliosus.
sakjsiua .sfdds* askhkjash.
Here I want to grep lines containing .* pattern.
Using command cat file | grep ".\*" is giving output sakjsiua .sfdds* askhkjash, but this should not be output or I am using the wrong command
Can anyone help here?
I assume that you have the words in a file foo. Hence giving a command:
cat foo | grep ".*"
Would print everything. If you just want .* pattern then use the below command:
cat foo | grep "\.\*"
By putting \ we can making . and * to behave as a normal text character
you should escape them both as below:
grep "\.\*" yourfile

How to extract distinct part of a string from a file in linux

I'm using the following command to extract distinct urls that contain .com extension and may contain .us or whatever country extension.
grep '\.com' source.txt -m 700 | uniq | sed -e 's/www.//'
> dest.txt
The problem is that, it extracts urls in the same doamin, the thing tht I don't want. Ex:
abc.yahoo.com
efg.yahoo.com
I only need the yahoo.com. How can I using grep or any other command extract distinct domain names only ?
Maybe something like this?
egrep -io '[a-z0-9\-]+\.[a-z]{2,3}(\.[a-z]{2})?' source.txt
Have you tried using awk in instead of sed and specify "." as the delimiter and only print out the two last fields.
awk -F "." '{ print $(NF-1)"."$NF }'
Perhaps something like this should help:
egrep -o '[^.]*.com' file

Resources