Search Text with Linebreaks recursiv in a directory? - linux

I Have many large logfiles which are looks like that:
DATETIME ["2015-03-03 21:52"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","BBB","TEST1"]
DATETIME ["2015-03-03 21:53"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","CCC"]
POST ["POST_JSON","DDD","TEST2"]
DATETIME ["2015-03-03 21:54"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","BBB","TEST3"]
DATETIME ["2015-03-03 21:55"]
SERVER [{json_with_$_SERVER-Output}]
GET ["GET_JSON","AAA"]
POST ["POST_JSON","EEE","TEST4"]
I want to search about 2 keywords (between them are linebreaks). one specific word in the GET-Line and one specific word in the POST-Line.
i need something like:
grep "GET(.*)AAA(.*)POST(.*)BBB"
what im searching for: AAA (in GET-Line) && BBB (In POST-Line)
the expected result:
POST ["POST_JSON","BBB","TEST1"]
POST ["POST_JSON","BBB","TEST3"]
with which simple methods this is doable?

Using GNU awk for the 3rd arg to match():
$ find . -type f |
xargs gawk -v RS= 'match($0,/\nGET.*AAA.*\n(POST.*BBB.*)/,a){print a[1]}'
POST ["POST_JSON","BBB","TEST1"]
POST ["POST_JSON","BBB","TEST3"]
Add -v ORS='\n\n' if you really want a blank line between output lines.

grep is the command you are searching for
grep -rHn "GET.*KEYWORD_A" -A1 /path/to/files | grep "POST.*KEYWORD_B"
I would first grep for lines containing KEYWORD_A and append one line after the match since the POST comes after the GET in your logfiles. Then search for KEYWORD_B
-r greps recursively in a directory
-H prints the file name
-n prints the line number

i solved this with grep -P for Regular Expressions as i know it from PHP and particularly with -A to get the next n Lines. Then i filtered the result with "|" and grep -P again

Related

Can grep show output only if the line contain another search string? [duplicate]

I am trying to extract text from a file between a < and a >, but only on a line starting with another specific pattern.
So in a file that looks like:
XXX Something here
XXX Something more here
XXX <\Lines like this are a problem>
ZZZ something <\This is the text I need>
XXX Don't need any of this
I would like to print only the <\This is the text I need>.
If I do
sed -n '/^ZZZ/p' FILENAME
it pulls the correct lines I need to look at, but obviously prints the whole line.
sed -n '/<\/,/>/p' FILENAME prints way too much.
I have looked into grouping and tried
sed -n '/^ZZZ/{/<\/,/>/} FILENAME
but this doesn't seem to work at all.
Any suggestions? They will be much appreciated.
(Apologies for formatting, never posted on here before)
sed -n '/^ZZZ/ { s/^.*\(<.*>\).*$/\1/p }'
If it does not have to be sed and you have a fairly recent grep, you may use grep's option -o as in
grep '^ZZZ' | grep -o '<[^>]*>'
An awk version
awk -F"<|>" '/^ZZZ/ {print "<"$2">"}' file
<\This is the text I need>

Extract substring with sed/awk/grep from .gff file

I have a file containing multiple lines like this:
NODE_1_length Prodigal:2.6 CDS 11 274 . + 0 ID=PROKKA_00001;inference=ab initio prediction:Prodigal:2.6;locus_tag=PROKKA_00001;product=hypothetical protein
And I want to extract the ID=PROKKA_[whatever number] and everything that comes after 'product=' to obtain an output like this:
ID=PROKKA_00001 product=hypothetical protein
I am not very skilled in using sed, so I tried to adapt some solutions I found here and around but didn't manage to get through. It is also fine if the solution comes in two step (one for the ID, one for the product), then I can merge the two results in a single file.
I would be grateful if you could include an explanation of the regex used.
So far I tried to split the problem in two (starting from the ID) and tried:
grep -o 'ID=PROKKA_[0-9]{1,5}*'
sed 's/^ID=PROKKA[0-9]*;//g/
grep -Po 'ID="K[^"]*'
but of course none of them worked.
Thanks for helping!
You may use grep -oE:
grep -oE 'ID=PROKKA_[0-9]+|product=[^;:]+' file
ID=PROKKA_00001
product=hypothetical protein
If you want result in same line then use grep + paste:
grep -oE 'ID=PROKKA_[0-9]+|product=[^;:]+' file | paste -s

How to grep for a matching word, not the surrounding line, with a wildcard?

Maybe an odd question, but I'm attempting to grep the output of a command to select just the matching word and not the line. This word also has a wildcard in it.
git log --format=%aD <file> | tail -1 | grep -oh 201
The first and second sections of the command check the git log for a file and grabs the line pertaining to the date and time of creation. I'm attempting to write a bash script that does something with the year it was created, so I need to grab just that one word (the year).
Looking at the grep documentation, -o specifically prints the matching word (and -h suppresses filenames). I can't find anything that allows for matching the rest of the word that it's matching, though (I could just be spacing).
So the output of that previous command is:
201
And I need it to be (as an example):
2017
Help would be much appreciated!
You can use . as a wildcard character:
$ echo 'before2017after' | grep -o '201.'
2017
Or, better yet, specify that the fourth character be a digit:
$ echo 'before2017after' | grep -o '201[[:digit:]]'
2017
Notes:
Since you are getting input from stdin, there are no filenames. Consequently, in this case, -h changes nothing.
[[:digit:]] is a unicode-safe way of specifying a digit.

Output grep results to text file, need cleaner output

When using the Grep command to find a search string in a set of files, how do I dump the results to a text file?
Also is there a switch for the Grep command that provides cleaner results for better readability, such as a line feed between each entry or a way to justify file names and search results?
For instance, a away to change...
./file/path: first result
./another/file/path: second result
./a/third/file/path/here: third result
to
./file/path: first result
./another/file/path: second result
./a/third/file/path/here: third result
grep -n "YOUR SEARCH STRING" * > output-file
The -n will print the line number and the > will redirect grep-results to the output-file.
If you want to "clean" the results you can filter them using pipe | for example:
grep -n "test" * | grep -v "mytest" > output-file
will match all the lines that have the string "test" except the lines that match the string "mytest" (that's the switch -v) - and will redirect the result to an output file.
A few good grep-tips can be found in this post
Redirection of program output is performed by the shell.
grep ... > output.txt
grep has no mechanism for adding blank lines between each match, but does provide options such as context around the matched line and colorization of the match itself. See the grep(1) man page for details, specifically the -C and --color options.
To add a blank line between lines of text in grep output to make it easier to read, pipe (|) it through sed:
grep text-to-search-for file-to-grep | sed G

How to find the particular text stored in the file "data.txt" and it occurs only once

The line I seek is stored in the file data.txt and is the only line of text that occurs only once.
How do I go about finding that particular line using linux?
This is a little bit old, but I think you are looking for this...
cat data.txt | sort | uniq -u
This will show the unique values that only occur once in the file. I assume you are familiar with "over the wire" if you are asking?? If so, this is what you are looking for.
To provide some context (I need more rep to comment) this is a question that features in an online "wargame" called Bandit that involves using the command line to discover passwords on an online Linux server to advance up the levels.
For those who would like to see data.txt in full I've Pastebin'd it here however it looks like this:
NN4e37KW2tkIb3dC9ZHyOPdq1FqZwq9h
jpEYciZvDIs6MLPhYoOGWQHNIoQZzE5q
3rpovhi1CyT7RUTunW30goGek5Q5Fu66
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
JOaWd4uAPii4Jc19AP2McmBNRzBYDAkO
9WV67QT4uZZK7JHwmOH0jnhurJMwoGZU
a2GjmWtTe3tTM0ARl7TQwraPGXgfkH4f
7yJ8imXc7NNiovDuAl1ZC6xb0O0mMBx1
UsvVyFSfZZWbi6wgC7dAFyFuR6jQQUhR
FcOJhZkHlnwqcD8QbvjRyn886rCrnWZ7
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
E3ugYDa6Wh2y8C8xQev7vOS8O3OgG1Hw
ME7nnzbId4W3dajsl6Xtviyl5uhmMenv
J5lN3Qe4s7ktiwvcCj9ZHWrAJcUWEhUq
aouHvjzagN8QT2BCMB6e9rlN4ffqZ0Qq
ZRF5dlSuwuVV9TLhHKvPvRDrQ2L5ODfD
9ZjR3NTHue4YR6n4DgG5e0qMQcJjTaiM
QT8Bw9ofH4x3MeRvYAVbYvV1e1zq3Xim
i6A6TL6nqvjCAPvOdXZWjlYgyvqxmB7k
tx7tQ6kgeJnC446CHbiJY7fyRwrwuhrs
One way to do it is to use:
sort data.txt | uniq -u
The sort command is like cat in that it displays the contents of the file however it sorts the file lexicographically by lines (it reorders them alphabetically so that matching ones are together).
The | is a pipe that redirects the output from one command into another.
The uniq command reports or omits repeated lines and by passing it the -u argument we tell it to report only unique lines.
Used together like this, the command will sort data.txt lexicographically by each line, find the unique line and print it back in the terminal for you.
sort -u data.txt | while read line; do if [ $(grep -c $line data.txt) == 1 ] ;then echo $line; fi; done
was mine solution, until I saw here easy one:
sort data.txt | uniq -u
Add more information to you post.
How data.txt look like?
Like this:
11111111
11111111
pass1111
11111111
Or like this
afawfdgd
password
somethin
gelse...
And, do you know the password is in file or you search for not repeat string.
If you know password, use something like this
cat data.txt | grep 'password'
If you don`t know the password and this password is only unique line in file you must create a script.
For example in Python
file = open("data.txt","r")
f = file.read()
for line in f:
if 'pass' in line:
print pass
Of course replace pass with something else.
For example some slice from line.
And one with only one tool in use, awk:
awk '{a[$1]++}END{for(i in a){if(a[i] == 1){print i} }}' data.txt
sort data.txt | uniq -c | grep 1\ ?*
and it will print the only text that occurs only one time
do not forget to put space after the backslash
sort data.txt | uniq -c | grep 1
you will find only one that accures one time

Resources