Filter out some input with GREP - linux

Echo "Hello everybody!"
I need to check whether the input argument of a linux script does comply with my security needs. It should contain only a-z characters, 0-9 digits, some spaces and the "+" sign. Eg.: "for 3 minutes do r51+r11"
This didn't worked for me:
if grep -v '[0123456789abcdefghijklmnopqrstuvwxyz+ ]' /tmp/input; then
echo "THIS DOES NOT COMPLY!";
fi
Any clues?

You are telling grep:
Show me every line that does not contain [0123456789abcdefghijklmnopqrstuvwxyz+ ]
Which would only show you lines that contains neither of the characters above. So a line only containing other characters, like () would match, but asdf() would not match.
Try instead to have grep showing you every line that contains charachter not in the list above:
if grep '[^0-9A-Za-z+ ]' file; then
If you find something that's not a number or a letter or a plus, then.

You want to test the entire row (assuming there is only one row in /tmp/input), not just whether a single character anywhere matches, so you need to anchor it to the start end end of the row. Try this regexp:
^[0123456789abcdefghijklmnopqrstuvwxyz+ ]*$
Note that you can shorten this using ranges:
^[0-9a-z+ ]*$

Related

How to print lines of file that start with d and end with number

I have this file named korad and I need to print all lines of this file that start with "d" and end with number, This is what I tried:
grep "^d*[0-9]$" korad
What about this:
grep "^d" korad | grep "[0-9]$"
This first filters the lines, starting with letter "d" and afterwards filters those results with the lines, ending with a number. Like that, you don't need to worry about anything being present between the first and the last character.
In case you don't understand the vertical bar, it's called a pipe, which is (amongst others) explained here.
grep -E '^d(?:.*)\d$' korad
To get all lines starting with d and ending with a digit
Regex101 Demo
Suggesting:
grep "^d.*[[:digit:]]$" korad

grep obtains pattern from a file but printing not only the whole match word

I've got file.txt to extract lines containing the exact words listed in check.txt file.
# file.txt
CA1C 2637 green
CA1C-S1 2561 green
CA1C-S2 2371 green
# check.txt
CA1C
I tried
grep -wFf check.txt file.txt
but I'm not getting the desired output, i.e. all the three lines were printed.
Instead, I'd like to get only the first line,
CA1C 2637 green
I searched and found this post being relevant, it's easy to do it when doing only one word matching. But how can I improve my code to let grep obtain patterns from check.txt file and print only the whole word matched lines?
A lot of thanks!
The man page for grep says the following about the -w switch:
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
In your case, all three lines start with "CA1C-", which meets the conditions of being at the beginning of the line, and being followed by a non-word constituent character (the hyphen).
I would do this with a loop, reading lines manually from check.txt:
cat check.txt | while read line; do grep "^$line " file.txt; done
CA1C 2637 green
This loop reads the lines from check.txt, and searches for each one at the start of a line in file.txt, with a following space.
There may be a better way to do this, but I couldn't get -f to actually consider whitespace at the end of a line of the input file.

Removing number of dots with grep using regular expression

How can I remove lines that contain more than 5 "." or less than 5 dots (simply put: 5 dots per line?
How can I write a regex that will detect it in bash using grep?
INPUT:
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
EXPECTED OUTPUT:
yGEtfWYBCBKtvxTbHxwK,176.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
Tried:
grep -P '[.]{5}' stuff.txt
grep -P '[\.]{5}' stuff.txt
grep -P '([\.]{5})' stuff.txt
grep -P '\.{5}' stuff.txt
grep -E '([\.]{5}' stuff.txt
You can display only the lines that contain exactly 5 dots as follow :
grep '^[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*$' stuff.txt
or if you want to factor it :
grep -E '^([^.]*\.){5}[^.]*$' stuff.txt
Using -ERE in this second one is helpful to avoid having to escape the \(\) and \{\}, in the first one grep's default BRE regex flavour is sufficient.
^ and $ are anchors representing respectively the start and end of the line that make sure we match the whole line and not just a part of it that contains 5 dots.
[^.] is a negated character class that will match anything but a dot.
They are quantified with * so that any number of non-dot characters can happen between each dot (you might want to change that to + if consecutive dots shouldn't be matched).
\. matches a literal dot (rather than any character, which the meta-character . outside of a character class would).
To detect specifically the bad IP address
Can you be certain that the IP address is always surrounded by commas and does not contain spaces - i.e. is never the first or last field?
Then, you might get away with:
grep -E ',\w+((\.\w+){2,3}|(\.\w+){5,}),'
If not, it is quite difficult to distinguish between a broken IP form with spaces and an ordinary sentence, so you might have to specify the column.
Using Perl one-liner to print only if number of "." exceeds 5
> cat five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
> perl -ne '{ while(/\./g){$count++} print if $count > 5; $count=0 } ' five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
>

Do not print unmatched text with sed

I want to print only matched lines and strip unmatched ones, but with following:
$ echo test12 test | sed -n 's/^.*12/**/p'
I always get:
** test
instead of:
**
What am I doing wrong?
[edit1]
I provide more information of what I need - and actually I should start with it. So, I have a command which produced lots of lines of output, I want to grab only parts of the lines - the ones that matches, and strip the result. So in the above example 12 was meant to find end of matched part of the line, and instead of ** I should have put & which represents matched string. So the full example is:
echo test12 test | sed -n 's/^.*12/&/p'
which produces exactly the same output as input:
test12 test
the expected output is:
test12
As suggested I started to find a grep alternative and the following looks promising:
$ echo test12 test | grep -Eo "^.*12"
but I dont see how to format the matched part, this only strips unmatched text.
EDIT: In some cases, the -E flag might be needed for sed. But then the brackets don't need to be escaped anymore. check your sed's man page.
I think what you are looking for is this:
echo test12 test | sed -n 's/^\(.*12\).*$/\1/p'
if you want to discard the rest of the line, you have to match it as well, but not include it in the output. the \( and \) denote a group that is then referenced by the \1.
Good luck :)
Additional information on sed:
sed works on lines, and the ampersand characters represents the entire line that was matched by the given regular expression. if a regex is "open" at the end (i.e. doesn't end with the endline character ($), it acts as if .*$ is appended to the match string. (not sure if that is how it is implemented, but could very well be.)
Try:
echo test12 test | sed -n 's/^.*/**/p'
You don't need to match the number 12, since that is already being done in your regex.
Your regular expression is matching anything from the beginning of the line until the expression '12'. All the matched expression is replaced with '**', that is why you get '** test'. If you want only match I recommend you using grep.

Grep words containg 'n' number of letters given user input

I am trying to create a script (bash) that will take input (integer) from a user and grep all words containing that number of letters. I am okay with how grep basically works, but I am unsure how use input from user to determine the output
Here is what I started:
#!/bin/sh
echo " Content type: text/html"
echo
x=`expr $1`
I'm pretty sure the grep command would be as simple as grep^...integer from user$. Just don't know how to take use the user input. Thanks!
EDIT: I should have mentioned that "user input" would be entered as an argument (./script 6)
Run this script as ./script 6 and it will select all 6-letter words from the file text and display them:
#!/bin/sh
grep -Eo "\<[[:alpha:]]{$1}\>" text
Key parts of the regex:
\< signifies the start of a word.
[[:alpha:]]{$1} signifies $1 alphabetical characters. If you want an apostrophe, such as in don't, to be considered a valid word character, then add it inside the outer square backets like this: [[:alpha:]']{$1}
\> signifies the end of a word.
There are some limitations to grep's ability to understand human-language. For example, in the string don't, it considers the apostrophe to be a word boundary.
Example
I ran this script against the text of the question:
$ ./script.sh 9
basically
determine
mentioned
$ ./script.sh 10
containing
you can use read to accpet input from the user.
#!/bin/sh
echo $1 | grep ".\{$2\}"
now if yo call the script as ./script hello 5
The positional parameters $1 will be hello and $2 as 5
here the {m} matches lines with m lenght as . any character is matched for exactly m times

Resources