Grep words containg 'n' number of letters given user input - linux

I am trying to create a script (bash) that will take input (integer) from a user and grep all words containing that number of letters. I am okay with how grep basically works, but I am unsure how use input from user to determine the output
Here is what I started:
#!/bin/sh
echo " Content type: text/html"
echo
x=`expr $1`
I'm pretty sure the grep command would be as simple as grep^...integer from user$. Just don't know how to take use the user input. Thanks!
EDIT: I should have mentioned that "user input" would be entered as an argument (./script 6)

Run this script as ./script 6 and it will select all 6-letter words from the file text and display them:
#!/bin/sh
grep -Eo "\<[[:alpha:]]{$1}\>" text
Key parts of the regex:
\< signifies the start of a word.
[[:alpha:]]{$1} signifies $1 alphabetical characters. If you want an apostrophe, such as in don't, to be considered a valid word character, then add it inside the outer square backets like this: [[:alpha:]']{$1}
\> signifies the end of a word.
There are some limitations to grep's ability to understand human-language. For example, in the string don't, it considers the apostrophe to be a word boundary.
Example
I ran this script against the text of the question:
$ ./script.sh 9
basically
determine
mentioned
$ ./script.sh 10
containing

you can use read to accpet input from the user.
#!/bin/sh
echo $1 | grep ".\{$2\}"
now if yo call the script as ./script hello 5
The positional parameters $1 will be hello and $2 as 5
here the {m} matches lines with m lenght as . any character is matched for exactly m times

Related

Removing number of dots with grep using regular expression

How can I remove lines that contain more than 5 "." or less than 5 dots (simply put: 5 dots per line?
How can I write a regex that will detect it in bash using grep?
INPUT:
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
EXPECTED OUTPUT:
yGEtfWYBCBKtvxTbHxwK,176.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
Tried:
grep -P '[.]{5}' stuff.txt
grep -P '[\.]{5}' stuff.txt
grep -P '([\.]{5})' stuff.txt
grep -P '\.{5}' stuff.txt
grep -E '([\.]{5}' stuff.txt
You can display only the lines that contain exactly 5 dots as follow :
grep '^[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*$' stuff.txt
or if you want to factor it :
grep -E '^([^.]*\.){5}[^.]*$' stuff.txt
Using -ERE in this second one is helpful to avoid having to escape the \(\) and \{\}, in the first one grep's default BRE regex flavour is sufficient.
^ and $ are anchors representing respectively the start and end of the line that make sure we match the whole line and not just a part of it that contains 5 dots.
[^.] is a negated character class that will match anything but a dot.
They are quantified with * so that any number of non-dot characters can happen between each dot (you might want to change that to + if consecutive dots shouldn't be matched).
\. matches a literal dot (rather than any character, which the meta-character . outside of a character class would).
To detect specifically the bad IP address
Can you be certain that the IP address is always surrounded by commas and does not contain spaces - i.e. is never the first or last field?
Then, you might get away with:
grep -E ',\w+((\.\w+){2,3}|(\.\w+){5,}),'
If not, it is quite difficult to distinguish between a broken IP form with spaces and an ordinary sentence, so you might have to specify the column.
Using Perl one-liner to print only if number of "." exceeds 5
> cat five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
> perl -ne '{ while(/\./g){$count++} print if $count > 5; $count=0 } ' five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
>

linux script to find specific words in file names

I need help writing a script to do the following stated below in part a.
The following code will output all of the words found in $filename, each word on a separate line.
for word in “cat $filename”
do
echo $word
done
a. Write a new script which receives two parameters. The first is a file’s name ($1 instead of $filename) and the second is a word you want to search for ($2). Inside the for loop, instead of echo $word, use an if statement to compare $2 to $word. If they are equal, add one to a variable called COUNT. Before the for loop, initialize COUNT to 0 and after the for loop, output a message that tells the user how many times $2 appeared in $1. That is, output $COUNT, $2 and $1 in an echo statement but make sure you have some literal words in here so that the output actually makes sense to the user. HINTS: to compare two strings, use the notation [ $string1 == $string2 ]. To add one to a variable, use the notation X=$((X+1)). If every instruction is on a separate line, you do not need any semicolons. Test your script on /etc/fstab with the word defaults (7 occurrences should be found)
This is what I got so far, but it does not work right. It says it finds 0 occurrences of the word "defaults" in /etc/fstab. I am sure my code is wrong but can't figure out the problem. Help is appreciated.
count=0
echo “what word do you want to search for?: “
read two
for word in “cat $1”
do
if [ “$two” == “$word” ]; then
count=$((count+1))
fi
done
echo $two appeared $count times in $1
You need to use command substitution, you were looping over this string: cat first_parameter.
for word in $(cat "$1")
Better way to do this using grep, paraphrasing How do I count the number of occurrences of a word in a text file with the command line?
grep -o "\<$two\>" "$1" | wc -l

Bash: How to extract numbers preceded by _ and followed by

I have the following format for filenames: filename_1234.svg
How can I retrieve the numbers preceded by an underscore and followed by a dot. There can be between one to four numbers before the .svg
I have tried:
width=${fileName//[^0-9]/}
but if the fileName contains a number as well, it will return all numbers in the filename, e.g.
file6name_1234.svg
I found solutions for two underscores (and splitting it into an array), but I am looking for a way to check for the underscore as well as the dot.
You can use simple parameter expansion with substring removal to simply trim from the right up to, and including, the '.', then trim from the left up to, and including, the '_', leaving the number you desire, e.g.
$ width=filename_1234.svg; val="${width%.*}"; val="${val##*_}"; echo $val
1234
note: # trims from left to first-occurrence while ## trims to last-occurrence. % and %% work the same way from the right.
Explained:
width=filename_1234.svg - width holds your filename
val="${width%.*}" - val holds filename_1234
val="${val##*_}" - finally val holds 1234
Of course, there is no need to use a temporary value like val if your intent is that width should hold the width. I just used a temp to protect against changing the original contents of width. If you want the resulting number in width, just replace val with width everywhere above and operate directly on width.
note 2: using shell capabilities like parameter expansion prevents creating a separate subshell and spawning a separate process that occurs when using a utility like sed, grep or awk (or anything that isn't part of the shell for that matter).
Try the following code :
filename="filename_6_1234.svg"
if [[ "$filename" =~ ^(.*)_([^.]*)\..*$ ]];
then
echo "${BASH_REMATCH[0]}" #will display 'filename_6_1234.svg'
echo "${BASH_REMATCH[1]}" #will display 'filename_6'
echo "${BASH_REMATCH[2]}" #will display '1234'
fi
Explanation :
=~ : bash operator for regex comparison
^(.*)_([^.])\..*$ : we look for any character, followed by an underscore, followed by any character, followed by a dot and an extension. We create 2 capture groups, one for before the last underscore, one for after
BASH_REMATCH : array containing the captured groups
Some more way
[akshay#localhost tmp]$ filename=file1b2aname_1234.svg
[akshay#localhost tmp]$ after=${filename##*_}
[akshay#localhost tmp]$ echo ${after//[^0-9]}
1234
Using awk
[akshay#localhost tmp]$ awk -F'[_.]' '{print $2}' <<< "$filename"
1234
I would use
sed 's!_! !g' | awk '{print "_" $NF}'
to get from filename_1234.svg to _1234.svg then
sed 's!svg!!g'
to get rid of the extension.
If you set IFS, you can use Bash's build-in read.
This splits the filename by underscores and dots and stores the result in the array a.
IFS='_.' read -a a <<<'file1b2aname_1234.svg'
And this takes the second last element from the array.
echo ${a[-2]}
There's a solution using cut:
name="file6name_1234.svg"
num=$(echo "$name" | cut -d '_' -f 2 | cut -d '.' -f 1)
echo "$num"
-d is for specifying a delimiter.
-f refers to the desired field.
I don't know anything about performance but it's simple to understand and simple to maintain.

filtering output of who with grep and cut

I have this exercice :
Create a bash script that check if the user passed as a parameter is
connected and if he is display when he connected. Indications : use the command who, the grep filter and the
command cut.
But i have some trouble to solve it.
#!/bin/bash
who>who.txt;
then
grep $1 who.txt
for a in who.txt
do
echo "$a"
done
else
echo "$1 isnt connected"
fi
So first of all i want to only keep the line where i can find the user in a .txt and then i want to cut each part with a loop in the who command to keep only the date but the problem is that i don't know how to cut here because it's seperated with multiple spaces.
So i'am really blocked and i don't see where to go to do this. I'am a beginner with bash.
If I understand you simply want to check to see if a user is logged in, then that is what the users command is for. If you want to wrap it in a short script, then you could do something like the following:
#!/bin/bash
[ -z "$1" ] && { ## validate 1 argument given on command line
printf "error: insufficient input, usage: %s username.\n" "${0##*/}" >&2
exit 1
}
## check if that argument is among the logged in users
if $(users | grep -q "$1" >/dev/null 2>&1) ; then
printf " user: %s is logged in.\n" "$1"
else
printf " user: %s is NOT logged in.\n" "$1"
fi
Example/Use
$ bash chkuser.sh dog
user: dog is NOT logged in.
$ bash chkuser.sh david
user: david is logged in.
cut is a rather awkward tool for parsing who's output, unless you use fixed column positions. In delimiter mode, with -d ' ', each space makes a separate empty field. It's not like awk where fields are separated by a run of spaces.
who(1) output looks like this (and GNU who has no option to cut it down to just the username/time):
$ who
peter tty1 2015-11-13 18:53
john pts/13 2015-11-12 08:44 (10.0.0.1)
john pts/14 2015-11-12 08:44 (10.0.0.1)
john pts/15 2015-11-12 08:44 (10.0.0.1)
john pts/16 2015-11-12 08:44 (10.0.0.1)
peter pts/9 2015-11-14 16:09 (:0)
I didn't check what happens with very long usernames, whether they're truncated or whether they shift the rest of the line over. Parsing it with awk '{print $3, $4} would feel much safer, since it would still work regardless of exact column position.
But since you need to use cut, let's assume that those exact column positions (time starting from 23 and running until 38) are constant across all systems where we want this script to work, and all terminal widths. (who doesn't appear to vary its output for $COLUMNS or the tty-driver column width (the columns field in stty -a output)).
Putting all that together:
#!/bin/sh
who | grep "^$1 " | cut -c 23-38
The regex on the grep command line will only match at the beginning of the line, and has to match a space following the username (to avoid substring matches). Then those lines that match are filtered through cut, to extract only the columns containing the timestamp.
With an empty cmdline arg, will print the login time for every logged-in user. If the pattern doesn't match anything, the output will be empty. To explicitly detect this and print something else, capture the pipeline output with var=$(pipeline), and test if it's the empty-string or not.
This will print a time for every separate login from the same user. You could use grep's count limit arg (see the man page) to stop after one match, but it might not be the most recent time. You might use sort -n | head -1 or something.
If you don't have to write a loop in the shell, don't. It's much better to write a pipeline that makes one pass over the data. The shell itself is slow, but as long as it doesn't have to parse every line of what you're dealing with, that doesn't matter.
Also note how I quoted the expansion of $1 with double quotes, to avoid the shell applying word splitting and glob expansion to it.
For more shell stuff, see the Wooledge Bash FAQ and guide. That's a good place to get started learning idioms that don't suck (i.e. don't break when you have filenames and directories with spaces in them, or filenames containing a ?, or lines with trailing spaces that you want to not munge...).

Filter out some input with GREP

Echo "Hello everybody!"
I need to check whether the input argument of a linux script does comply with my security needs. It should contain only a-z characters, 0-9 digits, some spaces and the "+" sign. Eg.: "for 3 minutes do r51+r11"
This didn't worked for me:
if grep -v '[0123456789abcdefghijklmnopqrstuvwxyz+ ]' /tmp/input; then
echo "THIS DOES NOT COMPLY!";
fi
Any clues?
You are telling grep:
Show me every line that does not contain [0123456789abcdefghijklmnopqrstuvwxyz+ ]
Which would only show you lines that contains neither of the characters above. So a line only containing other characters, like () would match, but asdf() would not match.
Try instead to have grep showing you every line that contains charachter not in the list above:
if grep '[^0-9A-Za-z+ ]' file; then
If you find something that's not a number or a letter or a plus, then.
You want to test the entire row (assuming there is only one row in /tmp/input), not just whether a single character anywhere matches, so you need to anchor it to the start end end of the row. Try this regexp:
^[0123456789abcdefghijklmnopqrstuvwxyz+ ]*$
Note that you can shorten this using ranges:
^[0-9a-z+ ]*$

Resources