How to print the longest word in a file by using combination of grep and wc - linux

iam trining to find the longest word in a text file.
i tried it and find out the no of characters in the longest word in a file
by using the command
wc -L
i need to print the longest word By using this number and grep command .

If you must use the two commands give, I'd suggest:
grep -E ".{$(wc -L < test.txt)}" test.txt
The command substitution is used to build the correct brace expression to match the line(s) with exactly the given number of characters. -E is needed to enable extended regular expression support; otherwise, the braces need to be escaped: grep ".\{...\}" test.txt.
Using an awk command that makes a single pass through the file may be faster.

Related

Bash script - Get part of a line of text from another file

I'm quite new to bash scripting. I have a script where I want to extract part of the value of a particular line in a separate config file and then use that value as a variable in the script.
For example:
Line 75 in a file named config.cfg
"ssl_cert_location=/etc/ssl/certs/thecert.cer"
I want just the value at the end of "thecert.cer" to then use in the script. I've tried awk and various uses of grep but I can't quite get just the name of the certificate.
Any help would be appreciated. Thanks
These are some examples of the commands I ran:
awk -F "/" '{print $4}' config.cfg
grep -o *.cer config.cfg
Is this possible to extract the value on that line and then edit the output so it just contains the name of the certificate file?
This is a pure Bash version of the basic functionality of basename:
cert=${line##*/}
which removes everything up to and including the final slash. It presupposes that you've already read the line.
Or, using sed:
cert=$(sed -n '75s/^.*\///p' filename)
or
cert=$(sed -n '/^ssl_cert_location=/s/^.*\///p' filename)
This gets the specified line based on the line number or the setting name and replaces everything up to and including the final slash with nothing. It ignores all other lines in the file (unless the setting is repeated in the case of the text match version). The text match version is better because it works no matter what line number the setting is on.
grep uses regular expressions (as does sed). The grep command in your command appears to have a glob expression which won't work. One way to use grep (GNU grep) is to use the PCRE feature (Perl Compatible Regular Expressions):
cert=$(grep -Po '^ssl_cert_location=.*/\K.*' filename)
This works similarly to the sed command.
I have anchored the regular expressions to the beginning of the line. If there may be leading white spaces (the line may be indented), change the regex so it looks something like this:
^[[:space:]]*ssl_cert_location=
which works for both indented and unindented lines.
There are many variants, but a simple one that comes to mind with grep is first getting the line, then matching only non-slashes at the end of the line:
<config.cfg grep '^ssl_cert_location=' | grep -o '[^/]*$'
Why didn't your grep command (grep -o *.cer config.cfg) work? Becasue *.cer is a shell glob pattern and will be expanded by the shell to matching file names, even before the grep process is even started. If there are no matching files, it will be passed verbatim, but * in regular expressions is a quantifier which needs a preceeding expression. . in regex is "match any single character". So what you wanted is probably grep -o '.*\.cer', but .* matches anything, including slashes.
An awk solution would look like the following:
awk -F/ '/^ssl_cert_location=/{print $NF}' config.cfg
It uses "/" as separator, finds only lines starting with "ssl_cert_location" and then prints the last (NF) field in from this line.
Or an equivalent sed solution, which matches the same line and then deletes everything including the last slash:
sed -n '/^ssl_cert_location=/s#^.*/##p' config.cfg
To store the output of any command in a variable, use command substitution:
var="$(command with arguments)"

Linux counting words in random characters

I have generated a file of random characters for A-Z and a-z, the file has different sizes for example 10000 characters or 1000000 I would like to search in them how many times the word 'cat' or 'dog' appeared Would someone be able to provide the command linux grep... | wc... or any other command that can handle this task.
grep has a -c command that will count the number of matches found.
So
grep -c "cat\|dog" <file name>
add -i if you want a case insensitive count
You can use grep with the flag -o. For example:
grep -o "dog\|cat" <filename> | wc -l
About the flag -o, according to man grep: «Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.»
This solution will work in several situations: multiple lines, a single line, the word surrounded with whitespaces or other characters, etc.

Filtering file-list using grep

I am trying to list files in a specific directory whose name do not match a certain pattern.
For eg. list all files not ending with abc.yml
For this I am using the command:
ls | grep -v "*abc.yml"
However I still see the files ending with abc.yml, what am I doing wrong here?
Asterisk has a different meaning in regular expressions. In fact, putting it to the front of the expressions makes it match literally. You can remove it, as grep tries to match the expression anywhere on the line, it doesn't try to match the whole line. To add the "end of line" anchor, add $. Also, . matches any character, use \. to match a dot literally:
ls | grep -v 'abc\.yml$'
In some shells, you can use extended globbing to list the files without the need to pipe to grep. For example, in bash:
shopt -s extglob
ls !(*abc.yml)

How to count exact match of certain patterns in a text file using linux shell command?

I want to find the count of certain pattern in a text file which contains lot of mixed patterns also using linux shell command.
I have a text file which contains below patterns,
[--------------]
[+--------------+]
[+----------+------------+--------------------+]
[+---------------------+---------------------+]
How to find exact count of only first pattern [--------------]?
Note: Don't include square bracket as a pattern. Only special character inside square bracket is a pattern.
cat ./file | sed -e 's/\]/\]\n/' |grep "\[--------------\]" -c
cat reads file
sed replace ] with ]\n
grep searches every line for your expression and prints the number of lines -c

Some help needed on grep

I am trying to find alphanumeric string including these two characters "/+" with at least 30 characters in length.
I have written this code,
grep "[a-zA-Z0-9\/\+]{30,}" tmp.txt
cat tmp.txt
> array('rWmyiJgKT8sFXCmMr639U4nWxcSvVFEur9hNOOvQwF/tpYRqTk9yWV2xPFBAZwAPRVs/s
ddd73ZEjfy+airfy8DtqIqKI9+dd 6hdd7soJ9iG0sGs/ld5f2GHzockoYHfh
+pAzx/t17Crf0T/2+8+reo+MU39lqCr02sAkcC1k/LzyBvSDEtu9N/9NHicr jA3SvDqg5s44DFlaNZ/8BW37fGEf2rk13S/q68OVVyzac7IT7yE7PIL9XZ/6LsmrY
KEsAmN4i/+ym8be3wwn KWGYaIB908+7W98pI6qao3iaZB
3mh7Y/nZm52hyLa37978f+PyOCqUh0Wfx2PL3vglofi0l
QVrOM1pg+mFLEIC88B706UzL4Pss7ouEo+EsrES+/qJq9Y1e/UGvwefOWSL2TJdt
this does not work, Mainly I wanted to have minimum length of the string to be 30
In the syntax of grep, the repetition braces need to be backslashed.
grep -o '[a-zA-Z0-9/+]\{30,\}' file
If you want to constrain the match to lines containing only matches to this pattern, add line-start and line-ending anchors:
grep '^[a-zA-Z0-9/+]\{30,\}$' file
The -o option in the first command line causes grep to only print the matching part, not the entire matching line.
The repetition operator is not directly supported in Basic Regular Expression syntax. Use grep -E to enable Extended Regular Expression syntax, or backslash the braces.
You can use
grep -e "^[a-zA-Z0-9/+]\{30,\}" tmp.txt
grep -e "^[a-zA-Z0-9/+]\{30,\}" tmp.txt
+pAzx/t17Crf0T/2+8+reo+MU39lqCr02sAkcC1k/LzyBvSDEtu9N/9NHicr jA3SvDqg5s44DFlaNZ/8BW37fGEf2rk13S/q68OVVyzac7IT7yE7PIL9XZ/6LsmrY
3mh7Y/nZm52hyLa37978f+PyOCqUh0Wfx2PL3vglofi0l
QVrOM1pg+mFLEIC88B706UzL4Pss7ouEo+EsrES+/qJq9Y1e/UGvwefOWSL2TJdt
man grep
Read up about the difference between between regular and extended patterns. You need the -E option.

Resources