Find the words do not appear in the dictionary in linux platform - linux

Here's the text maybe have some words ,each line has one word.and I accept it as a command line arguments. For example the textile a.txt is like:
about
catb
west
eastren
And what I want to do is to find the words do not in the dictionary, if the words are dictionary words, delete it in the textfile.
I use the following commands:
word=$1
grep "$1$" /usr/share/dict/linux.words -q
for word in $(<a.txt)
do
if [ $word -eq 0 ]
then
sed '/$word/d'
fi
done
Nothing Happened.

grep alone is enough from what I understand
$ grep -xvFf /usr/share/dict/linux.words a.txt
catb
eastren
catb and eastren are words not found in /usr/share/dict/linux.words. The options used are
-x, --line-regexp
Select only those matches that exactly match the whole line. For a regular expression pattern, this is
like parenthesizing the pattern and then surrounding it with ^ and $.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines,
any of which is to be matched.
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the
-e (--regexp) option, search for all patterns given. The empty file contains zero patterns, and
therefore matches nothing.

An alternative would be to use a spell checker like hunspell, so you can apply it to any text, not only a pre-formatted file with only one word per line. You can also specify several dictionaries, and it shows only words which are in none of them.
For example, after copy/pasting the content of this page into test.txt
lang1=en_US
lang2=fr_FR
hunspell -d $lang1,$lang2 -l test.txt | sort -u
produces a list of 46 words:
'
Apps
Arqade
catb
'cba'
Cersei
ceving
Drupal
eastren
...
WordPress
Worldbuilding
xvFf
xz
Yodeya

Related

How to count exact match of certain patterns in a text file using linux shell command?

I want to find the count of certain pattern in a text file which contains lot of mixed patterns also using linux shell command.
I have a text file which contains below patterns,
[--------------]
[+--------------+]
[+----------+------------+--------------------+]
[+---------------------+---------------------+]
How to find exact count of only first pattern [--------------]?
Note: Don't include square bracket as a pattern. Only special character inside square bracket is a pattern.
cat ./file | sed -e 's/\]/\]\n/' |grep "\[--------------\]" -c
cat reads file
sed replace ] with ]\n
grep searches every line for your expression and prints the number of lines -c

grep the files that contain specific word and don't contain those words

I need to use grep command to get all the files that contain a [form] word,
and don't contain a [head] or [cfGenerate] words at the same time.
GNU grep with PCRE :
grep -P '^(?!.*head)(?!.*cfGenerate).*form'
negative lookaheads can be combined to make the match fail if any unwanted pattern occurs.

Read one file to search another file and print out missing lines

I am following the example in this post finding contents of one file into another file in unix shell script but want to print out differently.
Basically file "a.txt", with the following lines:
alpha
0891234
beta
Now, the file "b.txt", with the lines:
Alpha
0808080
0891234
gamma
I would like the output of the command is:
alpha
beta
The first one is "incorrect case" and second one is "missing from b.txt". The 0808080 doesn't matter and it can be there.
This is different from using grep -f "a.txt" "b.txt" and print out 0891234 only.
Is there an elegant way to do this?
Thanks.
Use grep with following options:
grep -Fvf b.txt a.txt
The key is to use -v:
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
When reading patterns from a file I recommend to use the -F option as long as you not explicitly want that patterns are treated as regular expressions.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings (instead of regular expressions), separated by newlines, any of which
is to be matched.

How to print the longest word in a file by using combination of grep and wc

iam trining to find the longest word in a text file.
i tried it and find out the no of characters in the longest word in a file
by using the command
wc -L
i need to print the longest word By using this number and grep command .
If you must use the two commands give, I'd suggest:
grep -E ".{$(wc -L < test.txt)}" test.txt
The command substitution is used to build the correct brace expression to match the line(s) with exactly the given number of characters. -E is needed to enable extended regular expression support; otherwise, the braces need to be escaped: grep ".\{...\}" test.txt.
Using an awk command that makes a single pass through the file may be faster.

Highlight text similar to grep, but don't filter out text [duplicate]

This question already has answers here:
Colorized grep -- viewing the entire file with highlighted matches
(24 answers)
Closed 7 years ago.
When using grep, it will highlight any text in a line with a match to your regular expression.
What if I want this behaviour, but have grep print out all lines as well? I came up empty after a quick look through the grep man page.
Use ack. Checkout its --passthru option here: ack. It has the added benefit of allowing full perl regular expressions.
$ ack --passthru 'pattern1' file_name
$ command_here | ack --passthru 'pattern1'
You can also do it using grep like this:
$ grep --color -E '^|pattern1|pattern2' file_name
$ command_here | grep --color -E '^|pattern1|pattern2'
This will match all lines and highlight the patterns. The ^ matches every start of line, but won't get printed/highlighted since it's not a character.
(Note that most of the setups will use --color by default. You may not need that flag).
You can make sure that all lines match but there is nothing to highlight on irrelevant matches
egrep --color 'apple|' test.txt
Notes:
egrep may be spelled also grep -E
--color is usually default in most distributions
some variants of grep will "optimize" the empty match, so you might want to use "apple|$" instead (see: https://stackoverflow.com/a/13979036/939457)
EDIT:
This works with OS X Mountain Lion's grep:
grep --color -E 'pattern1|pattern2|$'
This is better than '^|pattern1|pattern2' because the ^ part of the alternation matches at the beginning of the line whereas the $ matches at the end of the line. Some regular expression engines won't highlight pattern1 or pattern2 because ^ already matched and the engine is eager.
Something similar happens for 'pattern1|pattern2|' because the regex engine notices the empty alternation at the end of the pattern string matches the beginning of the subject string.
[1]: http://www.regular-expressions.info/engine.html
FIRST EDIT:
I ended up using perl:
perl -pe 's:pattern:\033[31;1m$&\033[30;0m:g'
This assumes you have an ANSI-compatible terminal.
ORIGINAL ANSWER:
If you're stuck with a strange grep, this might work:
grep -E --color=always -A500 -B500 'pattern1|pattern2' | grep -v '^--'
Adjust the numbers to get all the lines you want.
The second grep just removes extraneous -- lines inserted by the BSD-style grep on Mac OS X Mountain Lion, even when the context of consecutive matches overlap.
I thought GNU grep omitted the -- lines when context overlaps, but it's been awhile so maybe I remember wrong.
You can use my highlight script from https://github.com/kepkin/dev-shell-essentials
It's better than grep cause you can highlight each match with it's own color.
$ command_here | highlight green "input" | highlight red "output"
Since you want matches highlighted, this is probably for human consumption (as opposed to piping to another program for instance), so a nice solution would be to use:
less -p <your-pattern> <your-file>
And if you don't care about case sensitivity:
less -i -p <your-pattern> <your-file>
This also has the advantage of having pages, which is nice when having to go through a long output
You can do it using only grep by:
reading the file line by line
matching a pattern in each line and highlighting pattern by grep
if there is no match, echo the line as is
which gives you the following:
while read line ; do (echo $line | grep PATTERN) || echo $line ; done < inputfile
If you want to print "all" lines, there is a simple working solution:
grep "test" -A 9999999 -B 9999999
A => After
B => Before
If you are doing this because you want more context in your search, you can do this:
cat BIG_FILE.txt | less
Doing a search in less should highlight your search terms.
Or pipe the output to your favorite editor. One example:
cat BIG_FILE.txt | vim -
Then search/highlight/replace.
If you are looking for a pattern in a directory recursively, you can either first save it to file.
ls -1R ./ | list-of-files.txt
And then grep that, or pipe it to the grep search
ls -1R | grep --color -rE '[A-Z]|'
This will look of listing all files, but colour the ones with uppercase letters. If you remove the last | you will only see the matches.
I use this to find images named badly with upper case for example, but normal grep does not show the path for each file just once per directory so this way I can see context.
Maybe this is an XY problem, and what you are really trying to do is to highlight occurrences of words as they appear in your shell. If so, you may be able to use your terminal emulator for this. For instance, in Konsole, start Find (ctrl+shift+F) and type your word. The word will then be highlighted whenever it occurs in new or existing output until you cancel the function.

Resources