grep -w is not unique if dash is in string - linux

iIf a dash is in the string "grep -w" is not unique. How can I solve this?
Example:
File1:
football01 football01test
# grep -iw ^football01
football01
File2:
football01 football01-test
# grep -iw ^football01
football01
football01-test

This is the expected and documented behaviour:
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the
matching substring must either be at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end of the line or followed
by a non-word constituent character. Word-constituent characters are letters, digits,
and the underscore.
If you add a dash, it terminates your first word as a dash is a "non-word constituent character". If you write the words together, then a word-regexp grep will treat it as one word and not match it.
What exactly it is that you want to do?
If you only want to know if your line is football01 and nothing else, you can do it as
grep -i "^football01$"
If you want to achieve something else, could you please explain what it is.

The -w switch is for word regex. In file1, football01test is a word and in file2 football01 and test are two words separated by a hyphen.
man grep says this for -w
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be at the
beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the
line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the underscore.
Since football01 doesn't match football01test as a whole word, you aren't getting that info from grep.
If you were to do grep -i ^football01 file1.txt, you will get both lines.

Related

sed remove specific key with varying value from string

Say I have a string:
ap=test:::bc=exam:::dc=comic:::mp=calc:::
Read in a linux box, i need to remove say bc=exam, the key is always the same, but the value can be any value, string or digits, and the placement of the key value pair can be anywhere in the string.
i've got to
sed -e 's/:::bc=\(.*:::\)*/\1/'
which only removes the key and a delimiter.
or
sed -e 's/:::bc=.*\(:::\)*/\1/'
which is removing everything from the key on.
Thanks in advance.
Since your values do not contain semicolons, you may match them with a negated bracket expression, [^:]*:
sed 's/:::bc=[^:]*//' file
See the online sed demo.
The :::bc=[^:]* matches :::bc and then any 0+ chars other than a colon.

Is there any way to find special characters in a column of a file using grep?

Is there any way to find special characters in a column of a file using grep?
And line number where the special character is?
in general if we say that all alphanumeric characters are actually specials then,
grep '[^[:alnum:] _]' file.txt
or more specific one
grep -o "\!##$%^&"
you can work on it further

How to change a specific colum content strings using bash/shell?

I'm having a .txt file looking like this (along about 400 rows):
lettuceFMnode_1240 J_C7R5_99354_KNKSR3_Oligomycin 81.52
lettuceFMnode_3755 H_C1R3_99940_KNKSF2_Tubulysin 70
lettuceFMnode_17813 G_C4R5_80184_KNKS113774F_Tetronasin 79.57
lettuceFMnode_69469 J_C11R7_99276_KNKSF2_Nystatin 87.27
I want to edit the names in the entire 2nd column so that only the last part will stay (meaning delete anything before that, so in fact leaving what comes after the last _).
I looked into different solutions using a combination of cut and sed, but couldn't understand how the code should be built.
Would appreciate any tips and help!
Thank you!
Here's one way:
perl -pe 's/^\S+\s+\K\S+_//'
For every line of input (-p) we execute some code (-e ...).
The code performs a subtitution (s/PATTERN/REPLACEMENT/).
The pattern matches as follows:
^ beginning of string
\S+ 1 or more non-whitespace characters (the first column)
\s+ 1 or more whitespace characters (the space after the first column)
\K do not treat the text matched so far as part of the final match
\S+ 1 or more non-whitespace characters (the second column)
_ an underscore
Because + is greedy (it matches as many characters as possible), \S+_ will match everything up to the last _ in the second column.
Because we used \K, only the rest of the pattern (i.e. the part of the match that lies in the second column) gets replaced.
The replacement string is empty, so the match is effectively removed.
With sed:
sed 's/ [^ ]*_/ /' file
Replace first space followed by non-space characters ([^ ]*) followed by _ widh one space.

Get lines that end with "$" in a text file

I have an output like this:
a/foo bar /
b/c/foo sth /xyz
cc/bar ghj /axz/byz
What i want is just this line:
a/foo bar /
To be more clear, I want those line ending with a specific string. I want to grep lines that have a / character at their last column.
You can use $ like this:
$ grep '/$' file
a/foo bar /
As $ stands for end of line, /$ matches those lines whose last character is a /.
grep '/$'
slash is not special character for grep and $ means match expression at the end of a line.
You can even grep the last column with only backlash at last column (but not the only column in the line)
I assumed tha the last column of a line is a string with more than one white space in front the string and no more character after the string. This assumption does not fulfill the requirement if there has only one column in that line because it does not need space in front of it to show it is last column if there has only one column.
By enable perl regular expressions (-P),
grep -P '\s+/$'
\s means matches any whitespace character (space, tab, newline)
plus sign means match 1 or more times for preceding element
$ means end of string
OR refer to Character Classes and Bracket Expressions
grep '[[:space:]]\+/$'
OR
grep '[[:blank:]]\+/$'
‘[:blank:]’ Blank characters: space and tab.
‘[:space:]’ Space characters: in the ‘C’ locale, this is tab, newline,
vertical tab, form feed, carriage return, and space. It is a synonym for '\s'.
Refer to #fedorqui, the backslash after ]] is used to distinguish with
the literal +. Thanks for the explanations.
Sorry if wrong for perl answer because I never use or learn Perl expression but really hope can help you find the last column slash so may be you can read these for more information for searching backspace with slash at end of line
grep with regexp: whitespace doesn't match unless I add an assertion
Regular expressions in Perl

How do I grep for all words that are less than 4 characters?

I have a dictionary with words separated by line breaks.
You can just do:
egrep -x '.{1,3}' myfile
This will also skip blank lines, which are technically not words. Unfortunately, the above reg-ex will count apostrophes in contractions as letters as well as hyphens in hyphenated compound words. Hyphenated compound words are not a problem at such a low letter count, but I am not sure whether or not you want to count apostrophes in contractions, which are possible (e.g., I'm). You can try to use a reg-ex such as:
egrep -x '\w{1,3}' myfile
..., but this will only match upper/lower case letters and not match contractions or hyphenated compound words at all.
Like this:
grep -v "^...." my_file
Try this regular expression:
grep -E '^.{1,3}$' your_dictionary

Resources