How to use "grep -v" or something similar in the entire text except for the first column? - linux

I am trying to manipulate a file lets say :
76ers23 Philadelphia 76ers announced today that
76ers24 Lakers announced today
76ers25 blazers plays today
76ers26 celics announced today that
76ers27 Bonston has Day off
76ers28 Philadelphia 76ers announced today that
76ers29 the blazzers announced today that
76ers30 76ers Training day
76ers31 Philadelphia 76ers has a day off today
76ers32 Philadelphia 76ers humiliate Lakers
76ers33 celics announced today that
I want to remove all the entries containing the term 76ers from the second column so as to obtain:
76ers24 Lakers announced today
76ers25 blazers plays today
76ers26 celics announced today that
76ers27 Bonston has Day off
76ers29 the blazzers announced today that
76ers33 celics announced today that
my issue here is that if I will use the grep -v "76ers" it returns null
I am looking to use the grep (or another command) in the second line only.
I found this complicate way but which is pretty much what I want, but I got an_at the beginning of the second column.
cat file|awk '{print $1}' >file1
cat file|awk '{$1="";print $0}'|tr -s ' ' | tr ' ' '_' >file2
paste file1 file2 |grep -v "_76ers"
I'm not a bash expert so I guess there will be an easier way for that.
Thank you in advance!

Use a regular expression that skips over the first column.
grep -v '^[^ ]* .*76ers' file
[^ ]* matches everything up to the first space.

using awk:
awk '{ found=0;for(i=2;i<=NF;i++) { if (match($i,"76ers")) { found=1 } } if (found==0) { print $0 } }' file
Loop through the second space separated field to the last field and use match to check if that field contains 76ers. If it does, set a found flag. Only print the line if found is 0 after we have looped through each field for every line..

You can create an Extend Reqular Expression to Ignore the first column. Not knowing exactly what you "flavor" of the OS is, I'll give you two different formats.
grep -E is the same as egrep
[[:digit:]] is the same as [0-9]
[[:space:]] is the same as []
First option: Look for 76ers with white space after it:
grep -Ev '76ers[[:space:]]' <file>
Second Option: Look for 76ers, followed by one or more digits, , then a second 76ers:
grep -Ev '76ers[[:digit:]][[:digit:]]*.*76ers' <filename>

With GNU grep, requiring that the match is "whole word" with the -w/--word-regexp option:
grep -vw '76ers' infile
From the manual:
-w
--word-regexp
Select only those lines containing matches that form whole words. The
test is that the matching substring must either be at the beginning of
the line, or preceded by a non-word constituent character. Similarly,
it must be either at the end of the line or followed by a non-word
constituent character. Word constituent characters are letters,
digits, and the underscore. This option has no effect if -x is also
specified.

Here is an alternative approach using awk. Similar to the idea of Balmer, ensure that the first column does not match the ERE.
$ awk -v ere='76ers' '$0~ere && $1!~ere' file
This will print all the records/lines which match the regular expression ere ($0~ere) but only if the first column does not match that regular expression $1!~ere.

$ grep -v ' .*76ers' file
76ers24 Lakers announced today
76ers25 blazers plays today
76ers26 celics announced today that
76ers27 Bonston has Day off
76ers29 the blazzers announced today that
76ers33 celics announced today that

Related

Adding to mutiple string a space at the end

I have a list of string I want to perform on them a modification by adding at the end a space.
First of all I want to uppercase all the string, I was able to do it, but if there's a way to combine both the sed it will be awesome
sed '
s/Hugh:/HUGH:/g ;
s/Lory:/LORY:/g;
s/Melody:/MELODY:/g;
s/Tifany:/TIFANY:/g;
s/Henry:/HENRY:/g;
s/Jack:/JACK:/g;
' | sed '
s/HUGH:/HUGH: /g ;
s/LORY:/LORY: /g;
s/MELODY:/MELODY: /g;
s/TIFANY:/TIFANY: /g;
s/HENRY:/HENRY: /g;
s/JACK:/JACK: /g;
'
The initial input:
Hugh:IS MISSING
Lory:Is Doing well
Tifany:Is sick
Melody:Is back
Henry:is Dead
Jack:is sleeping
The result at the moment is
Hugh:IS MISSING
LORY:Is Doing well
TIFANY:Is sick
MELODY:Is back
HENRY:is working
JACK:is sleeping
What I want is
Hugh: IS MISSING
LORY: Is Doing well
TIFANY: Is sick
MELODY: Is back
HENRY: is working
JACK: is sleeping
I want to add a space couldn't figure out how to do it, and if possible to combine the first sed and the second one by creating only one sed or awk. It will be awesome.
You can do so with sed quite easily with a couple of capture groups and a couple of backreferences along with \U (for uppercase) and \L (for lowercase), e.g.
sed 's/\(^\w*\):\(.*$\)/\U\1: \L\2/' file
Above the first capture group is anchored to the beginning of line with '^' and captures all word-characters \w, then a ':' and then to the end of line.
The replacement converts everything to uppercase with \U, adds a space after ':' and then converts the rest to lowercase with \L.
Example Use/Output
$ sed 's/\(^\w*\):\(.*$\)/\U\1: \L\2/' file
HUGH: is missing
LORY: is doing well
TIFANY: is sick
MELODY: is back
HENRY: is dead
JACK: is sleeping
Using Extended Regex
The roughly equivalent, but slightly more robust command using Extended Regular Expressions would be:
sed -E 's/(^\w+):(.*$)/\U\1: \L\2/'
Where the '+' repetition requires 1-or-more word character to match rather than the '*' (zero-or-more) match with Basic Regular Expressions, and you do not have to escape the capture group (...). Downside is not all support ERE, but most do with either the -E or -r option.
(note: no, I don't know what happened to "Hugh" in your example -- should he be deleted?)
Gnu awk :
awk '
BEGIN{FS=":";OFS=": "} # Changes the separator
$1=toupper($1) # Converts lower to upper
' <file>
Result :
HUGH: IS MISSING
LORY: Is Doing well
TIFANY: Is sick
MELODY: Is back
HENRY: is Dead
JACK: is sleeping
This might work for you (GNU sed):
sed -E 's/(hugh|lory|melody|tifany|henry|jack):/\U& /Ig' file
or, more loosely:
sed 's/^\S*:/\U& /' file

Replacing characters in each line on a file in linux

I have a file with different word in each line.
My goal is to replace the first character to a capital letter and replace the 3rd character to "#".
For example: football will be exchanged to Foo#ball.
I tried thinking about using awk and sed.It didn't help me since (to my knowledge) sed needs an exact character input and awk can print the desired character but not change it.
With GNU sed and two s commands:
echo 'football' | sed -E 's/(.)/\U\1/; s/(...)./\1#/'
Output:
Foo#ball
See: 3.3 The s Command, 5.7 Back-references and Subexpressions and 5.9.2 Upper/Lower case conversion
This might work for you (GNU sed):
sed 's/\(...\)./\u\1#/' file
With bash you can use parameter expansions alone to accomplish the task. For example, if you read each line into the variable line, you can do:
line="${line^}" # change football to Football (capitalize 1st char)
line="${line:0:3}#${line:4}" # make 4th character '#'
Example Input File
$ cat file
football
soccer
baseball
Example Use/Output
$ while read -r line; do line="${line^}"; echo "${line:0:3}#${line:4}"; done < file
Foo#ball
Soc#er
Bas#ball
While shell is typically slower, when use is limited to builtins, it doesn't fall too far behind.
(note: your question says 3rd character, but your example replaces the 4th character with '#')
With GNU awk for the 3rd arg to match():
$ echo 'football' | awk 'match($0,/(.)(..).(.*)/,a){$0=toupper(a[1]) a[2] "#" a[3]} 1'
Foo#ball
Cyrus' or Potong's answers are the preferred ones. (For Linux or systems with GNU sed because of \U or \u.)
This is just an additional solution with awk because you mentioned it and used also awk tag:
$ echo 'football'|awk '{a=substr($0,1,1);b=substr($0,2,2);c=substr($0,5);print toupper(a)b"#"c}'
Foo#ball
This is a most simple solution without RegEx. It will also work on non-GNU awk.
This should work with any version of awk:
awk '{
for(i=1;i<=NF;i++){
# Note that string indexes start at 1 in awk !
$i=toupper(substr($i,1,1)) "" substr($i,2,1) "#" substr($i,3)
}
print
}' file
Note: If a word is less than 3 characters long, like it, it will be printed as It#
if your data in 'd' file, tried on gnu sed:
sed -E 's/^(\w)(\w\w)\w/\U\1\E\2#/' d

insert a CR character in a file using shell script

i have a huge single block of data which i want to split into lines by inserting a Carriage Return before some identified patterns.
(at this stage, i don't want to use the linux split command).
So I am looking at :
how to identify the pattern in the data block
how to insert the CR right before the pattern starting position.
Example:
the block is 1234abcde56785abcde53453FEFDabcde
the result should look like this inside the file:
1234
abcde56785
abcde53453FEFD
abcde
thanks community !
Your pattern was not easy to understand at all so next time please try to add some information:
You can use the following sed command:
echo "1234abcde5678abcde53453FEFDabcde" | sed -E 's/(abcde[0-9]*[A-Z]*)/\n\1/g'
1234
abcde5678
abcde53453FEFD
abcde
If you need to have windows EOL then change it into
sed -E 's/(abcde[0-9]*[A-Z]*)/\r\n\1/g'
For explanations about sed:
-E is for extended regex support otherwise you need to escape characters such as (,),+,{,}
s/PATTERN/REPLACEMENT/g command to find and replace in global mode
For explanations about the regex:
demo
() for grouping and backreference
abcde[0-9]*[A-Z]* in order to match everything that starts with abcde then eventually some digits and eventually some uppercase letters:
Regex starting point: http://www.rexegg.com/regex-quickstart.html
Not sure completely about your requirement based on your shown output, could you please try following and let me know if this helps.
awk '{gsub(/abcde/,"\n&")} 1' Input_file
OR(in case of abcde string is not going to be always the same and it could be any small letter word)
awk '{gsub(/[a-z]+/,"\n&")} 1' Input_file

How To Delete All Words Before X Characters

I'm using code from this question How To Delete All Words After X Characters and I'm having a trouble keeping (not deleting) all the words after 30 characters.
Original code:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i-1; }1'
My attempt:
awk 'BEGIN{FS=OFS="" } length>30{i=30; while($i~/\w/) i++; NF=i+1; }1'
Basically, I understand I need to change the NF which was NF=i-1 so I tried changing it to NF=i+1 but obviously I'm only getting one field. How can I specify NF to print the rest of the line?
Sample data:
StackOverflow Users Are Brilliant And Hard Working
#character 30 ---------------^
Desired output:
And Hard Working
If you could please help me keep the rest of the line by using NF, I would really appreciate your positive input and support.
It is much easier using gnu grep:
grep -oP '^.{30}\w*\W*\K.*' file
And Hard Working
Where \K is used for reseting matched information.
RegEx Breakup:
^: Start
.{30}: Match first 30 characters
\w*: followed by 0 or more word characters
\W*: followed by 0 or more non-word characters
\K: reset matched information so far
.*: Match anything after this position
Using awk you can use this solution:
awk '{sub(/^.{30}[_[:alnum:]]*[[:blank:]]*/, "")} 1' file
And Hard Working
Finally a sed solution:
sed -E 's/^.{30}[_[:alnum:]]*[[:blank:]]*//' file
And Hard Working
another awk
awk '{print substr($0, index(substr($0,30),FS)+30)}'
find the delimiter index after the 30th char, take a substring from that index on.
I can't imagine why your considering anything related to NF for this since you're not doing anything with fields, you're just splitting each line at a blank char. It sounds like this is all you need for both questions, using GNU awk for gensub():
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\1",1)}' file
StackOverflow Users Are Brilliant
$ awk '{print gensub(/(.{30}\S*)\s+(.*)/,"\\2",1)}' file
And Hard Working
or it's briefer using GNU sed:
$ sed -E 's/(.{30}\S*)\s+(.*)/\1/' file
StackOverflow Users Are Brilliant
$ sed -E 's/(.{30}\S*)\s+(.*)/\2/' file
And Hard Working
With the use of NF, you can try
awk '{for(i=1;i<=NF;i++){a+=length($i)+1;if(a>30){for(j=i+1;j<=NF;j++)b=b $j" ";print b;exit}}}'
cut -c30- file | cut -d' ' -f2-
this will keep only the words that start after 30th character (index >= 31)

How would I change a person's birthday with sed if I'm provided with just the name?

The line I want to modify is Popeye's birthday
and we have to do it by assuming the we don't know his birthday.
Here's what I did but it doesn't work.
sed '/Popeye/s/[0-9]\/[0-9][0-9]\/[0-9][0-9]/[1][1]\/[1][4]\/[4][6]' DDdatebook
Input:
Popeye Sailor:156-454-3322:945 Bluto Street, Anywhere, USA 29358:3/19/35:22350
sed '/Popeye/ s#:[0-9]\{1,2\}/[0-9]\{1,2\}/\([0-9]\{2\}\)\{1,2\}:#:[1][1]/[1][4]/[4][6]:#' YourFile
you forget the last delelimiter
I suggest to use another one than /because of the presence in pattern, i select # in this case so internal / does not have to be escaped.
I add a extensive pattern for number allow 1 or 2 digit for day/month and 2 or 4 dgit for year (this last seems to be 2 only but in case or remove this second option)
Here is how you can change it using awk
awk -F: '/Popeye/ {$4="01/01/01"}1' OFS=: file
Popeye Sailor:156-454-3322:945 Bluto Street, Anywhere, USA 29358:01/01/01:22350
If you like another date, just change this section "01/01/01"
To write it back to original file:
awk -F: '/Popeye/ {$4="01/01/01"}1' OFS=: file > tmp && mv tmp file

Resources