Swapping the first word with itself 3 times only if there are 4 words only using sed - linux

Hi I'm trying to solve a problem only using sed commands and without using pipeline. But I am allowed to pass the result of a sed command to a file or te read from a file.
EX:
sed s/dog/cat/ >| tmp
or
sed s/dog/cat/ < tmp
Anyway lets say I had a file F1 and its contents was :
Hello hi 123
if a equals b
you
one abc two three four
dany uri four 123
The output should be:
if if if a equals b
dany dany dany uri four 123
Explanation: the program must only print lines that have exactly 4 words and when it prints them it must print the first word of the line 3 times.
I've tried doing commands like this:
sed '/[^ ]*.[^ ]*.[^ ]*/s/[^ ]\+/& & &/' F1
or
sed 's/[^ ]\+/& & &/' F1
but I can't figure out how i can calculate with sed that there are only 4 words in a line.
any help will be appreciated

$ sed -En 's/^([^[:space:]]+)([[:space:]]+[^[:space:]]+){3}$/\1 \1 &/p' file
if if if a equals b
dany dany dany uri four 123
The above uses a sed that supports EREs with a -E option, e.g. GNU and OSX seds).

If the fields are tab separated
sed 'h;s/[^[:blank:]]//g;s/[[:blank:]]\{3\}//;/^$/!d;x;s/\([^[:blank:]]*[[:blank:]]\)/\1\1\1/' infile

Related

sed replacing first occurence of characters in each line of file only if they are first 2 characters

Is it possible using sed to replace the first occurrence of a character or substring in line of file only if it is the first 2 characters in the line?
For example we have this text file:
15 hello
15 h15llo
1 hello
1 h15loo
Using the following command: sed -i 's/15/0/' file.txt
Will give this output
0 hello
0 h15llo
1 hello
1 h0loo
What I am trying to avoid is it considering the characters past the first 2.
Is this possible?
Desired output:
0 hello
0 h15llo
1 hello
1 h15loo
You can use
sed -i 's/^15 /0 /' file.txt
sed -i 's/^15\([[:space:]]\)/0\1/' file.txt
sed -i 's/^15\(\s\)/0\1/' file.txt
Here, the ^ matches the start of string position, 15 matches the 15 substring and then a space matches a space.
The second and third solutions are the same, instead of a literal space, they capture a whitespace char into Group 1 and the group value is put back into the result using the \1 placeholder.

grep lines that contain 1 character followed by another character

I'm working on my assignment and I've been stuck on this question, and I've tried looking for a solution online and my textbook.
The question is:
List all the lines in the f3.txt file that contain words with a character b not followed by a character e.
I'm aware you can do grep -i 'b' to find the lines that contain the letter b, but how can I make it so that it only shows the lines that contain b but not followed by the character e?
This will find a "b" that is not followed by "e":
$ echo "one be
two
bring
brought" | egrep 'b[^e]'
Or if perl is available but egrep is not:
$ echo "one be
two
bring
brought" | perl -ne 'print if /b[^e]/;'
And if you want to find lines with "b" not followed by "e" but no words that contain "be" (using the \w perl metacharacter to catch another character after the b), and avoiding any words that end with b:
$ echo "lab
bribe
two
bring
brought" | perl -ne 'print if /b\w/ && ! /be/'
So the final call would:
$ perl -ne 'print if /b\w/ && ! /be/' f3.txt
Exluding "edge" words that may exist and break the exercise, like lab , bribe and bob:
$ a="one
two
lab
bake
bob
aberon
bee
bell
bribe
bright
eee"
$ echo "$a" |grep -v 'be' |grep 'b.'
bake
bob
bright
You can go for the following two solutions:
grep -ie 'b[^e]' input_file.txt
or
grep -ie 'b.' input_file.txt | grep -vi 'be'
The first one does use regex:
'b[^e]' means b followed by any symbol that is not e
-i is to ignore case, with this option lines containing B or b that are not directly followed by e or E will be accepted
The second solution calls grep twice:
the first time you look for patterns that contains b only to select those lines
the resulting lines are filtered by the second grep using -v to reject lines containing be
both grep are ignoring the case by using -i
if b must absolutely be followed by another character then use b. (regex meaning b followed by any other char) otherwise if you want to also accept lines where b is not followed by any other character at all you can just use b in the first grep call instead of b..
grep -ie 'b' input_file.txt | grep -vi 'be'
input:
BEBE
bebe
toto
abc
bobo
result:
abc
bobo

How can I remove double line breaks with sed?

I tried:
sed -i 's/\n+/\n/' file
but it's not working.
I still want single line breaks.
Input:
abc
def
ghi
jkl
Desired output:
abc
def
ghi
jkl
This might work for you (GNU sed):
sed '/^$/{:a;N;s/\n$//;ta}' file
This replaces multiple blank lines by a single blank line.
However if you want to place a blank line after each non-blank line then:
sed '/^$/d;G' file
Which deletes all blank lines and only appends a single blank line to a non-blank line.
Sed isn't very good at tasks that examine multiple lines programmatically. Here is the closest I could get:
$ sed '/^$/{n;/^$/d}' file
abc
def
ghi
jkl
The logic of this: if you find a blank line, look at the next line. If that next line is also blank, delete that next line.
This doesn't gobble up all of the lines in the end because it assumes that there was an intentional extra pair and reduced the two \n\ns down to two \ns.
To do it in basic awk:
$ awk 'NF > 0 {blank=0} NF == 0 {blank++} blank < 2' file
abc
def
ghi
jkl
This uses a variable called blank, which is zero when the number of fields (NF) is nonzero and increments when they are zero (a blank line). Awk's default action, printing, is performed when the number of consecutive blank lines is less than two.
Using awk (gnu or BSD) you can do:
awk -v RS= -v ORS='\n\n' '1' file
abc
def
ghi
jkl
Also using perl:
perl -pe '$/=""; s/(\n)+/$1$1/' file
abc
def
ghi
jkl
Found here That's What I Sed (slower than this solution).
sed '/^$/N;/\n$/D' file
The sed script can be read as follows:
If the next line is empty, delete the current line.
And can be translated into the following pseudo-code (for the reader already familiar with sed, buffer refers to the pattern space):
1 | # sed '/^$/N;/\n$/D' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/N
5 | if buffer is empty : # /^$/
6 | buffer += "\n" + next line # N
7 | end if
8 | # /\n$/D
9 | if buffer ends with "\n" : # /\n$/
10 | delete first line in buffer and go to 5 # D
11 | end if
12 | print buffer
13 | end while
In the regular expression /^$/, the ^ and $ signs mean "beginning of the buffer" and "end of the buffer" respectively. They refer to the edges of the buffer, not to the content of the buffer.
The D command performs the following tasks: if the buffer contains newlines, delete the text of the buffer up to the first newline, and restart the program cycle (go back to line 1) without processing the rest of the commands, without printing the buffer, and without reading a new line of input.
Finally, keep in mind that sed removes the trailing newline before processing the line, and keep in mind that the print command adds back the trailing newline. So, in the above code, if the next line to be processed is Hello World!\n, then next line implicitely refers to Hello World!.
More details at https://www.gnu.org/software/sed/manual/sed.html.
You are now ready to apply the algorithm to the following file:
a\n
b\n
\n
\n
\n
c\n
Now let's see why this solution is faster.
The sed script /^$/{:a;N;s/\n$//;ta} can be read as follows:
If the current line matches /^$/, then do {:a;N;s/\n$//;ta}.
Since there is nothing between ^ and $ we can rephrase like this:
If the current line is empty, then do {:a;N;s/\n$//;ta}.
It means that sed executes the following commands for each empty line:
Step
Command
Description
1
:a
Declare a label named "a".
2
N
Append the next line preceded by a newline (\n) to the current line.
3
s/\n$//
Substitute (s) any trailing newline (/\n$/) with nothing (//).
4
ta
Return to label "a" (to step 1) if a substitution was performed (at step 3), otherwise print the result and move on to the next line.
Non empty lines are just printed as is. Knowing all this, we can describe the entire procedure with the following pseudo-code:
1 | # sed '/^$/{:a;N;s/\n$//;ta}' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/{:a;N;s/\n$//;ta}
5 | if buffer is empty : # /^$/
6 | :a # :a
7 | buffer += "\n" + next line # N
8 | if buffer ends with "\n" : # /\n$/
9 | remove last "\n" from buffer # s/\n$//
10 | go to :a (at 6) # ta
11 | end if
12 | end if
13 | print buffer
14 | end while
As you can see, the two sed scripts are very similar. Indeed, s/\n$//;ta is almost the same as /\n$/D. However, the second script skips step 5, so it is potentialy faster than the first script. Let's time both scripts fed with ~10Mb of empty lines:
$ yes '' | head -10000000 > file
$ /usr/bin/time -f%U sed '/^$/N;/\n$/D' file > /dev/null
3.61
$ /usr/bin/time -f%U sed '/^$/{:a;N;s/\n$//;ta}' file > /dev/null
2.37
Second script wins.
perl -00 -pe 1 filename
That splits the input file into "paragraphs" separated by 2 or more newlines, and then prints the paragraphs separated by a single blank line:
perl -00 -pe 1 <<END
abc
def
ghi
jkl
END
abc
def
ghi
jkl
This gives you what you want using solely sed :
sed '/^$/d' txt | sed -e $'s/$/\\\n/'
The first sed command removes all empty lines, denoted as "^$".
The second sed command inserts one newline character at the end of each line.
Why not just get rid of all your blank lines, then add a single blank line after each line? For an input file tmp as you specified,
sed '/^$/d' tmp|sed '0~1 a\ '
abc
def
ghi
jkl
If white space (spaces and tabs) counts as a "blank" line for you, then use sed '/^\s*$/d' tmp|sed '0~1 a\ ' instead.
Note that these solutions do leave a trailing blank line at the end, as I wasn't sure if this was desired. Easily removed.
I wouldn't use sed for this but cat with the -s flag.
As the manual states:
-s, --squeeze-blank suppress repeated empty output lines
So all that is needed to get the desired output is:
cat -s file

How to use Linux command(sed?) to delete specific lines in a file?

I have a file that contains a matrix. For example, I have:
1 a 2 b
2 b 5 b
3 d 4 b
4 b 7 b
I know it's easy to use sed command to delete specific lines with specific strings. But what if I only want to delete those lines where the second field's value is b (i.e., second line and fourth line)?
You can use regex in sed.
sed -i 's/^[0-9]\s+b.*//g' xxx_file
or
sed -i '/^[0-9]\s+b.*/d' xxx_file
The "-i" argument will modify the file's content directly, you can remove "-i" and output the result to other files as you want.
Awk just work fine, just use code as below:
awk '{if ($2 != "b") print $0;}' file
if you want get more usage about awk, just man it!
awk:
cat yourfile.txt | awk '{if($2!="b"){print;}}'

Extract certain text from each line of text file using UNIX or perl

I have a text file with lines like this:
Sequences (1:4) Aligned. Score: 4
Sequences (100:3011) Aligned. Score: 77
Sequences (12:345) Aligned. Score: 100
...
I want to be able to extract the values into a new tab delimited text file:
1 4 4
100 3011 77
12 345 100
(like this but with tabs instead of spaces)
Can anyone suggest anything? Some combination of sed or cut maybe?
You can use Perl:
cat data.txt | perl -pe 's/.*?(\d+):(\d+).*?(\d+)/$1\t$2\t$3/'
Or, to save to file:
cat data.txt | perl -pe 's/.*?(\d+):(\d+).*?(\d+)/$1\t$2\t$3/' > data2.txt
Little explanation:
Regex here is in the form:
s/RULES_HOW_TO_MATCH/HOW_TO_REPLACE/
How to match = .*?(\d+):(\d+).*?(\d+)
How to replace = $1\t$2\t$3
In our case, we used the following tokens to declare how we want to match the string:
.*? - match any character ('.') as many times as possible ('*') as long as this character is not matching the next token in regex (which is \d in our case).
\d+:\d+ - match at least one digit followed by colon and another number
.*? - same as above
\d+ - match at least one digit
Additionally, if some token in regex is in parentheses, it means "save it so I can reference it later". First parenthese will be known as '$1', second as '$2' etc. In our case:
.*?(\d+):(\d+).*?(\d+)
$1 $2 $3
Finally, we're taking $1, $2, $3 and printing them out separated by tab (\t):
$1\t$2\t$3
You could use sed:
sed 's/[^0-9]*\([0-9]*\)/\1\t/g' infile
Here's a BSD sed compatible version:
sed 's/[^0-9]*\([0-9]*\)/\1'$'\t''/g' infile
The above solutions leave a trailing tab in the output, append s/\t$// or s/'$'\t''$// respectively to remove it.
If you know there will always be 3 numbers per line, you could go with grep:
<infile grep -o '[0-9]\+' | paste - - -
Output in all cases:
1 4 4
100 3011 77
12 345 100
My solution using sed:
sed 's/\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]\)*/\1 \2 \3/g' file.txt

Resources