I came upon an example of sed simulating cat -s, which will replace two or more empty lines by one empty line.
The command is below
echo -e "\n-------------\nline1\n\nline2\nline3\n\n\nline4\n\n\n\nlast line\n-------------" | sed '1s/^$//p;/./,/^$/!d'
I understand the sed part has two parts. The first one, '1s/^$//p' will take place on first line and will just print nothing to the first line of it's empty. Ok, that part I get it.
Now, for the second part, '/./,/^$/!d', it will delete the line if it does not match /./, any single character or /^$/ empty line. That covers pretty much anything, no? How come an empty line after another empty line is matched by that?
The sed manual says this:
Appending the '!' character to the end of an address specification
(before the command letter) negates the sense of the match.
The sed command /./,/^$/!d is therefore "delete rows that are not in a range defined by a line with any character until and including one blank line". So it will delete rows that are not in this kind of range.
1 -------------
2 line1
3
4 line2
5 line3
6
7
8 line4
9
10
11
12 last line
13 -------------
14
The first range is lines 1-3.
The second range is lines 4-6.
The next range is lines 8-9.
The last range is lines 12-14.
Lines 7 and 10-11 do not fall into any of the matched ranges, so they are affected by the ! modifier, and they get deleted.
I can think of ways to do this in other programming languages that would be more clear, but if all you've got is sed then this is an effective way to reduce redundant blank lines.
Related
I want to do something like this:
sed "/^[^+]/ s/\(.*$1|$2.*$\)/+\ \1/" -i file
where 2 specific String Parameters are being checked in a file and in those lines where BOTH parameters ($1 | $2) occur, a + is added at the beginning of the line if there was no + before.
Tried different variations so far and ending up either checking both but then sed'ing every line that contains 1 of the 2 Strings or some errors.
Thankful for any clarifications regarding slash and backslash escaping (respectively single/double quotes) i guess thats where my problem lies.
Edit: Wished outcome: (Folder containing bunch of text files one of which has the following 2 lines)
sudo bash MyScript.sh 01234567 Wanted
Before:
Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
Expected:
+ Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
For an input file that looks as follows:
$ cat infile
Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
and setting $1 and $2 to 01234567 and Wanted (in a script, these are just the first two positional parameters and don't have to be set):
$ set -- 01234567 Wanted
the following command would work:
$ sed '/^+/b; /'"$1"'/!b; /'"$2"'/s/^/+ /' infile
+ Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
This is how it works:
sed '
/^+/b # Skip if line starts with "+"
/'"$1"'/!b # Skip if line doesn't contain first parameter
/'"$2"'/s/^/+ / # Prepend "+ " if second parameter is matched
' infile
b is the "branch" command; when used on its own (as opposed to with a label to jump to), it skips all commands.
The first two commands skip lines that start with + or that don' t contain the first parameter; if we're on the line with the s command, we already know that the current line doesn't start with + and contains the first parameter. If it contains the second parameter, we prepend + .
For quoting, I have single quoted the whole command except for where the parameters are included:
'single quoted'"$parameter"'single quoted'
so I don't have to escape anything unusual. This assumes that the variable in the double quoted part doesn't contain any metacharacters that might confuse sed.
This question already has answers here:
awk or perl one-liner to print line if second field is longer than 7 chars
(5 answers)
Closed 7 years ago.
I have a large file containing lines such as below. Is there a sed or awk command in Unix I can used to delete each line where the length of the data in the second column is <60. In this example I would be left with just the first 3 line.
I've tried unsuccessful so far.
Query1 1041 SVTQLTNDLFQTYLRKILS*MFKVIGCSDLLGNPLTLATN*IDGVLDLVQEPWSNS*KLS 862
Query1 1707 TTSNLTWLMQKNYMRQGILQFYKVIGSSDLLGNPIGLIDKLGSGVLEFFSEPYKGLLKPG 1767
Query1 2131 TIQTLSNLIIKNYVRQGILQFYKILGSSDILGNPIGLIDNLGTGVVEFFSEPYKGMLKPG 2191
Query1 1 VFEFFNEPAKGLLKPK 17
The following AWK script would do.
/[0-9A-Za-z]+\s+[0-9]+\s+[0-9A-Za-z*]{60,}+\s+[0-9]+/ { print($0); }
The regular expression matches the lines you want to keep. If a line is matched, it is printed out. You may have to tweak the regex to match your input format more precisely. I simply took what pattern I could infer from the few examples you've shown.
The regular expression explained:
[0-9A-Za-z]+ one or more alphanumeric characters
\s+ one or more spaces
[0-9]+ one or more digits
\s+ one or more spaces
[0-9A-Za-z*]{60,}+ sixty or more alphanumeric characters and asterisks
\s+ one or more spaces
[0-9]+ one or more digits
Another option would be to use
/./ { if (length($3) >= 60) print($0); }
which assumes that all lines are in the given column format. It matches any line and then conditionally prints it if the third column is wide enough.
In AWK, $N refers to the N-th column of the current line and $0 to the entire line. By default, columns are split at white-space.
As fedorqui points out in a comment, the more terse syntax
length($3) >= 60
may be used to achieve the same effect as AWK's default behavior is to print the current line if the condition is true. I never happen to remember all the shortcuts one can take in AWK…
Perl solution:
perl -ane 'print unless 60 > length $F[2]' file
-n reads the input line by line
-a splits each line into the #F array on whitespace
I’d like to merge two blocks of lines in Vim, i.e., take lines k through l and append them to lines m through n. If you prefer a pseudocode explanation: [line[k+i] + line[m+i] for i in range(min(l-k, n-m)+1)].
For example,
abc
def
...
123
45
...
should become
abc123
def45
Is there a nice way to do this without copying and pasting manually line by line?
You can certainly do all this with a single copy/paste (using block-mode selection), but I'm guessing that's not what you want.
If you want to do this with just Ex commands
:5,8del | let l=split(#") | 1,4s/$/\=remove(l,0)/
will transform
work it
make it
do it
makes us
harder
better
faster
stronger
~
into
work it harder
make it better
do it faster
makes us stronger
~
UPDATE: An answer with this many upvotes deserves a more thorough explanation.
In Vim, you can use the pipe character (|) to chain multiple Ex commands, so the above is equivalent to
:5,8del
:let l=split(#")
:1,4s/$/\=remove(l,0)/
Many Ex commands accept a range of lines as a prefix argument - in the above case the 5,8 before the del and the 1,4 before the s/// specify which lines the commands operate on.
del deletes the given lines. It can take a register argument, but when one is not given, it dumps the lines to the unnamed register, #", just like deleting in normal mode does. let l=split(#") then splits the deleted lines into a list, using the default delimiter: whitespace. To work properly on input that had whitespace in the deleted lines, like:
more than
hour
our
never
ever
after
work is
over
~
we'd need to specify a different delimiter, to prevent "work is" from being split into two list elements: let l=split(#","\n").
Finally, in the substitution s/$/\=remove(l,0)/, we replace the end of each line ($) with the value of the expression remove(l,0). remove(l,0) alters the list l, deleting and returning its first element. This lets us replace the deleted lines in the order in which we read them. We could instead replace the deleted lines in reverse order by using remove(l,-1).
An elegant and concise Ex command solving the issue can be obtained by
combining the :global, :move, and :join commands. Assuming that
the first block of lines starts on the first line of the buffer, and
that the cursor is located on the line immediately preceding the first
line of the second block, the command is as follows.
:1,g/^/''+m.|-j!
For detailed explanation of this technique, see my answer to
an essentially the same question “How to achieve the “paste -d '␣'”
behavior out of the box in Vim?”.
To join blocks of line, you have to do the following steps:
Go to the third line: jj
Enter visual block mode: CTRL-v
Anchor the cursor to the end of the line (important for lines of differing length): $
Go to the end: CTRL-END
Cut the block: x
Go to the end of the first line: kk$
Paste the block here: p
The movement is not the best one (I'm not an expert), but it works like you wanted. Hope there will be a shorter version of it.
Here are the prerequisits so this technique works well:
All lines of the starting block (in the example in the question abc and def) have the same length XOR
the first line of the starting block is the longest, and you don't care about the additional spaces in between) XOR
The first line of the starting block is not the longest, and you additional spaces to the end.
Here's how I'd do it (with the cursor on the first line):
qama:5<CR>y$'a$p:5<CR>dd'ajq3#a
You need to know two things:
The line number on which the first line of the second group starts (5 in my case), and
the number of lines in each group (3 in my example).
Here's what's going on:
qa records everything up to the next q into a "buffer" in a.
ma creates a mark on the current line.
:5<CR> goes to the next group.
y$ yanks the rest of the line.
'a returns to the mark, set earlier.
$p pastes at the end of the line.
:5<CR> returns to the second group's first line.
dd deletes it.
'a returns to the mark.
jq goes down one line, and stops recording.
3#a repeats the action for each line (3 in my case)
As mentioned elsewhere, block selection is the way to go. But you can also use any variant of:
:!tail -n -6 % | paste -d '\0' % - | head -n 5
This method relies on the UNIX command line. The paste utility was created to handle this sort of line merging.
PASTE(1) BSD General Commands Manual PASTE(1)
NAME
paste -- merge corresponding or subsequent lines of files
SYNOPSIS
paste [-s] [-d list] file ...
DESCRIPTION
The paste utility concatenates the corresponding lines of the given input files, replacing all but the last file's newline characters with a single tab character,
and writes the resulting lines to standard output. If end-of-file is reached on an input file while other input files still contain data, the file is treated as if
it were an endless source of empty lines.
Sample data is the same as rampion's.
:1,4s/$/\=getline(line('.')+4)/ | 5,8d
I wouldn't think make it too complicated.
I would just set virtualedit on
(:set virtualedit=all)
Select block 123 and all below.
Put it after the first column:
abc 123
def 45
... ...
and remove the multiple space between to 1 space:
:%s/\s\{2,}/ /g
I would use complex repeats :)
Given this:
aaa
bbb
ccc
AAA
BBB
CCC
With the cursor on the first line, press the following:
qa}jdd''pkJxjq
and then press #a (and you may subsequently use ##) as many times as needed.
You should end up with:
aaaAAA
bbbBBB
cccCCC
(Plus a newline.)
Explaination:
qa starts recording a complex repeat in a
} jumps to the next empty line
jdd deletes the next line
'' goes back to the position before the last jump
p paste the deleted line under the current one
kJ append the current line to the end of the previous one
x delete the space that J adds between the combined lines; you can omit this if you want the space
j go to the next line
q end the complex repeat recording
After that you'd use #a to run the complex repeat stored in a, and then you can use ## to rerun the last ran complex repeat.
There can be many number of ways to accomplish this. I will merge two blocks of text using any of the following two methods.
suppose first block is at line 1 and 2nd block starts from line 10 with the cursor's initial position at line number 1.
(\n means pressing the enter key.)
1. abc
def
ghi
10. 123
456
789
with a macro using the commands: copy,paste and join.
qaqqa:+9y\npkJjq2#a10G3dd
with a macro using the commands move a line at nth line number and join.
qcqqc:10m .\nkJjq2#c
Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq
I’d like to merge two blocks of lines in Vim, i.e., take lines k through l and append them to lines m through n. If you prefer a pseudocode explanation: [line[k+i] + line[m+i] for i in range(min(l-k, n-m)+1)].
For example,
abc
def
...
123
45
...
should become
abc123
def45
Is there a nice way to do this without copying and pasting manually line by line?
You can certainly do all this with a single copy/paste (using block-mode selection), but I'm guessing that's not what you want.
If you want to do this with just Ex commands
:5,8del | let l=split(#") | 1,4s/$/\=remove(l,0)/
will transform
work it
make it
do it
makes us
harder
better
faster
stronger
~
into
work it harder
make it better
do it faster
makes us stronger
~
UPDATE: An answer with this many upvotes deserves a more thorough explanation.
In Vim, you can use the pipe character (|) to chain multiple Ex commands, so the above is equivalent to
:5,8del
:let l=split(#")
:1,4s/$/\=remove(l,0)/
Many Ex commands accept a range of lines as a prefix argument - in the above case the 5,8 before the del and the 1,4 before the s/// specify which lines the commands operate on.
del deletes the given lines. It can take a register argument, but when one is not given, it dumps the lines to the unnamed register, #", just like deleting in normal mode does. let l=split(#") then splits the deleted lines into a list, using the default delimiter: whitespace. To work properly on input that had whitespace in the deleted lines, like:
more than
hour
our
never
ever
after
work is
over
~
we'd need to specify a different delimiter, to prevent "work is" from being split into two list elements: let l=split(#","\n").
Finally, in the substitution s/$/\=remove(l,0)/, we replace the end of each line ($) with the value of the expression remove(l,0). remove(l,0) alters the list l, deleting and returning its first element. This lets us replace the deleted lines in the order in which we read them. We could instead replace the deleted lines in reverse order by using remove(l,-1).
An elegant and concise Ex command solving the issue can be obtained by
combining the :global, :move, and :join commands. Assuming that
the first block of lines starts on the first line of the buffer, and
that the cursor is located on the line immediately preceding the first
line of the second block, the command is as follows.
:1,g/^/''+m.|-j!
For detailed explanation of this technique, see my answer to
an essentially the same question “How to achieve the “paste -d '␣'”
behavior out of the box in Vim?”.
To join blocks of line, you have to do the following steps:
Go to the third line: jj
Enter visual block mode: CTRL-v
Anchor the cursor to the end of the line (important for lines of differing length): $
Go to the end: CTRL-END
Cut the block: x
Go to the end of the first line: kk$
Paste the block here: p
The movement is not the best one (I'm not an expert), but it works like you wanted. Hope there will be a shorter version of it.
Here are the prerequisits so this technique works well:
All lines of the starting block (in the example in the question abc and def) have the same length XOR
the first line of the starting block is the longest, and you don't care about the additional spaces in between) XOR
The first line of the starting block is not the longest, and you additional spaces to the end.
Here's how I'd do it (with the cursor on the first line):
qama:5<CR>y$'a$p:5<CR>dd'ajq3#a
You need to know two things:
The line number on which the first line of the second group starts (5 in my case), and
the number of lines in each group (3 in my example).
Here's what's going on:
qa records everything up to the next q into a "buffer" in a.
ma creates a mark on the current line.
:5<CR> goes to the next group.
y$ yanks the rest of the line.
'a returns to the mark, set earlier.
$p pastes at the end of the line.
:5<CR> returns to the second group's first line.
dd deletes it.
'a returns to the mark.
jq goes down one line, and stops recording.
3#a repeats the action for each line (3 in my case)
As mentioned elsewhere, block selection is the way to go. But you can also use any variant of:
:!tail -n -6 % | paste -d '\0' % - | head -n 5
This method relies on the UNIX command line. The paste utility was created to handle this sort of line merging.
PASTE(1) BSD General Commands Manual PASTE(1)
NAME
paste -- merge corresponding or subsequent lines of files
SYNOPSIS
paste [-s] [-d list] file ...
DESCRIPTION
The paste utility concatenates the corresponding lines of the given input files, replacing all but the last file's newline characters with a single tab character,
and writes the resulting lines to standard output. If end-of-file is reached on an input file while other input files still contain data, the file is treated as if
it were an endless source of empty lines.
Sample data is the same as rampion's.
:1,4s/$/\=getline(line('.')+4)/ | 5,8d
I wouldn't think make it too complicated.
I would just set virtualedit on
(:set virtualedit=all)
Select block 123 and all below.
Put it after the first column:
abc 123
def 45
... ...
and remove the multiple space between to 1 space:
:%s/\s\{2,}/ /g
I would use complex repeats :)
Given this:
aaa
bbb
ccc
AAA
BBB
CCC
With the cursor on the first line, press the following:
qa}jdd''pkJxjq
and then press #a (and you may subsequently use ##) as many times as needed.
You should end up with:
aaaAAA
bbbBBB
cccCCC
(Plus a newline.)
Explaination:
qa starts recording a complex repeat in a
} jumps to the next empty line
jdd deletes the next line
'' goes back to the position before the last jump
p paste the deleted line under the current one
kJ append the current line to the end of the previous one
x delete the space that J adds between the combined lines; you can omit this if you want the space
j go to the next line
q end the complex repeat recording
After that you'd use #a to run the complex repeat stored in a, and then you can use ## to rerun the last ran complex repeat.
There can be many number of ways to accomplish this. I will merge two blocks of text using any of the following two methods.
suppose first block is at line 1 and 2nd block starts from line 10 with the cursor's initial position at line number 1.
(\n means pressing the enter key.)
1. abc
def
ghi
10. 123
456
789
with a macro using the commands: copy,paste and join.
qaqqa:+9y\npkJjq2#a10G3dd
with a macro using the commands move a line at nth line number and join.
qcqqc:10m .\nkJjq2#c