VIM - Reformat text to one line paragraphs - vim

I have a text file like the following:
--------
FOX&DOGS
The quick brown. Fox
jumped.
Over the lazy dogs.
-------------------
I want to change it as follow:
--------
FOX&DOGS
The quick brown. Fox jumped.
Over the lazy dogs.
-------------------
So in general:
preserve empty line/lines
have new-lines just after any period_newline ".\n" (end of paragraph... In the above example I don’t want to cut line after "brown." for instance: there is just a period but not followed by newline, so it isn’t an end of a paragraph, so it has to stay on the same line)
My solution:
%s/\n\n/#\r#\r/ | %s/\.\n/\.#\r/ | %j | s/# /\r/g | $$d
The idea is a bit rude:
mark all ends of paragraph and empty lines (I have chosen "#" as marker)
join all lines in a long single one
substitute the marker "# " (there is a space after #) with carriage return "\r" (newline)
delete last empty line created during this procedure
It seemed to work so I also created an alias in vimrc:
command Par %s/\n\n/#\r#\r/ | %s/\.\n/\.#\r/ | %j | s/# /\r/g | $$d
The problem:
If there aren’t any empty lines it returns error "pattern not found", and it doesn’t change anything. Seems a sort of conditional instruction is needed (if you find pattern substitute it with... else don't stop, continue with the other commands).
Any idea to solve in a simple way?

Maybe I found a solution:
add a blank line after the last one, so that the pattern “\n\n” is always found even if it isn’t present in the original file, and the error can’t block next commands.
in the end we will have to remove 2 blank lines at the bottom created by the substitution “s/# /\r/g”
So the command I tried is:
$ | put _ | %s/\n\n/#\r#\r/ | %s/\.\n/\.#\r/ | %j | s/# /\r/g | $$d | $$d
$ go to the last line
append a blank line
mark newlines involving blank lines (also last blank line added) with # character
mark newlines involving period (the last line can’t end with period due to the marker # added at the previous step)
join all lines in a long one
replace markers “# ” with a newline (here we creates two more blank lines at the bottom, have to be removed)
remove the two last blank lines added
Limitations:
if a paragraph ends with a punctuation mark other than “period”, it doesn’t work at all.
Any idea to improve my raw oneliner is welcome!

Related

Vi: delete only not-first (second, third, …) occurence of multiple lines

How to delete every occurence of multiple lines in a text file except the first one? (This question might be related.)
I need to keep the order, otherwise I had used :sort u.
Example:
hsdf
asdf
csdf
csdf
hsdf
dsdf
jsdf
asdf
results in
hsdf
asdf
csdf
dsdf
jsdf
instead of
asdf
csdf
dsdf
hsdf
jsdf
Sometimes you don't need to think much. Just take that big hammer and make a bang.
So what's the plan? Loop over all lines; store them into associative array; if it's already there then delete it from buffer. Looks dumb enough to get things working by the first attempt:
:let foo = {}
:g/./if foo->has_key(getline(".")) | delete | else | let foo[getline(".")] = 1 | endif
:unlet foo

Using sed to delete specific lines after LAST occurrence of pattern

I have a file that looks like:
this name
this age
Remove these lines and space above.
Remove here too and space below
Keep everything below here.
I don't want to hardcode 2 as the number of lines containing "this" can change. How can I delete 4 lines after the last occurrence of the string. I am trying sed -e '/this: /{n;N;N;N;N;d}' but it is deleting after the first occurrence of the string.
Could you please try following.
awk '
FNR==NR{
if($0~/this/){
line=FNR
}
next
}
FNR<=line || FNR>(line+4)
' Input_file Input_file
Output will be as follows with shown samples.
this: name
this: age
Keep everything below here.
You can also use this minor change to make your original sed command work.
sed '/^this:/ { :k ; n ; // b k ; N ; N ; N ; d }' input_file
It uses a loop which prints the current line and reads the next one (n) while it keeps matching the regex (the empty regex // recalls the latest one evaluated, i.e. /^this:/, and the command b k goes back to the label k on a match). Then you can append the next 3 lines and delete the whole pattern space as you did.
Another possibility, more concise, using GNU sed could be this.
sed '/^this:/ b ; /^/,$ { //,+3 d }' input_file
This one prints any line beginning with this: (b without label goes directly to the next line cycle after the default print action).
On the first line not matching this:, two nested ranges are triggered. The outer range is "one-shot". It is triggered right away due to /^/ which matches any line then it stays triggered up to the last line ($). The inner range is a "toggle" range. It is also triggered right away because // recalls /^/ on this line (and only on this line, hence the one-shot outer range) then it stays trigerred for 3 additional lines (the end address +3 is a GNU extension). After that, /^/ is no longer evaluated so the inner range cannot trigger again because // recalls /^this:/ (which is short cut early).
This might work for you (GNU sed):
sed -E ':a;/this/n;//ba;$!N;$!ba;s/^([^\n]*\n?){4}//;/./!d' file
If the pattern space (PS) contains this, print the PS and fetch the next line.
If the following line contains this repeat.
If the current line is not the last line, append the next line and repeat.
Otherwise, remove the first four lines of the PS and print the remainder.
Unless the PS is empty in which case delete the PS entirely.
N.B. This only reads the file once. Also the OP says
How can I delete 4 lines after the last occurrence of the string
However the example would seem to expect 5 lines to be deleted.

How to insert original line number in g/pattern/move

vim: insert original line number in g/pattern/move $
I'm debugging some event order in a log and like to check two set of events sequence by the line number of the showing log. Usually, I used g/pattern/move $ for some interesting info. But I cannot find a way to insert the original line number of them. Please help.
I tried :
g/pattern/move $; printf("%d",line("."))
but it does not work.
Can't help thinking of something very straightforward, for example:
g/pattern/call append(line('$'), line('.') . ' ' . getline('.'))
A slightly different way but I have following mapping in my _vimrc
nnoremap <F3> :redir! #f<cr>:silent g//<cr>:redir! END<cr>:enew!<cr>:put! f<cr>:let #f=#/<cr>:g/^$/d<cr>:let #/=#f<cr>gg
It opens a new buffer with all your search matches, including the linenumbers where the match occured.
I have figured out a way to insert at first the line number on the lines that have the pattern and after that moving the same lines to the end of the file:
:%s,\v^\ze.*pattern,\=line('.') . ' ' ,g | g/pattern/m$
We have two commands:
:%s,\v^\ze.*pattern,\=line('.') . ' ' ,g
, ....................... we are using comma as delimiter
\v ...................... very magic substitution
^ ....................... Regular expression for beginning of line
\ze ..................... indicates that all after it will not be substituted
\=line('.') ............. gets the line number
. ' ' .................. concatenates one space after the number
The second command is separated with |
g/pattern/m$
m$ ....................... moves the pattern to the end of file

Linux - Remove line feed

Is there a way to use linux command to remove the LF's displayed below.
Each row should begin with string 'F|'. Unfortunate multiple rows in my Oracle db are stored with hex 0a LF which at spool causes linebreaks.
Thanks
$grep -nvB 1 '^F|' File.txt
4720156-F|29|204380|A|16060|Telephone Updated by DCA|99996319 ,
4720157: |manual|
--
6005453-F|29|121389|A|16060|Telephone Updated by DCA|96844599 ,
6005454: |new|
--
6354243-F|29|366910|A|16060|Telephone Updated by DCA|
6354244: |new|
--
13318314-F|29|397713|A|16060|Telephone Updated by DCA|97597079 ,
13318315: ,52094436|new|
--
13471591-F|29|17945|A|16060|Telephone Updated by DCA|47990248,94291610,
13471592: |new|
--
13471607-F|29|152501|A|16060|Telephone Updated by DCA|
13471608: ,90290027,38297606|new|
--
13944867-F|29|322564|A|16060|Telephone Updated by DCA|
13944868: |new|
User#db01.test processed$
So, you want the lines which do not begin with F| to be joined to the line before (which does). A solution with sed:
sed -n '/^F|/{x;2,$p;be};x;G;s/\n//;h;:e;${g;p}' File.txt
/^F|/ If line begins with F|:
x Exchange the contents of the hold and pattern spaces
2,$p If not the first line: print the (previously held) line
be Branch to label e
Otherwise (line doesn't begin with F|):
x Exchange the contents of the hold and pattern spaces
G Append hold space to pattern space (lines joined, but still LF embedded)
s/\n// Remove the LF
h Copy pattern space (joined line) to hold space
:e Label e (both cases above get here):
$ If the last line:
g Copy hold space to pattern space
p Print the (last) line

How to remove lines with duplicate pair of words?

I have a file with multiple columns like
abc cvn bla..bla..n_columns
xnt yuk m_columns
abc cvn xxxx
vbh ast
sth rty
xnt yuk
I want to create a new file by comparing the repeated word pairs in first two columns.
The final file will look like
abc cvn bla..bla..n_columns
xnt yuk m_columns
vbh ast
sth rty
All you need is:
awk '!seen[$1,$2]++' file
If abc cvn xxxx appears before abc cvn bla..bla..n_columns I just want
to keep any of the line. It does not matter for me which line should
be there. Any of the line will be okay.
If the output sequence doesn't matter, you can use sort
sort -u -k1,2 file
otherwise you should use awk as suggested by devnull
sed -n 'H
$ {x
s/$/\
/
: again
s/\(\n\)\([^ ]\{1,\} \{1,\}[^ [:cntrl:]]\{1,\}\)\(.*\)\1\2[^[:cntrl:]]*\n/\1\2\3\1/
t again
s/\n\(.*\)\n/\1/
p
}' YourFile
based on any repeated peer of value (pair is character not space or \n separate by "space") in whole text with a loop while there is a peer finnded and replaced.
principle
H Append each line (sed work line by line in work buffer) from working buffer into the hold buffer (there is a working buffer and a hold buffer)
$ at the end
x swap working and hold buffer, so all the file is in working buffer but starting with a new line (due to Append action)
s/... Add a New line at the end (for later substitution process delimiter)
: again put a label anchor (for a later goto)
s/...// is the core of the process. Search a starting (after a new line) peer of word and a later same starting peer, if find, substitute the whole block with the part from start of block until second peer not included. (block start at first peer until new line on same line as second peer)
t again if substitution earlier is made, go to label again
s/.../ remove the added new line at start and end
p print the result
Sed is trying always to take the mose of a pattern so if there is more than 2 peer of 1 of the uniq peer, it first remove the last peer and go back until there is only 1

Resources