Linux - Remove line feed - linux

Is there a way to use linux command to remove the LF's displayed below.
Each row should begin with string 'F|'. Unfortunate multiple rows in my Oracle db are stored with hex 0a LF which at spool causes linebreaks.
Thanks
$grep -nvB 1 '^F|' File.txt
4720156-F|29|204380|A|16060|Telephone Updated by DCA|99996319 ,
4720157: |manual|
--
6005453-F|29|121389|A|16060|Telephone Updated by DCA|96844599 ,
6005454: |new|
--
6354243-F|29|366910|A|16060|Telephone Updated by DCA|
6354244: |new|
--
13318314-F|29|397713|A|16060|Telephone Updated by DCA|97597079 ,
13318315: ,52094436|new|
--
13471591-F|29|17945|A|16060|Telephone Updated by DCA|47990248,94291610,
13471592: |new|
--
13471607-F|29|152501|A|16060|Telephone Updated by DCA|
13471608: ,90290027,38297606|new|
--
13944867-F|29|322564|A|16060|Telephone Updated by DCA|
13944868: |new|
User#db01.test processed$

So, you want the lines which do not begin with F| to be joined to the line before (which does). A solution with sed:
sed -n '/^F|/{x;2,$p;be};x;G;s/\n//;h;:e;${g;p}' File.txt
/^F|/ If line begins with F|:
x Exchange the contents of the hold and pattern spaces
2,$p If not the first line: print the (previously held) line
be Branch to label e
Otherwise (line doesn't begin with F|):
x Exchange the contents of the hold and pattern spaces
G Append hold space to pattern space (lines joined, but still LF embedded)
s/\n// Remove the LF
h Copy pattern space (joined line) to hold space
:e Label e (both cases above get here):
$ If the last line:
g Copy hold space to pattern space
p Print the (last) line

Related

How to append lines that match two patterns to the previous line in a file?

I have a csv file where what's supposed to be a single line, is split across several. I need help to find a way to join the lines that are split. Also, the number of fields (separated by ,) is not fixed.
A correct line has the following pattern:
X,X,X,"()",Y,H where X can be any number of fields. However, the bold part (end of the string) is fixed. Y and H are both one word.
The issue is that this line can appear as (or any variant of this):
X,X,
X, "()"
,Y,H
What I need is a way (awk, sed) of appending the lines that don't have 24 or more commas and do not end with ",Y,H, to the previous line.
Please bear in mind that it's a large file, although I have 256 GB of RAM.
Example
Correct lines
a, b, c, "()", h, k
a, b, c, d, "()", h, k
Same lines in the file
First line
a, b, c,
"()", h, k
Second line
a, b, c, d, "()"
, h
, k
So far I've tried this (not working):
awk '/"[:space:]*,[:space:]*[:alpha:]+[:space:]*,[:space:]*[:alpha:]+$/{print}' check.csv
to try to find the lines ending with ", X, Y where X and Y are words.
Also, as the minimum number of correct fields is 24, I've used:
awk 'NF<24{print}' check.csv
to filter out lines with less than 24 fields.
My idea is to detect lines that match both regular expressions and append them to the previous line.
Thank you!
This might work for you (GNU sed):
sed '/"()", *[^,]\+, *[^,]\+$/b;:a;N;s/\n//;/"()", *[^,]\+, *[^,]\+$/!ba;P;D' file
Do not process a correct line, just bail out.
Otherwise append the next line, remove the introduced newline and try and match again.
Repeat until a match, then print/delete the first line and repeat.
perl -lanF, -e 'push #L, grep length, #F; if ($L[-3] eq q/"()"/) { print join ",", #L; #L=() }' file
use -l -n -e to loop over input lines w/o printing, append linebreaks to output
use -a -F, to create #F array by splitting input on commas
push #L, grep length, #F push nonempty fields onto #L
if ($L[-3] eq q/"()"/) - if the 3rd to last accumulated field is the magic marker:
print join ",", #L print all of #L joined with commas
#L=() reset #L

Using sed to delete specific lines after LAST occurrence of pattern

I have a file that looks like:
this name
this age
Remove these lines and space above.
Remove here too and space below
Keep everything below here.
I don't want to hardcode 2 as the number of lines containing "this" can change. How can I delete 4 lines after the last occurrence of the string. I am trying sed -e '/this: /{n;N;N;N;N;d}' but it is deleting after the first occurrence of the string.
Could you please try following.
awk '
FNR==NR{
if($0~/this/){
line=FNR
}
next
}
FNR<=line || FNR>(line+4)
' Input_file Input_file
Output will be as follows with shown samples.
this: name
this: age
Keep everything below here.
You can also use this minor change to make your original sed command work.
sed '/^this:/ { :k ; n ; // b k ; N ; N ; N ; d }' input_file
It uses a loop which prints the current line and reads the next one (n) while it keeps matching the regex (the empty regex // recalls the latest one evaluated, i.e. /^this:/, and the command b k goes back to the label k on a match). Then you can append the next 3 lines and delete the whole pattern space as you did.
Another possibility, more concise, using GNU sed could be this.
sed '/^this:/ b ; /^/,$ { //,+3 d }' input_file
This one prints any line beginning with this: (b without label goes directly to the next line cycle after the default print action).
On the first line not matching this:, two nested ranges are triggered. The outer range is "one-shot". It is triggered right away due to /^/ which matches any line then it stays triggered up to the last line ($). The inner range is a "toggle" range. It is also triggered right away because // recalls /^/ on this line (and only on this line, hence the one-shot outer range) then it stays trigerred for 3 additional lines (the end address +3 is a GNU extension). After that, /^/ is no longer evaluated so the inner range cannot trigger again because // recalls /^this:/ (which is short cut early).
This might work for you (GNU sed):
sed -E ':a;/this/n;//ba;$!N;$!ba;s/^([^\n]*\n?){4}//;/./!d' file
If the pattern space (PS) contains this, print the PS and fetch the next line.
If the following line contains this repeat.
If the current line is not the last line, append the next line and repeat.
Otherwise, remove the first four lines of the PS and print the remainder.
Unless the PS is empty in which case delete the PS entirely.
N.B. This only reads the file once. Also the OP says
How can I delete 4 lines after the last occurrence of the string
However the example would seem to expect 5 lines to be deleted.

delete a line after a pattern only if it is blank using sed or awk

I want to delete a blank line only if this one is after the line of my pattern using sed or awk
for example if I have
G
O TO P999-ERREUR
END-IF.
the pattern in this case is G
I want to have this output
G
O TO P999-ERREUR
END-IF.
This will do the trick:
$ awk -v n=-2 'NR==n+1 && !NF{next} /G/ {n=NR}1' file
G
O TO P999-ERREUR
END-IF.
Explanation:
-v n=-2 # Set n=-2 before the script is run to avoid not printing the first line
NR == n+1 # If the current line number is equal to the matching line + 1
&& !NF # And the line is empty
{next} # Skip the line (don't print it)
/G/ # The regular expression to match
{n = NR} # Save the current line number in the variable n
1 # Truthy value used a shorthand to print every (non skipped) line
Using sed
sed '/GG/{N;s/\n$//}' file
If it sees GG, gets the next line, removes the newline between them if the next line is empty.
Note this will only remove one blank line after, and the line must be blank i.e not spaces or tabs.
This might work for you (GNU sed):
sed -r 'N;s/(G.*)\n\s*$/\1/;P;D' file
Keep a moving window of two lines throughout the length of the file and remove a newline (and any whitespace) if it follows the intended pattern.
Using ex (edit in-place):
ex +'/G/j' -cwq foo.txt
or print to the standard output (from file or stdin):
ex -s +'/GG/j|%p|q!' file_or_/dev/stdin
where:
/GG/j - joins the next line when the pattern is found
%p - prints the buffer
q! - quits
For conditional checking (if there is a blank line), try:
ex -s +'%s/^\(G\)\n/\1/' +'%p|q!' file_or_/dev/stdin

How to remove lines with duplicate pair of words?

I have a file with multiple columns like
abc cvn bla..bla..n_columns
xnt yuk m_columns
abc cvn xxxx
vbh ast
sth rty
xnt yuk
I want to create a new file by comparing the repeated word pairs in first two columns.
The final file will look like
abc cvn bla..bla..n_columns
xnt yuk m_columns
vbh ast
sth rty
All you need is:
awk '!seen[$1,$2]++' file
If abc cvn xxxx appears before abc cvn bla..bla..n_columns I just want
to keep any of the line. It does not matter for me which line should
be there. Any of the line will be okay.
If the output sequence doesn't matter, you can use sort
sort -u -k1,2 file
otherwise you should use awk as suggested by devnull
sed -n 'H
$ {x
s/$/\
/
: again
s/\(\n\)\([^ ]\{1,\} \{1,\}[^ [:cntrl:]]\{1,\}\)\(.*\)\1\2[^[:cntrl:]]*\n/\1\2\3\1/
t again
s/\n\(.*\)\n/\1/
p
}' YourFile
based on any repeated peer of value (pair is character not space or \n separate by "space") in whole text with a loop while there is a peer finnded and replaced.
principle
H Append each line (sed work line by line in work buffer) from working buffer into the hold buffer (there is a working buffer and a hold buffer)
$ at the end
x swap working and hold buffer, so all the file is in working buffer but starting with a new line (due to Append action)
s/... Add a New line at the end (for later substitution process delimiter)
: again put a label anchor (for a later goto)
s/...// is the core of the process. Search a starting (after a new line) peer of word and a later same starting peer, if find, substitute the whole block with the part from start of block until second peer not included. (block start at first peer until new line on same line as second peer)
t again if substitution earlier is made, go to label again
s/.../ remove the added new line at start and end
p print the result
Sed is trying always to take the mose of a pattern so if there is more than 2 peer of 1 of the uniq peer, it first remove the last peer and go back until there is only 1

Can you explain this sed one-liner?

The following one liner prints out the content of the file in reverse
$ sed -n '1!G;h;$p' test.txt
How is it possible when sed reads the file line by line? Can you explain the meaning of
n flag
1!
G
h
and $p
in this command?
This will do the same job as tac, i.e. revert the order of rows.
Rewriting the sed script to pseudocode, it means:
$line_number = 1;
foreach ($input in $input_lines) {
// current input line is in $input
if ($line_number != 1) // 1!
$input = $input + '\n' + $hold; // G
$hold = $input; // h
$line_number++
}
print $input; // $p
As you can see, the sed language is very expressive :-) the 1! and $ are so called addresses, which put conditions when the command should be run. 1! means not on the first row, $ means at the end. Sed has one auxiliary memory register which is called hold space.
For more information type info sed on linux console (this is the best documentation).
-n disables the default print $input command in the loop itself.
The terms pattern space and hold space are equivalents of the variables $input and $hold (respectively) in this example.
n flag -> Disable auto-printing.
1! -> Any line except the first one.
G -> Append a newline and content of 'hold space' to 'pattern space'
h -> Replace content of 'hold space' with content of 'pattern space'
$ -> Last line.
p -> print
So, it means: Reverse the content of your file, as I understand it.
EDIT to add some explanation (thanks to potong, see his comment for the original one):
Addresses, like 1 and $ are bound to next commands, grouped using {...} or single without them. So in this case 1! applies to G and $ to p, whereas h is not attached to an address and applies to all addresses. That is $!G and $!{G} are the same.

Resources