Remove new line character by checking the expression, using sed - linux

Have to write a script which updates the file in this way.
raw file:
<?blah blah blah?>
<pen>
<?pineapple?>
<apple>
<pen>
Final file:
<?blah blah blah?><pen>
<?pineapple?><apple><pen>
Where ever in the file if the new line charter is not followed by
<?
We have to remove the newline in order to append it at the end of previous line.
Also it will be really helpful if you explain how your sed works.

Perl solution:
perl -pe 'chomp; substr $_, 0, 0, "\n" if $. > 1 && /^<\?/'
-p reads the input line by line, printing each line after changes
chomp removes the final newline
substr with 4 arguments modifies the input string, here it prepends newline if it's not the first line ($. is the input line number) and the line starts with <?.

Sed solution:
sed ':a;N;$!ba;s/\n\(<[^?]\)/\1/g' file > newfile
The basic idea is to replace every
\n followed by < not followed by ?
with what you matched except the \n.

When you are happy with a solution that puts every <? at the start of a line, you can combine tr with sed.
tr -d '\n' < inputfile| sed 's/<?/\n&/g;$s/$/\n/'
Explanation:
I use tr ... < inputfile and not cat inputfile | tr ... avoiding an additional catcall.
The sed command has 2 parts.
In s/<?/\n&/g it will insert a newline and with & it will insert the matched string (in this case always <?, so it will only save one character).
With $s/$/\n/ a newline is appended at the end of the last line.
EDIT: When you only want newlines before <? when you had them already,
you can use awk:
awk '$1 ~ /^<\?/ {print} {printf("%s",$0)} END {print}'
Explanation:
Consider the newline as the start of the line, not the end. Then your question transposes into "write a newline when the line starts with <?. You must escape the ? and use ^ for the start of the line.
awk '$1 ~ /^<\?/ {print}'
Next print the line you read without a newline character.
And you want a newline at the end.

Related

Create newline before line matching a pattern, if there is no newline yet

I have a text file with a few lines in it. What i am trying to do is to find all lines matching a pattern and if there is no newline (= non empty line) before them, create it.
Something like this, but it is not working properly:
sed -i '/[a-zA-Z0-9]/{N;/PATTERN/{s/PATTERN/\nPATTERN/}}' FILENAME
I know it could be probably done more easily and nicely in awk or perl/bash, but i would prefer an one line/one step solution.
Sample input file:
LINE1
LINE2
PATTERN
LINE3
PATTERN
LINE4
Expected output:
LINE1
LINE2
PATTERN
LINE3
PATTERN
LINE4
I'm not very good at sed but here's how I'd do it in awk:
awk 'prev != "" && /PATTERN/ { print "" } { prev = $0; print }' file
If prev (the previous line) is not empty and the current line matches /PATTERN/ then print a blank line. Unconditionally save the current line for comparison with the next, and print the current line.
To achieve an "in-place" edit (like sed -i), just redirect the command to a temporary file and then overwrite the original:
awk 'prev != "" && /PATTERN/ { print "" } { prev = $0; print }' file > tmp && mv tmp file
Note that since prev is initially unset, this won't print a newline at the start of the output, even if the first line matches /PATTERN/. To get around this, you can change the condition to:
(NR == 1 || prev != "") && /PATTERN/
You can also achieve the in-place edit with GNU awk, using the -i inplace option.
Take a look at this GNU sed (note that awk is a better tool for the job):
sed -i '/PATTERN/{x;/^$/!i\
x};h' input
h is a command that saves the contents of the pattern space into the hold buffer. It saves the line at the end of each cycle so that it can be used as the "previous" line in the next cycle
x exchanges the contents of the hold and pattern spaces. Whenever the current line matches your /PATTERN/, the previously saved line is put into the pattern space. If the previous line is NOT empty (/^$/!), newline is inserted with the i command. The current line is then put back into the pattern space with the x command
If you want to add a newline even if the first line matches /PATTERN/, use:
sed -i '/PATTERN/{1h;x;/^$/! ...
Further reading:
GNU sed: Less Frequently-Used Commands
grymoire.com sed tutorial

replace string in a file with a string from within the same file

I have a file like this (tens of variables) :
PLAY="play"
APPS="/opt/play/apps"
LD_FILER="/data/mysql"
DATA_LOG="/data/log"
I need a script that will output the variables into another file like this (with space between them):
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER}
Is it possible ?
I would say:
$ awk -F= '{printf "%s=${%s} ", $1,$1} END {print ""}' file
PLAY=${PLAY} APPS=${APPS} LD_FILER=${LD_FILER} DATA_LOG=${DATA_LOG}
This loops through the file and prints the content before = in a format var=${var} together with a space. At the end, it prints a new line.
Note this leaves a trailing space at the end of the line. If this matters, we can check how to improve it.
< input sed -e 's/\(.*\)=.*/\1=${\1}/' | tr \\n \ ; echo
sed 's/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile
your sample exclude last line so if this is important
sed '/DATA_LOG=/ d
s/"\([^"]*"\)"/={\1}/;H;$!d
x;y/\n/ /;s/.//' YourFile

replace word in line only if line number start with + csv file

I use the following sed command in order to replace string in CSV line
( the condition to replace the string is to match the number in the beginning of the CSV file )
SERIAL_NUM=1
sed "/$SERIAL_NUM/ s//OK/g" file.csv
the problem is that I want to match only the number that start in the beginning of the line ,
but sed match other lines that have this number
example:
in this example I want to replace the word - STATUS to OK but only in line that start with 1 ( before the "," separator )
so I do this
SERIAL_NUM=1
more file.csv
1,14556,43634,266,242,def,45,STATUS
2,4345,1,43,57,24,657,SD,STATUS
3,1,WQ,435,676,90,3,44f,STATUS
sed -i "/$SERIAL_NUM/ s/STATUS/OK/g" file.csv
more file.csv
1,14556,43634,266,242,def,45,OK
2,4345,1,43,57,24,657,SD,OK
3,1,WQ,435,676,90,3,44f,OK
but sed also replace the STATUS to OK also in line 2 and line 3 ( because those lines have the number 1 )
please advice how to change the sed syntax in order to match only the number that start the line before the "," separator
remark - solution can be also with perl line liner or awk ,
You can use anchor ^ to make sure $SERIAL_NUM only matches at start and use , after that to make sure there is a comma followed by this number:
sed "/^$SERIAL_NUM,/s/STATUS/OK/g" file.csv
Since this answer was ranked fifth in the Stackoverflow perl report but had no perl content, I thought it would be useful to add the following - instead of removing the perl tag :-)
#!/usr/bin/env perl
use strict;
use warnings;
while(<DATA>){
s/STATUS/OK/g if /^1\,/;
print ;
}
__DATA__
1,14556,43634,266,242,def,45,STATUS
2,4345,1,43,57,24,657,SD,STATUS
3,1,WQ,435,676,90,3,44f,STATUS
or as one line:
perl -ne 's/STATUS/OK/g if /^1\,/;' file.csv

sed to insert on first match only

UPDATED:
Using sed, how can I insert (NOT SUBSTITUTE) a new line on only the first match of keyword for each file.
Currently I have the following but this inserts for every line containing Matched Keyword and I want it to only insert the New Inserted Line for only the first match found in the file:
sed -ie '/Matched Keyword/ i\New Inserted Line' *.*
For example:
Myfile.txt:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
changed to:
Line 1
Line 2
Line 3
New Inserted Line
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
You can sort of do this in GNU sed:
sed '0,/Matched Keyword/s//New Inserted Line\n&/'
But it's not portable. Since portability is good, here it is in awk:
awk '/Matched Keyword/ && !x {print "Text line to insert"; x=1} 1' inputFile
Or, if you want to pass a variable to print:
awk -v "var=$var" '/Matched Keyword/ && !x {print var; x=1} 1' inputFile
These both insert the text line before the first occurrence of the keyword, on a line by itself, per your example.
Remember that with both sed and awk, the matched keyword is a regular expression, not just a keyword.
UPDATE:
Since this question is also tagged bash, here's a simple solution that is pure bash and doesn't required sed:
#!/bin/bash
n=0
while read line; do
if [[ "$line" =~ 'Matched Keyword' && $n = 0 ]]; then
echo "New Inserted Line"
n=1
fi
echo "$line"
done
As it stands, this as a pipe. You can easily wrap it in something that acts on files instead.
If you want one with sed*:
sed '0,/Matched Keyword/s//Matched Keyword\nNew Inserted Line/' myfile.txt
*only works with GNU sed
This might work for you:
sed -i -e '/Matched Keyword/{i\New Inserted Line' -e ':a;n;ba}' file
You're nearly there! Just create a loop to read from the Matched Keyword to the end of the file.
After inserting a line, the remainder of the file can be printed out by:
Introducing a loop place holder :a (here a is an arbitrary name).
Print the current line and fetch the next into the pattern space with the ncommand.
Redirect control back using the ba command which is essentially a goto to the a place holder. The end-of-file condition is naturally taken care of by the n command which terminates any further sed commands if it tries to read passed the end-of-file.
With a little help from bash, a true one liner can be achieved:
sed $'/Matched Keyword/{iNew Inserted Line\n:a;n;ba}' file
Alternative:
sed 'x;/./{x;b};x;/Matched Keyword/h;//iNew Inserted Line' file
This uses the Matched Keyword as a flag in the hold space and once it has been set any processing is curtailed by bailing out immediately.
If you want to append a line after first match only, use AWK instead of SED as below
awk '{print} /Matched Keyword/ && !n {print "New Inserted Line"; n++}' myfile.txt
Output:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
New Inserted Line
Line 4
This line contains the Matched Keyword and other stuff
Line 6

Extract K-th Line from Chunks Using Sed/AWK/Perl

I have some data that looks like this. It comes in chunk of four lines. Each chunk starts with a # character.
#SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
::::::::::::::::::::::::;;8
#SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
88888888888888888888888888
What I want to do is to extract last line of each chunk. Yielding:
::::::::::::::::::::::::;;8
888888888888888888888888888
Note that the last line of the chunk may contain any standard ASCII character
including #.
Is there an effective one-liner to do it?
The following sed command will print the 3rd line after the pattern:
sed -n '/^#/{n;n;n;p}' file.txt
If there are no blank lines:
perl -ne 'print if $. % 4 == 0' file
$ awk 'BEGIN{RS="#";FS="\n"}{print $4 } ' file
::::::::::::::::::::::::;;8
88888888888888888888888888
If you always have those 4 lines in a chunk, some other ways
$ ruby -ne 'print if $.%4==0' file
::::::::::::::::::::::::;;8
88888888888888888888888888
$ awk 'NR%4==0' file
::::::::::::::::::::::::;;8
88888888888888888888888888
It also seems like your line is always after the line that start with "+", so
$ awk '/^\+/{getline;print}' file
::::::::::::::::::::::::;;8
88888888888888888888888888
$ ruby -ne 'gets && print if /^\+/' file
::::::::::::::::::::::::;;8
88888888888888888888888888
This prints the lines before lines that starts with #, and also the last line. It can work with non uniform sized chunks, but assumes that only a chunk leading line starts with #.
sed -ne '1d;$p;/^#/!{x;d};/^#/{x;p}' file
Some explanation is in order:
First you don't need the first line so delete it 1d
Next you always need the last line, so print it $p
If you don't have a match swap it into the hold buffer and delete it x;d
If you do have match swap it out of the hold buffer, and print it x;p
This works similarly to dogbane's answer
awk '/^#/ {mark = NR} NR == mark + 3 {print}' inputfile
And, like that answer, will work regardless of the number of lines in each chunk (as long as there are at least 4).
The direct analog to that answer, however, would be:
awk '/^#/ {next; next; next; print}' inputfile
this can be done using grep easily
grep -A 1 '^#' ./infile
This might work for you (GNU sed):
sed '/^#/,+2d' file

Resources