Replacing newline character [duplicate]

Replacing newline character [duplicate] - linux

This question already has answers here:
How can I replace each newline (\n) with a space using sed?
(43 answers)
Closed 8 years ago.
I have an XML file which has occasional lines that are split into 2: the first line ending with 
. I want to concatenate any such lines and remove the 
, perhaps replacing it with a space.
e.g.
<message>hi I am
here </message>
needs to become
<message>hi I am here </message>
I've tried:
sed -i 's/
\/n/ /g' filename
with no luck.
Any help is much appreciated!

Here is a GNU sed version:
sed ':a;$bc;N;ba;:c;s/
\n/ /g' file
Explanation:
sed '
:a # Create a label a
$bc # If end of file then branch to label c
N # Append the next line to pattern space
ba # branch back to label a to repeat until end of file
:c # Another label c
s/
\n/ /g # When end of file is reached perform this substitution
' file

give this gawk one-liner a try:
awk -v RS="" 'gsub(/
\n/," ")+7' file
tested here with your example:
kent$ echo "<message>hi I am
here </message>"|awk -v RS="" 'gsub(/
\n/," ")+7'
<message>hi I am here </message>

You can use this awk:
awk -F"
" '/
$/ {a=$1; next} a{print a, $0; a=""; next} 1' file
Explanation
-F"
" set 
 as delimiter, so that the first field will be always the desired part of the string.
/
$/ {a=$1; next} if the line ends with 
, store it in a and jump to the next line.
a{print a, $0; a=""; next} if a is set, print it together with current line. Then unset a for future loops. Finally jump to next line.
1 as true, prints current line.
Sample
$ cat a
yeah
<message>hi I am
here </message>
hello
bye
$ awk -F"
" '/
$/ {a=$1; next} a{print a, $0; a=""; next} 1' a
yeah
<message>hi I am here </message>
hello
bye

This will work for you:
sed -i '{:q;N;s/&.*\n/ /g;t q}' <filename>
However replacing newline with sed is always a bash(read bad) idea. Chances of making an error are high.
So another but simpler solution:
tr -s '\&\#13\;\n' ' ' < <filename>
tr is replacing all chracter in match with space, so without -s it would have printed
<message>hi I am here </message>
-s from man page:
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character.

Related

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp

REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )

Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"

Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

Split or join lines in Linux using sed

I have file that contains below information
$ cat test.txt
Studentename:Ram
rollno:12
subjects:6
Highest:95
Lowest:65
Studentename:Krish
rollno:13
subjects:6
Highest:90
Lowest:45
Studentename:Sam
rollno:14
subjects:6
Highest:75
Lowest:65
I am trying place info of single student in single.
i.e My output should be
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65.
Below is the command I wrote
cat test.txt | tr "\n" " " | sed 's/Lowest:[0-9]\+/Lowest:[0:9]\n/g'
Above command is breaking line at regex Lowest:[0-9] but it doesn't print the pattern. Instead it is printing Lowest:[0-9].
Please help

Try:
$ sed '/^Studente/{:a; N; /Lowest/!ba; s/\n/ /g}' test.txt
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65
How it works
/^Studente/{...} tells sed to perform the commands inside the curly braces only on lines that start with Studente. Those commands are:
:a
This defines a label a.
N
This reads in the next line and appends it to the pattern space.
/Lowest/!ba
If the current pattern space does not contain Lowest, this tells sed to branch back to label a.
In more detail, /Lowest/ is true if the line contains Lowest. In sed, ! is negation so /Lowest/! is true if the line does not containLowest. Inba, thebstands for the branch command anda` is the label to branch to.
s/\n/ /g
This tells sed to replace all newlines with spaces.

Try this using awk :
awk '{if ($1 !~ /^Lowest/) {printf "%s ", $0} else {print}}' file.txt
Or shorter but more obfuscated :
awk '$1!~/^Lowest/{printf"%s ",$0;next}1' file.txt
Or correcting your command :
tr "\n" " " < file.txt | sed 's/Lowest:[0-9]\+/&\n/g'
Explanation: & is whats matched in the left part of substitution

Another possible GNU sed that doesn't assume Lowest is the last item:
sed ':a; N; /\nStudent/{P; D}; s/\n/ /; ba' test.txt

This might work for you (GNU sed):
sed '/^Studentename:/{:a;x;s/\n/ /gp;d};H;$ba;d' file
Use the hold space to gather up the fields and then remove the newlines to produce a record.

Extract substring of string if position is known

first, I need to extract the substring by a known position in the file.txt
file.txt in bash, but starting from the second line
>header
cgatgcgctctgtgcgtgcgtgcg
so let's assume I want position 10 from the second line, the output should be:
c
second, I want to include the surrounding ±5 characters, resulting in
gcgctctgtgc

{ read -r; read -r; echo "${REPLY:9:1}"; echo "${REPLY:4:11}"; } < file.txt
Output:
c
gcgctctgtgc
The ${parameter:offset:length} syntax for substrings is explained in https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion.
The read command is explained in https://www.gnu.org/software/bash/manual/bashref.html#index-read.
Input redirection: https://www.gnu.org/software/bash/manual/bashref.html#Redirections.

With awk:
To get the character at position 10, 1-indexed:
awk 'NR==2 {print substr($0, 10, 1)}'
NR==2 is checking if the record is second, if so the statements inside {} would be executed
substr($0, 10, 1) will extract 1 character starting from position 10 from field $0 (the whole record) i.e. only the 10-th character will be extracted. The format for substr() is substr(field, offset, length).
Similarly, to get ±5 characters around 10-th:
awk 'NR==2 {print substr($0, (10-5), 11)}'
(10-5) instead of 5 is just to give you the idea of the stuffs.
Example:
% cat file.txt
>header
cgatgcgctctgtgcgtgcgtgcg
% awk 'NR==2 {print substr($0, 10, 1)}' file.txt
c
% awk 'NR==2 {print substr($0, (10-5), 11)}' file.txt
gcgctctgtgc

use sed and cut:
sed -n '2p' file|cut -c 5-15
sed for access 2nd line and cut for print desired characters

Delete the first character of certan line in file in shell script

Here I want to delete the first character of file of certain lines. For example:
>cat file1.txt
10081551
10081599
10082234
10082259
20081134
20081159
30082232
10087721
From 3rd line to 7th line delete the first character sed command or any else and output will be:
>cat file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

sed -i '3,7s/.//' file1.txt
sed -i.bak '3,7s/.//' file1.txt # to keep backup
From 3rd to 7th line, replace the first character with nothing.

This is simple in either sed:
sed -i '3,7 s/^.//'
or Perl:
perl -i -pe 's/^.// if $. >= 3 && $. <= 7'

The sed program can do this with:
pax$ sed '3,7s/.//' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
substituting the first character on the line that matches . (which is the first character on the line).
I'll also provide an awk solution. It's a little more complex but it's worth learning since it allows for much more complex operations than sed.
pax$ awk 'NR>=3&&NR<=7{sub("^.","",$0)}{print}' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

For your 2nd question:
if the ending quote is on the last line of the file:
sed '$i\
/home/neeraj/yocto/poky/meta-ti \\
' text
to match the end of the continued lines (this one feels fragile)
sed '
/BBLAYERS.*"/ {
:a
/\\$/ {N; ba}
s#"$#/home/neeraj/yocto/poky/meta-ti \\\n"#
}
' text

Another variation of the awk
awk 'NR~/^[3-7]$/{sub(".","")}1' file
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

Extract text between two strings repeatedly using sed or awk? [duplicate]

This question already has answers here:
How to use sed/grep to extract text between two words?
(14 answers)
Closed 4 years ago.
I have a file called 'plainlinks' that looks like this:
13080. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94092-2012.gz
13081. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94094-2012.gz
13082. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94096-2012.gz
13083. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94097-2012.gz
13084. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94098-2012.gz
13085. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94644-2012.gz
13086. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94645-2012.gz
13087. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94995-2012.gz
13088. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94996-2012.gz
13089. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-96404-2012.gz
I need to produce output that looks like this:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404

Using sed:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks
Output:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
To save the changes to the file use the -i option:
sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks
Or to save to a new file then redirect:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt
Explanation:
s/ # subsitution
.* # match anything
\/ # upto the last forward-slash (escaped to not confused a sed)
(.*) # anything after the last forward-slash (captured in brackets)
- # upto a hypen
.* # anything else left on line
/ # end match; start replace
\1 # the value captured in the first (only) set of brackets
/ # end

Just for fun.
awk -F\/ '{print substr($7,0,12)}' plainlinks
or with grep
grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks

Assuming the format stays consistent as you have described, you can do it with awk:
awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file
Output:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
Explanation:
awk reads your input file one line at a time, breaking each line into "fields"
'BEGIN{FS="[/-]"; OFS="-"} specifies that delimiter used on the input lines should be either / or -, it also specifies that the output should be delimited by -
{print $7, $8}' tells awk to print the 7th and 8th field of each line, in this case 999999 and 9xxxx
plainlinks is the where the name of the input file would go
> output_file redirects output to a file named output_file

Just with the shell's parameter expansion:
while IFS= read -r line; do
tmp=${line##*noaa/}
echo ${tmp%-????.gz}
done < plainlinks

If the format stays the same, no need for sed or awk:
cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replacing newline character [duplicate] - linux

give this gawk one-liner a try: awk -v RS="" 'gsub(/ \n/," ")+7' file tested here with your example: kent$ echo "<message>hi I am here </message>"|awk -v RS="" 'gsub(/ \n/," ")+7' <message>hi I am here </message>

Related

search for a string and after getting result cut that word and store result in variable

Split or join lines in Linux using sed

Extract substring of string if position is known

Delete the first character of certan line in file in shell script

Extract text between two strings repeatedly using sed or awk? [duplicate]

Categories

Resources