Replacing newline character [duplicate] - linux

This question already has answers here:
How can I replace each newline (\n) with a space using sed?
(43 answers)
Closed 8 years ago.
I have an XML file which has occasional lines that are split into 2: the first line ending with 
. I want to concatenate any such lines and remove the 
, perhaps replacing it with a space.
e.g.
<message>hi I am
here </message>
needs to become
<message>hi I am here </message>
I've tried:
sed -i 's/
\/n/ /g' filename
with no luck.
Any help is much appreciated!

Here is a GNU sed version:
sed ':a;$bc;N;ba;:c;s/
\n/ /g' file
Explanation:
sed '
:a # Create a label a
$bc # If end of file then branch to label c
N # Append the next line to pattern space
ba # branch back to label a to repeat until end of file
:c # Another label c
s/
\n/ /g # When end of file is reached perform this substitution
' file

give this gawk one-liner a try:
awk -v RS="" 'gsub(/
\n/," ")+7' file
tested here with your example:
kent$ echo "<message>hi I am
here </message>"|awk -v RS="" 'gsub(/
\n/," ")+7'
<message>hi I am here </message>

You can use this awk:
awk -F"
" '/
$/ {a=$1; next} a{print a, $0; a=""; next} 1' file
Explanation
-F"
" set 
 as delimiter, so that the first field will be always the desired part of the string.
/
$/ {a=$1; next} if the line ends with 
, store it in a and jump to the next line.
a{print a, $0; a=""; next} if a is set, print it together with current line. Then unset a for future loops. Finally jump to next line.
1 as true, prints current line.
Sample
$ cat a
yeah
<message>hi I am
here </message>
hello
bye
$ awk -F"
" '/
$/ {a=$1; next} a{print a, $0; a=""; next} 1' a
yeah
<message>hi I am here </message>
hello
bye

This will work for you:
sed -i '{:q;N;s/&.*\n/ /g;t q}' <filename>
However replacing newline with sed is always a bash(read bad) idea. Chances of making an error are high.
So another but simpler solution:
tr -s '\&\#13\;\n' ' ' < <filename>
tr is replacing all chracter in match with space, so without -s it would have printed
<message>hi I am here </message>
-s from man page:
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character.

Related

search for a string and after getting result cut that word and store result in variable

I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp
REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )
Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"
Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)

Split or join lines in Linux using sed

I have file that contains below information
$ cat test.txt
Studentename:Ram
rollno:12
subjects:6
Highest:95
Lowest:65
Studentename:Krish
rollno:13
subjects:6
Highest:90
Lowest:45
Studentename:Sam
rollno:14
subjects:6
Highest:75
Lowest:65
I am trying place info of single student in single.
i.e My output should be
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65.
Below is the command I wrote
cat test.txt | tr "\n" " " | sed 's/Lowest:[0-9]\+/Lowest:[0:9]\n/g'
Above command is breaking line at regex Lowest:[0-9] but it doesn't print the pattern. Instead it is printing Lowest:[0-9].
Please help
Try:
$ sed '/^Studente/{:a; N; /Lowest/!ba; s/\n/ /g}' test.txt
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65
How it works
/^Studente/{...} tells sed to perform the commands inside the curly braces only on lines that start with Studente. Those commands are:
:a
This defines a label a.
N
This reads in the next line and appends it to the pattern space.
/Lowest/!ba
If the current pattern space does not contain Lowest, this tells sed to branch back to label a.
In more detail, /Lowest/ is true if the line contains Lowest. In sed, ! is negation so /Lowest/! is true if the line does not containLowest. Inba, thebstands for the branch command anda` is the label to branch to.
s/\n/ /g
This tells sed to replace all newlines with spaces.
Try this using awk :
awk '{if ($1 !~ /^Lowest/) {printf "%s ", $0} else {print}}' file.txt
Or shorter but more obfuscated :
awk '$1!~/^Lowest/{printf"%s ",$0;next}1' file.txt
Or correcting your command :
tr "\n" " " < file.txt | sed 's/Lowest:[0-9]\+/&\n/g'
Explanation: & is whats matched in the left part of substitution
Another possible GNU sed that doesn't assume Lowest is the last item:
sed ':a; N; /\nStudent/{P; D}; s/\n/ /; ba' test.txt
This might work for you (GNU sed):
sed '/^Studentename:/{:a;x;s/\n/ /gp;d};H;$ba;d' file
Use the hold space to gather up the fields and then remove the newlines to produce a record.

Extract substring of string if position is known

first, I need to extract the substring by a known position in the file.txt
file.txt in bash, but starting from the second line
>header
cgatgcgctctgtgcgtgcgtgcg
so let's assume I want position 10 from the second line, the output should be:
c
second, I want to include the surrounding ±5 characters, resulting in
gcgctctgtgc
{ read -r; read -r; echo "${REPLY:9:1}"; echo "${REPLY:4:11}"; } < file.txt
Output:
c
gcgctctgtgc
The ${parameter:offset:length} syntax for substrings is explained in https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion.
The read command is explained in https://www.gnu.org/software/bash/manual/bashref.html#index-read.
Input redirection: https://www.gnu.org/software/bash/manual/bashref.html#Redirections.
With awk:
To get the character at position 10, 1-indexed:
awk 'NR==2 {print substr($0, 10, 1)}'
NR==2 is checking if the record is second, if so the statements inside {} would be executed
substr($0, 10, 1) will extract 1 character starting from position 10 from field $0 (the whole record) i.e. only the 10-th character will be extracted. The format for substr() is substr(field, offset, length).
Similarly, to get ±5 characters around 10-th:
awk 'NR==2 {print substr($0, (10-5), 11)}'
(10-5) instead of 5 is just to give you the idea of the stuffs.
Example:
% cat file.txt
>header
cgatgcgctctgtgcgtgcgtgcg
% awk 'NR==2 {print substr($0, 10, 1)}' file.txt
c
% awk 'NR==2 {print substr($0, (10-5), 11)}' file.txt
gcgctctgtgc
use sed and cut:
sed -n '2p' file|cut -c 5-15
sed for access 2nd line and cut for print desired characters

Delete the first character of certan line in file in shell script

Here I want to delete the first character of file of certain lines. For example:
>cat file1.txt
10081551
10081599
10082234
10082259
20081134
20081159
30082232
10087721
From 3rd line to 7th line delete the first character sed command or any else and output will be:
>cat file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
sed -i '3,7s/.//' file1.txt
sed -i.bak '3,7s/.//' file1.txt # to keep backup
From 3rd to 7th line, replace the first character with nothing.
This is simple in either sed:
sed -i '3,7 s/^.//'
or Perl:
perl -i -pe 's/^.// if $. >= 3 && $. <= 7'
The sed program can do this with:
pax$ sed '3,7s/.//' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
substituting the first character on the line that matches . (which is the first character on the line).
I'll also provide an awk solution. It's a little more complex but it's worth learning since it allows for much more complex operations than sed.
pax$ awk 'NR>=3&&NR<=7{sub("^.","",$0)}{print}' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
For your 2nd question:
if the ending quote is on the last line of the file:
sed '$i\
/home/neeraj/yocto/poky/meta-ti \\
' text
to match the end of the continued lines (this one feels fragile)
sed '
/BBLAYERS.*"/ {
:a
/\\$/ {N; ba}
s#"$#/home/neeraj/yocto/poky/meta-ti \\\n"#
}
' text
Another variation of the awk
awk 'NR~/^[3-7]$/{sub(".","")}1' file
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

Extract text between two strings repeatedly using sed or awk? [duplicate]

This question already has answers here:
How to use sed/grep to extract text between two words?
(14 answers)
Closed 4 years ago.
I have a file called 'plainlinks' that looks like this:
13080. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94092-2012.gz
13081. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94094-2012.gz
13082. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94096-2012.gz
13083. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94097-2012.gz
13084. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94098-2012.gz
13085. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94644-2012.gz
13086. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94645-2012.gz
13087. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94995-2012.gz
13088. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94996-2012.gz
13089. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-96404-2012.gz
I need to produce output that looks like this:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
Using sed:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks
Output:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
To save the changes to the file use the -i option:
sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks
Or to save to a new file then redirect:
sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt
Explanation:
s/ # subsitution
.* # match anything
\/ # upto the last forward-slash (escaped to not confused a sed)
(.*) # anything after the last forward-slash (captured in brackets)
- # upto a hypen
.* # anything else left on line
/ # end match; start replace
\1 # the value captured in the first (only) set of brackets
/ # end
Just for fun.
awk -F\/ '{print substr($7,0,12)}' plainlinks
or with grep
grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks
Assuming the format stays consistent as you have described, you can do it with awk:
awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file
Output:
999999-94092
999999-94094
999999-94096
999999-94097
999999-94098
999999-94644
999999-94645
999999-94995
999999-94996
999999-96404
Explanation:
awk reads your input file one line at a time, breaking each line into "fields"
'BEGIN{FS="[/-]"; OFS="-"} specifies that delimiter used on the input lines should be either / or -, it also specifies that the output should be delimited by -
{print $7, $8}' tells awk to print the 7th and 8th field of each line, in this case 999999 and 9xxxx
plainlinks is the where the name of the input file would go
> output_file redirects output to a file named output_file
Just with the shell's parameter expansion:
while IFS= read -r line; do
tmp=${line##*noaa/}
echo ${tmp%-????.gz}
done < plainlinks
If the format stays the same, no need for sed or awk:
cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2

Resources