Split or join lines in Linux using sed - linux

I have file that contains below information
$ cat test.txt
Studentename:Ram
rollno:12
subjects:6
Highest:95
Lowest:65
Studentename:Krish
rollno:13
subjects:6
Highest:90
Lowest:45
Studentename:Sam
rollno:14
subjects:6
Highest:75
Lowest:65
I am trying place info of single student in single.
i.e My output should be
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65.
Below is the command I wrote
cat test.txt | tr "\n" " " | sed 's/Lowest:[0-9]\+/Lowest:[0:9]\n/g'
Above command is breaking line at regex Lowest:[0-9] but it doesn't print the pattern. Instead it is printing Lowest:[0-9].
Please help

Try:
$ sed '/^Studente/{:a; N; /Lowest/!ba; s/\n/ /g}' test.txt
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65
How it works
/^Studente/{...} tells sed to perform the commands inside the curly braces only on lines that start with Studente. Those commands are:
:a
This defines a label a.
N
This reads in the next line and appends it to the pattern space.
/Lowest/!ba
If the current pattern space does not contain Lowest, this tells sed to branch back to label a.
In more detail, /Lowest/ is true if the line contains Lowest. In sed, ! is negation so /Lowest/! is true if the line does not containLowest. Inba, thebstands for the branch command anda` is the label to branch to.
s/\n/ /g
This tells sed to replace all newlines with spaces.

Try this using awk :
awk '{if ($1 !~ /^Lowest/) {printf "%s ", $0} else {print}}' file.txt
Or shorter but more obfuscated :
awk '$1!~/^Lowest/{printf"%s ",$0;next}1' file.txt
Or correcting your command :
tr "\n" " " < file.txt | sed 's/Lowest:[0-9]\+/&\n/g'
Explanation: & is whats matched in the left part of substitution

Another possible GNU sed that doesn't assume Lowest is the last item:
sed ':a; N; /\nStudent/{P; D}; s/\n/ /; ba' test.txt

This might work for you (GNU sed):
sed '/^Studentename:/{:a;x;s/\n/ /gp;d};H;$ba;d' file
Use the hold space to gather up the fields and then remove the newlines to produce a record.

Related

How to convert to title case a specific column

I have come up with this code:
cut -d';' -f4 columns.csv | sed 's/.*/\L&/; s/[a-z]*/\u&/g'
which actually does the job for the fourth column, but in the way I have lost the other columns..
I have unsuccessfully tried:
cut -d';' -f4 columns.csv | sed -i 's/.*/\L&/; s/[a-z]*/\u&/g'
So, how could I apply the change to that specific column in the file and keep other columns as they are?
Let say that columns.csv content is:
TEXT;more text;SoMe MoRe TeXt;THE FOURTH COLUMN;something else
Then, expected output should be:
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
GNU sed:
sed -ri 's/;/&\r/3;:1;s/\r([^; ]+\s*)/\L\u\1\r/;t1;s/\r//' columns.csv
update:
sed -i 's/; */&\n/3;:1;s/\n\([^; ]\+ *\)/\L\u\1\n/;t1;s/\n//' columns.csv
Place anchor \r (\n) at the beginning of field 4. We edit the whole word and move the anchor to the beginning of the next one. Jump by label t1 :1 is carried out as long as there are matches for the pattern in the substitution command. Removing the anchor.
Not a short simple awk, but should work:
awk -F";" '{t=split($4,a," ");$4="";for(i=1;i<=t;i++) {a[i]=substr(a[i],1,1) tolower(substr(a[i],2));$4=$4 sprintf("%s ",a[i])}$4=substr($4,1,length($4)-1)}1' OFS=";" file
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
Some shorter version
awk -F";" '{t=split($4,a," ");$4="";for(i=1;i<=t;i++) {a[i]=substr(a[i],1,1) tolower(substr(a[i],2));$4=$4 a[i](t==i?"":" ")}}1' OFS=";" file
With perl:
$ perl -F';' -lane '$F[3] =~ s/[a-z]+/\L\u$&/gi; print join ";", #F' columns.csv
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
-F';' use ; to split the input line
$F[3] =~ s/[a-z]+/\L\u$&/gi change case only for the 4th column
print join ";", #F print the modified fields
Unicode version:
perl -Mopen=locale -Mutf8 -F';' -lane '$F[3]=~s/\p{L}+/\L\u$&/gi;
print join ";", #F'
Using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { FS=OFS=";" }
{
title = ""
numWords = split($4,words,/ /)
for (wordNr=1; wordNr<=numWords; wordNr++) {
word = words[wordNr]
word = toupper(substr(word,1,1)) tolower(substr(word,2))
title = (wordNr>1 ? title " " : "") word
}
$4 = title
print
}
$ awk -f tst.awk file
TEXT;more text;SoMe MoRe TeXt;The Fourth Column;something else
True capitalization in a title is much more complicated than that though.
This might work for you (GNU sed):
sed -E 's/[^;]*/\n&\n/4;h;s/\S*/\L\u&/g;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/' file
Delimit the fourth field by newlines and make a copy.
Uppercase the first character of each word.
Append the amended line to the original.
Using pattern matching, replace the original fourth field by the amended one.

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1
grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'
If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

Delete the first character of certan line in file in shell script

Here I want to delete the first character of file of certain lines. For example:
>cat file1.txt
10081551
10081599
10082234
10082259
20081134
20081159
30082232
10087721
From 3rd line to 7th line delete the first character sed command or any else and output will be:
>cat file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
sed -i '3,7s/.//' file1.txt
sed -i.bak '3,7s/.//' file1.txt # to keep backup
From 3rd to 7th line, replace the first character with nothing.
This is simple in either sed:
sed -i '3,7 s/^.//'
or Perl:
perl -i -pe 's/^.// if $. >= 3 && $. <= 7'
The sed program can do this with:
pax$ sed '3,7s/.//' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
substituting the first character on the line that matches . (which is the first character on the line).
I'll also provide an awk solution. It's a little more complex but it's worth learning since it allows for much more complex operations than sed.
pax$ awk 'NR>=3&&NR<=7{sub("^.","",$0)}{print}' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
For your 2nd question:
if the ending quote is on the last line of the file:
sed '$i\
/home/neeraj/yocto/poky/meta-ti \\
' text
to match the end of the continued lines (this one feels fragile)
sed '
/BBLAYERS.*"/ {
:a
/\\$/ {N; ba}
s#"$#/home/neeraj/yocto/poky/meta-ti \\\n"#
}
' text
Another variation of the awk
awk 'NR~/^[3-7]$/{sub(".","")}1' file
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

Replacing newline character [duplicate]

This question already has answers here:
How can I replace each newline (\n) with a space using sed?
(43 answers)
Closed 8 years ago.
I have an XML file which has occasional lines that are split into 2: the first line ending with 
. I want to concatenate any such lines and remove the 
, perhaps replacing it with a space.
e.g.
<message>hi I am
here </message>
needs to become
<message>hi I am here </message>
I've tried:
sed -i 's/
\/n/ /g' filename
with no luck.
Any help is much appreciated!
Here is a GNU sed version:
sed ':a;$bc;N;ba;:c;s/
\n/ /g' file
Explanation:
sed '
:a # Create a label a
$bc # If end of file then branch to label c
N # Append the next line to pattern space
ba # branch back to label a to repeat until end of file
:c # Another label c
s/
\n/ /g # When end of file is reached perform this substitution
' file
give this gawk one-liner a try:
awk -v RS="" 'gsub(/
\n/," ")+7' file
tested here with your example:
kent$ echo "<message>hi I am
here </message>"|awk -v RS="" 'gsub(/
\n/," ")+7'
<message>hi I am here </message>
You can use this awk:
awk -F"
" '/
$/ {a=$1; next} a{print a, $0; a=""; next} 1' file
Explanation
-F"
" set 
 as delimiter, so that the first field will be always the desired part of the string.
/
$/ {a=$1; next} if the line ends with 
, store it in a and jump to the next line.
a{print a, $0; a=""; next} if a is set, print it together with current line. Then unset a for future loops. Finally jump to next line.
1 as true, prints current line.
Sample
$ cat a
yeah
<message>hi I am
here </message>
hello
bye
$ awk -F"
" '/
$/ {a=$1; next} a{print a, $0; a=""; next} 1' a
yeah
<message>hi I am here </message>
hello
bye
This will work for you:
sed -i '{:q;N;s/&.*\n/ /g;t q}' <filename>
However replacing newline with sed is always a bash(read bad) idea. Chances of making an error are high.
So another but simpler solution:
tr -s '\&\#13\;\n' ' ' < <filename>
tr is replacing all chracter in match with space, so without -s it would have printed
<message>hi I am here </message>
-s from man page:
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character.

Strings extraction from text file with sed command

I have a text file which contains some lines as the following:
ASDASD2W 3ASGDD12 SDADFDFDDFDD W11 ACC=PNO23 DFSAEFEA EAEDEWRESAD ASSDRE
AERREEW2 3122312 SDADDSADADAD W12 ACC=HH34 23SAEFEA EAEDEWRESAD ASEEWEE
A15ECCCW 3XCXXF12 SDSGTRERRECC W43 ACC=P11 XXFSAEFEA EAEDEWRESAD ASWWWW
ASDASD2W 3122312 SDAFFFDEEEEE SD3 ACC=PNI22 ABCEFEA EAEDEWRESAD ASWEDSSAD
...
I have to extract the substring between the '=' character and the following blank space for each line , i.e.
PNO23
HH34
P11
PNI22
I've been using the sed command but cannot figure out how to ignore all characters following the blank space.
Any help?
Use the right tool for the job.
$ awk -F '[= ]+' '{ print $6 }' input.txt
PNO23
HH34
P11
PNI22
Sorry, but have to add another one because I feel the existing answers are just to complicated
sed 's/.*=//; s/ .*//;' inputfile
This might work for you:
sed -n 's/.*=\([^ ]*\).*/\1/p' file
or, if you prefer:
sed 's/.*=\([^ ]*\).*/\1/p;d' file
Put the string you want to capture in a backreference:
sed 's/.*=\([^ =]*\) .*/\1/'
or do the substitution piecemeal;
sed -e 's/.*=//' -e 's/ .*//'
sed 's/[^=]*=\([^ ]*\) .*/\1/' inputfile
Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.
A chain of grep can do the trick.
grep -o '[=][a-zA-Z0-9]*' file | grep -o '[a-zA-Z0-9]*'

Resources