Strings extraction from text file with sed command

Strings extraction from text file with sed command - string

I have a text file which contains some lines as the following:
ASDASD2W 3ASGDD12 SDADFDFDDFDD W11 ACC=PNO23 DFSAEFEA EAEDEWRESAD ASSDRE
AERREEW2 3122312 SDADDSADADAD W12 ACC=HH34 23SAEFEA EAEDEWRESAD ASEEWEE
A15ECCCW 3XCXXF12 SDSGTRERRECC W43 ACC=P11 XXFSAEFEA EAEDEWRESAD ASWWWW
ASDASD2W 3122312 SDAFFFDEEEEE SD3 ACC=PNI22 ABCEFEA EAEDEWRESAD ASWEDSSAD
...
I have to extract the substring between the '=' character and the following blank space for each line , i.e.
PNO23
HH34
P11
PNI22
I've been using the sed command but cannot figure out how to ignore all characters following the blank space.
Any help?

Use the right tool for the job.
$ awk -F '[= ]+' '{ print $6 }' input.txt
PNO23
HH34
P11
PNI22

Sorry, but have to add another one because I feel the existing answers are just to complicated
sed 's/.*=//; s/ .*//;' inputfile

This might work for you:
sed -n 's/.*=\([^ ]*\).*/\1/p' file
or, if you prefer:
sed 's/.*=\([^ ]*\).*/\1/p;d' file

Put the string you want to capture in a backreference:
sed 's/.*=\([^ =]*\) .*/\1/'
or do the substitution piecemeal;
sed -e 's/.*=//' -e 's/ .*//'

sed 's/[^=]*=\([^ ]*\) .*/\1/' inputfile
Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.

A chain of grep can do the trick.
grep -o '[=][a-zA-Z0-9]*' file | grep -o '[a-zA-Z0-9]*'

Related

Split or join lines in Linux using sed

I have file that contains below information
$ cat test.txt
Studentename:Ram
rollno:12
subjects:6
Highest:95
Lowest:65
Studentename:Krish
rollno:13
subjects:6
Highest:90
Lowest:45
Studentename:Sam
rollno:14
subjects:6
Highest:75
Lowest:65
I am trying place info of single student in single.
i.e My output should be
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65.
Below is the command I wrote
cat test.txt | tr "\n" " " | sed 's/Lowest:[0-9]\+/Lowest:[0:9]\n/g'
Above command is breaking line at regex Lowest:[0-9] but it doesn't print the pattern. Instead it is printing Lowest:[0-9].
Please help

Try:
$ sed '/^Studente/{:a; N; /Lowest/!ba; s/\n/ /g}' test.txt
Studentename:Ram rollno:12 subjects:6 Highest:95 Lowest:65
Studentename:Krish rollno:13 subjects:6 Highest:90 Lowest:45
Studentename:Sam rollno:14 subjects:6 Highest:75 Lowest:65
How it works
/^Studente/{...} tells sed to perform the commands inside the curly braces only on lines that start with Studente. Those commands are:
:a
This defines a label a.
N
This reads in the next line and appends it to the pattern space.
/Lowest/!ba
If the current pattern space does not contain Lowest, this tells sed to branch back to label a.
In more detail, /Lowest/ is true if the line contains Lowest. In sed, ! is negation so /Lowest/! is true if the line does not containLowest. Inba, thebstands for the branch command anda` is the label to branch to.
s/\n/ /g
This tells sed to replace all newlines with spaces.

Try this using awk :
awk '{if ($1 !~ /^Lowest/) {printf "%s ", $0} else {print}}' file.txt
Or shorter but more obfuscated :
awk '$1!~/^Lowest/{printf"%s ",$0;next}1' file.txt
Or correcting your command :
tr "\n" " " < file.txt | sed 's/Lowest:[0-9]\+/&\n/g'
Explanation: & is whats matched in the left part of substitution

Another possible GNU sed that doesn't assume Lowest is the last item:
sed ':a; N; /\nStudent/{P; D}; s/\n/ /; ba' test.txt

This might work for you (GNU sed):
sed '/^Studentename:/{:a;x;s/\n/ /gp;d};H;$ba;d' file
Use the hold space to gather up the fields and then remove the newlines to produce a record.

Add comma after each word

I have a variable (called $document_keywords) with following text in it:
Latex document starter CrypoServer
I want to add comma after each word, not after last word. So, output will become like this:
Latex, document, starter, CrypoServer
Anybody help me to achieve above output.
regards,
Ankit

In order to preserve whitespaces as they are given, I would use sed like this:
echo "$document_keywords" | sed 's/\>/,/g;s/,$//'
This works as follows:
s/\>/,/g # replace all ending word boundaries with a comma -- that is,
# append a comma to every word
s/,$// # then remove the last, unwanted one at the end.
Then:
$ echo 'Latex document starter CrypoServer' | sed 's/\>/,/g;s/,$//'
Latex, document, starter, CrypoServer
$ echo 'Latex document starter CrypoServer' | sed 's/\>/,/g;s/,$//'
Latex, document, starter, CrypoServer

A normal sed gave me the expected output,
sed 's/ /, /g' filename

You can use awk for this purpose. Loop using for and add a , after any char except on the last occurance (when i == NF).
$ echo $document_keywords | awk '{for(i=1;i<NF;i++)if(i!=NF){$i=$i","} }1'

Using BASH string substitution:
document_keywords='Latex document starter CrypoServer'
echo "${document_keywords//[[:blank:]]/,}"
Latex,document,starter,CrypoServer
Or sed:
echo "$document_keywords" | sed 's/[[:blank:]]/,/g'
Latex,document,starter,CrypoServer
Or tr:
echo "$document_keywords" | tr '[[:blank:]]/' ','
Latex,document,starter,CrypoServer

Delete the first character of certan line in file in shell script

Here I want to delete the first character of file of certain lines. For example:
>cat file1.txt
10081551
10081599
10082234
10082259
20081134
20081159
30082232
10087721
From 3rd line to 7th line delete the first character sed command or any else and output will be:
>cat file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

sed -i '3,7s/.//' file1.txt
sed -i.bak '3,7s/.//' file1.txt # to keep backup
From 3rd to 7th line, replace the first character with nothing.

This is simple in either sed:
sed -i '3,7 s/^.//'
or Perl:
perl -i -pe 's/^.// if $. >= 3 && $. <= 7'

The sed program can do this with:
pax$ sed '3,7s/.//' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721
substituting the first character on the line that matches . (which is the first character on the line).
I'll also provide an awk solution. It's a little more complex but it's worth learning since it allows for much more complex operations than sed.
pax$ awk 'NR>=3&&NR<=7{sub("^.","",$0)}{print}' file1.txt
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

For your 2nd question:
if the ending quote is on the last line of the file:
sed '$i\
/home/neeraj/yocto/poky/meta-ti \\
' text
to match the end of the continued lines (this one feels fragile)
sed '
/BBLAYERS.*"/ {
:a
/\\$/ {N; ba}
s#"$#/home/neeraj/yocto/poky/meta-ti \\\n"#
}
' text

Another variation of the awk
awk 'NR~/^[3-7]$/{sub(".","")}1' file
10081551
10081599
0082234
0082259
0081134
0081159
0082232
10087721

bash script to strip out some characters

Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed
"2012-05-04T10:16:04Z"
"2012-04-05T15:27:40Z"
"2012-03-05T14:58:27Z"
"2011-11-29T15:04:09Z"
"2011-11-16T12:12:00Z"
Thanks

A simple awk command to do this:
awk -F '["T]' '{print $2}' file
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16

Through sed,
sed 's/"\|T.*//g' file
"matches double quotes \| or T.* starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output.
Example:
$ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g'
2012-05-04

With bash builtins:
while IFS='T"' read -r a a b; do echo "$a"; done < filename
Output:
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16

unix sed command matching a word

I am trying to match a line and use sed command to substitute it. Some thing like
aaa = 10
aaa =10
aaa=10
My sed regular expression should match all those patterns and should replace with something like bbb=5. I tried with
sed -i '/ *aaa *= */bbb=5'
But this is not properly working for all the patterns. Any help will be really appreciable.

sed -i 's/\s*aaa\s*=\s*[0-9]*/bbb=5/' input_file

cat a | sed -e '1s/aaa =10/bbb=10/' -e '2s/ aaa =10/bbb=10/' -e '3s/aaa=10/bbb=10/'

cat myfile | sed 's/\s*aaa\s*=\s*\(.*\)/bbb = \1/'
The \s character class matches both tab and space

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Strings extraction from text file with sed command - string

Use the right tool for the job. $ awk -F '[= ]+' '{ print $6 }' input.txt PNO23 HH34 P11 PNI22

Sorry, but have to add another one because I feel the existing answers are just to complicated sed 's/.=//; s/ .//;' inputfile

This might work for you: sed -n 's/.=\([^ ]\)./\1/p' file or, if you prefer: sed 's/.=\([^ ]\)./\1/p;d' file

Put the string you want to capture in a backreference: sed 's/.=\([^ =]\) ./\1/' or do the substitution piecemeal; sed -e 's/.=//' -e 's/ .*//'

sed 's/[^=]=\([^ ]\) .*/\1/' inputfile Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.

A chain of grep can do the trick. grep -o '[=][a-zA-Z0-9]' file | grep -o '[a-zA-Z0-9]'

Related

Split or join lines in Linux using sed

Add comma after each word

Delete the first character of certan line in file in shell script

bash script to strip out some characters

unix sed command matching a word

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Strings extraction from text file with sed command - string

Use the right tool for the job. $ awk -F '[= ]+' '{ print $6 }' input.txt PNO23 HH34 P11 PNI22

Sorry, but have to add another one because I feel the existing answers are just to complicated sed 's/.*=//; s/ .*//;' inputfile

This might work for you: sed -n 's/.*=\([^ ]*\).*/\1/p' file or, if you prefer: sed 's/.*=\([^ ]*\).*/\1/p;d' file

Put the string you want to capture in a backreference: sed 's/.*=\([^ =]*\) .*/\1/' or do the substitution piecemeal; sed -e 's/.*=//' -e 's/ .*//'

sed 's/[^=]*=\([^ ]*\) .*/\1/' inputfile Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.

A chain of grep can do the trick. grep -o '[=][a-zA-Z0-9]*' file | grep -o '[a-zA-Z0-9]*'

Related

Split or join lines in Linux using sed

Add comma after each word

Delete the first character of certan line in file in shell script

bash script to strip out some characters

unix sed command matching a word

Categories

Resources

Sorry, but have to add another one because I feel the existing answers are just to complicated sed 's/.=//; s/ .//;' inputfile

This might work for you: sed -n 's/.=\([^ ]\)./\1/p' file or, if you prefer: sed 's/.=\([^ ]\)./\1/p;d' file

Put the string you want to capture in a backreference: sed 's/.=\([^ =]\) ./\1/' or do the substitution piecemeal; sed -e 's/.=//' -e 's/ .*//'

sed 's/[^=]=\([^ ]\) .*/\1/' inputfile Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.

A chain of grep can do the trick. grep -o '[=][a-zA-Z0-9]' file | grep -o '[a-zA-Z0-9]'