How to extract email headers extending on multiple lines from file

How to extract email headers extending on multiple lines from file - linux

I am trying to extract the To header from an email file using sed on linux.
The problem is that the To header could be on multiple lines.
e.g:
To: name1#mydomain.org, name2#mydomain.org,
name3#mydomain.org, name4#mydomain.org,
name5#mydomain.org
Message-ID: <46608700.369886.1549009227948#domain.org>
I tried the following:
sed -n -e '/^[Tt]o: / { N; p; }' _message_file_ |
awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
The sed command extracts the line starting with To and next line.
I pipe the output to awk to put everything on a single line.
The full command outputs in one line:
To: name1#mydomain.org, name2#mydomain.org, name3#mydomain.org, name4#mydomain.org
I don't know how to keep going and test if the next line starts with whitespace and add it to the result.
What I want is all the addresses
To: name1#mydomain.org, name2#mydomain.org, name3#mydomain.org, name4#mydomain.org, name5#mydomain.org
Any help will be appreciated.

formail is a good solution but here's how to do it with sed:
sed -e '/^$/q;/^To:/!d;n;:c;/^\s/!d;n;bc' message_file
/^$/q; - (optional) quit if we run out of headers
/^To:/!d; - if not a To: header, stop processing this line
n; - otherwise, implicitly print it, and load next line
:c; - c is a label we can branch to
/^\s/!d; - if not a contination, stop processing this line
n; - otherwise, implicitly print it, and load next line
bc - branch back to label c (ie. loop)

Both formail and reformail have a -c option to do exactly that.
From man reformail:
-c Concatenate multi-line headers. Headers split on multiple lines
are combined into a single line.
So you don't need to pipe the output to awk, and can just do
reformail -c -X To: < $your_message_file
However, emails normally use CRLF line endings, and the output on screen may be garbled because of the CR characters. To remove them, you can use Perl's generic \R line ending in a regex on the output :
reformail -c -X To: < $your_message_file | perl -pe 's/\R/\n/g'
or do it on the input if you prefer:
perl -pe 's/\R/\n/g' $your_message_file | reformail -c -X To:
On Debian and derived systems like Ubuntu, you can install them with
apt install maildrop for reformail, which is part of Courier's maildrop
or apt install procmail for formail (but procmail seems to be abandoned now).

I did it like this:
cat _message_file | formail -X To: | awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
Or:
formail -X To: < _message_file | awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'

This might work for you (GNU sed):
sed -n '/^To:/{:a;N;/^ /Ms/\s*\n\s*/ /;ta;P}' file
Turn off implicit printing by using the -n option. Gather up the lines starting with white space, removing white space either side of the newline and replace it by a single space, starting from the line that begins To:. When matching fails, print the first line in the pattern space.
To print addresses as is, use:
sed '/^\S/h;G;/^To:/MP;d' file

It could be as straightforward as this:
sed -n '/^To:/{
:a
p
n
/^[[:space:]]/ba
}'
Be silent, but starting from the To: header print the text line by line while it still relevant to the header.

Related

Linux: Append variable to end of line using line number as variable

I am new to shell scripting. I am using ksh.
I have this particular line in my script which I use to append text in a variable q to the end of a particular line given by the variable a
containing the line number .
sed -i ''$a's#$#'"$q"'#' test.txt
Now the variable q can contain a large amount of text, with all sorts of special characters, such as !##$%^&*()_+:"<>.,/;'[]= etc etc, no exceptions. For now, I use a couple of sed commands in my script to remove any ' and " in this text (sed "s/'/ /g" | sed 's/"/ /g'), but still when I execute the above command I get the following error
sed: -e expression #1, char 168: unterminated `s' command
Any sed, awk, perl, suggestions are very much appreciated

The difficulty here is to quote (escape) the substitution separator characters # in the sed command:
sed -i ''$a's#$#'"$q"'#' test.txt
For example, if q contains # it will not work. The # will terminate the replacement pattern prematurely. Example: q='a#b', a=2, and the command expands to
sed -i 2s#$#a#b# test.txt
which will not append a#b to the end of line 2, but rather a#.
This can be solved by escaping the # characters in q:
sed -i 2s#$#a\#b# test.txt
However, this escaping could be cumbersome to do in shell.
Another approach is to use another level of indirection. Here is an example of using a Perl one-liner. First q is passed to the script in quoted form. Then, within the script the variable assigned to a new internal variable $q. Using this approach there is no need to escape the substitution separator characters:
perl -pi -E 'BEGIN {$q = shift; $a = shift} s/$/$q/ if $. == $a' "$q" "$a" test.txt

Do not bother trying to sanitize the string. Just put it in a file, and use sed's r command to read it in:
echo "$q" > tmpfile
sed -i -e ${a}rtmpfile test.txt
Ah, but that creates an extra newline that you don't want. You can remove it with:
sed -e ${a}rtmpfile test.txt | awk 'NR=='$a'{printf $0; next}1' > output

Another approach is to use the patch utility if present in your system.
patch test.txt <<-EOF
${a}c
$(sed "${a}q;d" test.txt)$q
.
EOF
${a}c will be replaced with the line number followed by c which means the operation is a change in line ${a}.
The second line is the replacement of the change. This is the concatenated value of the original text and the added text.
The sole . means execute the commands.

sed: replace block of text between markers INCLUDING the markers themselves

I have the following sed commands that replace a block of text with the contents of a file between the start & end markers /**##+ and **##-* respectively:
sed -i -ne '/**##+/ {p; r block.txt' -e ':a; n; /**##-*/ {p; b}; ba}; p' -e '/**##+/d' test.txt && sed -i -e '/**##+/d' -e '/**##-*/d' test.txt
(Besides replacing text, the command also converts line endings.)
As it is, this leaves the start and end markers intact, but I want to get rid of those as well. My using the p command means that I can't have a d command in the same execution unit. I work around the problem by introducing a second set of commands that delete those markers, but I would like to have it all in one single sed command, if possible.
test.txt
start of file
/**##+
* the start marker is above
*/
this should get replaced
/**##-*/
end marker is above
block.txt
THIS IS THE REPLACEMENT
Results
Running the command should change test.txt like so:
start of file
THIS IS THE REPLACEMENT
end marker is above
I am looking for the shortest, single-line solution in sed.

This might work for you (GNU sed):
sed '/^\/\*\*##+/,/^\/\*\*##-\*/cThis is the replacement' file
This changes the lines between the range to the required string.
To replace a range with contents of a file use:
sed -e '/^\/\*\*##+/!b;:a;N;/^\/\*\*##-\*/M!ba;r replacementFile' -e 'd' file
On encountering the start of the range set up a loop to gather up the range in the pattern space, then read the replacement file into the standard output and delete the contents of the pattern space.

Your start and end tags contain regex meta characters and /. sed only searches an input by regex and you need to escape / and all of those meta-characters in sed.
It is much easier to handle this in awk as awk allows non-regex plain text search also:
awk -v st='/**##+' -v et='/**##-*/' -v repl="$(<block.txt)" '
$0 == st{del=1} $0 == et{$0 = repl; del=0} !del' file
start of file
THIS IS THE REPLACEMENT
end marker is above

Separate a text file with sed

I have the following sample file:
evtlog.161202.002609.debugevtlog.161201.162408.debugevtlog.161202.011046.debugevtlog.161202.002809.debugevtlog.161201.160035.debugevtlog.161201.155140.debugevtlog.161201.232156.debugevtlog.161201.145017.debugevtlog.161201.154816.debug
I want to separate the string and add a newline after matching "debug" like this:
evtlog.161202.002609.debug
evtlog.161201.162408.debug
So far I tried almost everything with sed, but it doesn't seem to do what I want.
sed 's/debug/{G}' latest_evtlogs.out
sed '/debug/i "SAD"' latest_evtlogs.out
etc...
sed 's/debug/\n/g' latest_evtlogs.out doesn't work when I add it as a pipe in the script , but it does when I run it manually.
Here's how I generate the file:
printf $(ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/\n/g') >> latest_evtlogs.out
Initially I wanted to just add newline with awk, but it doesn't work either.
Any ideas why I can't separate the string with a newline ?
I'm using :
Distributor ID: Debian
Description: Debian GNU/Linux 5.0.10 (lenny)
Release: 5.0.10
Codename: lenny

Just add a new line after debug:
sed 's/debug/&\n/g' file
Note & prints back the matched text, so it is a way to print "debug" back.
This returns:
evtlog.161202.002609.debug
evtlog.161201.162408.debug
evtlog.161202.011046.debug
evtlog.161202.002809.debug
evtlog.161201.160035.debug
evtlog.161201.155140.debug
evtlog.161201.232156.debug
evtlog.161201.145017.debug
evtlog.161201.154816.debug

The problem is, that you are using the output of sed in a command expansion. In this context your shell will replace all newlines with spaces. The spaces are then used to do the word splitting, so that printf sees each line as a separate argument, interpreting the first line as the format argument and ignoring the rest as there are printf-placeholders in the format.
It should work if you drop the outer printf $() from your command and just redirect the output from your pipeline to your file:
ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/\n/g' >> latest_evtlogs.out

Maybe Perl is "happier" than sed on your system:
perl -pe 's/debug/&\n/g' < YourLogFile

Get will append what is in the hold buffer unto the pattern space (Usually just the current line read from the input file) So this cannot be used.
insert will print the specified text to standard output. So this cannot be used.
What you you want to to replace all debug with debug^J, where ^J is a newline, dependent on the sed version, you can either do:
sed 's/debug/&\n/g' input_file
But \n is - afaik - not strictly specified in POSIX sed. One can however use c strings:
sed 's/debug/&'$'\n''/g' input_file
Or a multi line string:
sed 's/debug/&\
/g' input_file

Thank you all for the answers.I finally did it like this :
echo $(ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/&\n/g') > temp.out
sed 's/ /\n/g' /share/sqa/dumps/5314577631/checks/temp.out > latest_evtlogs.out
It's not at all elegant, but it finally works.

How to remove line breaks generated by sed

I have a file called sms:
gsm versi jadul
29 sender: +62896666666
date: 15/02/14,03:55:12
reboot router
when I type in:
sed -n '6p' sms > /tmp/result
The /tmp/result always looks like this:
Notice the line break there, I want to get rid of the line break on the second line, so the final result will be like this:
How do I do that?

You could trim it off with tr like this:
sed -n '6p' sms | tr -d '\n' > /tmp/result

You can use awk instead of sed:
awk 'NR==6 {printf $0}' sms > result
NR==6 specifies line number
printf $0 prints that line without any \n

There's nothing wrong with your sed command, your input file contains trailing control-Ms. Remove them with dos2unix or similar before running sed.

A correct implementation of the POSIX sed command does not add such a blank line. 6p should print the sixth line. I cannot reproduce the issue on, for example, Ubuntu 12 Linux. You have some line ending problem or some such issue.

How can I remove the last character of a file in unix?

Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?

A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.

truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.

Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.

After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile

If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.

Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.

EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)

sed 's/.$//' filename | tee newFilename
This should do your job.

A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to extract email headers extending on multiple lines from file - linux

I did it like this: cat _message_file | formail -X To: | awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}' Or: formail -X To: < _message_file | awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'

It could be as straightforward as this: sed -n '/^To:/{ :a p n /^[[:space:]]/ba }' Be silent, but starting from the To: header print the text line by line while it still relevant to the header.

Related

Linux: Append variable to end of line using line number as variable

sed: replace block of text between markers INCLUDING the markers themselves

Separate a text file with sed

How to remove line breaks generated by sed

How can I remove the last character of a file in unix?

Categories

Resources