Base64 coded attachments to e-mails in Public Record Request from City Government - base64

In a public record request to a city government, I have gotten back a number of records that are .txt format of e-mails with attachments appearing to be base64. The e-mail attachments are jpegs, pdf, png, or doc in base64 a shorter example is below. The government official claim "The records we released to you are in the form that was available, including the file you noted. From what I have been told, it is likely garbled computer coding. We have no other version of those records."
Questions:
Does someone have to intentionally work at saving the e-mail and attachments in this way so that they are unreadable, thus making the information not public (hiding it)?
or is this something that can plausibly happen in saving "garbled computer coding"?
If it is plausible that a computer does it, how?
Is there a way of decoding it?
I have tried a number of online decoding with various settings and have been unsuccessful.
I have done a number of public record requests from this city and department in the past and have never gotten such .txt documents. The public record request is around a city contract that is problematic.
From: "Steinberg, David (DPW)" <david.steinberg#sfdpw.org>
To: "Goldberg, Jonathan (DPW)" <jonathan.goldberg#sfdpw.org>
Sent: Fri, 17 May 2019 20:40:36 +0000
Subject: Re: SOTF - Education, Outreach and Training Committee, May 21, 2019 hearing
----boundary-LibPST-iamunique-566105023_-_-
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
Content-ID: <image002.jpg#01D50CB4.2B093D10>
Content-Disposition: attachment;
filename*=utf-8''image002.jpg;
filename="image002.jpg"
/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwkHBgoJCAkLCwoMDxkQDw4ODx4WFxIZJCAmJSMg
IyIoLTkwKCo2KyIjMkQyNjs9QEBAJjBGS0U+Sjk/QD3/2wBDAQsLCw8NDx0QEB09KSMpPT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT3/wAARCABJAFEDASIA
AhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA
AAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3
ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm
p6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA
AwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx
BhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK
U1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3
uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDKooqa
FY47qIXiSiHcPMVRhtuecZ716ZgRojSOqIpZ2OAo6k0+G2muBIYYZJBEu+Taudi9yfSum1bwlDa6
ENa0q7mlh4lVXGGVD0ORzkcVf8ExwaxHqPnb4bsxeVLLE20SI38RHTcMHnvWTqrl5kUo62OO07T5
9VvY7S0VWmkztDNgcDJ5qB0aN2RhhlJUj3Fdp4N0LT/7YZ5bx/t9nK4FtwOBwGz3GCK5/WmhsfFl
20FvE0UNxkQtyjY6g+xOaaqXk0hWsjJorutb0XS9R8OzaxZReVdpGssiQjCklQeVPbBzkVwtVCak
gasFFKFJViMHb1Hf649KSrEFFFFAE9jMba/t5gMlJFOAAc8+/FaPiO7a5uLdXl8xUQhcyb3UFj8p
OAcgY45rKiDtKgiBMhYbQPWtb+3tRe9ubh4rWedwryu8CvgLjGOwA9qzktboaOsuNXNtbaroscAE
NtpweBm5aQbRkkdB16exrN+H979lt9UXaD8isDjvhv8ACuZe7vvtD30hcvMjAu3IKNlSOe3OPal0
++vNMhmmtSqpIuxmbB6EHgHr1/I1n7P3Wu4+bU6HwPe/a/FiMyhMWhjRe+Bjqe5965vUZHutYu3d
gXedyT0HU80+zm1DSLuNrUtFM5CqBg7j6fqOPzqNGujb3ESR5EkgEz7fmzydpPYZBOPUVajaV0K+
h1eka2n9iw2RCRpeySWZduvESqhPtkj6Zrm9F0oahrFvaXLNDC0pieQDgMATtB6ZOMfjUZurloLP
zI4mggfMSbVXcSRnOOTnHWrdtr2raf5ksMixxGd3aMouwuSM8dTjj6UlFxvbqO99zvNYtLPQ9I8j
SbexivJQY4VlUbpePmAJ6nHr1OBXljI0Z2OrKw4IYYIrS1jUNQ1q+SS+SPziuFVFAwPfnPvzVCcy
mU/aGYyYHLNk4xxz6YopQcVqEncjooorYkltYpZ7uKK3jMkzMAiDua0prTV7RZ7p7UKhA81l2uAM
Y5wScc8n9aj8Nf8AIy6d/wBdhT7ObTtJeWe2uJriZoniWPyPLX5hjLHPIGelZSbvZDRTD3Taa+Fz
aqyxlsfdbkgfU/0pmLhtObAb7Ksm1iDxvI6H8BVyAf8AFI3mOQt7CT7DYwoi48J3BPRr6MA+uI2z
/Onf8wJY7PWLwQ3aQbwQxicsgyDxwCarJJfx6k8AiK3ckmxomTBLEYxj8fz5q5fLpp07R/t5ut/2
T/liExt8x/XvVmYs/jTTJsgxTNbvCec+XwF3Z/i45qFIdjAeR1Cxui74TtDEfMMHp6dalZrk2Ekp
j/0eaUqZP9v7xA54/L8ahu2H2245H+ufv/tGtF/+RPj/AOv9v/RYrTohDLW0v9QY3VvbIVyVeRiE
RyRyCWIBP0qrfQ3FvcmK6h8mRVACY4C44x7e9XfEHXT1X/j1FnGYB26fP+O7Oays5A/SiOuoMKKK
KsRZ037UNSt/sH/H1vHldPvfjxSrpl42pf2etuxu9xTysjOev06VBDM9tPHPEcSRMHU+4Oa9EuYo
7bU7jxVGoMJ04Sx+hlPAH5Y/Ospz5WUlc4rT49UtNVeys4z9rcmKSBgrB8ckEHg9M1M1rrWu3TWq
WxkNqShiiVUjiPfp8ueK68Qpbajd+Kto8l9PWWMjp5jDBH6D86xNRnmtfh9phtHZEuZWa6kQ4LOS
eCfr/IVCnd6IdjPvX1vQoIIL62SOKNSsTSwRyDGScBsH1JxUo0HxNcXceo/ZJZJsrIkhdO33cDPT
2qzo80t54H16O9dpLWFA0LOc7XxnAJ98fnWvrmnxajcaLE2s/YJntkVIwrZc8cgggfnSc7OwWuc4
JtdvbybT/skRuirGSM2sSNjuckD161QtLbUL+xlt7aMyW1sxnk5UBDjGSx9h0ruLW8W9+IrRKrqb
Wza3Z5B8zkEfN+tZus29uPB8kOg3HmW1pcEX2BzKf7xPcA/h+VCnrawWMjRLXXruyK6dai4tAxIW
ZEZA3fbu7/Ss/V1v0vyuqIyXAUDaygYXtgDjH0rcsta0q98PWmlandXWnvbE7JoSdjdeTj61meJd
KuNJ1JEuLo3ayxh4pyTll6DP0q4v3tRPYyKKKK2JCtSTxDeyeH00dtn2ZCCDg7sA5xnPTNZdFJpP
cDUfxDeyeH00djH9mQ5BAO4jOcE56Zo0nxFeaRFJBGIp7WQ5eCddyE+vtWXRS5I2tYd2a2reI7zV
rVbRkhtrRTkQW6bVJ9T61DqWt3WqS2kkwRHtUCRmMEcA5BPPXis+ihQS6Cuzb/4Su9GtnVRDbC5M
XlNhTtYepGetVNK1q50h7gwLHIlyhSWOUEqw+mfc/nWfRS5I7WHdm5YeKrixso7RrKxuIYs+WJos
lec9aoarq11rV59pvGUuF2qqjCovoBVKimoRTuhXYUUUVQH/2Q==
----boundary-LibPST-iamunique-566105023_-_---

I got the image by copying your base64 data to a file and
base64 -d file > image.jpeg
on my Debian/Linux.
RFC 2045 section 6 says binary data are to be encoded in US-ASCII characters
so that it's quite normal images are encoded by base64 although GUI email reader
would not show you raw data.
When you see such raw data, it means somebody might copy and paste other emails blindly, still it less likely happen. (Obviously your example is part of multipart message, but not complete.)
Decode service on web is available, for example, here.

Related

sendmail add a second attachment to an email

Have a program that issues the following command at the linux level
EXE1= "SH -c '/usr/lib/sendmail ":EMAIL<1,X>:' < "/thisdata/level1/VRE/&HOLD&/':PC.FILE:'.CSV':'"':"'"
Is it possible to attach a second PC.File in this instruction?
The rightest answer to this would require knowledge of OS and any mail server restrictions you might have but I have been working on an old routine of ours that maybe provide some insight.
We used to us uuencode as #RedCabbage suggested but we had problems with some servers rejecting the messages. As we have no control over other people's servers and because uuencode is almost as old as dirt we updated our solution to use MIME instead. See "Not As Easy Way"
Easy way
On our Linux system we use mailx (which in our case linked to /usr/bin/mail) to add multiple attachments.
echo "It's Wednessday, my dudes." |mail -s "foobar" -a foo.txt -a bar.txt dudes#wednessday.com
This results in an email with two attachments and message body if the files are properly pathed (if used with a real address).
Not As Easy Way
Create your own MIME (multipart/mixed) message. We create a record similar to this.
To: foo#foo.bar
From: bar#foo.bar
Subject: Not Rocket Surgery
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="SomelongBoundryMarkerYouMakeUpDoNotUseThis"
This is a multipart message in MIME format.
--SomelongBoundryMarkerYouMakeUpDoNotUseThis
Content-Type: text/plain
The body of that message go here.
--SomelongBoundryMarkerYouMakeUpDoNotUseThis
Content-Type: application/pdf
Content-Transfer-Encoding: Base64
Content-Disposition: attachment; filename="YourFileName.pdf"
Content-Base: http://www.IputOurDomainHereDunnoWhyDoNotUseThis.com
##BASE64 encoded string go here.
--SomelongBoundryMarkerYouMakeUpDoNotUseThis
Rinse and Repeat with more files, newlines are important here.
For the encoding part, assuming that your file is in /foo/bar.pdf
ENCODED = ""
FILE.LOC = "/foo/bar.pdf"
TEST = ENCODE("Base64",1,FILE.LOC,2,ENCODED,1)
IF TEST EQ 0 THEN
;*Put the text in ENCODED where it says ##BASE64 encoded string go here
END
This is much more tedious because you have to figure out the all the mime types and make sure everything is formatted just right.
Good Luck
This is how we have managed multiple attachments. For example, /home/jbloggs/attachment1.doc /home/jbloggs/attachment2.pdf.
You will (as you have done) have to build the commands and execute them with SH -c 'blah' (as you have done already.)
Create a text file with the address, subject and body in (for example) /home/jbloggs/tempemail.txt
To: recipient.email#gmail.com
subject: Your subject line here
This is the main body text
Cycle through the attachments and uuencode them to an incrementing name (ENCODEDn here):
uuencode /home/jbloggs/attachment1.doc attachment1.doc > ENCODED1
uuencode /home/jbloggs/attachment2.pdf attachment2.pdf > ENCODED2
Use cat to join all the files together to a single one:
cat /home/jbloggs/tempemail.txt ENCODED1 ENCODED2 > COMBOFILE
Use sendmail on the combo:
sendmail recipient.email#gmail.com < COMBOFILE
You can loop through as many ENCODEDn files as you like.

javamail BASE64DecoderStream decode issue

Found an issue with Base64DecoderStream in javamail. Some email content I get are like this:
Content-Type: text/plain; charset=3D"utf-8"
Content-Transfer-Encoding: base64
QmFzZTY0IGlzIGEgZ2VuZXJpYyB0ZXJtIGZvciBhIG51bWJlciBvZiBzaW1pbGFyIGVuY29kaW5=
n
IHNjaGVtZXMgdGhhdCBlbmNvZGUgYmluYXJ5IGRhdGEgYnkgdHJlYXRpbmcgaXQgbnVtZXJpY2F=
s
bHkgYW5kIHRyYW5zbGF0aW5nIGl0IGludG8gYSBiYXNlIDY0IHJlcHJlc2VudGF0aW9uLiBUaGU=
g
QmFzZTY0IHRlcm0gb3JpZ2luYXRlcyBmcm9tIGEgc3BlY2lmaWMgTUlNRSBjb250ZW50IHRyYW5=
z
ZmVyIGVuY29kaW5nLg==
Ideally the = sign should have been replaced with the single character on the following line but gsuite(Gmail) sometimes does like this. This causes Base64DecoderStream to corrupt the message. However, Outlook and many popular online base64 decoders handle this base64 content well. Can this be fixed?
Additional detail was provided privately, which allowed me to determine that the problem is that the message includes an attachment of MIME type message/rfc822 (the original message), and that attachment uses a Content-Transfer-Encoding of quoted-printable. The MIME spec does not allow the use of that encoding for MIME content of that type. This is a violation of the MIME spec that Google really needs to fix. Please provide them this additional information if they haven't figured it out themselves.
RFC 2046, section 5.2.1, says:
No encoding other than "7bit", "8bit", or "binary" is permitted for
the body of a "message/rfc822" entity.
In the mean time, you can set the JavaMail System property mail.mime.allowencodedmessages to "true" to work around this bug in GSuite.

How to create an email file?

After the fetchmail fetches mails, the new mails are stored in a file like /var/mail/user. We can open the file user by text editor like vim.
How can I create such text-based email files? Say, I want to send an email with contents:
From: sender <sender#xx.com>
To: receiver <receiver#xx.com>
Subject: test subject
Contents: ...
Attached: file1.txt, file2.png, file3.pdf
The problem is how to make these to be a formal text-based email.
Besides, If I have such an email file. How can I extract files(say, subjects, contents, attached files, etc.) by some command line tools. I know I can open it with program like mutt. Can this be done using a command line utility?
There is a bunch of standards you need to understand, but email is fundamentally text.
The file format in /var/spool/mail or /var/mail/user etc is typically Berkeley mbox. This is not formally defined anywhere, but consists of a sequence of RFC5322 (née RFC822) email messages, each preceded by a From_ line, the format of which is basically From %s %C where %s is the sender's email address (what you also see in Return-Path:) and %C is the date when the message arrived. Notice the two spaces between the format strings!
The toplevel email message is RFC5322 but on top of that, you need to understand MIME.
mbox: RFC 4155
Email message format: RFC5322
MIME: RFC2045 RFC2046 RFC2047 RFC2048
You will also stumble over (E)SMTP RFC5321 which is only tangential to your question, but good to know. Notice how 821 and 822 (and later 2821 and 2822, and now 5321 and 5322) have adjacent RFC numbers.
Furthermore, there is a wild, wild West of non-standard headers, some of which are nonetheless significant. Dan Bernstein's reference http://cr.yp.to/immhf.html is a lifesaver. As a general guideline, what spammers typically do is copy/paste headers without understanding them; therefore, an essential practice for deliverability is "don't do that". In other words, if you don't know what a header is for, don't use it.
Any modern programming language will come with libraries to create and manipulate RFC5322 and MIME, and probably mbox too. For creating a message you can send somewhere, you don't need mbox anyway, just something along the lines of (pseudocode)
message = new MIME({'Subject': 'hello', 'From': 'me#example.net',
'To': 'My Friend <you#example.com>'});
message.addbodypart('text/plain', 'Hi Fred.\nHow are you?');
message.addbodypart('image/png', {'file': '/home/you/logo.png'});
smtp = new SMTP('mail.example.net', 587, {'user': 'me', 'pass': 'xyzzy'});
smtp.send(message);
A multipart message looks something like what you describe in your question, except there is no specific header to identify "attachments" and actually conceptually no "attachments", just "body parts". Here is a simple MIME message to show what the message in your question would properly look something like.
From: sender <sender#example.com>
To: receiver <receiver#example.com>
Subject: test subject
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="so_long_eFlop"
This is a MIME multipart message. Nobody actually sees what it says here.
--so_long_eFlop
Content-type: text/plain; charset="utf-8"
Content-disposition: inline
Content-transfer-encoding: 7bit
Many mail clients will display this as the "main part" but MIME does not
define any particular hierarchy. Many mail clients will generate a
text/plain rendering and a text/html rendering of the message you type in,
and the recipient's mail client will decide -- based on user preferences
-- which one to display. Anyway, I will not create an example of that
here. This is just "a text message with a picture attached", or, more
precisely, a MIME message with two body parts.
Oh, the content-disposition: inline is usually just implied for a
text/plain part. Some clients will override or ignore the disposition
set by the sender anyway.
--so_long_eFlop
Content-type: image/png
Content-disposition: attachment
Content-transfer-encoding: base64
Iam+not/attaching+a/real00picture+here/just/a/bunch0of/binary/goo===
--so_long_eFlop--
The file format is called "mbox". There's a good article on Wikipedia (http://en.wikipedia.org/wiki/Mbox), as well as all over the Internet. Like RFC 4155. :)
telnet your.mail.server 25
helo localhost.localdomain
mail from:<sender#address.com>
rcpt to:<recipient#address.com>
data
From:Me
Subject:This is an email via Telnet
Hi,
The first line connects to the server on port 25. Replace "your.mail.server" with the name or address of the MX server for the domain.
Most servers expect the second "HELO" line to begin the session. I have seen servers that don't care, but in general they should throw an error.
You must have a "MAIL FROM:" line with the address you expect a reply to come to.
The mail is going nowhere if you don't specify the "RCPT TO:" address.
The message body begins with "DATA" line. This will usually be met with instruction on how to end the message - a single "." on a line by itself.
The "From:" and "Subject:" headers above are optional. You can add any additional headers here.
.
quit

binary protocols v. text protocols

does anyone have a good definition for what a binary protocol is? and what is a text protocol actually? how do these compare to each other in terms of bits sent on the wire?
here's what wikipedia says about binary protocols:
A binary protocol is a protocol which is intended or expected to be read by a machine rather than a human being (http://en.wikipedia.org/wiki/Binary_protocol)
oh come on!
to be more clear, if I have jpg file how would that be sent through a binary protocol and how through a text one? in terms of bits/bytes sent on the wire of course.
at the end of the day if you look at a string it is itself an array of bytes so the distinction between the 2 protocols should rest on what actual data is being sent on the wire. in other words, on how the initial data (jpg file) is encoded before being sent.
Binary protocol versus text protocol isn't really about how binary blobs are encoded. The difference is really whether the protocol is oriented around data structures or around text strings. Let me give an example: HTTP. HTTP is a text protocol, even though when it sends a jpeg image, it just sends the raw bytes, not a text encoding of them.
But what makes HTTP a text protocol is that the exchange to get the jpg looks like this:
Request:
GET /files/image.jpg HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.01 [en] (Win95; I)
Host: hal.etc.com.au
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
Response:
HTTP/1.1 200 OK
Date: Mon, 19 Jan 1998 03:52:51 GMT
Server: Apache/1.2.4
Last-Modified: Wed, 08 Oct 1997 04:15:24 GMT
ETag: "61a85-17c3-343b08dc"
Content-Length: 60830
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/jpeg
<binary data goes here>
Note that this could very easily have been packed much more tightly into a structure that would look (in C) something like
Request:
struct request {
int requestType;
int protocolVersion;
char path[1024];
char user_agent[1024];
char host[1024];
long int accept_bitmask;
long int language_bitmask;
long int charset_bitmask;
};
Response:
struct response {
int responseType;
int protocolVersion;
time_t date;
char host[1024];
time_t modification_date;
char etag[1024];
size_t content_length;
int keepalive_timeout;
int keepalive_max;
int connection_type;
char content_type[1024];
char data[];
};
Where the field names would not have to be transmitted at all, and where, for example, the responseType in the response structure is an int with the value 200 instead of three characters '2' '0' '0'. That's what a text based protocol is: one that is designed to be communicated as a flat stream of (usually human-readable) lines of text, rather than as structured data of many different types.
Here's a kind-of cop-out definition:
You'll know it when you see it.
This is one of those cases where it is very hard to find a concise definition that covers all corner cases. But it is also one of those cases where the corner cases are completely irrelevant, because they simply do not occur in real life.
Pretty much all protocols that you will encounter in real life will either look like this:
> fg,m4wr76389b zhjsfg gsidf7t5e89wriuotu nbsdfgizs89567sfghlkf
> b9er t8ß03q+459tw4t3490ß´5´3w459t srt üßodfasdfäasefsadfaüdfzjhzuk78987342
< mvclkdsfu93q45324äö53q4lötüpq34tasä#etr0 awe+s byf eart
[Imagine a ton of other non-printable crap there. One of the challenges in conveying the difference between text and binary is that you have to do the conveying in text :-)]
Or like this:
< HELLO server.example.com
> HELLO client.example.com
< GO
> GETFILE /foo.jpg
< Length: 3726
< Type: image/jpeg
< READY?
> GO
< ... server sends 3726 bytes of binary data ...
> ACK
> BYE
[I just made this up on the spot.]
There's simply not that much ambiguity there.
Another definition that I have sometimes heard is
a text protocol is one that you can debug using telnet
Maybe I am showing my nerdiness here, but I have actually written and read e-mails via SMTP and POP3, read usenet articles via NNTP and viewed web pages via HTTP using telnet, for no other reason than to see whether it would actually work.
Actually, while writing this, I kinda caught the fever again:
bash-4.0$ telnet smtp.googlemail.com 25
Trying 74.125.77.16...
Connected to googlemail-smtp.l.google.com.
Escape character is '^]'.
< 220 googlemail-smtp.l.google.com ESMTP Thu, 15 Apr 2010 19:19:39 +0200
> HELO
< 501 Syntactically invalid HELO argument(s)
> HELO client.example.com
< 250 googlemail-smtp.l.google.com Hello client.example.com [666.666.666.666]
> RCPT TO:Me <Me#Example.Com>
< 503 sender not yet given
> SENDER:Me <Me#Example.Com>
< 500 unrecognized command
> RCPT FROM:Me <Me#Example.Com>
< 500 unrecognized command
> FROM:Me <Me#Example.Com>
< 500-unrecognized command
> HELP
< 214-Commands supported:
< 214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP ETRN
> MAIL FROM:Me <Me#Example.Com>
< 250 OK
> RCPT TO:You <You#SomewhereElse.Example.Com>
< 250 Accepted
> DATA
< 354 Enter message, ending with "." on a line by itself
> From: Me <Me#Example.Com>
> To: You <You#SomewhereElse.Example.Com>
> Subject: Testmail
>
> This is a test.
> .
< 250 OK id=1O2Sjq-0000c4-Qv
> QUIT
< 221 googlemail-smtp.l.google.com closing connection
Connection closed by foreign host.
Damn, it's been quite a while since I've done this. Quite a few errors in there :-)
Examples of binary protocols: RTP, TCP, IP.
Examples of text protocols: SMTP, HTTP, SIP.
This should allow you to generalise to a reasonable definition of binary vs text protocols.
Hint: just skip to the example sections, or the diagrams. They serve to illustrate Tyler's rocking answer.
As most of you suggested we can't differentiate whether the protocol is Binary or text simply by looking at the content on the wire
AFIK
Binary protocol - Bits are boundary
Order is very critical
Eg., RTP
First two bits are version
Next bit is MarkUp bit
Text protocol - Delimiters specific to protocol
Order of the fields is not important
Eg., SIP
One more is, in binary protocol, we can split a byte, i.e., a single bit might have a specific individual meaning; While in a text protocol minimum meaningful unit is BYTE. You can't split a byte.
Both uses different char set, the text one, use a reduced char set, the binary includes all it can, not only "letters" and "numbers", (that's why wikipedia says "human being")
o be more clear, if I have jpg file how would that be sent through a binary protocol and how >through a text one? in terms of bits/bytes sent on the wire of course.
you should read this Base64
any coments are apprecited, I am trying to get to the essence of things here.
I think the essence for narrowing the charset, is narrowing the complexity, and reach portability, compatibility. It's harder to arrange and agree with many to respect a Wide charset, (or a wide whatever). The Latin/Roman alphabet and the Arabic numerals are worldwide known. (There are of course other considerations to reduce the code, but that's a main one)
Let say in binary protocols the "contract" between the parts is about bits, first bit mean this, second that, etc.. or even bytes (but with the freedom of use the charset without thinking in portability) for example in privated closed system or (near hardware standars), however if you design a open system you have to take account how your codes will be represented in a wide set of situations, for example how it will be represented in a machine at other side of world?, so here comes the text protocols where the contract will be as standar as posible. I have designed both and that were the reasons, binary for very custom solutions and text for open or/and portable systems.
How can we send an image file in SOAP: Click here
This shows that binary data is attached as such [ATTACHMENT] and its reference is saved in SOAP message.
So, The protocol is text based and data[Image] is binary attachment whose encoding is not relevant
Thus, SOAP is text protocol due to the way we specify Soap headers and not actual data encoded in it.
I think you got it wrong.
It's not the protocol that determines how data looks on the "wire", but it's the data type that determine which protocol to use to transmit it.
Take tcp socket for instance, a jpeg file will be sent and received with a binary protocol 'cause it's binary data (not human readable, bytes that go among the 32-126 ascii range), but you can send / recv a text file with both protocols and you wouldn't notice the difference.
Text protocol can be self-explanatory and extensive.
It's self-explanatory because the message includes the field names just in the message itself. You cannot understand which value means in the message of binary protocol if you don't refer to the protocol specification.
It's extensive means HTTP as a text protocol just make simple rules but you can extend the data structure by freely adding new headers or by changing the content type to transport different payloads. And the headers are the meta data and have the capability of negotiation and automatically adaption.

What is the encoding of an .eml file from IIS's SMTP server?

I need to write a program that read the .eml files from IIS's mail drop box, but I can't find a definitive source that tells me the encoding of the .eml files. Is there a specification somewhere that tells me the encoding of the files, or do I just have to guess/assume one?
You need to read the Content-Transfer-Encoding header. This value will tell you how the email is encoded. The most common are 7-Bit (no encoding), Quoted-Printable (where you see a lot of =HEX pairs), and base64 (which is base 64 encoding).
Based upon that header value, you decode the following body part using the specified routine.
I found my answer at en.wikipedia.org/wiki/MIME: "The basic Internet e-mail transmission protocol, SMTP, supports only 7-bit ASCII characters... "
Though it's too late to answer but eml file format nothing but a plaintext MIME (rfc822) file format for storing emails.

Resources