does anyone have a good definition for what a binary protocol is? and what is a text protocol actually? how do these compare to each other in terms of bits sent on the wire?
here's what wikipedia says about binary protocols:
A binary protocol is a protocol which is intended or expected to be read by a machine rather than a human being (http://en.wikipedia.org/wiki/Binary_protocol)
oh come on!
to be more clear, if I have jpg file how would that be sent through a binary protocol and how through a text one? in terms of bits/bytes sent on the wire of course.
at the end of the day if you look at a string it is itself an array of bytes so the distinction between the 2 protocols should rest on what actual data is being sent on the wire. in other words, on how the initial data (jpg file) is encoded before being sent.
Binary protocol versus text protocol isn't really about how binary blobs are encoded. The difference is really whether the protocol is oriented around data structures or around text strings. Let me give an example: HTTP. HTTP is a text protocol, even though when it sends a jpeg image, it just sends the raw bytes, not a text encoding of them.
But what makes HTTP a text protocol is that the exchange to get the jpg looks like this:
Request:
GET /files/image.jpg HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.01 [en] (Win95; I)
Host: hal.etc.com.au
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
Response:
HTTP/1.1 200 OK
Date: Mon, 19 Jan 1998 03:52:51 GMT
Server: Apache/1.2.4
Last-Modified: Wed, 08 Oct 1997 04:15:24 GMT
ETag: "61a85-17c3-343b08dc"
Content-Length: 60830
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/jpeg
<binary data goes here>
Note that this could very easily have been packed much more tightly into a structure that would look (in C) something like
Request:
struct request {
int requestType;
int protocolVersion;
char path[1024];
char user_agent[1024];
char host[1024];
long int accept_bitmask;
long int language_bitmask;
long int charset_bitmask;
};
Response:
struct response {
int responseType;
int protocolVersion;
time_t date;
char host[1024];
time_t modification_date;
char etag[1024];
size_t content_length;
int keepalive_timeout;
int keepalive_max;
int connection_type;
char content_type[1024];
char data[];
};
Where the field names would not have to be transmitted at all, and where, for example, the responseType in the response structure is an int with the value 200 instead of three characters '2' '0' '0'. That's what a text based protocol is: one that is designed to be communicated as a flat stream of (usually human-readable) lines of text, rather than as structured data of many different types.
Here's a kind-of cop-out definition:
You'll know it when you see it.
This is one of those cases where it is very hard to find a concise definition that covers all corner cases. But it is also one of those cases where the corner cases are completely irrelevant, because they simply do not occur in real life.
Pretty much all protocols that you will encounter in real life will either look like this:
> fg,m4wr76389b zhjsfg gsidf7t5e89wriuotu nbsdfgizs89567sfghlkf
> b9er t8ß03q+459tw4t3490ß´5´3w459t srt üßodfasdfäasefsadfaüdfzjhzuk78987342
< mvclkdsfu93q45324äö53q4lötüpq34tasä#etr0 awe+s byf eart
[Imagine a ton of other non-printable crap there. One of the challenges in conveying the difference between text and binary is that you have to do the conveying in text :-)]
Or like this:
< HELLO server.example.com
> HELLO client.example.com
< GO
> GETFILE /foo.jpg
< Length: 3726
< Type: image/jpeg
< READY?
> GO
< ... server sends 3726 bytes of binary data ...
> ACK
> BYE
[I just made this up on the spot.]
There's simply not that much ambiguity there.
Another definition that I have sometimes heard is
a text protocol is one that you can debug using telnet
Maybe I am showing my nerdiness here, but I have actually written and read e-mails via SMTP and POP3, read usenet articles via NNTP and viewed web pages via HTTP using telnet, for no other reason than to see whether it would actually work.
Actually, while writing this, I kinda caught the fever again:
bash-4.0$ telnet smtp.googlemail.com 25
Trying 74.125.77.16...
Connected to googlemail-smtp.l.google.com.
Escape character is '^]'.
< 220 googlemail-smtp.l.google.com ESMTP Thu, 15 Apr 2010 19:19:39 +0200
> HELO
< 501 Syntactically invalid HELO argument(s)
> HELO client.example.com
< 250 googlemail-smtp.l.google.com Hello client.example.com [666.666.666.666]
> RCPT TO:Me <Me#Example.Com>
< 503 sender not yet given
> SENDER:Me <Me#Example.Com>
< 500 unrecognized command
> RCPT FROM:Me <Me#Example.Com>
< 500 unrecognized command
> FROM:Me <Me#Example.Com>
< 500-unrecognized command
> HELP
< 214-Commands supported:
< 214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP ETRN
> MAIL FROM:Me <Me#Example.Com>
< 250 OK
> RCPT TO:You <You#SomewhereElse.Example.Com>
< 250 Accepted
> DATA
< 354 Enter message, ending with "." on a line by itself
> From: Me <Me#Example.Com>
> To: You <You#SomewhereElse.Example.Com>
> Subject: Testmail
>
> This is a test.
> .
< 250 OK id=1O2Sjq-0000c4-Qv
> QUIT
< 221 googlemail-smtp.l.google.com closing connection
Connection closed by foreign host.
Damn, it's been quite a while since I've done this. Quite a few errors in there :-)
Examples of binary protocols: RTP, TCP, IP.
Examples of text protocols: SMTP, HTTP, SIP.
This should allow you to generalise to a reasonable definition of binary vs text protocols.
Hint: just skip to the example sections, or the diagrams. They serve to illustrate Tyler's rocking answer.
As most of you suggested we can't differentiate whether the protocol is Binary or text simply by looking at the content on the wire
AFIK
Binary protocol - Bits are boundary
Order is very critical
Eg., RTP
First two bits are version
Next bit is MarkUp bit
Text protocol - Delimiters specific to protocol
Order of the fields is not important
Eg., SIP
One more is, in binary protocol, we can split a byte, i.e., a single bit might have a specific individual meaning; While in a text protocol minimum meaningful unit is BYTE. You can't split a byte.
Both uses different char set, the text one, use a reduced char set, the binary includes all it can, not only "letters" and "numbers", (that's why wikipedia says "human being")
o be more clear, if I have jpg file how would that be sent through a binary protocol and how >through a text one? in terms of bits/bytes sent on the wire of course.
you should read this Base64
any coments are apprecited, I am trying to get to the essence of things here.
I think the essence for narrowing the charset, is narrowing the complexity, and reach portability, compatibility. It's harder to arrange and agree with many to respect a Wide charset, (or a wide whatever). The Latin/Roman alphabet and the Arabic numerals are worldwide known. (There are of course other considerations to reduce the code, but that's a main one)
Let say in binary protocols the "contract" between the parts is about bits, first bit mean this, second that, etc.. or even bytes (but with the freedom of use the charset without thinking in portability) for example in privated closed system or (near hardware standars), however if you design a open system you have to take account how your codes will be represented in a wide set of situations, for example how it will be represented in a machine at other side of world?, so here comes the text protocols where the contract will be as standar as posible. I have designed both and that were the reasons, binary for very custom solutions and text for open or/and portable systems.
How can we send an image file in SOAP: Click here
This shows that binary data is attached as such [ATTACHMENT] and its reference is saved in SOAP message.
So, The protocol is text based and data[Image] is binary attachment whose encoding is not relevant
Thus, SOAP is text protocol due to the way we specify Soap headers and not actual data encoded in it.
I think you got it wrong.
It's not the protocol that determines how data looks on the "wire", but it's the data type that determine which protocol to use to transmit it.
Take tcp socket for instance, a jpeg file will be sent and received with a binary protocol 'cause it's binary data (not human readable, bytes that go among the 32-126 ascii range), but you can send / recv a text file with both protocols and you wouldn't notice the difference.
Text protocol can be self-explanatory and extensive.
It's self-explanatory because the message includes the field names just in the message itself. You cannot understand which value means in the message of binary protocol if you don't refer to the protocol specification.
It's extensive means HTTP as a text protocol just make simple rules but you can extend the data structure by freely adding new headers or by changing the content type to transport different payloads. And the headers are the meta data and have the capability of negotiation and automatically adaption.
Related
Using nodejs and iconv-lite to create a http response file in xml with charset windows-1252, the file -i command cannot identify it as windows-1252.
Server side:
r.header('Content-Disposition', 'attachment; filename=teste.xml');
r.header('Content-Type', 'text/xml; charset=iso8859-1');
r.write(ICONVLITE.encode(`<?xml version="1.0" encoding="windows-1252"?><x>€Àáção</x>`, "win1252")); //euro symbol and portuguese accentuated vogals
r.end();
The browser donwloads the file and then i check it in Ubuntu 20.04 LTS:
file -i teste.xml
/tmp/teste.xml: text/xml; charset=unknown-8bit
When i use gedit to open it, the accentuated vogal appear fine but the euro symbol it does not (all characters from 128 to 159 get messed up).
I checked in a windows 10 vm and in there all goes well. Both in Windows and Linux web browsers, it also shows all fine.
So, is it a problem in file command? How to check the right charsert of a file in Linux?
Thank you
EDIT
The result file can be get here
2nd EDIT
I found one error! The code line:
r.header('Content-Type', 'text/xml; charset=iso8859-1');
must be:
r.header('Content-Type', 'text/xml; charset=Windows-1252');
It's important to understand what a character encoding is and isn't.
A text file is actually just a stream of bits; or, since we've mostly agreed that there are 8 bits in a byte, a stream of bytes. A character encoding is a lookup table (and sometimes a more complicated algorithm) for deciding what characters to show to a human for that stream of bytes.
For instance, the character "€" encoded in Windows-1252 is the string of bits 10000000. That same string of bits will mean other things in other encodings - most encodings assign some meaning to all 256 possible bytes.
If a piece of software knows that the file is supposed to be read as Windows-1252, it can look up a mapping for that encoding and show you a "€". This is how browsers are displaying the right thing: you've told them in the Content-Type header to use the Windows-1252 lookup table.
Once you save the file to disk, that "Windows-1252" label form the Content-Type header isn't stored anywhere. So any program looking at that file can see that it contains the string of bits 10000000 but it doesn't know what mapping table to look that up in. Nothing you do in the HTTP headers is going to change that - none of those are going to affect how it's saved on disk.
In this particular case the "file" command could look at the "encoding" marker inside the XML document, and find the "windows-1252" there. My guess is that it simply doesn't have that functionality. So instead it uses its general logic for guessing an encoding: it's probably something ASCII-compatible, because it starts with the bytes that spell <?xml in ASCII; but it's not ASCII itself, because it has bytes outside the range 00000000 to 01111111; anything beyond that is hard to guess, so output "unknown-8bit".
In a public record request to a city government, I have gotten back a number of records that are .txt format of e-mails with attachments appearing to be base64. The e-mail attachments are jpegs, pdf, png, or doc in base64 a shorter example is below. The government official claim "The records we released to you are in the form that was available, including the file you noted. From what I have been told, it is likely garbled computer coding. We have no other version of those records."
Questions:
Does someone have to intentionally work at saving the e-mail and attachments in this way so that they are unreadable, thus making the information not public (hiding it)?
or is this something that can plausibly happen in saving "garbled computer coding"?
If it is plausible that a computer does it, how?
Is there a way of decoding it?
I have tried a number of online decoding with various settings and have been unsuccessful.
I have done a number of public record requests from this city and department in the past and have never gotten such .txt documents. The public record request is around a city contract that is problematic.
From: "Steinberg, David (DPW)" <david.steinberg#sfdpw.org>
To: "Goldberg, Jonathan (DPW)" <jonathan.goldberg#sfdpw.org>
Sent: Fri, 17 May 2019 20:40:36 +0000
Subject: Re: SOTF - Education, Outreach and Training Committee, May 21, 2019 hearing
----boundary-LibPST-iamunique-566105023_-_-
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
Content-ID: <image002.jpg#01D50CB4.2B093D10>
Content-Disposition: attachment;
filename*=utf-8''image002.jpg;
filename="image002.jpg"
/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwkHBgoJCAkLCwoMDxkQDw4ODx4WFxIZJCAmJSMg
IyIoLTkwKCo2KyIjMkQyNjs9QEBAJjBGS0U+Sjk/QD3/2wBDAQsLCw8NDx0QEB09KSMpPT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT3/wAARCABJAFEDASIA
AhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA
AAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3
ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm
p6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA
AwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx
BhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK
U1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3
uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDKooqa
FY47qIXiSiHcPMVRhtuecZ716ZgRojSOqIpZ2OAo6k0+G2muBIYYZJBEu+Taudi9yfSum1bwlDa6
ENa0q7mlh4lVXGGVD0ORzkcVf8ExwaxHqPnb4bsxeVLLE20SI38RHTcMHnvWTqrl5kUo62OO07T5
9VvY7S0VWmkztDNgcDJ5qB0aN2RhhlJUj3Fdp4N0LT/7YZ5bx/t9nK4FtwOBwGz3GCK5/WmhsfFl
20FvE0UNxkQtyjY6g+xOaaqXk0hWsjJorutb0XS9R8OzaxZReVdpGssiQjCklQeVPbBzkVwtVCak
gasFFKFJViMHb1Hf649KSrEFFFFAE9jMba/t5gMlJFOAAc8+/FaPiO7a5uLdXl8xUQhcyb3UFj8p
OAcgY45rKiDtKgiBMhYbQPWtb+3tRe9ubh4rWedwryu8CvgLjGOwA9qzktboaOsuNXNtbaroscAE
NtpweBm5aQbRkkdB16exrN+H979lt9UXaD8isDjvhv8ACuZe7vvtD30hcvMjAu3IKNlSOe3OPal0
++vNMhmmtSqpIuxmbB6EHgHr1/I1n7P3Wu4+bU6HwPe/a/FiMyhMWhjRe+Bjqe5965vUZHutYu3d
gXedyT0HU80+zm1DSLuNrUtFM5CqBg7j6fqOPzqNGujb3ESR5EkgEz7fmzydpPYZBOPUVajaV0K+
h1eka2n9iw2RCRpeySWZduvESqhPtkj6Zrm9F0oahrFvaXLNDC0pieQDgMATtB6ZOMfjUZurloLP
zI4mggfMSbVXcSRnOOTnHWrdtr2raf5ksMixxGd3aMouwuSM8dTjj6UlFxvbqO99zvNYtLPQ9I8j
SbexivJQY4VlUbpePmAJ6nHr1OBXljI0Z2OrKw4IYYIrS1jUNQ1q+SS+SPziuFVFAwPfnPvzVCcy
mU/aGYyYHLNk4xxz6YopQcVqEncjooorYkltYpZ7uKK3jMkzMAiDua0prTV7RZ7p7UKhA81l2uAM
Y5wScc8n9aj8Nf8AIy6d/wBdhT7ObTtJeWe2uJriZoniWPyPLX5hjLHPIGelZSbvZDRTD3Taa+Fz
aqyxlsfdbkgfU/0pmLhtObAb7Ksm1iDxvI6H8BVyAf8AFI3mOQt7CT7DYwoi48J3BPRr6MA+uI2z
/Onf8wJY7PWLwQ3aQbwQxicsgyDxwCarJJfx6k8AiK3ckmxomTBLEYxj8fz5q5fLpp07R/t5ut/2
T/liExt8x/XvVmYs/jTTJsgxTNbvCec+XwF3Z/i45qFIdjAeR1Cxui74TtDEfMMHp6dalZrk2Ekp
j/0eaUqZP9v7xA54/L8ahu2H2245H+ufv/tGtF/+RPj/AOv9v/RYrTohDLW0v9QY3VvbIVyVeRiE
RyRyCWIBP0qrfQ3FvcmK6h8mRVACY4C44x7e9XfEHXT1X/j1FnGYB26fP+O7Oays5A/SiOuoMKKK
KsRZ037UNSt/sH/H1vHldPvfjxSrpl42pf2etuxu9xTysjOev06VBDM9tPHPEcSRMHU+4Oa9EuYo
7bU7jxVGoMJ04Sx+hlPAH5Y/Ospz5WUlc4rT49UtNVeys4z9rcmKSBgrB8ckEHg9M1M1rrWu3TWq
WxkNqShiiVUjiPfp8ueK68Qpbajd+Kto8l9PWWMjp5jDBH6D86xNRnmtfh9phtHZEuZWa6kQ4LOS
eCfr/IVCnd6IdjPvX1vQoIIL62SOKNSsTSwRyDGScBsH1JxUo0HxNcXceo/ZJZJsrIkhdO33cDPT
2qzo80t54H16O9dpLWFA0LOc7XxnAJ98fnWvrmnxajcaLE2s/YJntkVIwrZc8cgggfnSc7OwWuc4
JtdvbybT/skRuirGSM2sSNjuckD161QtLbUL+xlt7aMyW1sxnk5UBDjGSx9h0ruLW8W9+IrRKrqb
Wza3Z5B8zkEfN+tZus29uPB8kOg3HmW1pcEX2BzKf7xPcA/h+VCnrawWMjRLXXruyK6dai4tAxIW
ZEZA3fbu7/Ss/V1v0vyuqIyXAUDaygYXtgDjH0rcsta0q98PWmlandXWnvbE7JoSdjdeTj61meJd
KuNJ1JEuLo3ayxh4pyTll6DP0q4v3tRPYyKKKK2JCtSTxDeyeH00dtn2ZCCDg7sA5xnPTNZdFJpP
cDUfxDeyeH00djH9mQ5BAO4jOcE56Zo0nxFeaRFJBGIp7WQ5eCddyE+vtWXRS5I2tYd2a2reI7zV
rVbRkhtrRTkQW6bVJ9T61DqWt3WqS2kkwRHtUCRmMEcA5BPPXis+ihQS6Cuzb/4Su9GtnVRDbC5M
XlNhTtYepGetVNK1q50h7gwLHIlyhSWOUEqw+mfc/nWfRS5I7WHdm5YeKrixso7RrKxuIYs+WJos
lec9aoarq11rV59pvGUuF2qqjCovoBVKimoRTuhXYUUUVQH/2Q==
----boundary-LibPST-iamunique-566105023_-_---
I got the image by copying your base64 data to a file and
base64 -d file > image.jpeg
on my Debian/Linux.
RFC 2045 section 6 says binary data are to be encoded in US-ASCII characters
so that it's quite normal images are encoded by base64 although GUI email reader
would not show you raw data.
When you see such raw data, it means somebody might copy and paste other emails blindly, still it less likely happen. (Obviously your example is part of multipart message, but not complete.)
Decode service on web is available, for example, here.
After the fetchmail fetches mails, the new mails are stored in a file like /var/mail/user. We can open the file user by text editor like vim.
How can I create such text-based email files? Say, I want to send an email with contents:
From: sender <sender#xx.com>
To: receiver <receiver#xx.com>
Subject: test subject
Contents: ...
Attached: file1.txt, file2.png, file3.pdf
The problem is how to make these to be a formal text-based email.
Besides, If I have such an email file. How can I extract files(say, subjects, contents, attached files, etc.) by some command line tools. I know I can open it with program like mutt. Can this be done using a command line utility?
There is a bunch of standards you need to understand, but email is fundamentally text.
The file format in /var/spool/mail or /var/mail/user etc is typically Berkeley mbox. This is not formally defined anywhere, but consists of a sequence of RFC5322 (née RFC822) email messages, each preceded by a From_ line, the format of which is basically From %s %C where %s is the sender's email address (what you also see in Return-Path:) and %C is the date when the message arrived. Notice the two spaces between the format strings!
The toplevel email message is RFC5322 but on top of that, you need to understand MIME.
mbox: RFC 4155
Email message format: RFC5322
MIME: RFC2045 RFC2046 RFC2047 RFC2048
You will also stumble over (E)SMTP RFC5321 which is only tangential to your question, but good to know. Notice how 821 and 822 (and later 2821 and 2822, and now 5321 and 5322) have adjacent RFC numbers.
Furthermore, there is a wild, wild West of non-standard headers, some of which are nonetheless significant. Dan Bernstein's reference http://cr.yp.to/immhf.html is a lifesaver. As a general guideline, what spammers typically do is copy/paste headers without understanding them; therefore, an essential practice for deliverability is "don't do that". In other words, if you don't know what a header is for, don't use it.
Any modern programming language will come with libraries to create and manipulate RFC5322 and MIME, and probably mbox too. For creating a message you can send somewhere, you don't need mbox anyway, just something along the lines of (pseudocode)
message = new MIME({'Subject': 'hello', 'From': 'me#example.net',
'To': 'My Friend <you#example.com>'});
message.addbodypart('text/plain', 'Hi Fred.\nHow are you?');
message.addbodypart('image/png', {'file': '/home/you/logo.png'});
smtp = new SMTP('mail.example.net', 587, {'user': 'me', 'pass': 'xyzzy'});
smtp.send(message);
A multipart message looks something like what you describe in your question, except there is no specific header to identify "attachments" and actually conceptually no "attachments", just "body parts". Here is a simple MIME message to show what the message in your question would properly look something like.
From: sender <sender#example.com>
To: receiver <receiver#example.com>
Subject: test subject
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="so_long_eFlop"
This is a MIME multipart message. Nobody actually sees what it says here.
--so_long_eFlop
Content-type: text/plain; charset="utf-8"
Content-disposition: inline
Content-transfer-encoding: 7bit
Many mail clients will display this as the "main part" but MIME does not
define any particular hierarchy. Many mail clients will generate a
text/plain rendering and a text/html rendering of the message you type in,
and the recipient's mail client will decide -- based on user preferences
-- which one to display. Anyway, I will not create an example of that
here. This is just "a text message with a picture attached", or, more
precisely, a MIME message with two body parts.
Oh, the content-disposition: inline is usually just implied for a
text/plain part. Some clients will override or ignore the disposition
set by the sender anyway.
--so_long_eFlop
Content-type: image/png
Content-disposition: attachment
Content-transfer-encoding: base64
Iam+not/attaching+a/real00picture+here/just/a/bunch0of/binary/goo===
--so_long_eFlop--
The file format is called "mbox". There's a good article on Wikipedia (http://en.wikipedia.org/wiki/Mbox), as well as all over the Internet. Like RFC 4155. :)
telnet your.mail.server 25
helo localhost.localdomain
mail from:<sender#address.com>
rcpt to:<recipient#address.com>
data
From:Me
Subject:This is an email via Telnet
Hi,
The first line connects to the server on port 25. Replace "your.mail.server" with the name or address of the MX server for the domain.
Most servers expect the second "HELO" line to begin the session. I have seen servers that don't care, but in general they should throw an error.
You must have a "MAIL FROM:" line with the address you expect a reply to come to.
The mail is going nowhere if you don't specify the "RCPT TO:" address.
The message body begins with "DATA" line. This will usually be met with instruction on how to end the message - a single "." on a line by itself.
The "From:" and "Subject:" headers above are optional. You can add any additional headers here.
.
quit
I need to write a program that read the .eml files from IIS's mail drop box, but I can't find a definitive source that tells me the encoding of the .eml files. Is there a specification somewhere that tells me the encoding of the files, or do I just have to guess/assume one?
You need to read the Content-Transfer-Encoding header. This value will tell you how the email is encoded. The most common are 7-Bit (no encoding), Quoted-Printable (where you see a lot of =HEX pairs), and base64 (which is base 64 encoding).
Based upon that header value, you decode the following body part using the specified routine.
I found my answer at en.wikipedia.org/wiki/MIME: "The basic Internet e-mail transmission protocol, SMTP, supports only 7-bit ASCII characters... "
Though it's too late to answer but eml file format nothing but a plaintext MIME (rfc822) file format for storing emails.
I want to send data hex format (0x01) or (0xff) to the port where rs232 cable is connected. I am using the MSCOMM control in VC++ 6.0. Is that the correct way to send the hex data. This is the code I have.
CString output;
UCHAR data1,data2;
data1=0x01;
output.Format("%x",data1);
m_mscom.SetOutput(COleVariant(output));
data2=0xff;
output.Format("%x",data2);
m_mscom.SetOutput(COleVariant(output));
If it is not correct, please tell me how to send hex format data to the port.
Thanks in Advance
If your data is simply a hex value that you want to send out (as opposed to a hex formatted string), you probably want "%c" rather than "%x". Given a data value of (e.g.) 0x65, "%x" will produce two bytes on the serial port: '6' (ascii value 54) and '5' (ascii value 53). "%c" will produce a single byte on the serial port: 'e' (ascii value 100 or 0x65).
As far as sending data on a serial port in C, have a look at CSerial or CSerialPort: they may simplify things for you a bit (note that I've not used them, I tend to do serial port apps in python with the pyserial module or in wxWidgets with the ctb library).
Edit: The other one that's quite good (I have used it before, but couldn't find the link when I wrote the original post) is CSerialFile, part of the WFC libraries. As I said, I tend to use wxWidgets now, but if you're using the Microsoft foundation classes, CSerialFile does make serial access very easy.
I'm no familiar with MSCOM but it seems like it won't work. Format may re-format the data to an ASCII string representation instead.
Alternatively, you can just use any serial port as a 'file' in Windows. Look at the windows api for opening files and you will see that you can address certain devices as files by using a filename like 'COM1:' instead.
Then, you can read/write from it like a file.