Short Version
A Content-ID header on an attachment must be of the form local-part "#" domain. Gmail's Content-IDs do not have an # in them. Is this a real bug, or am I misreading the spec?
Long Version
I noticed this problem when I tried to resend an email sent from Gmail that has an inline image attached. My mailer (SwiftMailer) claimed that the Content-ID was invalid.
Here's the email I'm working with. I created this by inserting an image inline in Gmail and emailing it to myself.
Here are the relevant parts of the spec (as far as I can tell):
RFC 2045
Content-ID Header Field
In constructing a high-level user agent, it may be desirable to allow
one body to make reference to another. Accordingly, bodies may be
labelled using the "Content-ID" header field, which is syntactically
identical to the "Message-ID" header field:
id := "Content-ID" ":" msg-id
RFC 822 here and here
msg-id = "<" addr-spec ">" ; Unique message id
addr-spec = local-part "#" domain ; global address
What am I missing here? Is Gmail not following the spec, or is it ok not to have an # in the Content-ID?
Seeing as no one's posted a better answer…
My interpretation of the RFCs fits with yours. I'd say Gmail's doing the wrong thing here, by the book. However, what Gmail does is de facto valid by definition. Gmail is too popular for other software not to accept however it does things, which opens the door for more software to violate the spec in the same way until it's standard practice.
Unfortunately that means there's currently no exact spec that matches reality. Luckily this question comes up in Google results now.
The original email in the question is gone, so here's another example. This is just the encoded image portion of the multipart message. Note the Content-ID header.
--089e0153807e5a346d04f1ae7c38
Content-Type: image/gif; name="blank.gif"
Content-Transfer-Encoding: base64
Content-ID: <ii_14403b4fa16783bf>
X-Attachment-Id: ii_14403b4fa16783bf
R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==
--089e0153807e5a346d04f1ae7c38--
Related
I am sending an email to new users asking them to verify.
This is my email
But when I am testing the email in mail-tester.com, I get this HTML_IMAGE_ONLY_20 message. But I do not have any image sith text in or am I misunderstading the error?
I assume that’s a SpamAssassin rule. It is a rule that says that your message is composed primarily of image and not text. In your example you have a logo and not much text, so I’d assume it’s that. You could look up the source code for the SA rule and see what it’s actually looking for.
I know how to use python3's email.mime.multipart.MIMEMultipart class to create various kinds of multipart email messages.
I have a need to do "email surgery" on certain existing multipart messages by taking the body (which contains the various MIME parts) and to remove and add some new parts without changing the rest of the email's headers. In the headers, there is the following sample header:
Content-Type: multipart/mixed;
boundary="ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"
... where the ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ boundary is used to separate the existing message parts.
I want to use the email.mime.multipart.MIMEMultipart tools to create a new body with modified parts, and I want to use the same ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ boundary between these parts, so that I don't have to go back and change the Content-Type header that already exists among the message's original headers.
It would be ideal if I could specify the exact text of the boundary that I want to use when creating this multipart body, but in python3, it seems like the email.mime.multipart.MIMEMultipart package always creates its own, randomly generated boundary string.
In python3, is there any way to tell the email.mime.multipart.MIMEMultipart software to use a boundary string that I supply?
If not, I can work around this by taking the newly generated message body, extracting its new boundary string (which I'll call "newboundary", and then replacing it with the original boundary (which I'll call "originalboundary"), as follows:
msgbodytext = msgbodytext.replace(newboundary, originalboundary)
But I prefer not to do this, if possible.
Any ideas about how to tell email.mime.multipart.MIMEMultipart to use a boundary string that I specify? Thank you.
I originally misread the docs, and now that I re-read them, I see that this is trivially easy.
To set my own boundary to the value of the originalboundary variable, I just have to do this:
msg = email.mime.multipart.MIMEMultipart(boundary=originalboundary)
My app is downloading eMails from a POP3 server. But it receives "strange" subject lines. I see there is a pattern of coding in there. But what exactly is that coding? What do I google for if I want cover the parsing of lines like that?
=?utf-8?Q?HMS=20will=20no=20longer=20provide=20Netbiter=20SIM=2Dcards?=
The eMail header is:
Subject: =?utf-8?Q?HMS=20will=20no=20longer=20provide=20Netbiter=20SIM=2Dcards?=
So I know it's coded by the sender. Also this is just one of many header lines, which all seem to be valid. The subject is obviously HMS will no longer provide Netbiter SIM-cards after you replace "=20" with spaces and all the other stuff. The eMail is just a notification from a supplier.
**Found it myself:** [RFC 2047][1]
I don't remember exactly but i have experienced a issue like this before. If the subject is greater than some length of character (which i think is 75 chars) then subject will not be parsed properly.
This used to be a problem in codeIgniter (I don't use it now, so don't know the current status of this issue)
To resolve this issue you might have to change your email config.
There might be other reasons for this issue but i think this is because you have more characters in your subject.
Recently I stared deciphering the hotmail inbox/junk folder placement. After long searching I found out that Hotmail/Outlook is using their X-Message-Delivery tag they input in each mail, to determine alot of things. If you go to source of an e-mail you can see an X-Message-Delivery similar to this :
Vj0xLjE7dXM9MDtsPTE7YT0wO0Q9MjtHRD0yO1NDTD00
which if you BASE64-Decode it you get
V=1.1;us=0;l=1;a=0;D=2;GD=2;SCL=4
Based upon my research, I found out that following things are causing a message to go to the junk folder,
If "D=x" is higher then 1
If "GD=x" is higher then 1
If "SCL=x" is higher then 0
Other then that I found out that V=1.1 is allways the same and does not change in any e-mail I received (or atleast the ones I checked)
I think that "a=x" is meaning if the sender is in the receivers adressbook, or is atleast known by the receiver. If its 0 its a new sender, and therefor more likely to go to the junk folder, if its a=1 it helps to get into inbox.
I think the l=1 or l=0 has something to with, if images are blocked or not for this sender upon an open. Although I did not find this very consistent.
Does any of you know, what these tags stand for?
D=
GD=
us=
If you do, do you also know what they would change from 0 to 1 or 2 ?
Thank u!
mime-version
SCL refers to Spam Confidence Level (SCL)
SCL ratings of 2, 3, 4, 7, and 8 are not used by the service. An SCL
rating of 5 or 6 is considered suspected spam, which is less certain
to be spam than an SCL rating of 9, which is considered certain spam.
Different actions for spam and high confidence spam can be configured
via your content filter policies in the Exchange admin center. For
more information, see Configure Content Filter Policies. You can also
set the SCL rating for messages that match specific conditions by
using Transport rules, as described in Create a transport rule to
identify mail as spam or not spam by setting the spam confidence level
(SCL).
http://technet.microsoft.com/en-us/library/dn798345(v=exchg.150).aspx
http://technet.microsoft.com/en-us/library/jj200686(v=exchg.150).aspx
GD.. after an exhaustive search.. I believe that it refers to the GD2, the graphics library associated with PHP but I have no references to substantiate that. Or, I found an obscure mention of a term referred to as 'guarantee delivery' but it was a very subtle mention and I could not found any further reference to it. However, it does make sense possibly..
Email senders that have given Hotmail a reach around and gotten special treatment as being flagged 'not spam no matter what.. guarantee delivery'. Seems reasonable..
The only reference I could find in any RFCs for GD was RFC6035 Which defines GD as GapDuration
"GD" EQUAL (1*7DIGIT) ;0-3,600,000 -- milliseconds
BLD=0 BD=0 GLD=2.0 GD=500 GMIN=16
GDuration gd = (GDuration) table.get(context);
http://media.proquest.com/media/pq/classic/doc/1397920421/fmt/ai/rep/NPDF?_s=PRwJ1WMP1Dw7%2F7ZDFW5hXUNyfPo%3D
but I don't really think they're related.
I doubt that it means Generate-Delivery-Report as Manuel guessed. Reason being, Generate-Delivery-Report is already a defined header field so it's not necessary to create another field.
The X- headers are defined originally in RFC822 Section 4
By the time I gave up on my search of GD I quit.
Whatever they mean, I'm betting that whatever they may have meant has been forgotten so long ago that probably no body at Hotmail even knows what it means anymore and even more unlikely that there will be any actionable methods for you to change whatever they are. I'm kinda still in shock that I couldn't find ANYTHING.. I'm usually really awesome at researching and digging up obscure data. ;p
Further Resources:
RFC2076 : Common Internet Message Headers
RFC2045 : Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
RFC4021 : Registration of Mail and MIME Header Fields
BCP178
The following parameters from "X-Message-Delivery" header field:
V=1.1;us=0;l=0;a=0;D=2;GD=2;SCL=4
do not affect email deliverability to INBOX, they are not neither causing this and do not indicate it at all.
After fighting with Microsoft online/delisting support the message with the same scores now go to INBOX instead of JUNK
There is unclear "something else" from Microsoft what decides whether to put it in Junk or Inbox.
This is what I wanted to mention above
We'd like to double-check our http headers for security before we send them out. Obviously we can't allow '\r' or '\n' to appear, as that would allow content injection.
I see just two options here:
Truncate the value at the newline character.
Strip the invalid character from the header value.
Also, from reading RFC2616, it seems that only ascii-printable characters are valid for http header values Should also I follow the same policy for the other 154 possible invalid bytes?
Or, is there any authoritative prior art on this subject?
This attack is called "header splitting" or "response splitting".
That OWASP link points out that removing CRLF is not sufficient. \n can be just as dangerous.
To mount a successful exploit, the application must allow input that contains CR (carriage return, also given by 0x0D or \r) and LF (line feed, also given by 0x0A or \n)characters into the header.
(I do not know why OWASP (and other pages) list \n as a vulnerability or whether that only applies to query fragments pre-decode.)
Serving a 500 on any attempt to set a header that contains a character not allowed by the spec in a header key or value is perfectly reasonable, and will allow you to identify offensive requests in your logs. Failing fast when you know your filters are failing is a fine policy.
If the language you're working in allows it, you could wrap your HTTP response object in one that raises an exception when a bad header is seen, or you could change the response object to enter an invalid state, set the response code to 500, and close the response body stream.
EDIT:
Should I strip non-ASCII inputs?
I prefer to do that kind of normalization in the layer that receives trusted input unless, as in the case of entity-escaping to convert plain-text to HTML escaping, there is a clear type conversion. If it's a type conversion, I do it when the output type is required, but if it is not a type-conversion, I do it as early as possible so that all consumers of data of that type see a consistent value. I find this approach makes debugging and documentation easier since layers below input handling never have to worry about unnormalized inputs.
When implementing the HTTP response wrapper, I would make it fail on all non-ascii characters (including non-ASCII newlines like U+85, U+2028, U+2029) and then make sure my application tests include a test for each third-party URL input to makes sure that any Location headers are properly %-encoded before the Location reaches setHeader, and similarly for other inputs that might reach the request headers.
If your cookies include things like a user-id or email address, I would make sure the dummy accounts for tests include a dummy account with a user-id or email address containing a non-ASCII letter.
The simple removal of new lines \n will prevent HTTP Response Splitting. Even though a CRLF is used as a delimiter in the RFC, the new line alone is recognized by all browsers.
You still have to worry about user content within a set-cookie or content-type. Attributes within these elements are delimited using a ;, it maybe possible for an attacker to change the content type to UTF-7 and bypass your XSS protection for IE users (and only IE users). It may also be possible for an attacker to create a new cookie, which introduces the possibility of Session Fixation.
Non-ASCII characters are allowed in header fields, although the spec doesn't really clearly say what they mean; so it's up to sender and recipient to agree on their semantics.
What made you think otherwise?