Explicitly specifying the boundary in MIME multipart messages? - python-3.x

I know how to use python3's email.mime.multipart.MIMEMultipart class to create various kinds of multipart email messages.
I have a need to do "email surgery" on certain existing multipart messages by taking the body (which contains the various MIME parts) and to remove and add some new parts without changing the rest of the email's headers. In the headers, there is the following sample header:
Content-Type: multipart/mixed;
boundary="ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"
... where the ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ boundary is used to separate the existing message parts.
I want to use the email.mime.multipart.MIMEMultipart tools to create a new body with modified parts, and I want to use the same ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ boundary between these parts, so that I don't have to go back and change the Content-Type header that already exists among the message's original headers.
It would be ideal if I could specify the exact text of the boundary that I want to use when creating this multipart body, but in python3, it seems like the email.mime.multipart.MIMEMultipart package always creates its own, randomly generated boundary string.
In python3, is there any way to tell the email.mime.multipart.MIMEMultipart software to use a boundary string that I supply?
If not, I can work around this by taking the newly generated message body, extracting its new boundary string (which I'll call "newboundary", and then replacing it with the original boundary (which I'll call "originalboundary"), as follows:
msgbodytext = msgbodytext.replace(newboundary, originalboundary)
But I prefer not to do this, if possible.
Any ideas about how to tell email.mime.multipart.MIMEMultipart to use a boundary string that I specify? Thank you.

I originally misread the docs, and now that I re-read them, I see that this is trivially easy.
To set my own boundary to the value of the originalboundary variable, I just have to do this:
msg = email.mime.multipart.MIMEMultipart(boundary=originalboundary)

Related

What should I use as an endpoint to serve downloadable documents using REST API?

Right now I have an endpoint that servers a file to the user (json, csv, excel or pdf).
My question here is, which type of route should I use to serve it, path variables or query parameters (considering best practices and for developer comprehension):
baseURl/api/v1/resource/xlsx
or
baseURl/api/v1/resource?format=xlsx
Thank you in advance.
So long as you are consistent with the production rules of the http URI scheme, any spelling conventions you choose are fine.
Choosing spellings that match the capabilities of URI templates will make it easier to construct/deconstruct resource identifiers in a "common URI space", which is often convenient both for clients and servers.
Using path segments vs query is purely trade offs. Using application/x-www-form-urlencoded key value pairs in the query part mean that you can implement your URI template as an HTML form. Using path segments means that you can use dot segments to describe other identifiers in the common URI space.
If you don't care about either of those, it just comes down to which spellings you like best in an access log, or in your documents, or in a browser history, or when you paste them into an email message, or ....
It is best practice to use headers for HTTP to show what format the client can understand. You should use a get route and include the Accept header for the format.
Header key: Accept
Header Value: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept
HTML Input="file" Accept Attribute File Type (CSV)

How do you extract data associated with an image that is an attachment to a Notes RichTextItem?

I've seen things that skirt around this question, but nothing that answers it directly.
I have a RichTextItem in a document that contains an attached image. If I look at the document properties for the field, it says:
Data Type: MIME Part
Data Length: 7615 bytes
...
"Content-Transfer-Encoding: binary
Content-Type: image/jpeg
then a bit of binary data. How can I extract that data in server-side javascript so that I can use it in the value of an image control? In other words, I want the data corresponding to that image to appear in the following so that it renders in the web browser:
<xp:image><xp:this.value><![CDATA[#{javascript:"data:image/jpeg;base64,<DATA HERE>
Can this be done? I've tried all sorts of things but to no avail.
Thanks,
Reid
There a several approaches you can play with.
The "cheat" way: use ....nsf/0/unid/RTItemName?OpenField in a dojo panel as its source. (see here. It would open the whole RichText.
Eventually you need OpenElement instead - can directly address an attachment
Last not least, since your field isn't actually RichText, but MIME, you can use the Notes MIME classes to get to the content and render it base64. The mime classes allow to get the data as stream as well as provide methods to then encode it, so you don't need an extra encoder class
Hope that helps

Pure Node.js file upload (multipart POST) without using a framework

The third-party libraries "node-formidable" and "express" come with the ability to handle multipart POST requests (e.g. with a file upload form), but I don't want to use any third-party code. How do I make the file upload process in pure JavaScript on Node.js?
There are very few resources in this regard. How can this be done? Thank you, love is.
Just to clarify because it seems some people are angry that the other answer didn't help much: There is no simple way of doing this without relying on a library doing it for you.
First, here's an answer to another question trying to clarify what happens on a POST file upload: https://stackoverflow.com/a/8660740/2071242
To summarize, to parse such an upload, you'll first need to check for a Content-Type header containing "multipart/form-data" and, if one exists, read the boundary attribute within the header.
After this, the content comes in multiple parts, each starting with the boundary string, including some additional headers and then the data itself after a blank line. The browser can select the boundary string pretty freely as long as such byte sequence doesn't exist in the uploaded data (see the spec at https://www.rfc-editor.org/rfc/rfc1867 for details). You can read in the data by registering a callback function for the request object's data event: request.on('data', callback);
For example, with boundary "QweRTy", an upload might look something like this:
POST /upload HTTP/1.1
(some standard HTTP headers)
Content-Type: multipart/form-data; boundary=QweRTy
--QweRTy
Content-Disposition: form-data; name="upload"; filename="my_file.txt"
Content-Type: text/plain
(The contents of the file)
--QweRTy--
Note how after the initial headers two dashes are added to the beginning of each boundary string and two dashes are added to the end of the last one.
Now, what makes this challenging is that you might need to read the incoming data (within the callback function mentioned above) in several chunks, and there are no guarantees that the boundary will be contained within one chunk. So you'll either need to buffer all the data (not necessarily a good idea) or implement a state machine parser that goes through the data byte by byte. This is actually exactly what the formidable library is doing.
So after having similar considerations, what I personally decided to do is to use the library. Re-implementing such a parser is pretty error-prone and in my opinion not worth the effort. But if you really want to avoid any libraries, checking the code of formidable might be a good start.
This is a bit old question, but still quite relevant.
I have been looking for a similar solution and no luck. So decided to do my own which might come handy to some other users.
GIST: https://gist.github.com/patrikbego/6b80c6cfaf4f4e6c119560e919409bb2
Nodejs itself recommends (as seen here) formidable, but I think that such a basic functionality should be provided by Nodejs out of the box.
I think you need to parse form by yourself if you don't want to use any modules very much. When uploading a file, the form will be in multipart/form-data format, which means your request content will be divided by a string that is generated randomly by your browser. You need to read this string at the beginning of the form, try to load data and find this string, then parse them one by one.
For more information about multipart/form-data you can refer http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.2
I think the best solution is to use formidable. It handles vary scenarios and works prefect I think.

Does Gmail set an invalid Content-ID header for inline attachments?

Short Version
A Content-ID header on an attachment must be of the form local-part "#" domain. Gmail's Content-IDs do not have an # in them. Is this a real bug, or am I misreading the spec?
Long Version
I noticed this problem when I tried to resend an email sent from Gmail that has an inline image attached. My mailer (SwiftMailer) claimed that the Content-ID was invalid.
Here's the email I'm working with. I created this by inserting an image inline in Gmail and emailing it to myself.
Here are the relevant parts of the spec (as far as I can tell):
RFC 2045
Content-ID Header Field
In constructing a high-level user agent, it may be desirable to allow
one body to make reference to another. Accordingly, bodies may be
labelled using the "Content-ID" header field, which is syntactically
identical to the "Message-ID" header field:
id := "Content-ID" ":" msg-id
RFC 822 here and here
msg-id = "<" addr-spec ">" ; Unique message id
addr-spec = local-part "#" domain ; global address
What am I missing here? Is Gmail not following the spec, or is it ok not to have an # in the Content-ID?
Seeing as no one's posted a better answer…
My interpretation of the RFCs fits with yours. I'd say Gmail's doing the wrong thing here, by the book. However, what Gmail does is de facto valid by definition. Gmail is too popular for other software not to accept however it does things, which opens the door for more software to violate the spec in the same way until it's standard practice.
Unfortunately that means there's currently no exact spec that matches reality. Luckily this question comes up in Google results now.
The original email in the question is gone, so here's another example. This is just the encoded image portion of the multipart message. Note the Content-ID header.
--089e0153807e5a346d04f1ae7c38
Content-Type: image/gif; name="blank.gif"
Content-Transfer-Encoding: base64
Content-ID: <ii_14403b4fa16783bf>
X-Attachment-Id: ii_14403b4fa16783bf
R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==
--089e0153807e5a346d04f1ae7c38--

security for http headers

We'd like to double-check our http headers for security before we send them out. Obviously we can't allow '\r' or '\n' to appear, as that would allow content injection.
I see just two options here:
Truncate the value at the newline character.
Strip the invalid character from the header value.
Also, from reading RFC2616, it seems that only ascii-printable characters are valid for http header values Should also I follow the same policy for the other 154 possible invalid bytes?
Or, is there any authoritative prior art on this subject?
This attack is called "header splitting" or "response splitting".
That OWASP link points out that removing CRLF is not sufficient. \n can be just as dangerous.
To mount a successful exploit, the application must allow input that contains CR (carriage return, also given by 0x0D or \r) and LF (line feed, also given by 0x0A or \n)characters into the header.
(I do not know why OWASP (and other pages) list \n as a vulnerability or whether that only applies to query fragments pre-decode.)
Serving a 500 on any attempt to set a header that contains a character not allowed by the spec in a header key or value is perfectly reasonable, and will allow you to identify offensive requests in your logs. Failing fast when you know your filters are failing is a fine policy.
If the language you're working in allows it, you could wrap your HTTP response object in one that raises an exception when a bad header is seen, or you could change the response object to enter an invalid state, set the response code to 500, and close the response body stream.
EDIT:
Should I strip non-ASCII inputs?
I prefer to do that kind of normalization in the layer that receives trusted input unless, as in the case of entity-escaping to convert plain-text to HTML escaping, there is a clear type conversion. If it's a type conversion, I do it when the output type is required, but if it is not a type-conversion, I do it as early as possible so that all consumers of data of that type see a consistent value. I find this approach makes debugging and documentation easier since layers below input handling never have to worry about unnormalized inputs.
When implementing the HTTP response wrapper, I would make it fail on all non-ascii characters (including non-ASCII newlines like U+85, U+2028, U+2029) and then make sure my application tests include a test for each third-party URL input to makes sure that any Location headers are properly %-encoded before the Location reaches setHeader, and similarly for other inputs that might reach the request headers.
If your cookies include things like a user-id or email address, I would make sure the dummy accounts for tests include a dummy account with a user-id or email address containing a non-ASCII letter.
The simple removal of new lines \n will prevent HTTP Response Splitting. Even though a CRLF is used as a delimiter in the RFC, the new line alone is recognized by all browsers.
You still have to worry about user content within a set-cookie or content-type. Attributes within these elements are delimited using a ;, it maybe possible for an attacker to change the content type to UTF-7 and bypass your XSS protection for IE users (and only IE users). It may also be possible for an attacker to create a new cookie, which introduces the possibility of Session Fixation.
Non-ASCII characters are allowed in header fields, although the spec doesn't really clearly say what they mean; so it's up to sender and recipient to agree on their semantics.
What made you think otherwise?

Resources