Python requests: chunked post request - python-3.x

I am trying to send a post request through the request module with headers["Transfer-encoding"] = "chunked", but I am getting back:
<BODY><h2>Bad Request - Invalid Content Length</h2><hr><p>HTTP Error 400. There is an invalid content length or chunk length in the request.</p>
I am sending a json string. headers["Content-Type"] = "application/json" is also given.
Does anybody know if I am missing some setting? Maybe I should set the chunk-size somewhere?
Analysing the headers of the request attached to the response I actually get a content-length header different from zero.
I also tried to create a custom generator from the json string, and pass it to the post method as data=, but it it seems to simply hang there (also above the given timeout=).

Your error says you didn't create the request properly (it's 4xx error, not 5xx which would indicate server issue).
Transfer-Encoding: chunked serves for sending data in chunks. When the body of your message consists of unspecified number of chunks and you send them in lets say - stream. I would suggest reading this.
Each chunk should have it's size in front of the data. For instance:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
9\r\n
Some data\r\n
6\r\n
Python\r\n
If you want to send chunked requests with python requests module. You probably need a generator method for that. Please see this. With such few information I can't help you more.

Related

Python Client Rest API Invocation - Invalid character found in method name [{}POST]. HTTP method names must be tokens

Client
Python Version - 3.9,
Python Requests module version - 2.25
Server
Java 13,
Tomcat 9.
I have a Tomcat+Java based server exposing REST APIs. I am writing a client in python to consume those APIs. Everything is fine until I send empty body in POST request. It is a valid use case for us. If I send empty body I get 400 bad request error - Invalid character found in method name [{}POST]. HTTP method names must be tokens. If I send empty request from POSTMAN or Java or CURL it works fine, problem is only when I used python as a client.
Following is python snippet -
json_object={}
header = {'alias': 'A', 'Content-Type' : 'application/json', 'Content-Length' : '0'}
resp = requests.post(url, auth=(username, password), headers=header, json=json_object)
I tried using data as well instead of json param to send payload with not much of success.
I captured the wireshark dumps to undertand it further and found that, the request tomcat received is not as per RFC2616 (https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html). Especially the part -
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Because I could see in from wireshark dumps it looked like - {}POST MY-APP-URI HTTP/1.1
As we can see the empty body is getting prefixed with http-method, hence tomcat reports that as an error.
I then looked at python http library code -client.py. Following are relevant details -
File - client.py
Method - _send_output (starting at line # 1001) - It first sends the header at line #1010 and then the body somewhere down in the code. I thought(I could be wrong here) perhaps in this case header is way longer 310 bytes than body 2 bytes, so by the time complete header is sent on wire body is pushed and hence TCP frames are order in such a way that body appears first. To corroborate this I added a delay of 1 second just after sending header line#1011 and bingo, the error disappeared and it started working fine. Not sure if this is completely correct analysis, but can someone in the know can confirm or let me know how to fix this.

Adding message to gmail error "Payload parts count different from expected"

I am adding a message to a gmail folder using this (example) URL:
https://www.googleapis.com/gmail/v1/users/user#domain.com/messages/import?uploadType=multipart
The body of the request looks like this:
--test_abc123
Content-Type: application/json; charset=UTF-8
{
"labelIds": [ "Label_525" ],
"raw": "RnJvbTogIlNlY3RpZ28gQ2VydGlmaWNh..."
}
--test_abc123--
The raw data is a base64 encoded standard MIME message that looks normal to me. The result of this POST is http error 400 with the error response "Payload parts count different from expected 2. Request payload parts count: 1".
I can supply the original MIME text if that is helpful, but let me emphasize that I have been running this code for several years without problem. I've tried different messages to test this out, but it appears that Google has changed something to break my software.
Is Google objecting to my raw data, or something about the MIME encoding? Any ideas what the problem could be?
---- Addendum ----
I have gotten a few messages to work, they seem to all have image or data attachments. However I really don't see any problem with the messages that are failing - I can import them into Office 365 or Thunderbird or anything else and they render just fine. As a test, I tried importing the message below, which was taken from the MIME RFC. It fails with the same error. I think that Google has changed something to make their MIME parser very fussy, but I don't see how I can fix my input data.
From: Nathaniel Borenstein <nsb#bellcore.com>
To: Ned Freed <ned#innosoft.com>
Subject: Sample message
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="simple boundary"
This is the preamble. It is to be ignored, though it
is a handy place for mail composers to include an
explanatory note to non-MIME compliant readers.
--simple boundary
This is implicitly typed plain ASCII text.
It does NOT end with a linebreak.
--simple boundary
Content-type: text/plain; charset=us-ascii
This is explicitly typed plain ASCII text.
It DOES end with a linebreak.
--simple boundary--
This is the epilogue. It is also to be ignored.
Addendum 2: I tried a simple upload (using content-type header message/rfc822) and it worked, except the message was unlabeled. How
would I specify what label I want applied to a message? I was originally trying to follow the documentation here
link
which tells me to create the json body that I gave above. This allows me to specify the label. But I cannot seem to use
this body in a simple upload. The content type is either invalid, or what Gmail imports is just literally the json body,
it does not parse out the raw data. If you could point me to a specific example showing the URI, message body, http headers
(not java code) that would be very useful to me.
OK never mind, I got it working by adding an empty message/rfc822 part to the body of the multipart upload. That satisfies Google, and the empty part is ignored in favor of the raw data.
You are doing a multipart upload,see here:
The body of the request is formatted as a multipart/related content
type [RFC2387] and contains exactly two parts. The parts are
identified by a boundary string, and the final boundary string is
followed by two hyphens.
This is why it works only for your messages with images or attachments, since your message
--test_abc123
is only one part.
In the past there was no check if this condition is fulfilled, so you might have gotten away with using multipart for 1-part-messages.
But now it's not possible anymore, so if have a single-part message, you should use Simple upload.
If you do not know in advance how many parts your message has, you can always try the multipart first, implementing a try...catch statement, and implement a simple upload request within catch in case of failure.

What does "content-type" mean in headers of python requests library and if the value is text/html;charset=UTF-8?

I want to do some operations with response from python requests library. After I use below function;
response = requests.get(f'{AUTHORIZE_URL}?client_id={CLIENT_ID}&response_type=code&state={STATE}&redirect_uri={REDIRECT_URI}')
I need to get an URL something like this in return;
http://127.0.0.1:8000/products/auth/?state=2b33fdd45jbevd6nam&code=MGY1MTMyNWY0YjQ0MzEwNmMxMjY2ZjcwMWE2MWY5ZDE5MzJlMjA1YjdkNWExNGRhYjIzOGI5NzQ5OWZkNTA5NA
While doing it, it will be easier to use JSON in order to get state and code values from URL but I cannot use it because I think the content type does not allow this.
See this for Content-Type explanation: Content-Type
In short the "content-type" in the headers of response got by using requests.get tells you what kind of the content server did send, in your case you'we got a response in the form of the HTML (like .html document) and you can read that response with response.text, if the "content-type" is "application/json" then you can read it as JSON like this response.json().
I see that you use some local server, your local server should send in headers "Content-Type": "application/json" and then you should be able to read JSON from response like this (you need to send JSON not hmtl or text from server):
targetURL = 'http://127.0.0.1:8000/products/auth/?state=2b33fdd45jbevd6nam&code=MGY1MTMyNWY0YjQ0MzEwNmMxMjY2ZjcwMWE2MWY5ZDE5MzJlMjA1YjdkNWExNGRhYjIzOGI5NzQ5OWZkNTA5NA'
response.get(targetURL).json()

What makes conditional GETs "conditional" if the resource is obtained in the initial request?

Breaking down what makes a conditional GET:
In RFC 2616 it states that the GET method change to a "conditional GET" if the request message includes an If-* (If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range) header field.
It then states:
A conditional GET method requests that the entity be transferred ONLY under the circumstances described by the conditional header field(s).
From my understanding this is saying it will only return the data being requested if the condition is met with the "If-*" in any new subsequent requests. For example, if a GET request returns a response with a Etag header then the next request must include the If-None-Match with the ETag value to transfer the client back the requested resource.
However, If a client has to send an initial request before getting the returned "ETag" header (to return with If-None-Match) then they already have the requested resource. Thus, any future requests that return the If-None-Match header with the ETag value only dictate the return of the requested value, returning 200 OK (if the client does not return the If-None-Matchand ETag value from initial request) or 304 Not Modified (if they do), where this helps the client and server by caching the resource.
My Question:
Why does it state the entity (the resource from a request) will "be transferred ONLY" if the If-* condition is met (like in my example where the client returns the ETag value with anIf-None-Match in order to cache the requested resource) if the resource or "entity" is being returned with or without a "If-*" being returned? It doesn't return a resource "only under the circumstances described by the conditional header" because it returns the resource despiteless returning 200 OK or 304 Not Modified depending on if a "If-*" header is returned. What am I misunderstanding about this?
Full conditional GET reference from RFC 2616:
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
First of all, please note that RFC 2616 is obsolete, and you should refer instead to RFC 7232.
It's hard to see what exactly is confusing you. So let me just illustrate with examples instead.
Scenario 1
Client A: I need http://example.com/foo/bar.
GET /foo/bar HTTP/1.1
Host: example.com
Server: Here you go.
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 12
ETag: "2ac07d4"
Hello world!
(some time passes)
Client A: I need http://example.com/foo/bar again. But I already have the "2ac07d4" version in my cache. Maybe that will do?
GET /foo/bar HTTP/1.1
Host: example.com
If-None-Match: "2ac07d4"
Server: Yeah, "2ac07d4" is fine. Just take it from your cache, I'm not sending it to you.
HTTP/1.1 304 Not Modified
Scenario 2
Client A: I need http://example.com/foo/bar.
GET /foo/bar HTTP/1.1
Host: example.com
Server: Here you go.
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 12
ETag: "2ac07d4"
Hello world!
(some time passes)
Client B: I want to upload a new version of http://example.com/foo/bar.
PUT /foo/bar HTTP/1.1
Content-Type: text/plain
Content-Length: 17
Hello dear world!
Server: This looks good, I'm saving it. I will call this version "f6049b9".
HTTP/1.1 204 No Content
ETag: "f6049b9"
(more time passes)
Client A: I need http://example.com/foo/bar again. But I already have the "2ac07d4" version in my cache. Maybe that will do?
GET /foo/bar HTTP/1.1
Host: example.com
If-None-Match: "2ac07d4"
Server: I'm sorry, but "2ac07d4" is out of date. We have a new version now, it's called "f6049b9". Here, let me send it to you.
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 17
ETag: "f6049b9"
Hello dear world!
Analysis
A conditional GET method requests that the entity be transferred ONLY under the circumstances described by the conditional header field(s).
Consider Client A's second request (in both scenarios).
The conditional header field is: If-None-Match: "2ac07d4".
The circumstances described by it are: "a selected representation of the resource does not match entity-tag "2ac07d4"".
Scenario 1: the circumstances do not hold, because the selected representation of the resource (the one containing Hello world!) does indeed match entity-tag "2ac07d4". Therefore, in accordance with the protocol, the server does not transfer the entity in its response.
Scenario 2: the circumstances do hold: the selected representation of the resource (the one containing Hello dear world!) doesn't match entity-tag "2ac07d4" (it matches "f6049b9" instead). Therefore, in accordance with the protocol, the server does transfer the entity in its response.
How does the server come up with these "2ac07d4" and "f6049b9", anyway? Of course, this depends on the application, but one straightforward way to do it is to compute a hash (such as SHA-1) of the entity body--a value that changes dramatically when even small changes are introduced.

How to POST a multipart/form-data using Power Query's Web.Contents

In Power Query, I can download data from Web using the Web.Contents function, but there's an api that required the request to contains multipart/form data in the following format
"__rdxml"=<*Some data*>
So how do you do this using the Web.Contents function?
I tried, doing
...
PostContent = "__rdxml=<*Some data*>",
Source Web.Contents(url,Content=Text.ToBinary(PostContent))
...
But server response with 400 Bad Request.
I checked the raw request with Fiddler, it seem like the request is not sending with content-type=multipart/form-data header.
I tried manually adding the content-type header with content-type=multipart/form-data, but that doesn't work either. Same 400 Bad Request in the response.
Any idea?
multipart/form-data is a fairly complicated encoding, requiring a bunch of MIME-specific headers. I would first try to see if you can use application/x-www-form-urlencoded instead:
let
actualUrl = "http://some.url",
record = [__rdxml="some data"],
body = Text.ToBinary(Uri.BuildQueryString(record)),
options = [Headers =[#"Content-type"="application/x-www-form-urlencoded"], Content=body],
result = Web.Contents(actualUrl, options)
in
result
EDIT: I've come up with an example of using multipart/form-data with Power Query. It's at https://gist.github.com/CurtHagenlocher/b21ce9cddf54e3807317

Resources