Bottle: HEAD always falls back to GET, thus functions always executed twice? - python-3.x

I'm using Bottle to implement a web interface for a simple database system. As documented, Bottle handles HTTP HEAD requests by falling back to the corresponding GET route and cutting off the response body. However, in my experience, it means that the function attached to the GET route is executed both times in response to a GET request. This can be problematic if that function performs an operation that side-effects, such as a database operation.
Is there a way to prevent this double execution from happening? Or should I define a fake HEAD route for every GET route?

Update: It sounds like Bottle is working as designed (calling the function only once per request). Your browser is the apparent source of the HEAD requests.
On HEAD requests, Bottle calls the method once, not twice. Can you demonstrate some code that shows the behaviour you're describing? When I run the following code, I see the "Called" line only once:
from bottle import Bottle, request
app = Bottle()
#app.get("/")
def home():
print(f"Called: {request.method}")
return "Some text\n"
app.run()
Output:
$ curl --head http://127.0.0.1:8080/
Called: HEAD
HTTP/1.0 200 OK
127.0.0.1 - - [13/Jan/2021 08:28:02] "HEAD / HTTP/1.1" 200 0
Date: Wed, 13 Jan 2021 13:28:02 GMT
Server: WSGIServer/0.2 CPython/3.8.6
Content-Length: 10
Content-Type: text/html; charset=UTF-8

Related

Python requests: chunked post request

I am trying to send a post request through the request module with headers["Transfer-encoding"] = "chunked", but I am getting back:
<BODY><h2>Bad Request - Invalid Content Length</h2><hr><p>HTTP Error 400. There is an invalid content length or chunk length in the request.</p>
I am sending a json string. headers["Content-Type"] = "application/json" is also given.
Does anybody know if I am missing some setting? Maybe I should set the chunk-size somewhere?
Analysing the headers of the request attached to the response I actually get a content-length header different from zero.
I also tried to create a custom generator from the json string, and pass it to the post method as data=, but it it seems to simply hang there (also above the given timeout=).
Your error says you didn't create the request properly (it's 4xx error, not 5xx which would indicate server issue).
Transfer-Encoding: chunked serves for sending data in chunks. When the body of your message consists of unspecified number of chunks and you send them in lets say - stream. I would suggest reading this.
Each chunk should have it's size in front of the data. For instance:
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked
9\r\n
Some data\r\n
6\r\n
Python\r\n
If you want to send chunked requests with python requests module. You probably need a generator method for that. Please see this. With such few information I can't help you more.

Python3 requests.get is too slow

EDIT Adding info:
requests version: 2.21.0
Server info: a Windows python implementation which includes 10 instances of threading.Thread, each creating HTTPServer with a handler based on BaseHTTPRequestHandler. My do_GET looks like this:
def do_GET(self):
rc = 'some response'
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(rc.encode('utf-8'))
I'm getting a strange behaviour.
Using the curl command line, the GET command is finished quickly:
curl "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1"
However, using requests.get() of python takes too much time. I was isolated it up to
python -c "import requests; requests.get('http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1')"
I scanned through many other questions here and have tried many things, without success.
Here are some of my findings:
If I'm adding timeout=0.2, the call is ending quickly without any error.
However, adding timeout=5 or timeout=(5,5)` doesn't make it take longer. It always seem to be waiting a full one second before returning with results.
Working with a session wrapper, and cancelling keep-alive, didn't improve. I mean for this:
with requests.Session() as session:
session.headers.update({'Connection': 'close'})
url = "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=%d&tmr=0" % i
session.get(url, timeout=2)
Enabling full debug, I'm getting the following output:
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:3020
send: b'GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:29 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1" 200 None
url=http://localhost:3020/pbio/powermtr?cmd=read-power-density
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
send: b'GET /pbio/powermtr?cmd=read-power-density HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:30 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/powermtr?cmd=read-power-density HTTP/1.1" 200 None
6.710,i=4
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=4&tmr=0
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
...
From the docs:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.
It took me 3 years to find an answer.
I still do not understand why, but at least I can suggest a working solution.
According to these docs, the timeout can be specified as a tuple, like this:
(timeout for connection, timeout for interval without data)
Although I do not understand why requests is waiting for [timeout] before issuing the connection, I can tell it to wait very little for the connection, and specify another timeout for the data.
So what I'm doing now, is giving a timeout of let's say (0.01, 4). Now the connection is immediate, and if the data has a deadtime of 4 seconds, it will generate a timeout exception.
Some interesting reading can be found here.
Hoping this info will help others!

How to print an empty date_time_string in BaseHTTPRequestHandler?

I have built a tiny httpd using BaseHTTPRequestHandler in Python3. At the moment, it logs every HTTP GET request in systemd journal like this:
Oct 18 23:41:51 ubuntu httpd.py[19414]: 192.168.0.17 - - [18/Oct/2018 23:41:51] "GET / HTTP/1.1" 200 -
I would like to avoid printing the timestamp printed by my httpd. Handler for HTTP GET requests is following:
def do_GET(self):
self.send_response(200)
self.send_header('Content-type','text/html')
self.date_time_string()
self.end_headers()
Is it possible to disable printing the timestamp with date_time_string()? I tried with self.date_time_string(None), self.date_time_string(timestamp=None), but this didn't change anything.
To achieve desired log format, you need to write your own implementation of log_message method. You can go ahead and redefine log_error and log_request if you want to.

Check if a large file exists without downloading it

Not sure if this is possible, but I would like to check the status code of an HTTP request to a large file without downloading it; I just want to check if it's present on the server.
Is it possible to do this with Python's requests? I already know how to check the status code but I can only do that after the file has been downloaded.
I guess what I'm asking is can you issue a GET request and stop it as soon as you've receive the response headers?
Use requests.head(). This only returns the header of requests, not all content — in other words, it will not return the body of a message, but you can get all the information from the header.
The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation contained
in the HTTP headers in response to a HEAD request SHOULD be identical
to the information sent in response to a GET request. This method can
be used for obtaining metainformation about the entity implied by the
request without transferring the entity-body itself. This method is
often used for testing hypertext links for validity, accessibility,
and recent modification.
For example:
import requests
url = 'http://lmsotfy.com/so.png'
r = requests.head(url)
r.headers
Output:
{'Content-Type': 'image/png', 'Content-Length': '6347', 'ETag': '"18cb-4f7c2f94011da"', 'Accept-Ranges': 'bytes', 'Date': 'Mon, 09 Jan 2017 11:23:53 GMT', 'Last-Modified': 'Thu, 24 Apr 2014 05:18:04 GMT', 'Server': 'Apache', 'Keep-Alive': 'timeout=2, max=100', 'Connection': 'Keep-Alive'}
This code does not download the picture, but returns the header of the picture message, which contains the size, type and date. If the picture does not exist, there will be no such information.
Use HEAD method.
For example urllib
import urllib.request
response = urllib.request.urlopen(url)
if response.getcode() == 200:
print(response.headers['content-length'])
In your case with requests
import requests
response = requests.head(url)
if response.status_code == 200:
print(response.headers['content-length'])
Normally, you use HEAD method instead of GET for such sort of things. If you query some random server on the web, then be prepared that it may be configured to return inconsistent results (this is typical for servers requiring registration). In such cases you may want to use GET request with Range header to download only small number of bytes.

What makes conditional GETs "conditional" if the resource is obtained in the initial request?

Breaking down what makes a conditional GET:
In RFC 2616 it states that the GET method change to a "conditional GET" if the request message includes an If-* (If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range) header field.
It then states:
A conditional GET method requests that the entity be transferred ONLY under the circumstances described by the conditional header field(s).
From my understanding this is saying it will only return the data being requested if the condition is met with the "If-*" in any new subsequent requests. For example, if a GET request returns a response with a Etag header then the next request must include the If-None-Match with the ETag value to transfer the client back the requested resource.
However, If a client has to send an initial request before getting the returned "ETag" header (to return with If-None-Match) then they already have the requested resource. Thus, any future requests that return the If-None-Match header with the ETag value only dictate the return of the requested value, returning 200 OK (if the client does not return the If-None-Matchand ETag value from initial request) or 304 Not Modified (if they do), where this helps the client and server by caching the resource.
My Question:
Why does it state the entity (the resource from a request) will "be transferred ONLY" if the If-* condition is met (like in my example where the client returns the ETag value with anIf-None-Match in order to cache the requested resource) if the resource or "entity" is being returned with or without a "If-*" being returned? It doesn't return a resource "only under the circumstances described by the conditional header" because it returns the resource despiteless returning 200 OK or 304 Not Modified depending on if a "If-*" header is returned. What am I misunderstanding about this?
Full conditional GET reference from RFC 2616:
The semantics of the GET method change to a "conditional GET" if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
First of all, please note that RFC 2616 is obsolete, and you should refer instead to RFC 7232.
It's hard to see what exactly is confusing you. So let me just illustrate with examples instead.
Scenario 1
Client A: I need http://example.com/foo/bar.
GET /foo/bar HTTP/1.1
Host: example.com
Server: Here you go.
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 12
ETag: "2ac07d4"
Hello world!
(some time passes)
Client A: I need http://example.com/foo/bar again. But I already have the "2ac07d4" version in my cache. Maybe that will do?
GET /foo/bar HTTP/1.1
Host: example.com
If-None-Match: "2ac07d4"
Server: Yeah, "2ac07d4" is fine. Just take it from your cache, I'm not sending it to you.
HTTP/1.1 304 Not Modified
Scenario 2
Client A: I need http://example.com/foo/bar.
GET /foo/bar HTTP/1.1
Host: example.com
Server: Here you go.
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 12
ETag: "2ac07d4"
Hello world!
(some time passes)
Client B: I want to upload a new version of http://example.com/foo/bar.
PUT /foo/bar HTTP/1.1
Content-Type: text/plain
Content-Length: 17
Hello dear world!
Server: This looks good, I'm saving it. I will call this version "f6049b9".
HTTP/1.1 204 No Content
ETag: "f6049b9"
(more time passes)
Client A: I need http://example.com/foo/bar again. But I already have the "2ac07d4" version in my cache. Maybe that will do?
GET /foo/bar HTTP/1.1
Host: example.com
If-None-Match: "2ac07d4"
Server: I'm sorry, but "2ac07d4" is out of date. We have a new version now, it's called "f6049b9". Here, let me send it to you.
HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 17
ETag: "f6049b9"
Hello dear world!
Analysis
A conditional GET method requests that the entity be transferred ONLY under the circumstances described by the conditional header field(s).
Consider Client A's second request (in both scenarios).
The conditional header field is: If-None-Match: "2ac07d4".
The circumstances described by it are: "a selected representation of the resource does not match entity-tag "2ac07d4"".
Scenario 1: the circumstances do not hold, because the selected representation of the resource (the one containing Hello world!) does indeed match entity-tag "2ac07d4". Therefore, in accordance with the protocol, the server does not transfer the entity in its response.
Scenario 2: the circumstances do hold: the selected representation of the resource (the one containing Hello dear world!) doesn't match entity-tag "2ac07d4" (it matches "f6049b9" instead). Therefore, in accordance with the protocol, the server does transfer the entity in its response.
How does the server come up with these "2ac07d4" and "f6049b9", anyway? Of course, this depends on the application, but one straightforward way to do it is to compute a hash (such as SHA-1) of the entity body--a value that changes dramatically when even small changes are introduced.

Resources