How to print an empty date_time_string in BaseHTTPRequestHandler? - python-3.x

I have built a tiny httpd using BaseHTTPRequestHandler in Python3. At the moment, it logs every HTTP GET request in systemd journal like this:
Oct 18 23:41:51 ubuntu httpd.py[19414]: 192.168.0.17 - - [18/Oct/2018 23:41:51] "GET / HTTP/1.1" 200 -
I would like to avoid printing the timestamp printed by my httpd. Handler for HTTP GET requests is following:
def do_GET(self):
self.send_response(200)
self.send_header('Content-type','text/html')
self.date_time_string()
self.end_headers()
Is it possible to disable printing the timestamp with date_time_string()? I tried with self.date_time_string(None), self.date_time_string(timestamp=None), but this didn't change anything.

To achieve desired log format, you need to write your own implementation of log_message method. You can go ahead and redefine log_error and log_request if you want to.

Related

Bottle: HEAD always falls back to GET, thus functions always executed twice?

I'm using Bottle to implement a web interface for a simple database system. As documented, Bottle handles HTTP HEAD requests by falling back to the corresponding GET route and cutting off the response body. However, in my experience, it means that the function attached to the GET route is executed both times in response to a GET request. This can be problematic if that function performs an operation that side-effects, such as a database operation.
Is there a way to prevent this double execution from happening? Or should I define a fake HEAD route for every GET route?
Update: It sounds like Bottle is working as designed (calling the function only once per request). Your browser is the apparent source of the HEAD requests.
On HEAD requests, Bottle calls the method once, not twice. Can you demonstrate some code that shows the behaviour you're describing? When I run the following code, I see the "Called" line only once:
from bottle import Bottle, request
app = Bottle()
#app.get("/")
def home():
print(f"Called: {request.method}")
return "Some text\n"
app.run()
Output:
$ curl --head http://127.0.0.1:8080/
Called: HEAD
HTTP/1.0 200 OK
127.0.0.1 - - [13/Jan/2021 08:28:02] "HEAD / HTTP/1.1" 200 0
Date: Wed, 13 Jan 2021 13:28:02 GMT
Server: WSGIServer/0.2 CPython/3.8.6
Content-Length: 10
Content-Type: text/html; charset=UTF-8

Privnote sockets server did not understand request

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("privnote.com", 80))
#s = ssl.wrap_socket(s, keyfile=None, certfile=None, server_side=False, cert_reqs=ssl.CERT_NONE, ssl_version=ssl.PROTOCOL_SSLv23)
def claim_note(note_url):
s.sendall(b'DELETE /'+note_url.encode()+b'HTTP/1.1\r\nX-Requested-With: XMLHttpRequest\r\nHost: privnote.com\r\n')
print(s.recv(4096))
This is my code, now let me first start by saying that I have tried so many different things apart from this. I’ve tried https port and http port, 443, 80. I’ve commented and uncommented the statement that wraps the socket with ssl. All with the same outcome. Either the api returning absolutely nothing or the api telling me the request couldn’t be understood by the server. I was looking at a GitHub repo and only one header was used and it was because it was for an Ajax call which was x-requested-with. I tried adding user agent content type and now I’m just using host and x requested with. It’s a DELETE request and the url is the first 8 chars after the link. I’ve also tried adding \r\n\r\n at the end and even tried content-length. I don’t know what else to do. I want to know why the server is saying that.
There are multiple problems with your code. If you actually print out the request you are trying to sent it will look like this:
b'DELETE /node_urlHTTP/1.1\r\nX-Requested-With: XMLHttpRequest\r\nHost: privnote.com\r\n'
There are two problems with this line: a missing space between /node_url and HTTP/1.1 and a missing final \r\n als end-of-header marker at the end. Once these are fixed you get a successful response - a 302 redirect to the HTTPS version:
b'HTTP/1.1 302 Found\r\nDate:...\r\nLocation: https://privnote.com/node_url ...
When repeating the request with HTTPS and a valid node_url (with an invalid node_url you get an error that DELETE is not an allowed method):
s.connect(("privnote.com", 443))
s = ssl.wrap_socket(s)
...
b'HTTP/1.1 200 OK\r\n ...

Python3 requests.get is too slow

EDIT Adding info:
requests version: 2.21.0
Server info: a Windows python implementation which includes 10 instances of threading.Thread, each creating HTTPServer with a handler based on BaseHTTPRequestHandler. My do_GET looks like this:
def do_GET(self):
rc = 'some response'
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(rc.encode('utf-8'))
I'm getting a strange behaviour.
Using the curl command line, the GET command is finished quickly:
curl "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1"
However, using requests.get() of python takes too much time. I was isolated it up to
python -c "import requests; requests.get('http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=1')"
I scanned through many other questions here and have tried many things, without success.
Here are some of my findings:
If I'm adding timeout=0.2, the call is ending quickly without any error.
However, adding timeout=5 or timeout=(5,5)` doesn't make it take longer. It always seem to be waiting a full one second before returning with results.
Working with a session wrapper, and cancelling keep-alive, didn't improve. I mean for this:
with requests.Session() as session:
session.headers.update({'Connection': 'close'})
url = "http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=%d&tmr=0" % i
session.get(url, timeout=2)
Enabling full debug, I'm getting the following output:
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:3020
send: b'GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:29 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=1&tmr=0 HTTP/1.1" 200 None
url=http://localhost:3020/pbio/powermtr?cmd=read-power-density
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
send: b'GET /pbio/powermtr?cmd=read-power-density HTTP/1.1\r\nHost: localhost:3020\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.0 200 OK\r\n'
header: Server: BaseHTTP/0.6 Python/3.7.2
header: Date: Wed, 01 May 2019 15:28:30 GMT
header: Content-type: text/html
header: Access-Control-Allow-Origin: *
DEBUG:urllib3.connectionpool:http://localhost:3020 "GET /pbio/powermtr?cmd=read-power-density HTTP/1.1" 200 None
6.710,i=4
url=http://localhost:3020/pbio/button2?cmd=uz-crosslink-leds&g1=0&g2=0&g3=0&g4=4&tmr=0
DEBUG:urllib3.connectionpool:Resetting dropped connection: localhost
slight pause here
...
From the docs:
timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds). If no timeout is specified explicitly, requests do not time out.
It took me 3 years to find an answer.
I still do not understand why, but at least I can suggest a working solution.
According to these docs, the timeout can be specified as a tuple, like this:
(timeout for connection, timeout for interval without data)
Although I do not understand why requests is waiting for [timeout] before issuing the connection, I can tell it to wait very little for the connection, and specify another timeout for the data.
So what I'm doing now, is giving a timeout of let's say (0.01, 4). Now the connection is immediate, and if the data has a deadtime of 4 seconds, it will generate a timeout exception.
Some interesting reading can be found here.
Hoping this info will help others!

Truncating logging of Post Request in RobotFramework

I am using the Requests library of robot framework to upload files to a server. The file RequestsKeywords.py has a line
logger.info('Post Request using : alias=%s, uri=%s, data=%s, headers=%s, files=%s, allow_redirects=%s '
% (alias, uri, dataStr, headers, files, redir))
This prints out the whole contents of my upload file inside the request in my log file. Now i could get rid of this log by changing the log level however, my goal is to be able to see the log but just truncate it to 80 characters, so I am not browsing through lines of hex values. Any idea how this could be done?
A solution would be to create a wrapper method, that'll temporary disable the logging, and enable it back once completed.
The flow is - get an instance of the RequestsLibrary, call RF's Set Log Level with argument "ERROR" (so at least an error gets through, if needed), call the original keyword, set the log level back to what it was, and return the result.
Here's how it looks like in python:
from robot.libraries.BuiltIn import BuiltIn
def post_request_no_log(*args, **kwargs):
req_lib = BuiltIn().get_library_instance('RequestsLibrary')
current_level = BuiltIn().set_log_level('ERROR')
try:
result = req_lib.post_request(*args, **kwargs)
except Exception as ex:
raise ex
finally:
BuiltIn().set_log_level(current_level)
return result
And the same, in robotframework syntax:
Post Request With No Logging
[Documentation] Runs RequestsLibrary's Post Request, with its logging surpressed
[Arguments] #{args} &{kwargs}
${current level}= Set Log Level ERROR
${result}= Post Request #{args} &{kwargs}
[Return] ${result}
[Teardown] Set Log Level ${current level}
The python's version is bound to be milliseconds faster - no need to parse & match the text in the RF syntax, which on large usage may add up.
Perhaps not the answer you're looking for, but after having looked at the source of the RequestsLibrary I think this is indeed undesirable and should be corrected. It makes sense to have the file contents when running in a debug or trace setting, but not during regular operation.
As I consider this a bug, I'd recommend registering an issue with the GitHub project page or correcting it yourself and providing a pull request. In my opinion the code should be refactored to send the file name under the info setting and the file contents under the trace/debug setting:
logger.info('Post Request using : alias=%s, uri=%s, data=%s, headers=%s, allow_redirects=%s' % ...
logger.trace('Post Request files : files=%s' % ...
In the mean time you have two options. As you correctly said, temporarily reduce the log level settings in Robot Code. If you can't change the script, then using a Robot Framework Listener can help with that. Granted, it would be more work then making the change in the ReqestsLibrary yourself.
An temporary alternative could be to use the RequestLibrary Post, which is deprecated but still present.
If you look at the method in RequestKeywords library, its only calling self. _body_request() at the end. What we ended up doing is writing another keyword that was identical to the original except the part where it called logger.info(). We modified it to log files=%.80s which truncated the file to 80 chars.
def post_request_truncated_logs(
self,
alias,
uri,
data=None,
params=None,
headers=None,
files=None,
allow_redirects=None,
timeout=None):
session = self._cache.switch(alias)
if not files:
data = self._format_data_according_to_header(session, data, headers)
redir = True if allow_redirects is None else allow_redirects
response = self._body_request(
"post",
session,
uri,
data,
params,
files,
headers,
redir,
timeout)
dataStr = self._format_data_to_log_string_according_to_header(data, headers)
logger.info('Post Request using : alias=%s, uri=%s, data=%s, headers=%s, files=%.80s, allow_redirects=%s '
% (alias, uri, dataStr, headers, files, redir))

How to display the age of an nginx cached file in headers

I've set up a caching server for a site through nginx 1.6.3 on CentOS 7, and it's configured to add http headers to served files to show if said files came from the caching server (HIT, MISS, or BYPASS) like so:
add_header X-Cached $upstream_cache_status;
However, i'd like to see if there's a way to add a header to display the age of the cached file, as my solution has proxy_cache_valid 200 60m; set, and i'd like to check that it's respecting that setting.
So what i'm looking for would be something like:
add_header Cache-Age $upstream_cache_age;
I'm unable to find anything of the sort though, can you help?
Thanks
The nginx documentation is quite exhaustive — there's no variable with the direct relative age of the cached file.
The best way would be to use the $upstream_http_ variable class to get the absolute age of the resource by picking up its Date header through $upsteam_http_date.
add_header X-Cache-Date $upstream_http_date;
For the semantic meaning of the Date header field in HTTP/1.1, refer to rfc7231#section-7.1.1.2, which describes it as the time of the HTTP response generation, so, basically, this should accomplish exactly what you want (especially if the backend runs with the same timecounter).
I spent some time attempting to solve this with the Nginx Perl module, which does not seem to have access to $upstream_http_NAME headers that would allow you to successfully calculate the current time from a timestamp header that your proxied application created during render time.
Alternatively, you could use a different caching layer architecture like Varnish Cache, which does indeed provide the Age HTTP response header:
http://book.varnish-software.com/3.0/HTTP.html#age
I made a solution that works for this, with the Lua module, in this question: Nginx: Add “Age” header, with Lua. Is this a good solution?
I'm going to post here the code, for any suggestion it would be better to discuss it in the other link, where I explain it in more detail.
map $upstream_http_Date $mapdate {
default $upstream_http_Date;
'' 'Sat, 21 Dec 2019 00:00:00 GMT';
}
Inside location:
header_filter_by_lua_block {
ngx.header["Age"] = ngx.time() - ngx.parse_http_time(ngx.var.mapdate);
}

Resources