How to get client host name in bottle python? - python-3.x

from bottle import Bottle, request
import socket
app = Bottle()
my_ip = socket.gethostbyname(socket.gethostname())
#app.route('/hello')
def hello():
client_ip = request.environ.get('REMOTE_ADDR')
print("Client IP is ", client_ip)
#Code to read client hostname or machine name
return ['Your IP is: {}\n'.format(client_ip)]
app.run(host=my_ip, port=8080)
I am using the bottle(WSGI micro web-framework for Python). I know how to get IP of the client who requested the service. But I also want to know the hostname(machine name) of the client as well.
I tried reading hostname with windows commands like nbtstat & ping but they are not 100% reliable. Are there any other options available?

A couple of things:
You're probably better off not doing this at all. Instead, consider logging all the IP addresses to a file (better yet, do nothing and simply use an existing access_log) and then resolving them, offline, in batch.
If you do insist on resolving the IP addresses inline, you don't need to call out to Windows commands in order to do it. It'll be a lot faster, simper, and more robust to resolve the addresses in-process. I've included some sample code for you below.
Finally, I'd like to address your comment:
I tried... but they are not 100% reliable
That is a problem with your expectation, not with your DNS resolver. Reverse DNS lookups will inherently yield far less than 100% matches.
Here's the sample code to do the reverse lookup in Python. Good luck!
from socket import gethostbyaddr, herror
def resolve_address(addr):
'''
Resolve the ip address string ``addr`` and return its DNS name. If no name
is found, return None.
Raises any exceptions thrown by gethostbyaddr, except for NOTFOUND (which
is handled by returning None).
NOTE: Illustrative purposes only, not for production use.
'''
try:
record = gethostbyaddr(addr)
except herror as exc:
if exc.errno == 1:
print(f'no name found for address "{addr}"')
return None
else:
print(f'an error occurred while resolving {addr}: {exc}')
raise
print(f'the name of "{addr}" is: {record[0]}')
return record[0]
assert resolve_address('151.236.216.85') == 'li560-85.members.linode.com'
assert resolve_address('10.1.1.1') is None

Related

Boto3 client in multiprocessing pool fails with "botocore.exceptions.NoCredentialsError: Unable to locate credentials"

I'm using boto3 to connect to s3, download objects and do some processing. I'm using a multiprocessing pool to do the above.
Following is a synopsis of the code I'm using:
session = None
def set_global_session():
global session
if not session:
session = boto3.Session(region_name='us-east-1')
def function_to_be_sent_to_mp_pool():
s3 = session.client('s3', region_name='us-east-1')
list_of_b_n_o = list_of_buckets_and_objects
for bucket, object in list_of_b_n_o:
content = s3.get_object(Bucket=bucket, Key=key)
data = json.loads(content['Body'].read().decode('utf-8'))
write_processed_data_to_a_location()
def main():
pool = mp.Pool(initializer=set_global_session, processes=40)
pool.starmap(function_to_be_sent_to_mp_pool, list_of_b_n_o_i)
Now, when processes=40, everything works good. When processes = 64, still good.
However, when I increases to processes=128, I get the following error:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
Our machine has the required IAM roles for accessing S3. Moreover, the weird thing that happens is that for some processes, it works fine, whereas for some others, it throws the credentials error. Why is this happening, and how to resolve this?
Another weird thing that happens is that I'm able to trigger two jobs in 2 separate terminal tabs (each of which has a separate ssh login shell to the machine). Each job spawns 64 processes, and that works fine as well, which means there are 128 processes running simultaneously. But 80 processes in one login shell fails.
Follow up:
I tried creating separate sessions for separate processes in one approach. In the other, I directly created s3-client using boto3.client. However, both of them throw the same error with 80 processes.
I also created separate clients with the following extra config:
Config(retries=dict(max_attempts=40), max_pool_connections=800)
This allowed me to use 80 processes at once, but anything > 80 fails with the same error.
Post follow up:
Can someone confirm if they've been able to use boto3 in multiprocessing with 128 processes?
This is actually a race condition on fetching the credentials. I'm not sure how fetching credentials under the hood works, but the I saw this question in Stack Overflow and this ticket in github.
I was able to resolve this by keeping a random wait time for each of the processes. The following is the updated code which works for me:
client_config = Config(retries=dict(max_attempts=400), max_pool_connections=800)
time.sleep(random.randint(0, num_processes*10)/1000) # random sleep time in milliseconds
s3 = boto3.client('s3', region_name='us-east-1', config=client_config)
I tried keeping the range for sleep time lesser than num_processes*10, but that failed again with the same issue.
#DenisDmitriev, since you are getting the credentials and storing them explicitly, I think that resolves the race condition and hence the issue is resolved.
PS: values for max_attempts and max_pool_connections don't have a logic. I was plugging several values until the race condition was figured out.
I suspect that AWS recently reduced throttling limits for metadata requests because I suddenly started running into the same issue. The solution that appears to work is to query credentials once before creating the pool and have the processes in the pool use them explicitly instead of making them query credentials again.
I am using fsspec with s3fs, and here's what my code for this looks like:
def get_aws_credentials():
'''
Retrieve current AWS credentials.
'''
import asyncio, s3fs
fs = s3fs.S3FileSystem()
# Try getting credentials
num_attempts = 5
for attempt in range(num_attempts):
credentials = asyncio.run(fs.session.get_credentials())
if credentials is not None:
if attempt > 0:
log.info('received credentials on attempt %s', 1 + attempt)
return asyncio.run(credentials.get_frozen_credentials())
time.sleep(15 * (random.random() + 0.5))
raise RuntimeError('failed to request AWS credentials '
'after %d attempts' % num_attempts)
def process_parallel(fn_d, max_processes):
# [...]
c = get_aws_credentials()
# Cache credentials
import fsspec.config
prev_s3_cfg = fsspec.config.conf.get('s3', {})
try:
fsspec.config.conf['s3'] = dict(prev_s3_cfg,
key=c.access_key,
secret=c.secret_key)
num_processes = min(len(fn_d), max_processes)
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=num_processes) as pool:
for data in pool.map(process_file, fn_d, chunksize=10):
yield data
finally:
fsspec.config.conf['s3'] = prev_s3_cfg
Raw boto3 code will look essentially the same, except instead of the whole fs.session and asyncio.run() song and dance, you'll work with boto3.Session itself and call its get_credentials() and get_frozen_credentials() methods directly.
I get the same problem with multi process situation. I guess there is a client init problem when you use multi process. So I suggest that you can use get function to get s3 client. It works for me.
g_s3_cli = None
def get_s3_client(refresh=False):
global g_s3_cli
if not g_s3_cli or refresh:
g_s3_cli = boto3.client('s3')
return g_s3_cli

Python HTTPServer [WinError 10053] When there are too many requests?

Running a super simple webserver that, for the most part, just needs to strip out the query info from requests from a 3rd party script and return some files:
def run(server_class=HTTPServer, handler_class=MyHandler, port=xxx):
server_address = ('', port)
httpd = server_class(server_address, handler_class)
print ('Starting httpd...')
httpd.serve_forever()
One object that the server must return is an HTML page that then loads some javascript, which then loads three.js, which then proceeds to load a ton of objects from this same server.
Which works, but after reloading 1-5 times, it would usually result in WinError[10053] and the server locking up! After that, connections would be rejected or timeout. Not sure if this is due to too many requests, or something to do with Three.js's load function's connections.
This took hours and I couldn't find a specific solution, so I'll post an answer down below. Feel free to chime in other answers
By multi-threading the server/handler, I've been able to handle at least one user. It may still be a throughput problem, but this has been enough for now:
class ThreadingHTTPServer(socketserver.ThreadingMixIn, HTTPServer):
pass
def run(server_class=ThreadingHTTPServer, handler_class=MyHandler, port=xxx):
server_address = ('', port)
httpd = server_class(server_address, handler_class)
print ('Starting httpd...')
httpd.serve_forever()
Some other things I tried/failed:
Editing self.path and calling SimpleHTTPRequestHandler.do_GET(self) //Couldn't get path changing to affect the simple handler
allow_reuse_address = True/False // No effect
manually setting close_connection = True //No effect
some header play //No effect
Hope this saves someone!

How to get resource path in flask-RESTPlus?

I am fairly new at working with flask and flask-RESTPlus. I have the following and it is not clear how can I determine which path was used in the get request?
ns = api.namespace('sample', description='get stuff')
#ns.route(
'/resource-settings/<string:address>',
'/unit-settings/<string:address>',
'/resource-proposals/<string:address>',
'/unit-proposals/<string:address>')
#ns.param('address', 'The address to decode')
class Decode(Resource):
#ns.doc(id='Get the decoded result of a block address')
def get(self, address):
# How do I know what get path was called?
pass
A better solution would be to use the request context. To get the full path, you can do:
from flask import request
def get(self, address):
# How do I know what get path was called?
print(request.full_path)
Through lot's of digging I found that url_for in flask import.
Still feels a bit wonky but I can create a fully qualified link with:
result = api.base_url + url_for('resource-settings', address=id)
So this works and I get the desired results.

Requests: Check that the proxy has been used to make the HTTP Request

I've been scratching my head the whole day yesterday about this and to my surprise, can't seem to find an easy way to check this.
I am using Python's Requests library to pass my proxy such as:
def make_request(url):
with requests.Session() as s:
s.mount("http://", HTTPAdapter(max_retries=3))
s.mount("https://", HTTPAdapter(max_retries=3))
page = None
d.rotate(-1) #d contains a dict of my proxies. this allows to rotate through the proxies everytime make_request is called.
s.proxies = d[0]
page = s.get(url, timeout=3)
print('proxy used: ' + str(d[0]))
return page.content
Problem is, I can't seem to make the request fail when the proxy is not expected to work. It seems there is always a fallback on my internet ip if the proxy is not working.
For example: I tried passing a random proxy ip like 101.101.101.101:8800 or removing the ip authentication that is needed on my proxies, the request is still passed, even though it should'nt.
I thought adding the timeout parameters when passing the request would do the trick, but obviously it didn't.
So
Why does this happen?
How can I check from which ip a request is being made?
From what I have seen so far, you should use the form
s.get(url, proxies = d)
This should use the proxies in the dict d to make a connection.
This form allowed me to check with working proxies and non-working proxies the status_code
print(s.status_code)
I will update once I find out whether it just circulates over the proxies in the dict to match a working one, or one is able to actually select which one to be used.
[UPDATE]
Tried to work around the dict in proxies, to use different proxy if I wanted to. However, proxies must be a dict to work. So I used a dict in the form of:
d = {"https" : 'https://' + str(proxy_ips[n].strip('\n'))}
This seems to work and allow me to use an ip I want to. Although it seems quite dull, I hope someone might come and help!
The proxies used can be seen through:
requests.utils.getproxies()
or
requests.utils.get_environ_proxies(url)
I hope that helps, obviously quite old question, but still!

Disable fingerprint in Twisted Web

I did a nmap to my server and i watch the fingerprint, how can i disable it?
443/tcp open ssl/http TwistedWeb httpd 9.0.0
The "fingerprint" is how server identifies itself at the start of http session. Thus we should look at what implements Web server in twisted and where does it keep its identification.
Now if we look at http://twistedmatrix.com/trac/browser/tags/releases/twisted-12.2.0/twisted/web/server.py line 498 states
version = "TwistedWeb/%s" % copyright.version
This variable then gets handled by Request.process() method
class Request(pb.Copyable, http.Request, components.Componentized):
....
def process(self):
"Process a request."
# get site from channel
self.site = self.channel.site
# set various default headers
self.setHeader('server', version)
self.setHeader('date', http.datetimeToString())
# Resource Identification
self.prepath = []
self.postpath = map(unquote, string.split(self.path[1:], '/'))
try:
resrc = self.site.getResourceFor(self)
self.render(resrc)
except:
self.processingFailed(failure.Failure())
So you could easily subclass Request and overwrite process method, to do what you like.
Or you in theory could do something like this in your application code:
from twisted.web import server
server.version = "COBOL HTTPD SERVICE"
overriding version value in the imported resource.

Resources