urllib.request.urlretrieve with proxy?

urllib.request.urlretrieve with proxy? - python-3.x

somehow I can't download files trough a proxyserver, and I don't know what i have done wrong. I just get a timeout. Any advice?
import urllib.request
urllib.request.ProxyHandler({"http" : "myproxy:123"})
urllib.request.urlretrieve("http://myfile", "file.file")

You need to use your proxy-object, not just instanciate it (you created an object, but didn't assign it to a variable and therefore can't use it). Try using this pattern:
#create the object, assign it to a variable
proxy = urllib.request.ProxyHandler({'http': '127.0.0.1'})
# construct a new opener using your proxy settings
opener = urllib.request.build_opener(proxy)
# install the openen on the module-level
urllib.request.install_opener(opener)
# make a request
urllib.request.urlretrieve('http://www.google.com')
Or, if you do not need to rely on the std-lib, use requests (this code is from the official documentation):
import requests
proxies = {"http": "http://10.10.1.10:3128",
"https": "http://10.10.1.10:1080"}
requests.get("http://example.org", proxies=proxies)

urllib reads proxy settings from the system environment.
According to the code snippet in the urllib\request.py, just set http_proxy and https_proxy to the environment variable.
In the meantime, it is also documented here: https://www.cmi.ac.in/~madhavan/courses/prog2-2015/docs/python-3.4.2-docs-html/howto/urllib2.html#proxies
# Proxy handling
def getproxies_environment():
"""Return a dictionary of scheme -> proxy server URL mappings.
Scan the environment for variables named <scheme>_proxy;
this seems to be the standard convention. If you need a
different way, you can pass a proxies dictionary to the
[Fancy]URLopener constructor.
"""
proxies = {}
# in order to prefer lowercase variables, process environment in
# two passes: first matches any, second pass matches lowercase only
for name, value in os.environ.items():
name = name.lower()
if value and name[-6:] == '_proxy':
proxies[name[:-6]] = value
# CVE-2016-1000110 - If we are running as CGI script, forget HTTP_PROXY
# (non-all-lowercase) as it may be set from the web server by a "Proxy:"
# header from the client
# If "proxy" is lowercase, it will still be used thanks to the next block
if 'REQUEST_METHOD' in os.environ:
proxies.pop('http', None)
for name, value in os.environ.items():
if name[-6:] == '_proxy':
name = name.lower()
if value:
proxies[name[:-6]] = value
else:
proxies.pop(name[:-6], None)
return proxies

If you have to use a SOCKS5 proxy, here's the solution:
import socks
import socket
import urllib.request
proxy_ip = "127.0.0.1"
proxy_port = 1080
socks.set_default_proxy(socks.PROXY_TYPE_SOCKS5, proxy_ip, proxy_port)
socket.socket = socks.socksocket
url = 'https://example.com/foo/bar.jpg'
urllib.request.urlretrieve(url, 'bar.png')
More Info:
This works very well, but if we want to use ProxyHandler, for some reason it errors for SOCKS proxies, even though it should support it.
proxy = urllib.request.ProxyHandler({'socks': 'socks://127.0.0.1:1080'})
opener = urllib.request.build_opener(proxy)
urllib.request.install_opener(opener)
urllib.request.urlretrieve(url, 'bar.png')
class urllib.request.ProxyHandler(proxies=None)
Cause requests to go
through a proxy. If proxies is given, it must be a dictionary mapping
protocol names to URLs of proxies. The default is to read the list of
proxies from the environment variables _proxy. If no proxy
environment variables are set, then in a Windows environment proxy
settings are obtained from the registry’s Internet Settings section,
and in a macOS environment proxy information is retrieved from the
System Configuration Framework.
When a SOCKS5 proxy is globally set on my Windows OS, I get this:
>>> urllib.request.getproxies()
{'socks': 'socks://127.0.0.1:1080'}
But it still fails.

Related

Pytest Environment Global Varibale

I am attempting to pass in a config environment variable in the pytest command line. I already use one for Browser type e.g. --browser chrome.
What I would like is to pass in --env and then either Staging or Live to run my tests against.
This will determine which URL I use for my selenium tests so I would need a method for BaseURL and if it's Staging then use Staging URL and if it's Live then the Live URL. I've added in the config side in conftest
def pytest_addoption(parser):
parser.addoption("--browser")
parser.addoption("--env", help="staging or live")
#pytest.fixture(scope="session")
def env(request):
return request.config.getoption("--env")
The first part I am unsure about is how I handle the BaseURL side method.Currently, this gets set as one URL in the WebdriverFactory Class which grabs the driver and URL. I need a way of storing the env config and checking which one is set before pass that back.
The second part is that I want to have tests that load the relevant base URL and then log in using either the staging or Live user credentials
so :
def test_validAdminLogin(self):
self.lp.clearUserAndPasswordFields()
self.lp.login("**AdminUser**")
So this test will pass the roletype (adminUser) to the page object which would in turn check whether or not the Env variable that has been passed in is either set to Live or Staging and then grab the relevant user details to enter on the page here:
def login(self, username="", password=""):
self.enter_username(username)
self.enter_password(password)
self.click_login_button()
I just need a pointer in how to map all this out really

For baseurl try adding:
before pytest_addoption base_url='' global variable
Then after pytest_adoption add:
def pytest_configure(config):
global base_url
env = config.getoption('--env')
base_url = base_url.format(env)
Then you create a fixture and pass it to your tests eg:
#pytest.fixture():
def fixture_base_url():
global base_url
return base_url
In your test:
def test_function(fixture_base_url):
browser.open(fixture_base_url)
For the second part - you want to create an open_browser fixture. The fixture would open the browser and yield the browser to the test function.
eg:
def open_browser(fixture_base_url):
# code to open browser
yield browser
browser.close() #after all functions finish using this fixture, control goes past `yield` and browser will be closed.
updated test function:
def test_function(open_browser):
#code to navigate the browser

How to get client host name in bottle python?

from bottle import Bottle, request
import socket
app = Bottle()
my_ip = socket.gethostbyname(socket.gethostname())
#app.route('/hello')
def hello():
client_ip = request.environ.get('REMOTE_ADDR')
print("Client IP is ", client_ip)
#Code to read client hostname or machine name
return ['Your IP is: {}\n'.format(client_ip)]
app.run(host=my_ip, port=8080)
I am using the bottle(WSGI micro web-framework for Python). I know how to get IP of the client who requested the service. But I also want to know the hostname(machine name) of the client as well.
I tried reading hostname with windows commands like nbtstat & ping but they are not 100% reliable. Are there any other options available?

A couple of things:
You're probably better off not doing this at all. Instead, consider logging all the IP addresses to a file (better yet, do nothing and simply use an existing access_log) and then resolving them, offline, in batch.
If you do insist on resolving the IP addresses inline, you don't need to call out to Windows commands in order to do it. It'll be a lot faster, simper, and more robust to resolve the addresses in-process. I've included some sample code for you below.
Finally, I'd like to address your comment:
I tried... but they are not 100% reliable
That is a problem with your expectation, not with your DNS resolver. Reverse DNS lookups will inherently yield far less than 100% matches.
Here's the sample code to do the reverse lookup in Python. Good luck!
from socket import gethostbyaddr, herror
def resolve_address(addr):
'''
Resolve the ip address string ``addr`` and return its DNS name. If no name
is found, return None.
Raises any exceptions thrown by gethostbyaddr, except for NOTFOUND (which
is handled by returning None).
NOTE: Illustrative purposes only, not for production use.
'''
try:
record = gethostbyaddr(addr)
except herror as exc:
if exc.errno == 1:
print(f'no name found for address "{addr}"')
return None
else:
print(f'an error occurred while resolving {addr}: {exc}')
raise
print(f'the name of "{addr}" is: {record[0]}')
return record[0]
assert resolve_address('151.236.216.85') == 'li560-85.members.linode.com'
assert resolve_address('10.1.1.1') is None

how to pull url and pass variable to python 3's http.server

My google-fu is failing me, perhaps because it is late.
Given the code below, I need to take everything after the domain.tld/ and parse it. It will then run against a database (in the example I'm using a dict, but in reality I'll be making a call to mysql, will that change it?)
Then, it will serve up a redirect header with that url.
from http.server import BaseHTTPRequestHandler, HTTPServer
class HTTPServer_RequestHandler(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(301)
self.send_header('Location', url)
self.end_headers()
return
array["stuff"] = "http://google.com"
array["thing"] = "http://yahoo.com"
url = array[page]
server_address = ('0.0.0.0', 80)
httpd = HTTPServer(server_address, HTTPServer_RequestHandler)
httpd.serve_forever()

It may be a bit late, but according to the documentation, self.path should return the url after the location of your server

Requests: Check that the proxy has been used to make the HTTP Request

I've been scratching my head the whole day yesterday about this and to my surprise, can't seem to find an easy way to check this.
I am using Python's Requests library to pass my proxy such as:
def make_request(url):
with requests.Session() as s:
s.mount("http://", HTTPAdapter(max_retries=3))
s.mount("https://", HTTPAdapter(max_retries=3))
page = None
d.rotate(-1) #d contains a dict of my proxies. this allows to rotate through the proxies everytime make_request is called.
s.proxies = d[0]
page = s.get(url, timeout=3)
print('proxy used: ' + str(d[0]))
return page.content
Problem is, I can't seem to make the request fail when the proxy is not expected to work. It seems there is always a fallback on my internet ip if the proxy is not working.
For example: I tried passing a random proxy ip like 101.101.101.101:8800 or removing the ip authentication that is needed on my proxies, the request is still passed, even though it should'nt.
I thought adding the timeout parameters when passing the request would do the trick, but obviously it didn't.
So
Why does this happen?
How can I check from which ip a request is being made?

From what I have seen so far, you should use the form
s.get(url, proxies = d)
This should use the proxies in the dict d to make a connection.
This form allowed me to check with working proxies and non-working proxies the status_code
print(s.status_code)
I will update once I find out whether it just circulates over the proxies in the dict to match a working one, or one is able to actually select which one to be used.
[UPDATE]
Tried to work around the dict in proxies, to use different proxy if I wanted to. However, proxies must be a dict to work. So I used a dict in the form of:
d = {"https" : 'https://' + str(proxy_ips[n].strip('\n'))}
This seems to work and allow me to use an ip I want to. Although it seems quite dull, I hope someone might come and help!
The proxies used can be seen through:
requests.utils.getproxies()
or
requests.utils.get_environ_proxies(url)
I hope that helps, obviously quite old question, but still!

Disable fingerprint in Twisted Web

I did a nmap to my server and i watch the fingerprint, how can i disable it?
443/tcp open ssl/http TwistedWeb httpd 9.0.0

The "fingerprint" is how server identifies itself at the start of http session. Thus we should look at what implements Web server in twisted and where does it keep its identification.
Now if we look at http://twistedmatrix.com/trac/browser/tags/releases/twisted-12.2.0/twisted/web/server.py line 498 states
version = "TwistedWeb/%s" % copyright.version
This variable then gets handled by Request.process() method
class Request(pb.Copyable, http.Request, components.Componentized):
....
def process(self):
"Process a request."
# get site from channel
self.site = self.channel.site
# set various default headers
self.setHeader('server', version)
self.setHeader('date', http.datetimeToString())
# Resource Identification
self.prepath = []
self.postpath = map(unquote, string.split(self.path[1:], '/'))
try:
resrc = self.site.getResourceFor(self)
self.render(resrc)
except:
self.processingFailed(failure.Failure())
So you could easily subclass Request and overwrite process method, to do what you like.
Or you in theory could do something like this in your application code:
from twisted.web import server
server.version = "COBOL HTTPD SERVICE"
overriding version value in the imported resource.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

urllib.request.urlretrieve with proxy? - python-3.x

somehow I can't download files trough a proxyserver, and I don't know what i have done wrong. I just get a timeout. Any advice? import urllib.request urllib.request.ProxyHandler({"http" : "myproxy:123"}) urllib.request.urlretrieve("http://myfile", "file.file")

Related

Pytest Environment Global Varibale

How to get client host name in bottle python?

how to pull url and pass variable to python 3's http.server

Requests: Check that the proxy has been used to make the HTTP Request

Disable fingerprint in Twisted Web

Categories

Resources