I am trying to use stem to have a small script run through Tor. I can't seem to get stem to work. Here is my code:
import urllib.request
import re
from stem.connection import connect_port
from stem import Signal
from stem.control import Controller
controller = connect_port(port=9151)
def change():
controller.authenticate()
controller.signal(Signal.NEWNYM)
def getIp():
print (urllib.request.urlopen("http://my-ip.heroku.com").read(30).decode('utf-8'))
def connectTor():
controller = connect_port(port=9151)
controller.connect()
getIp()
if not controller:
sys.exit(1)
print("nope")
def disconnect():
controller.close()
if __name__ == '__main__':
connectTor()
getIP()
change()
getIp()
disconnect()
Basically, all of the IPs that display are the same, when in theory, they should all be different. What can I do to make this code work?
To use Tor you need to direct traffic through its SocksPort (Tor acts as a local socks proxy). In your code above you don't have anything attempting to make urllib go through Tor.
For examples see Stem's client usage tutorials. I'm not sure offhand if SocksiPy or PycURL have Python 3.x counterparts. If not then you'll need to find an alternative.
Related
I'm trying to run a Selenium test in Python using Device Farm desktop browser session, but with the lack of resources (official or not), and my lack of knowledge, I can't figure it out.
I used these documentations:
https://docs.aws.amazon.com/devicefarm/latest/testgrid/getting-started-migration.html
https://selenium-python.readthedocs.io/getting-started.html#simple-usage
I installed the GeckoDriver, and ran the following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
driver.close()
I saw a web browser appear for about a second.
I then decided to use Device Farm. I setup my AWS env vars, tested the connectivity, and ran the following code:
import boto3
import pytest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
class test_url:
def setup_method(self, method):
devicefarm_client = boto3.client("devicefarm", region_name="eu-west-1")
testgrid_url_response = devicefarm_client.create_test_grid_url(
projectArn="arn:aws:devicefarm:us-west-2:1234567890:testgrid-project:some-id-string",
expiresInSeconds=300)
self.driver = webdriver.Remote(
"http://www.python.org", webdriver.DesiredCapabilities.FIREFOX)
# later, make sure to end your WebDriver session:
def teardown_method(self, method):
self.driver.quit()
Here's the result:
$ pytest -s
====================================================================================== test session starts =======================================================================================
platform linux -- Python 3.8.2, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/eric/nuage/devicefarm-poc
collected 0 items
===================================================================================== no tests ran in 0.07s ======================================================================================
I saw nothing happen in the AWS Management Console.
Why did no test run? Shouldn't this code perform an URL test? Shouldn't something happen in the AWS Management Console when I run this?
There appears to be a few issues with your code.
According to the pytest documentaion it seems like you need to put your tests into a file starting with the name test and to put your tests in methods starting with the word test as well. This is why none of your code is executing.
The line driver = webdriver.Firefox() tries to create a local firefox driver. What you want is a remote driver using the URL that AWS Device Farm provides (which you do at the line self.driver = webdriver.Remote("http://www.python.org", webdriver.DesiredCapabilities.FIREFOX)
The line self.driver = webdriver.Remote("http://www.python.org", webdriver.DesiredCapabilities.FIREFOX) is incorrect. The first argument is supposed to be the URL of the remote endpoint used to execute your tests. In this case, its AWS Device Farm's endpoint that is given in the CreateTestGridUrl API response. Selenium is basically just a REST service, so it performs actions via REST calls to an endpoint that tells the driver which actions to perform.
AWS Device Farm is currently only in us-west-2.
I suggest you go through the pytest, Selenium, and AWS docs again to understand how it all works together. Its not too complex, but it may get confusing if you do not know how all the working parts interact with each other.
Here's a "minimal" example with pytest to get you started.
import logging
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
import boto3
import pytest
PROJECT_ARN = # Your project ARN
# Currently, AWS Device Farm is only in us-west-2
devicefarm = boto3.client('devicefarm', region_name='us-west-2')
remote_url = devicefarm.create_test_grid_url(
projectArn=PROJECT_ARN,
expiresInSeconds=600 # 10 minutes. Increase to longer if needed
)['url']
#pytest.fixture(scope="module") # Specify "module" to reuse the same session
def firefox_driver(request):
# Start fixture setup
logging.info("Creating a new session with remote URL: " + remote_url)
remote_web_driver = webdriver.Remote(command_executor=remote_url, desired_capabilities=DesiredCapabilities.FIREFOX)
logging.info("Created the remote webdriver session: " + remote_web_driver.session_id)
yield remote_web_driver # Returns driver fixture and waits for tests to run
logging.info("Teardown the remote webdriver session: " + remote_web_driver.session_id)
remote_web_driver.quit()
logging.info("Done tearing down")
#pytest.mark.usefixtures("firefox_driver")
def test_search_in_python_org(firefox_driver):
driver = firefox_driver
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source
# driver.close() // This is done in the fixture instead of here now
#pytest.mark.usefixtures("firefox_driver")
def test_aws_console_title(firefox_driver):
driver = firefox_driver
driver.get("https://aws.amazon.com/")
assert "Amazon Web Services" in driver.title
if __name__ == '__main__':
unittest.main()
I'm trying to have a python RTC client use a global variable so that I can reuse it for multiple functions.
I'm using this for a RTC project I¨ve been working on, I have a functioning js Client, but the functions work differently from python.
The functions on the server and js client side are my own, and do not have have parameters, and I hope to avoid having to use them on the python client I'm making.
I've been using the aiortc Cli.py from their github as a basis for how my python clien should work. But I don't run it asynchronous, because I am trying to learn and control when events happen.
the source code can be found here, I am referring to the codes in line 71-72
https://github.com/aiortc/aiortc/blob/master/examples/datachannel-cli/cli.py
this is the code I'm trying to run properly
I've only inserted the code relevant to my current issue
import argparse
import asyncio
import logging
import time
from aiortc import RTCIceCandidate, RTCPeerConnection, RTCSessionDescription
from aiortc.contrib.signaling import add_signaling_arguments, create_signaling
pc = None
channel = None
def createRTCPeer():
print("starting RTC Peer")
pc = RTCPeerConnection()
print("created Peer", pc)
return pc
def pythonCreateDataChannel():
print("creating datachannel")
channel = pc.CreateDataChannel("chat")
the createRTCPeer function works as intended, with it creating an RTC Object, but my pythonCreateDataChannel reports an error, if I have it set to "None" before using it
AttributeError: 'NoneType' object has no attribute 'CreateDataChannel'
and it will report
NameError: name 'channel' is not defined
same goes for pc if I don't set it in the global scope before hand
Have you tried this:
import argparse
import asyncio
import logging
import time
from aiortc import RTCIceCandidate, RTCPeerConnection, RTCSessionDescription
from aiortc.contrib.signaling import add_signaling_arguments, create_signaling
pc = None
channel = None
def createRTCPeer():
print("starting RTC Peer")
global pc
pc = RTCPeerConnection()
print("created Peer", pc)
def pythonCreateDataChannel():
print("creating datachannel")
global channel
channel = pc.CreateDataChannel("chat")
I was developing simple application, which reads notifications from D-Bus and does some stuff upon receiving it.
This turned out to be quite a headache so I am sharing my code with you all.
import gi.repository.GLib
import dbus
from dbus.mainloop.glib import DBusGMainLoop
def notifications(bus, message):
# do your magic
DBusGMainLoop(set_as_default=True)
bus = dbus.SessionBus()
bus.add_match_string_non_blocking("eavesdrop=true, interface='org.freedesktop.Notifications', member='Notify'")
bus.add_message_filter(notifications)
mainloop = gi.repository.GLib.MainLoop()
mainloop.run()
When running Scrapy from an own script that loads URLs from a DB and follows all internal links on those websites, I encounter a pitty. I need to know which start_url is currently used as I have to maintain consistency with a database (SQL DB). But: When Scrapy uses the built-in list called 'start_urls' in order to receive a list of links to follow and those websites have an immediate redirect, a problem occurs. For example, when Scrapy starts and the start_urls are being crawled and the crawler follows all internal links that are being found there, I later can only determine the currently visited URL, not the start_url where Scrapy started out.
Other answers from the web are wrong, for other use cases or deprecated as there seems to have been a change in Scrapy's code last year.
MWE:
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.crawler import CrawlerProcess
class CustomerSpider(CrawlSpider):
name = "my_crawler"
rules = [Rule(LinkExtractor(unique=True), callback="parse_obj", ), ]
def parse_obj(self, response):
print(response.url) # find current start_url and do something
a = CustomerSpider
a.start_urls = ["https://upb.de", "https://spiegel.de"] # I want to re-identify upb.de in the crawling process in process.crawl(a), but it is redirected immediately # I have to hand over the start_urls this way, as I use the class CustomerSpider in another class
a.allowed_domains = ["upb.de", "spiegel.de"]
process = CrawlerProcess()
process.crawl(a)
process.start()
Here, I provide an MWE where Scrapy (my crawler) receives a list of URLs like I have to do it. An example redirection-url is https://upb.de which redirects to https://uni-paderborn.de.
I am searching for an elegant way of handling this as I want to make use of Scrapy's numerous features such as parallel crawling etc. Thus, I do not want to use something like the requests-library additionally. I want to find the Scrapy start_url which is currently used internally (in the Scrapy library).
I appreciate your help.
Ideally, you would set a meta property on the original request, and reference it later in the callback. Unfortunately, CrawlSpider doesn't support passing meta through a Rule (see #929).
You're best to build your own spider, instead of subclassing CrawlSpider. Start by passing your start_urls in as a parameter to process.crawl, which makes it available as a property on the instance. Within the start_requests method, yield a new Request for each url, including the database key as a meta value.
When parse receives the response from loading your url, run a LinkExtractor on it, and yield a request for each one to scrape it individually. Here, you can again pass meta, propagating your original database key down the chain.
The code looks like this:
from scrapy.spiders import Spider
from scrapy import Request
from scrapy.linkextractors import LinkExtractor
from scrapy.crawler import CrawlerProcess
class CustomerSpider(Spider):
name = 'my_crawler'
def start_requests(self):
for url in self.root_urls:
yield Request(url, meta={'root_url': url})
def parse(self, response):
links = LinkExtractor(unique=True).extract_links(response)
for link in links:
yield Request(
link.url, callback=self.process_link, meta=response.meta)
def process_link(self, response):
print {
'root_url': response.meta['root_url'],
'resolved_url': response.url
}
a = CustomerSpider
a.allowed_domains = ['upb.de', 'spiegel.de']
process = CrawlerProcess()
process.crawl(a, root_urls=['https://upb.de', 'https://spiegel.de'])
process.start()
# {'root_url': 'https://spiegel.de', 'resolved_url': 'http://www.spiegel.de/video/'}
# {'root_url': 'https://spiegel.de', 'resolved_url': 'http://www.spiegel.de/netzwelt/netzpolitik/'}
# {'root_url': 'https://spiegel.de', 'resolved_url': 'http://www.spiegel.de/thema/buchrezensionen/'}
I've started to explore HTTP servers in python and i wanted to do something very simple. The idea is just if a client connects to "my ip"/admin it shows his header on the web page, if not it just uses the default do_GET() function.
My code:
#!/usr/bin/env python3
import http.server
import socketserver
class HttpHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
if self.path == "/admin":
self.wfile.write("This page is for administrators only".encode())
self.wfile.write(str(self.headers).encode())
else:
http.server.SimpleHTTPRequestHandler.do_GET(self)
http_server=socketserver.TCPServer(("",10002),HttpHandler)
http_server.serve_forever()
For some reason i can't see the headers (unless i do a print of the self.headers to show them in the terminal) it isn't even throwing any errors so i'm kinda lost here.
Thanks for the help
Ok so the solution i came up with (for anyone interested or might friend this thread with the same doubts) is this:
#!/usr/bin/env python3
import http.server
import socketserver
class HttpHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
if self.path == "/admin":
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("<html> <head><title> Testing </title> </head><body><p> This page is only for administrators</p>"+str(self.headers)+"</body>", "UTF-8"))
else:
http.server.SimpleHTTPRequestHandler.do_GET(self)
http_server=socketserver.TCPServer(("",10001), HttpHandler)
http_server.serve_forever()