How to import firefox cookies to python requests

How to import firefox cookies to python requests - python-3.x

I'm logged in on some page in Firefox and I want to take the cookie and try to browse webpage with python-requests. Problem is that after importing cookie to the requests session nothing happen (like there is no cookie at all). Structure of the cookie made by requests differ from the one from Firefox as well.
Is such it possible to load FF cookie and use it in requests session?
My code so far:
import sys
import sqlite3
import http.cookiejar as cookielib
import requests
from requests.utils import dict_from_cookiejar
def get_cookies(final_cookie, firefox_cookies):
con = sqlite3.connect(firefox_cookies)
cur = con.cursor()
cur.execute("SELECT host, path, isSecure, expiry, name, value FROM moz_cookies")
for item in cur.fetchall():
if item[0].find("mydomain.com") == -1:
continue
c = cookielib.Cookie(0, item[4], item[5],
None, False,
item[0], item[0].startswith('.'), item[0].startswith('.'),
item[1], False,
item[2],
item[3], item[3]=="",
None, None, {})
final_cookie.set_cookie(c)
cookie = cookielib.CookieJar()
input_file = ~/.mozilla/firefox/myprofile.default/cookies.sqlite
get_cookies(cookie, input_file)
#print cookie given from firefox
cookies = dict_from_cookiejar(cookie)
for key, value in cookies.items():
print(key, value)
s = requests.Session()
payload = {
"lang" : "en",
'destination': '/auth',
'credential_0': sys.argv[1],
'credential_1': sys.argv[2],
'credential_2': '86400',
}
r = s.get("mydomain.com/login", data = payload)
#print cookie from requests
cookies = dict_from_cookiejar(s.cookies)
for key, value in cookies.items():
print(key, value)
Structure of cookies from firefox is:
_gid GA1.3.2145214.241324
_ga GA1.3.125598754.422212
_gat_is4u 1
Structure of cookies from requests is:
UISTestAuth tesskMpA8JJ23V43a%2FoFtdesrtsszpw
After all, when trying to assign cookies from FF to session.cookies, requests works as I import nothing.

It looks like there are two type of cookies in the Firefox - request and response. It could be seen while Page inspector > Network > login (post) > Cookies:
Response cookies:
UISAuth
httpOnly true
path /
secure true
value tesskMpA8JJ23V43a%2FoFtdesrtsszpw
Request cookies:
_ga GA1.3.125598754.422212
_gat_is4u 1
_gid GA1.3.2145214.241324
The request cookies are stored in the cookies.sqlite file in the
~/.mozilla/firefox/*.default/cookies.sqlite
and can be load to the python object in more ways, for example:
import sqlite3
import http.cookiejar
def get_cookies(cj, ff_cookies):
con = sqlite3.connect(ff_cookies)
cur = con.cursor()
cur.execute("SELECT host, path, isSecure, expiry, name, value FROM moz_cookies")
for item in cur.fetchall():
c = cookielib.Cookie(0, item[4], item[5],
None, False,
item[0], item[0].startswith('.'), item[0].startswith('.'),
item[1], False,
item[2],
item[3], item[3]=="",
None, None, {})
print c
cj.set_cookie(c)
where cj is CookieJar object and ff_cookies is path to Firefox cookies.sqlite. Taken from this page.
The whole code to load cookies and import to the python requests using session would looks like:
import requests
import sys
cj = http.cookiejar.CookieJar()
ff_cookies = sys.argv[1] #pass path to the cookies.sqlite as an argument to the script
get_cookies(cj, ff_cookies)
s = requests.Session()
s.cookies = cj
Response cookie is basically session ID, which usualy expires at the end of the session (or some timeout), so they are not stored.

There is a package on PyPi, browser-cookie3, which does exactly this.
import browser_cookie3
import requests
cookiejar = browser_cookie3.firefox(domain_name='signed-in-website.tld')
resp = requests.get('https://signed-in-website.tld/path/', cookies=cookiejar)
print(resp.content)
browser_cookie3.firefox() retrieves Firefox cookies as a cookiejar with a domain name as optional argument.

Related

How do I pass an API Token into Cookie via Python Request?

I'm having issues w/ authenticating to pkgs.org api, a token was produced, by support mentioned it needs to be passed as a cookie. I've never worked with cookies before.
import requests
import json
import base64
import urllib3
import sys
import re
import os
token=('super-secret')
#s = requests.Session()
head = {'Accept':'application/json'}
r = requests.get('https://api.pkgs.org/v1/distributions', auth=(token), headers=head)
print(r)
print(r.connection)
print(r.cookies)
I tried to use the request.session method, to handle the cookie, but i honestly don't know syntax on how to ever 1 create a cookie, let alone pass the cookie.

If I read the API documentation correctly you should set acces_token cookie:
import requests
token = "super-secret"
cookies = {"access_token": token}
headers = {"Accept": "application/json"}
r = requests.get(
"https://api.pkgs.org/v1/distributions", cookies=cookies, headers=headers
)

Get cookie using aiohttp

I am trying to get cookies from the browser using aiohttp. From the docs and googling I have only found articles about setting cookies in aiohttp.
In flask I would get the cookies as simply as
cookie = request.cookies.get('name_of_cookie')
# do something with cookie
Is there a simple way to fetch the cookie from browser using aiohttp?

Is there a simple way to fetch the cookie from the browser using aiohttp?
Not sure about whether this is simple but there is a way:
import asyncio
import aiohttp
async def main():
urls = [
'http://httpbin.org/cookies/set?test=ok',
]
for url in urls:
async with aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar()) as s:
async with s.get(url) as r:
print('JSON', await r.json())
cookies = s.cookie_jar.filter_cookies('http://httpbin.org')
for key, cookie in cookies.items():
print('Key: "%s", Value: "%s"' % (cookie.key, cookie.value))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
The program generates the following output:
JSON: {'cookies': {'test': 'ok'}}
Key: "test", Value: "ok"
Example adapted from https://aiohttp.readthedocs.io/en/stable/client_advanced.html#custom-cookies + https://docs.aiohttp.org/en/stable/client_advanced.html#cookie-jar
Now if you want to do a request using a previously set cookie:
import asyncio
import aiohttp
url = 'http://example.com'
# Filtering for the cookie, saving it into a varibale
async with aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar()) as s:
cookies = s.cookie_jar.filter_cookies('http://example.com')
for key, cookie in cookies.items():
if key == 'test':
cookie_value = cookie.value
# Using the cookie value to do anything you want:
# e.g. sending a weird request containing the cookie in the header instead.
headers = {"Authorization": "Basic f'{cookie_value}'"}
async with s.get(url, headers=headers) as r:
print(await r.json())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For testing urls containing a host part made up by an IP address use aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar(unsafe=True)), according to https://github.com/aio-libs/aiohttp/issues/1183#issuecomment-247788489

Yes, the cookies are stored in request.cookies as a dict, just like in flask, so request.cookies.get('name_of_cookie') works the same.
In the examples section of the aiohttp repository there is a file, web_cookies.py that shows how to retrieve, set, and delete a cookie. Here's the section from that script that reads the cookies and returns it to the template as a preformatted string:
from pprint import pformat
from aiohttp import web
tmpl = '''\
<html>
<body>
Login<br/>
Logout<br/>
<pre>{}</pre>
</body>
</html>'''
async def root(request):
resp = web.Response(content_type='text/html')
resp.text = tmpl.format(pformat(request.cookies))
return resp

You can get the cookie value, domain, path etc, without having to loop thru all cookies.
s.cookie_jar._cookies
gives you all the cookies in a defaultdict with the domains as keys and their respective cookies as values. aiohttp uses SimpleCookie
So, to get the value of a cookie
s.cookie_jar._cookies.get("https://httpbin.org")["cookie_name"].value
for domain, path:
s.cookie_jar._cookies.get("https://httpbin.org")["cookie_name"]["domain"]
s.cookie_jar._cookies.get("https://httpbin.org")["cookie_name"]["path"]
more info can be found here: https://docs.python.org/3/library/http.cookies.html

Flask JWT Extended cookie name overide Flask Session Cookie Name

I am using the Flask JWT Extended extension for flask and have built a login application successfully using JWT. I have gone through the tutorial on JWT in Cookies on the JWT extended documentation site using CSRF protection and everything.
What I can't seem to figure out is when using the set_access_cookies() and set_refresh_cookies() methods the JWTs are not saved in a httponly cookie named using the JWT Extended default configuration setting.
app.config.setdefault('JWT_ACCESS_COOKIE_NAME', 'access_token_cookie')
app.config.setdefault('JWT_REFRESH_COOKIE_NAME', 'refresh_token_cookie')
Instead when I debug the return back from the auth call the cookies are saved in the base Flask default configuration instead.
'SESSION_COOKIE_NAME': 'session',
Shouldn't the set_access_cookies() and set_refresh_cookies() methods override the base Flask default configurations as long as make sure to register my app in the JWTManager()?
uscc_login_app = Flask(__name__)
jwt = JWTManager(uscc_login_app)
Or is there something else I missed in the base Flask JWT Extended documentation to ensure that its configuration defaults are used when appropriate?
Updated code via request.
The code is pretty spread out but here is my best shot to include what I think will help.
In init.py:
from flask import Flask, url_for
from flask_restful import Api
from flask_jwt_extended import JWTManager
from resources.auth import Authenticate
from resources.refresh import Refresh
from temp_app.temp import TempApp
from uscc_login.uscc_app_login import *
uscc_login_app = Flask(__name__)
uscc_login_app.config.from_object(os.environ.get('FLASK_ENV'))
jwt = JWTManager(uscc_login_app)
api = Api(uscc_login_app, prefix='/v1')
# Add resources via the add_resource method
api.add_resource(Authenticate, '/login')
api.add_resource(Refresh, '/refresh_token')
login_view = Login.as_view(name='uscc_login')
uscc_login_app.add_url_rule('/login', view_func=login_view, methods=['POST', 'GET'])
In my app.py:
from uscc_login import uscc_login_app
if __name__ == '__main__':
uscc_login_app.run(debug=uscc_login_app.config.get('DEBUG'), threaded=uscc_login_app.config.get('THREADED'),
port=uscc_login_app.config.get('PORT'), host=uscc_login_app.config.get('HOST'))
In my config.py since I am using the Flask config.from_objects
import os
import datetime
uscc_login_app_dir = os.path.abspath(os.path.dirname(__file__))
class BaseConfig:
SECRET_KEY = os.environ.get('USCC_SECRET_KEY') or 'you-will-never-guess'
JWT_SECRET_KEY = os.environ.get('USCC_JWT_KEY') or 'super-secret'
JWT_TOKEN_LOCATION = ['cookies']
JWT_COOKIE_CSRF_PROTECT = True
JWT_HEADER_TYPE = 'JWT'
PROPAGATE_EXCEPTIONS = True
THREADED = True
class DevelopmentConfig(BaseConfig):
DEBUG = True
PORT = 5000 if os.environ.get("PORT") is None else int(os.environ.get("PORT"))
HOST = os.environ.get('HOST') or 'localhost'
if os.environ.get('access_token_expiration') is not None:
JWT_ACCESS_TOKEN_EXPIRES = datetime.timedelta(seconds=int(os.environ.get('access_token_expiration')))
if os.environ.get('refresh_token_expiration') is not None:
JWT_REFRESH_TOKEN_EXPIRES = datetime.timedelta(seconds=int(os.environ.get('refresh_token_expiration')))
So then in my Flask MethodView that contains my login authorization POST I have the following:
auth.py
import sys
import os
from flask import jsonify, request
from flask_restful import Resource
from flask_jwt_extended import create_access_token, create_refresh_token, jwt_refresh_token_required, get_jwt_identity, \
set_access_cookies, set_refresh_cookies
from utilities import Common
class Authenticate(Resource):
#staticmethod
def post():
"""
:return:
"""
api_cred_path = os.environ.get('api_cred_path')
if api_cred_path is None:
response = jsonify({"msg": "Environment Variable 'api_cred_path' is not set."})
response.status_code = 500
return response
if not request.is_json:
response = jsonify({'msg': 'Missing JSON in request'})
response.status_code = 400
return response
params = request.get_json()
user_name = params.get('username')
user_password = params.get('password')
if not user_name:
response = jsonify({'msg': 'Missing username parameter'})
response.status_code = 400
return response
if not user_password:
response = jsonify({'msg': 'Missing password parameter'})
response.status_code = 400
return response
if Common.check_path_exists(api_cred_path):
with open(api_cred_path) as afh:
for line in afh:
file_userid, file_password = line.split('=')
if file_userid == user_name and file_password.strip('\n') == user_password:
access_token = create_access_token(identity=user_name)
refresh_token = create_refresh_token(identity=user_name)
response = jsonify({'login': True})
set_access_cookies(response, access_token)
set_refresh_cookies(response, refresh_token)
# # Identity can be any data that is json serializable
# art = {
# 'access_token': create_access_token(identity=user_name),
# 'refresh_token': create_refresh_token(identity=user_name)}
# response = jsonify(art)
response.status_code = 200
return response
else:
response = jsonify({"msg": "api_cred_path invalid."})
response.status_code = 500
return response
response = jsonify({'msg': 'Bad username or password'})
response.status_code = 401
return response

Could you provide some code to duplicate what you are seeing? When I try running the example token in jwt code (https://github.com/vimalloc/flask-jwt-extended/blob/master/examples/jwt_in_cookie.py) I see the expected cookie values when I login:
$ http :5000/token/auth username=test password=test
...
Set-Cookie: access_token_cookie=<jwt>; HttpOnly; Path=/api/
Set-Cookie: refresh_token_cookie=<jwt>; HttpOnly; Path=/token/refresh
...

So I realized my mistake in this. I was trying to get the access_token_cookie variable to be set from my auth.py which serves as my RESTFUL based microservice of which my login app calls to do the authorization. Realized it won't be available after redirecting back to the caller from the login apps POST method since the cookie was related to the login app UI frontend. So I just base the access and refresh tokens back from the auth.py POST method to the login POST method and let it set the cookies so they are available to the end client.
This was more of design problem than a code problem.

Changing Scrapy/Splash user agent

How can I set the user agent for Scrapy with Splash in an equivalent way like below:
import requests
from bs4 import BeautifulSoup
ua = {"User-Agent":"Mozilla/5.0"}
url = "http://www.example.com"
page = requests.get(url, headers=ua)
soup = BeautifulSoup(page.text, "lxml")
My spider would look similar to this:
import scrapy
from scrapy_splash import SplashRequest
class ExampleSpider(scrapy.Spider):
name = "example"
allowed_domains = ["example.com"]
start_urls = ["https://www.example.com/"]
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(
url,
self.parse,
args={'wait': 0.5}
)

You need to set user_agent attribute to override default user agent:
class ExampleSpider(scrapy.Spider):
name = 'example'
user_agent = 'Mozilla/5.0'
In this case UserAgentMiddleware (which is enabled by default) will override USER_AGENT setting value to 'Mozilla/5.0'.
You can also override headers per request:
scrapy_splash.SplashRequest(url, headers={'User-Agent': custom_user_agent})

The proper way is to to alter the splash script to included it... no add it to the spider though, if it works as well.
http://splash.readthedocs.io/en/stable/scripting-ref.html?highlight=agent

If you use pure splash (not scrapy-splash package), you can just pass headers param with 'User-Agent' key. And the requests on this page all will use this user-agent.
https://splash.readthedocs.io/en/stable/api.html?highlight=User-Agent
Here is an example:
import requests
import json
headers = {
'User-Agent': 'Mozilla/5.0',
}
param = {
'url': your_aim_url,
'headers': headers,
'html': 1,
'har': 1,
'response_body': 1,
}
session = requests.Session()
session.headers.update({'Content-Type': 'application/json'})
response = self.session.post(url='http://127.0.0.1:8050/render.json', json=param)
response_json = json.loads(response.text, encoding='utf-8')
print(response_json.get('html')) # page html
print(response_json.get('har')) # har with respose body. if do not want respose body, set 'response_body' to 0
You can check the request header in har to see if the user-agent is correct.

How to write these 3 sentences with requests instead of urllib2? [duplicate]

Just a short, simple one about the excellent Requests module for Python.
I can't seem to find in the documentation what the variable 'proxies' should contain. When I send it a dict with a standard "IP:PORT" value it rejected it asking for 2 values.
So, I guess (because this doesn't seem to be covered in the docs) that the first value is the ip and the second the port?
The docs mention this only:
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
So I tried this... what should I be doing?
proxy = { ip: port}
and should I convert these to some type before putting them in the dict?
r = requests.get(url,headers=headers,proxies=proxy)

The proxies' dict syntax is {"protocol": "scheme://ip:port", ...}. With it you can specify different (or the same) proxie(s) for requests using http, https, and ftp protocols:
http_proxy = "http://10.10.1.10:3128"
https_proxy = "https://10.10.1.11:1080"
ftp_proxy = "ftp://10.10.1.10:3128"
proxies = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
r = requests.get(url, headers=headers, proxies=proxies)
Deduced from the requests documentation:
Parameters:
method – method for the new Request object.
url – URL for the new Request object.
...
proxies – (optional) Dictionary mapping protocol to the URL of the proxy.
...
On linux you can also do this via the HTTP_PROXY, HTTPS_PROXY, and FTP_PROXY environment variables:
export HTTP_PROXY=10.10.1.10:3128
export HTTPS_PROXY=10.10.1.11:1080
export FTP_PROXY=10.10.1.10:3128
On Windows:
set http_proxy=10.10.1.10:3128
set https_proxy=10.10.1.11:1080
set ftp_proxy=10.10.1.10:3128

You can refer to the proxy documentation here.
If you need to use a proxy, you can configure individual requests with the proxies argument to any request method:
import requests
proxies = {
"http": "http://10.10.1.10:3128",
"https": "https://10.10.1.10:1080",
}
requests.get("http://example.org", proxies=proxies)
To use HTTP Basic Auth with your proxy, use the http://user:password#host.com/ syntax:
proxies = {
"http": "http://user:pass#10.10.1.10:3128/"
}

I have found that urllib has some really good code to pick up the system's proxy settings and they happen to be in the correct form to use directly. You can use this like:
import urllib
...
r = requests.get('http://example.org', proxies=urllib.request.getproxies())
It works really well and urllib knows about getting Mac OS X and Windows settings as well.

The accepted answer was a good start for me, but I kept getting the following error:
AssertionError: Not supported proxy scheme None
Fix to this was to specify the http:// in the proxy url thus:
http_proxy = "http://194.62.145.248:8080"
https_proxy = "https://194.62.145.248:8080"
ftp_proxy = "10.10.1.10:3128"
proxyDict = {
"http" : http_proxy,
"https" : https_proxy,
"ftp" : ftp_proxy
}
I'd be interested as to why the original works for some people but not me.
Edit: I see the main answer is now updated to reflect this :)

If you'd like to persisist cookies and session data, you'd best do it like this:
import requests
proxies = {
'http': 'http://user:pass#10.10.1.0:3128',
'https': 'https://user:pass#10.10.1.0:3128',
}
# Create the session and set the proxies.
s = requests.Session()
s.proxies = proxies
# Make the HTTP request through the session.
r = s.get('http://www.showmemyip.com/')

8 years late. But I like:
import os
import requests
os.environ['HTTP_PROXY'] = os.environ['http_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['HTTPS_PROXY'] = os.environ['https_proxy'] = 'http://http-connect-proxy:3128/'
os.environ['NO_PROXY'] = os.environ['no_proxy'] = '127.0.0.1,localhost,.local'
r = requests.get('https://example.com') # , verify=False

The documentation
gives a very clear example of the proxies usage
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)
What isn't documented, however, is the fact that you can even configure proxies for individual urls even if the schema is the same!
This comes in handy when you want to use different proxies for different websites you wish to scrape.
proxies = {
'http://example.org': 'http://10.10.1.10:3128',
'http://something.test': 'http://10.10.1.10:1080',
}
requests.get('http://something.test/some/url', proxies=proxies)
Additionally, requests.get essentially uses the requests.Session under the hood, so if you need more control, use it directly
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')
I use it to set a fallback (a default proxy) that handles all traffic that doesn't match the schemas/urls specified in the dictionary
import requests
proxies = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.setdefault('http', 'http://127.0.0.1:9009')
session.proxies.update(proxies)
session.get('http://example.org')

i just made a proxy graber and also can connect with same grabed proxy without any input
here is :
#Import Modules
from termcolor import colored
from selenium import webdriver
import requests
import os
import sys
import time
#Proxy Grab
options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get("https://www.sslproxies.org/")
tbody = driver.find_element_by_tag_name("tbody")
cell = tbody.find_elements_by_tag_name("tr")
for column in cell:
column = column.text.split(" ")
print(colored(column[0]+":"+column[1],'yellow'))
driver.quit()
print("")
os.system('clear')
os.system('cls')
#Proxy Connection
print(colored('Getting Proxies from graber...','green'))
time.sleep(2)
os.system('clear')
os.system('cls')
proxy = {"http": "http://"+ column[0]+":"+column[1]}
url = 'https://mobile.facebook.com/login'
r = requests.get(url, proxies=proxy)
print("")
print(colored('Connecting using proxy' ,'green'))
print("")
sts = r.status_code

here is my basic class in python for the requests module with some proxy configs and stopwatch !
import requests
import time
class BaseCheck():
def __init__(self, url):
self.http_proxy = "http://user:pw#proxy:8080"
self.https_proxy = "http://user:pw#proxy:8080"
self.ftp_proxy = "http://user:pw#proxy:8080"
self.proxyDict = {
"http" : self.http_proxy,
"https" : self.https_proxy,
"ftp" : self.ftp_proxy
}
self.url = url
def makearr(tsteps):
global stemps
global steps
stemps = {}
for step in tsteps:
stemps[step] = { 'start': 0, 'end': 0 }
steps = tsteps
makearr(['init','check'])
def starttime(typ = ""):
for stemp in stemps:
if typ == "":
stemps[stemp]['start'] = time.time()
else:
stemps[stemp][typ] = time.time()
starttime()
def __str__(self):
return str(self.url)
def getrequests(self):
g=requests.get(self.url,proxies=self.proxyDict)
print g.status_code
print g.content
print self.url
stemps['init']['end'] = time.time()
#print stemps['init']['end'] - stemps['init']['start']
x= stemps['init']['end'] - stemps['init']['start']
print x
test=BaseCheck(url='http://google.com')
test.getrequests()

It’s a bit late but here is a wrapper class that simplifies scraping proxies and then making an http POST or GET:
ProxyRequests
https://github.com/rootVIII/proxy_requests

Already tested, the following code works. Need to use HTTPProxyAuth.
import requests
from requests.auth import HTTPProxyAuth
USE_PROXY = True
proxy_user = "aaa"
proxy_password = "bbb"
http_proxy = "http://your_proxy_server:8080"
https_proxy = "http://your_proxy_server:8080"
proxies = {
"http": http_proxy,
"https": https_proxy
}
def test(name):
print(f'Hi, {name}') # Press Ctrl+F8 to toggle the breakpoint.
# Create the session and set the proxies.
session = requests.Session()
if USE_PROXY:
session.trust_env = False
session.proxies = proxies
session.auth = HTTPProxyAuth(proxy_user, proxy_password)
r = session.get('https://www.stackoverflow.com')
print(r.status_code)
if __name__ == '__main__':
test('aaa')

I share some code how to fetch proxies from the site "https://free-proxy-list.net" and store data to a file compatible with tools like "Elite Proxy Switcher"(format IP:PORT):
##PROXY_UPDATER - get free proxies from https://free-proxy-list.net/
from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
import re
######################FIND PROXIES#########################################
def get_proxies():
url = 'https://free-proxy-list.net/'
response = requests.get(url)
parser = fromstring(response.text)
proxies = set()
for i in parser.xpath('//tbody/tr')[:299]: #299 proxies max
proxy = ":".join([i.xpath('.//td[1]/text()')
[0],i.xpath('.//td[2]/text()')[0]])
proxies.add(proxy)
return proxies
######################write to file in format IP:PORT######################
try:
proxies = get_proxies()
f=open('proxy_list.txt','w')
for proxy in proxies:
f.write(proxy+'\n')
f.close()
print ("DONE")
except:
print ("MAJOR ERROR")

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to import firefox cookies to python requests - python-3.x

Related

How do I pass an API Token into Cookie via Python Request?

Get cookie using aiohttp

Flask JWT Extended cookie name overide Flask Session Cookie Name

Changing Scrapy/Splash user agent

How to write these 3 sentences with requests instead of urllib2? [duplicate]

Categories

Resources