Related
The code below, needs to return 200, but an error occurs for some domains.
import requests
url1 = 'https://www.pontofrio.com.br/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.11 (KHTML, like Gecko) '
'Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
response = requests.get(url1, headers, timeout=10)
print(response.status_code)
Return:
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "C:\Python34\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "C:\Python34\lib\http\client.py", line 1148, in getresponse
response.begin()
File "C:\Python34\lib\http\client.py", line 352, in begin
version, status, reason = self._read_status()
File "C:\Python34\lib\http\client.py", line 314, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Python34\lib\socket.py", line 371, in readinto
return self._sock.recv_into(b)
File "C:\Python34\lib\site-packages\urllib3\contrib\pyopenssl.py", line 309, in recv_into
return self.recv_into(*args, **kwargs)
File "C:\Python34\lib\site-packages\urllib3\contrib\pyopenssl.py", line 307, in recv_into
raise timeout('The read operation timed out')
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python34\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Python34\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Python34\lib\site-packages\urllib3\util\retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:\Python34\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
raise value
File "C:\Python34\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Python34\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "C:\Python34\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='www.pontofrio.com.br', port=443): Read timed out. (read timeout=10)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:/teste.py", line 219, in <module>
url = montaurl(dominio)
File "c:/teste.py", line 81, in montaurl
response = requests.get(url1, headers, timeout=10)
File "C:\Python34\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Python34\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python34\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python34\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Python34\lib\site-packages\requests\adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.pontofrio.com.br', port=443): Read timed out. (read timeout=10)
Domain that works:
https://www.pichau.com.br/
Domains that don't work:
casasbahia.com.br
extra.com.br
boticario.com.br
I believe it is some block on the server of the pontofrio, how can I get around this?
There seemed to be a couple of issues, the first being how the headers were being set. The below doesn't actually pass the custom headers to the requests.get function.
response = requests.get(url1, headers, timeout=10)
This can be tested against httpbin:
import requests
url1 = 'https://httpbin.org/headers'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.11 (KHTML, like Gecko) '
'Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'
}
response = requests.get(url1, headers, timeout=10)
print(response.text)
print(response.status_code)
Which outputs:
{
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.25.1",
"X-Amzn-Trace-Id": "Root=1-608a0391-3f1cfa79444ac04865ad9111"
}
}
200
To properly set the headers argument:
response = requests.get(url1, headers=headers, timeout=10)
Let's test:
import requests
url1 = 'https://httpbin.org/headers'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '
'AppleWebKit/537.11 (KHTML, like Gecko) '
'Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'
}
response = requests.get(url1, headers=headers, timeout=10)
print(response.text)
print(response.status_code)
Here's the output:
{
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.3",
"Accept-Encoding": "none",
"Accept-Language": "en-US,en;q=0.8",
"Host": "httpbin.org",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
"X-Amzn-Trace-Id": "Root=1-608a0533-40c8281f5faa85d1050c6b6a"
}
}
200
Finally, the order of the headers and the 'Connection': 'keep-alive' header in particular were causing problems. Once I reordered and removed the Connection header it starting working on all of the urls.
Here's the code I used to test:
import requests
urls = ['https://www.pontofrio.com.br/',
'https://www.casasbahia.com.br',
'https://www.extra.com.br',
'https://www.boticario.com.br']
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4491.0 Safari/537.36'}
for url1 in urls:
print("Trying url: %s"% url1)
response = requests.get(url1, headers=headers, timeout=10)
print(response.status_code)
And the output:
Trying url: https://www.pontofrio.com.br/
200
Trying url: https://www.casasbahia.com.br
200
Trying url: https://www.extra.com.br
200
Trying url: https://www.boticario.com.br
200
I've tested to access the page with wget but without success. The problem seems that the server responds only to HTTP/2 requests.
Test with curl:
This times out:
$ curl --http1.1 -A "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/81.0" "https://www.pontofrio.com.br/"
# times out
This succeeds (note the --http2 parameter):
$ curl --http2 -A "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/81.0" "https://www.pontofrio.com.br/"
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml">
...
Unfortunately, requests module doesn't support it. You can however use httpx module that has experimental HTTP/2 support:
import httpx
import asyncio
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
}
async def get_text(url):
async with httpx.AsyncClient(http2=True, headers=headers) as client:
r = await client.get(url)
return r.text
txt = asyncio.run(get_text("https://www.pontofrio.com.br/"))
print(txt)
Prints:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml">
...
To install the httpx module with HTTP/2 support, use for example pip install httpx[http2]
Hello I am attempting to reach https://api.louisvuitton.com/api/eng-us/catalog/availability/M80016 through a session while using request in python. Currently I am unable to reach it and get an error of Remote end closed connection without response.
I have been trying to debug but havent been successful. Bellow is my code and the output.
Code:
import requests
from requests.auth import HTTPBasicAuth
import json
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'}
s = requests.Session()
r = s.get("https://us.louisvuitton.com/eng-us/products/pocket-organizer-damier-graphite-nvprod2630093v#N60432",headers=headers)
if r:
print("Requested Successfully")
else:
print("Request Failed ==> " + str(r))
exit()
url2 = "https://api.qubit.com/graphql"
payload = json.dumps({
"query": "query ($trackingId: String!, $contextId: String!) {\n property(trackingId: $trackingId) {\n visitor(contextId: $contextId) {\n ipAddress\n ipLocation: location {\n city\n cityCode\n country\n countryCode\n latitude\n longitude\n area\n areaCode\n region\n regionCode\n }\n segment: segments {\n state\n }\n history {\n conversionCycleNumber: conversionCycle\n conversionNumber: conversions\n entranceNumber: entrances\n firstConversionTs: firstConversion\n firstViewTs: firstView\n lastConversionTs: lastConversion\n lastViewTs: lastView\n lifetimeValue\n sessionNumber: sessions\n viewNumber: views\n }\n }\n }\n}",
"variables": {
"trackingId": "louisvuitton_prod",
"contextId": "o6vfrf9jm4g-0k999shdp-fiadwa4"
}})
headers2 = {
'Content-Type': 'application/json'
}
x = s.post(url2,headers=headers2, data=payload)
if x:
print("Post Successfully")
else:
print("Post Failed ==> " + str(x))
exit()
headers3 = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)',
'Accept': "*/*",
'Cache-Control': "no-cache",
'Host': "api.louisvuitton.com",
'Accept-Encoding': "gzip, deflate",
'Connection': "keep-alive",
'cache-control': "no-cache",
'Content-Type': 'application/json'
}
cookies = s.cookies
t = s.get("https://api.louisvuitton.com/api/eng-us/catalog/availability/M80016",headers=headers3,cookies=cookies)
if t:
print("Get Successfully")
else:
print("Get Failed ==> " + str(t))
exit()
Output
Requested Successfully
Post Successfully
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3-1.25.10-py3.8.egg/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3-1.25.10-py3.8.egg/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/site-packages/urllib3-1.25.10-py3.8.egg/urllib3/connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1347, in getresponse
response.begin()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 307, in begin
version, status, reason = self._read_status()
File "/usr/local/Cellar/python#3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 276, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
Anyone have a clue or idea how to resolve this issues? Would appreciated any help.
If you inspect the cookies on the webpage in Chrome with Inspect Element -> application -> storage -> cookies -> https://us.louisvuitton.com/ you see about 40 cookies. However if you add import pprint to your code and at line 50 pprint.pprint(s.cookies.get_dict()) you see only 4 cookies. So you are missing many cookies.
The response you get is actually an Access Denied message as you can see if you use Inspect Element -> Network copy as cURL on the https://api.louisvuitton.com/api/eng-us/catalog/availability/nvprod... URL and remove the cookies except for your 4 and run it, if you run it will all the cookies it works fine.
So as there are many XHR requests than can set cookies I suggest you either go through all requests decode them if needed and read all the JavaScript files to see if they set cookies or a much easier solution use Selenium, requests-html https://pypi.org/project/requests-html/ or PyQT
i am trying to login to instagram through tor
(i am using python3 on a linux machine if this helps!)
here is the code:
import json
import requests
import os
from colorama import Fore
from stem import Signal
from stem.control import Controller
def tor_session():
session = requests.session()
session.proxies['http'] = 'socks5h://localhost:9050'
session.proxies['https'] = 'socks5h://localhost:9050'
return session
def login(username, password):
# params:
# [string]username- the username of the instagram account to log in to
# [string]password- the password to use in the log in process
# description:
# logs in to the account with the specified username and with the specified password
# session setup
sess = tor_session()
sess.cookies.update({
'sessionid': '',
'mid': '',
'ig_pr': '1',
'ig_vw': '1920',
'csrftoken': '',
's_network': '',
'ds_user_id': ''
})
sess.headers.update({
'UserAgent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'x-instagram-ajax': '1',
'X-Requested-With': 'XMLHttpRequest',
'origin': 'https://www.instagram.com',
'ContentType': 'application/x-www-form-urlencoded',
'Connection': 'keep-alive',
'Accept': '*/*',
'Referer': 'https://www.instagram.com',
'authority': 'www.instagram.com',
'Host': 'www.instagram.com',
'Accept-Language': 'ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4',
'Accept-Encoding': 'gzip, deflate'
})
# get csrftoken and the instagram main page
r = sess.get('https://www.instagram.com/')
sess.headers.update({'X-CSRFToken': r.cookies.get_dict()['csrftoken']})
# log in
data = {'username': username, 'password': password}
r = sess.post('https://www.instagram.com/accounts/login/ajax/', data=data, allow_redirects=True)
token = r.cookies.get_dict()['csrftoken']
sess.headers.update({'X-CSRFToken': token})
# parse the response from the log in
data = json.loads(r.text)
print(data)
if data['status'] == 'fail':
return None
if data['authenticated']:
return True
else:
return False
login("username", "password")
the problem is that almost every time i have tried to run this
it didnt work and threw an exception:
Traceback (most recent call last):
File "main.py", line 156, in <module>
main()
File "main.py", line 152, in main
brute_force(username, pass_file_path)
File "main.py", line 114, in brute_force
logged_in = login(username, password)
File "main.py", line 81, in login
sess.headers.update({'X-CSRFToken': r.cookies.get_dict()['csrftoken']})
KeyError: 'csrftoken'
and sometimes it threw this exception:
File "main.py", line 94, in login
if data['authenticated']:
KeyError: 'authenticated'
how can i fix this?
i tried restarting tor
changing its configs
but nothing works
please help if you can!
It appears that Instagram doesn't set cookies for tor users:
>>> s = your_setup_code_for_session()
>>> r = s.get('https://www.instagram.com')
>>> r.cookies.get_dict()
{}
I also tested this using the tor browser and got the same results:
It looks like you'll need to use a vpn or a Tor + vpn combination.
I want to scrape some of the data from here which is implemented based on websockets. So after inspecting the Chrome DevTools for wss address and header:
and the negotiation message:
I wrote:
from websocket import create_connection
headers = {
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,fa;q=0.8',
'Cache-Control': 'no-cache',
'Connection': 'Upgrade',
'Host': 'stream179.forexpros.com',
'Origin': 'https://www.investing.com',
'Pragma': 'no-cache',
'Sec-WebSocket-Extensions': 'client_max_window_bits',
'Sec-WebSocket-Key': 'ldcvnZNquzPkSNvpSdI09g==',
'Sec-WebSocket-Version': '13',
'Upgrade': 'websocket',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36'
}
ws = create_connection('wss://stream179.forexpros.com/echo/894/l27e2ja8/websocket', header=headers)
nego_message = '''["{\"_event\":\"bulk-subscribe\",\"tzID\":8,\"message\":\"pid-1:%%pid-8839:%%pid-166:%%pid-20:%%pid-169:%%pid-170:%%pid-44336:%%pid-27:%%pid-172:%%pid-2:%%pid-3:%%pid-5:%%pid-7:%%pid-9:%%pid-10:%%pid-945629:%%pid-11:%%pid-16:%%pid-68:%%pidTechSumm-1:%%pidTechSumm-2:%%pidTechSumm-3:%%pidTechSumm-5:%%pidTechSumm-7:%%pidTechSumm-9:%%pidTechSumm-10:%%pidExt-1:%%event-393634:%%event-393633:%%event-393636:%%event-393638:%%event-394479:%%event-394518:%%event-394514:%%event-394516:%%event-394515:%%event-394517:%%event-393654:%%event-394467:%%event-393653:%%event-394468:%%event-394545:%%event-394549:%%event-394548:%%event-394547:%%event-394550:%%event-394546:%%event-394551:%%event-394553:%%event-394552:%%event-394743:%%event-394744:%%event-393661:%%event-394469:%%event-394470:%%event-393680:%%event-393682:%%event-393681:%%event-393687:%%event-393694:%%event-393685:%%event-393689:%%event-393688:%%event-393695:%%event-393698:%%event-393704:%%event-393705:%%event-393724:%%event-393723:%%event-393725:%%event-393726:%%event-394591:%%event-393736:%%event-393733:%%event-393734:%%event-393740:%%event-393731:%%event-393732:%%event-393730:%%event-394617:%%event-394616:%%event-393737:%%event-378304:%%event-393645:%%event-394619:%%event-393755:%%event-393757:%%event-393760:%%event-393756:%%event-393758:%%event-393759:%%event-393761:%%event-393762:%%event-394481:%%event-394625:%%event-393754:%%event-394483:%%event-393775:%%event-394621:%%event-394622:%%event-376710:%%event-394623:%%event-394484:%%event-394624:%%isOpenExch-1:%%isOpenExch-2:%%isOpenExch-13:%%isOpenExch-3:%%isOpenExch-4:%%isOpenPair-1:%%isOpenPair-8839:%%isOpenPair-44336:%%cmt-1-5-1:%%domain-1:\"}"]'''
ws.send(nego_message)
while True:
print(ws.recv())
but I'm getting:
o
Traceback (most recent call last):
File "test.py", line 647, in <module>
print(ws.recv())
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_core.py", line 313, in recv
opcode, data = self.recv_data()
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_core.py", line 330, in recv_data
opcode, frame = self.recv_data_frame(control_frame)
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_core.py", line 343, in recv_data_frame
frame = self.recv_frame()
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_core.py", line 377, in recv_frame
return self.frame_buffer.recv_frame()
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_abnf.py", line 361, in recv_frame
self.recv_header()
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_abnf.py", line 309, in recv_header
header = self.recv_strict(2)
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_abnf.py", line 396, in recv_strict
bytes_ = self.recv(min(16384, shortage))
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_core.py", line 452, in _recv
return recv(self.sock, bufsize)
File "C:\Users\me\AppData\Local\Programs\Python\Python37\lib\site-packages\websocket\_socket.py", line 115, in recv
"Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.
[Finished in 1.9s]
What am I missing here?
Update 1: updating code using WebSocketApp:
def on_message(ws, message):
print("message:", message)
def on_error(ws, error):
print("error:", error)
def on_close(ws):
print("closed.")
def on_open(ws):
print("opened")
time.sleep(1)
ws.send(nego_message)
ws = websocket.WebSocketApp(
"wss://stream179.forexpros.com/echo/894/l27e2ja8/websocket",
on_open = on_open,
on_message = on_message,
on_error = on_error,
on_close = on_close,
header = headers
)
websocket.enableTrace(True)
ws.run_forever()
but still no success:
--- request header ---
GET /echo/894/l27e2ja8/websocket HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Host: stream179.forexpros.com
Origin: http://stream179.forexpros.com
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,fa;q=0.8
Cache-Control: no-cache
Connection: Upgrade
Host: stream179.forexpros.com
Origin: https://www.investing.com
Pragma: no-cache
Sec-WebSocket-Extensions: client_max_window_bits
Sec-WebSocket-Key: ldcvnZNquzPkSNvpSdI09g==
Sec-WebSocket-Version: 13
Upgrade: websocket
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36
-----------------------
--- response header ---
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: XPKKpUMZLpSYx/1z8Q0499hcobs=
-----------------------
opened
send: b'\x81\xfe\x06{_\xda7\xd2\x04\xf8L\xf0\x00\xbfA\xb71\xae\x15\xe8}\xb8B\xbe4\xf7D\xa7=\xa9T\xa06\xb8R\xf0s\xf8C\xa8\x16\x9e\x15\xe8g\xf6\x15\xbf:\xa9D\xb38\xbf\x15\xe8}\xaa^\xb6r\xeb\r\xf7z\xaa^\xb6r\xe2\x0f\xe1f\xe0\x12\xf7/\xb3S\xffn\xec\x01\xe8z\xffG\xbb;\xf7\x05\xe2e\xff\x12\xa26\xbe\x1a\xe3i\xe3\r\xf7z\xaa^\xb6r\xeb\x00\xe2e\xff\x12\xa26\xbe\x1a\xe6k\xe9\x04\xe4e\xff\x12\xa26\xbe\x1a\xe0h\xe0\x12\xf7/\xb3S\xffn\xed\x05\xe8z\xffG\xbb;\xf7\x05\xe8z\xffG\xbb;\xf7\x04\xe8z\xffG\xbb;\xf7\x02\xe8z\xffG\xbb;\xf7\x00\xe8z\xffG\xbb;\xf7\x0e\xe8z\xffG\xbb;\xf7\x06\xe2e\xff\x12\xa26\xbe\x1a\xebk\xef\x01\xe0f\xe0\x12\xf7/\xb3S\xffn\xeb\r\xf7z\xaa^\xb6r\xeb\x01\xe8z\xffG\xbb;\xf7\x01\xeae\xff\x12\xa26\xbec\xb7<\xb2d\xa72\xb7\x1a\xe3e\xff\x12\xa26\xbec\xb7<\xb2d\xa72\xb7\x1a\xe0e\xff\x12\xa26\xbec\xb7<\xb2d\xa72\xb7\x1a\xe1e\xff\x12\xa26\xbec\xb7<\xb2d\xa72\xb7\x1a\xe7e\xff\x12\xa26\xbec\xb7<\xb2d\xa72\xb7\x1a\xe5e\xff\x12\xa26\xbec\xb7<\xb2d\xa72\xb7\x1a\xebe\xff\x12\xa26\xbec\xb7<\xb2d\xa72\xb7\x1a\xe3o\xe0\x12\xf7/\xb3S\x97\'\xae\x1a\xe3e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1i\xe9\x03\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe4l\xe9\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x01\xe1i\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xec\x04\xeae\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6k\xed\x0e\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe7n\xe2\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xee\x02\xe3k\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xef\x06\xe4e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6j\xeb\x02\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe7n\xed\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x01\xe7k\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xee\x01\xe5e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1i\xef\x04\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe6i\xe2\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xee\x02\xe6j\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xef\x03\xebe\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6j\xee\x0f\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe7k\xed\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xee\x02\xe7o\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xef\x03\xe4e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6j\xef\x06\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe7j\xe9\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xee\x02\xe7m\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xed\x03\xe1e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6h\xee\x03\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe4i\xeb\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xee\x03\xe4f\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xee\x00\xe2e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1i\xe2\x07\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe4g\xe8\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x01\xean\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xec\x0f\xe5e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1i\xe3\x03\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe4g\xef\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x01\xeaf\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xec\x0f\xeae\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1i\xe3\x02\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe4f\xe2\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe2k\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xed\x07\xe7e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1h\xe8\x03\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe5m\xe9\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe0j\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xed\x05\xe4e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6j\xe3\x06\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe5l\xec\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe1l\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xed\x04\xe6e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1h\xee\x07\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe5l\xeb\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe1m\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xed\x04\xe2e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6i\xeb\x00\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe4n\xec\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe1h\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xe5g\xe9\x07\xe6e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1i\xee\x02\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe4n\xe3\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe7j\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xed\x02\xe5e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1h\xec\x07\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe5j\xec\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe7g\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebl\xed\x02\xebe\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1h\xec\x06\xe8z\xffR\xa4:\xb4C\xffl\xe3\x04\xe5i\xe8\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xee\x03\xean\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xec\x05\xe7e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe1h\xef\x03\xe8z\xffR\xa4:\xb4C\xffl\xe3\x03\xe6g\xe9\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xe9\x00\xe5j\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xec\x05\xe3e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6i\xe8\x05\xe8z\xffR\xa4:\xb4C\xffl\xed\x01\xe5n\xea\r\xf7z\xbfA\xb71\xae\x1a\xe1f\xee\x01\xe0l\xe0\x12\xf7:\xacR\xbc+\xf7\x04\xebk\xee\x0f\xe6e\xff\x12\xb7)\xbfY\xa6r\xe9\x0e\xe6i\xe8\x03\xe8z\xff^\xa1\x10\xaaR\xbc\x1a\xa2T\xbar\xeb\r\xf7z\xb3D\x9d/\xbfY\x97\'\xb9_\xffm\xe0\x12\xf76\xa9x\xa2:\xb4r\xaa<\xb2\x1a\xe3l\xe0\x12\xf76\xa9x\xa2:\xb4r\xaa<\xb2\x1a\xe1e\xff\x12\xbb,\x95G\xb71\x9fO\xb17\xf7\x03\xe8z\xff^\xa1\x10\xaaR\xbc\x0f\xbb^\xa0r\xeb\r\xf7z\xb3D\x9d/\xbfY\x82>\xb3E\xffg\xe2\x04\xebe\xff\x12\xbb,\x95G\xb71\x8aV\xbb-\xf7\x03\xe6l\xe9\x01\xe8z\xffT\xbf+\xf7\x06\xffj\xf7\x06\xe8z\xffS\xbd2\xbb^\xbcr\xeb\r\xf0"\xf8j'
message: o
send: b'\x88\x82!\xdd\x07\xcf"5'
closed.
[Finished in 2.3s]
I tried to remove all the dashes from the message sent and eventually it worked.
nego_message = '{"_event":"bulk-subscribe","tzID":8,"message":"pid-0:%%isOpenExch-1:%%pid-8849:%%isOpenExch-1004:%%pid-8833:%%pid-8862:%%pid-8830:%%pid-8836:%%pid-8831:%%pid-8916:%%pid-8832:%%pid-169:%%pid-20:%%isOpenExch-2:%%pid-166:%%pid-172:%%isOpenExch-4:%%pid-27:%%isOpenExch-3:%%pid-167:%%isOpenExch-9:%%pid-178:%%isOpenExch-20:%%pid-6408:%%pid-6369:%%pid-13994:%%pid-6435:%%pid-13063:%%pid-26490:%%pid-243:%%pid-1:%%isOpenExch-1002:%%pid-2:%%pid-3:%%pid-5:%%pid-7:%%pid-9:%%pid-10:%%pid-23705:%%pid-23706:%%pid-23703:%%pid-23698:%%pid-8880:%%isOpenExch-118:%%pid-8895:%%pid-1141794:%%pid-1175152:%%isOpenExch-152:%%pid-1175153:%%pid-14958:%%pid-44336:%%isOpenExch-97:%%pid-8827:%%pid-6497:%%pid-941155:%%pid-104395:%%pid-1013048:%%pid-1055979:%%pid-1177973:%%pid-1142416:%%pidExt-1:%%cmt-1-5-1:%%pid-252:%%pid-1031244:%%isOpenExch-125:"}'
ws.send(nego_message)
while True:
print(ws.recv())
Outputs:
a["{\"message\":\"pid-3::{\\\"pid\\\":\\\"3\\\",\\\"last_dir\\\":\\\"greenBg\\\",\\\"last_numeric\\\":149.19,\\\"last\\\":\\\"149.19\\\",\\\"bid\\\":\\\"149.18\\\",\\\"ask\\\":\\\"149.19\\\",\\\"high\\\":\\\"149.29\\\",\\\"low\\\":\\\"149.12\\\",\\\"last_close\\\":\\\"149.26\\\",\\\"pc\\\":\\\"-0.07\\\",\\\"pcp\\\":\\\"-0.05%\\\",\\\"pc_col\\\":\\\"redFont\\\",\\\"turnover\\\":\\\"18.13K\\\",\\\"turnover_numeric\\\":\\\"18126\\\",\\\"time\\\":\\\"0:39:09\\\",\\\"timestamp\\\":1666139948}\"}"]
The while loop is calling ws.recv() twice. If you simply do:
print(ws.recv())
It will not attempt to call .recv() on a closed connection. The result of your message output is printing o before the stack trace.
As an aside, it seems like you might want a longer running connection using websocket.WebSocketApp (example) for a scrape.
I'm currently trying to get a new ip via python.
show source :
import urllib.request
from stem import Signal
from stem.control import Controller
import socks, socket, time, random
proxy_support = urllib.request.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib.request.build_opener(proxy_support)
UA = [
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:26.0) Gecko/20100101 Firefox/26.0',
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
'Mozilla/5.0 (Windows NT 6.1; rv:26.0) Gecko/20100101 Firefox/26.0',
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36'
]
def newI():
controller = Controller.from_port(port = 9051)
try:
controller.authenticate()
controller.signal(Signal.NEWNYM)
bytes_read = controller.get_info("traffic/read")
bytes_written = controller.get_info("traffic/written")
print (bytes_read)
print (bytes_written)
finally:
controller.close()
if __name__ == '__main__':
params = 'site:google.com admin'
page = 0
for i in range(100):
url = 'http://www.google.co.kr/search?hl=ko&q=%s&start=%d' %(urllib.parse.quote(params), page)
proxy_support = urllib.request.ProxyHandler({"http" : "127.0.0.1:8118"})
urllib.request.install_opener(opener)
user_agent = random.choice(UA)
headers = {'User-Agent' : user_agent}
random_interval = random.randrange(1, 5, 1)
time.sleep(random_interval)
req = urllib.request.Request(url, headers = headers)
res = urllib.request.urlopen(req)
html = res.read()
print (len(html))
page = page + 10
newI()
I have my vidalia running and privoxy. I have my settings correctly set:
Web Proxy (HTTP): 127.0.0.1:8118 and the same for HTTPS
In my privoxy config file I have this line:
forward-socks5 / 127.0.0.1:9050 .
Though still when I run the code it is stuck on on case 1 and I can't get an ip. This is the log of my vidalia:
1. settings > Sharing > Run as client only
2. settings > Advanced > 127.0.0.1 : 9051
Though still when I run the code it is stuck on on case 1 and I can't get an ip. This is the log of my vidalia:
Traceback (most recent call last):
File "C:/Users/kwon/PycharmProjects/google_search/test.py", line 50, in <module>
res = urllib.request.urlopen(req)
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 507, in error
result = self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
What am I doing wrong ?
Google prevent automated requests, take a look to this another post Tor blocked by Google