I have the following CSP report:
"csp-report": {
"effective-directive": "script-src",
"referrer": "",
"status-code": 200,
"original-policy": "img-src 'self' data: https://redacted-development.s3.amazonaws.com https://s3.eu-west-2.amazonaws.com https://app.appzi.io https://cdn.ywxi.net https://randomuser.me;object-src 'none';form-action 'self';frame-ancestors 'none';base-uri 'self';report-uri /report-csp;script-src 'strict-dynamic' 'unsafe-inline' 'nonce-M2EyZTVhMzItNDY5My00YTI5LWE3MzEtM2NjMjdjMjc0ZmQ0' 'nonce-M2JiODg4NWQtODJjNy00MTZjLTkyYzMtZjY1MDIyMDQwYzgw' 'nonce-M2Y3MDQ1YWUtNThiZi00MWI3LTg1NzQtYjg2NDAxMmE1YjZl' 'nonce-MjMyMjUxZGUtZTQ1MS00OGZlLTk2NGYtZGM0NzQwZDBlOGQx' 'nonce-Mjg2M2U1ZTgtZmYyNS00YzllLWI1ZDItODY1NWUxNjIxMzQx' 'nonce-MmMyMmQyNWYtNWU4OC00NjRhLWEzNDYtYjc1NDg4ZTMzOGUy' 'nonce-MzZjZTE4MGItMWQyZi00YzRhLWFhMmQtMjlhMjg1ZTQzZDdl' 'nonce-NDExZTg5MjYtODQ1ZC00ZTE5LThjYmEtYmU3NmY5ZDg2MjI0' 'nonce-NDhiNmU5YjktYzEyYS00NjFjLWJmMWItNzU0MzI2NTlkOGNh' 'nonce-NWI2Yzg1YzktN2JkZC00OGY5LWFhODktZTFhN2MxZTUxNTNj' 'nonce-NzFjNTUzN2YtMWQ3MC00ODY5LWJhYmUtOGYxYjBiZjc0Y2Yx' 'nonce-NzgzNjI3ZDctNWU0ZC00ZWI0LThiN2UtODk5NWFhODNjY2Zj' 'nonce-OTUwNzMyM2EtZmExMS00NjA1LThjNGMtZjQzYTFiZTM4NmQx' 'nonce-OWIxZDNlZGMtZWQxZS00ZjRlLTg4OWYtY2RkOTdiYzFmMDFh' 'nonce-Y2ExZDg4OWEtM2ExOS00NzE0LTk2NjEtZWYzNmQyNzkxZDE2' 'nonce-ZDRkNDc2ZmYtMDQ4Yi00MDY4LWFjOWQtMTZkMmMzYmFhNWQw' 'nonce-ZTU4ZTIxNGItNmZiYy00ODM4LTljZDQtMzhhY2RkZTMxMWE2' 'nonce-ZmYyMzg3ZjgtNjY0Zi00ZDEyLWE0NTMtYWNhMzYzNGE2YmI2'",
"document-uri": "https://redacted.com/",
"violated-directive": "script-src 'strict-dynamic' 'unsafe-inline' 'nonce-M2EyZTVhMzItNDY5My00YTI5LWE3MzEtM2NjMjdjMjc0ZmQ0' 'nonce-M2JiODg4NWQtODJjNy00MTZjLTkyYzMtZjY1MDIyMDQwYzgw' 'nonce-M2Y3MDQ1YWUtNThiZi00MWI3LTg1NzQtYjg2NDAxMmE1YjZl' 'nonce-MjMyMjUxZGUtZTQ1MS00OGZlLTk2NGYtZGM0NzQwZDBlOGQx' 'nonce-Mjg2M2U1ZTgtZmYyNS00YzllLWI1ZDItODY1NWUxNjIxMzQx' 'nonce-MmMyMmQyNWYtNWU4OC00NjRhLWEzNDYtYjc1NDg4ZTMzOGUy' 'nonce-MzZjZTE4MGItMWQyZi00YzRhLWFhMmQtMjlhMjg1ZTQzZDdl' 'nonce-NDExZTg5MjYtODQ1ZC00ZTE5LThjYmEtYmU3NmY5ZDg2MjI0' 'nonce-NDhiNmU5YjktYzEyYS00NjFjLWJmMWItNzU0MzI2NTlkOGNh' 'nonce-NWI2Yzg1YzktN2JkZC00OGY5LWFhODktZTFhN2MxZTUxNTNj' 'nonce-NzFjNTUzN2YtMWQ3MC00ODY5LWJhYmUtOGYxYjBiZjc0Y2Yx' 'nonce-NzgzNjI3ZDctNWU0ZC00ZWI0LThiN2UtODk5NWFhODNjY2Zj' 'nonce-OTUwNzMyM2EtZmExMS00NjA1LThjNGMtZjQzYTFiZTM4NmQx' 'nonce-OWIxZDNlZGMtZWQxZS00ZjRlLTg4OWYtY2RkOTdiYzFmMDFh' 'nonce-Y2ExZDg4OWEtM2ExOS00NzE0LTk2NjEtZWYzNmQyNzkxZDE2' 'nonce-ZDRkNDc2ZmYtMDQ4Yi00MDY4LWFjOWQtMTZkMmMzYmFhNWQw' 'nonce-ZTU4ZTIxNGItNmZiYy00ODM4LTljZDQtMzhhY2RkZTMxMWE2' 'nonce-ZmYyMzg3ZjgtNjY0Zi00ZDEyLWE0NTMtYWNhMzYzNGE2YmI2'",
"blocked-uri": "https://redacted.com/static/js/browser.polyfill.min.js?etag=dp1dNqwV"
}
The user agent for this report is
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763
This parses as Edge 18.0 on Windows 10.
It is not clear to me what a status code of 200 means in the context of a CSP report, and this seems not to occur on other browsers.
n.b. The real web address has been redacted.
"status-code" is the HTTP status code of the resource on which the object was instantiated. In your case the status code 200 is output because the request was executed correctly and without errors.
It seems to me that this only occurs in IE, I don't see this code 200 in either Chrome or Safari.
(I was debugging my CSP using the Report-Uri website. They showed these phenomena only with IE.)
Related
I have a large number of short URLs and I want to expand them. I found somewhere online (I missed the source) the following code:
short_url = "t.co/NHBbLlfCaa"
r = requests.get(short_url)
if r.status_code == 200:
print("Actual url:%s" % r.url)
It works perfectly. But I get this error when I ping the same server for many times:
urllib3.exceptions.MaxRetryError:
HTTPConnectionPool(host='www.fatlossadvice.pw', port=80): Max retries
exceeded with url:
/TIPS/KILLED-THAT-TREADMILL-WORKOUT-WORD-TO-TIMMY-GACQUIN.ASP (Caused
by NewConnectionError(': Failed to establish a new connection: [Errno
11004] getaddrinfo failed',))
I tried many solutions like the set here: Max retries exceeded with URL in requests, but nothing worked.
I was thinking about another solution, which is to pass an useragent in the request, and each time I change it randomly (using a large number of useragents):
user_agent_list = [
'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0',
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0',
'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36',
]
r = requests.get(short_url, headers={'User-Agent': user_agent_list[np.random.randint(0, len(user_agent_list))]})
if r.status_code == 200:
print("Actual url:%s" % r.url)
My problem is that r.url always return the short url instead of the long one (the expanded one).
What am I missing?
You can prevent the error by adding allow_redirects=False to requests.get() method to prevent redirecting to page that doesn't exist (and thus raising the error). You have to examine the header sent by server yourself (replace XXXX by https, remove spaces):
import requests
short_url = ["XXXX t.co /namDL4YHYu",
'XXXX t.co /MjvmV',
'XXXX t.co /JSjtxfaxRJ',
'XXXX t.co /xxGSANSE8K',
'XXXX t.co /ZRhf5gWNQg']
for url in short_url:
r = requests.get(url, allow_redirects=False)
try:
print(url, r.headers['location'])
except KeyError:
print(url, "Page doesn't exist!")
Prints:
XXXX t.co/namDL4YHYu http://gottimechillinaround.tumblr.com/post/133931725110/tip-672
XXXX t.co/MjvmV Page doesn't exist!
XXXX t.co/JSjtxfaxRJ http://www.youtube.com/watch?v=rE693eNyyss
XXXX t.co/xxGSANSE8K http://www.losefattips.pw/Tips/My-stretch-before-and-after-my-workout-is-just-as-important-to-me-as-my-workout.asp
XXXX .co/ZRhf5gWNQg http://www.youtube.com/watch?v=3OK1P9GzDPM
Im trying to login to Plus500 with in python. Everething is ok, with status code 200, and response from server. But the server will not accept my requuest.
I did every single step that the webBrowser thoes. Headers like web browser. Always the same result.
url = "https://trade.plus500.com/AppInitiatedImm/WebTrader2/?webvisitid=" + self.tokesession+ "&page=login&isInTradeContext=false"
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
"Host": "trade.plus500.com",
"Connection": "keep-alive",
"Referer": "https://app.plus500.com/trade",
"Cookie":"webvisitid=" + self.cookiesession + ";"+\
"IP="+self.hasip}
param = "ClientType=WebTrader2&machineID=33b5db48501c9b0e5552ea135722b2c6&PrimaryMachineId=33b5db48501c9b0e5552ea135722b2c6&hl=en&cl=en-GB&AppVersion=87858&refurl=https%3A%2F%2Fwww.plus500.co.uk%2F&SessionID=0&SubSessionID=0"
response = self.session.request(method="POST",
url=url,
params=param,
headers=header,stream=True)
The code above is the initialization of the web app. after that do the login. But it always come up with JSON reply : AppSessionRquired. I think I already try everinthing that i can think of. If some one as idea.
want to scrape data from of each block and want to change the pages, but not able to do that,help me someone to crack this.
i tried to crawl data using headers and form data , but fail to do that.
below is my code.
from bs4 import BeautifulSoup
import requests
url='http://www.msmemart.com/msme/listings/company-list/agriculture-product-stocks/1/585/Supplier'
headers={
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie": "unique_visitor=49.35.36.33; __utma=189548087.1864205671.1549441624.1553842230.1553856136.3; __utmc=189548087; __utmz=189548087.1553856136.3.3.utmcsr=nsic.co.in|utmccn=(referral)|utmcmd=referral|utmcct=/; csrf=d665df0941bbf3bce09d1ee4bd2b079e; ci_session=ac6adsb1eb2lcoogn58qsvbjkfa1skhv; __utmt=1; __utmb=189548087.26.10.1553856136",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
"Accept": "application/json, text/javascript, */*; q=0.01",
}
data ={
'catalog': 'Supplier',
'category':'1',
'subcategory':'585',
'type': 'company-list',
'csrf': '0db0757db9473e8e5169031b7164f2a4'
}
r = requests.get(url,data=data,headers=headers)
soup = BeautifulSoup(html,'html.parser')
div = soup.find('div',{"id":"listings_result"})
for prod in div.find_all('b',string='Products/Services:').next_sibling:
print(prod)
getting "ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host" 2-3 times, i want to crawl all text details in a block.plz someone help me to found this.
I am using python requests library for making http requests. For this website : https://www.epi.org/resources/budget/ i am unable to read the HTML response as it is not human readable , looks like its protected by cloudfare ddos protection . Here is my simple code below.
import requests
headers = {'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,pt;q=0.8',
'cache-control': 'max-age=0',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36',
'upgrade-insecure-requests': '1'}
s = requests.Session()
a = s.get('https://www.epi.org/resources/budget/',headers=headers)
print (a.text)
The response HTML looks like this : https://justpaste.it/6ie73
The reason why got unreadable content is Accept-Encoding. Differ from browser, if Python got gzip response we have to unpack it by ourselves. Also br need to do so but with different module brotl . So you should set 'Accept-Encoding': 'default'. Btw, if you need full content that rendering is necessary.
I am writing a simple script to get public profile visible without login on LinkedIn.
Below is my code to get the page for beautifulsoup. I am using public proxies as well.
import urllib.request, urllib.error
from bs4 import BeautifulSoup
url = "https://www.linkedin.com/company/amazon"
proxy = urllib.request.ProxyHandler({'https': proxy, })
opener = urllib.request.build_opener(proxy)
urllib.request.install_opener(opener)
hdr = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3218.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,hi;q=0.8',
'Connection': 'keep-alive'}
req = urllib.request.Request(url, headers=hdr)
page = urllib.request.urlopen(req, timeout=20)
self.soup = BeautifulSoup(page.read(), "lxml")
But it is raising "HTTPError 999 - request Denied" error. This is only for testing purpose till I am getting access via partnership program.
What am I doing wrong? Please help.
You did not do anything wrong, LinkedIn blacklist cloud servers ip addresses to prevent "stealing" their data. Questionable practice but this is how it is.