I am trying to access investing.com from excel 2016 using power query, I am getting a permission issue, is there a way to bypass it? It worked till yesterday, but now I am still getting this error even though I passed the cookie and header information in the URL. Below are the error and code
Error: [Permission Error] The credentials provided for the web source are invalid. Please update the credentials through a refresh or in the Data Source settings dialog to continue. Any guidance is much appreciated.
let
Source = ()=> Web.Contents("https://tvc6.investing.com/de339cb1628af015d591805dfffe9b13/1663439222/1/1/8/quotes?symbols=NSE%20%3ANSEI",[Headers=[#"accept-encoding"="gzip, deflate",#"cookie"="jLeqXJArEhTZVUQpwiP1gSPVlhn19b3fsB7uRcSUbyg-1666154846-0-AR2gelOI/3ZaH0SKRFuxhsgOGr/SOXpnbQ7e90mYM2iDmpTf8XcvvIB/FvGtgkB50nccnbWB1rFawcO37vJpSBU=",#"accept-language"="en-US,en;q=0.9",#"user-agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/106.0.1370.47"]])
in
Source
Related
In TrackJS, some user agents are parsed as normal browsers, e.g.:
Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
Chrome Mobile 59.0.3071
I tried to do it by ignore rules in settings, but it doesn't work.
So I need to filtrate errors by token in user agent.
Is it possible do this without JS?
More similar user agents: https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers
The TrackJS UI doesn't allow you to create Ignore Rules against the raw UserAgent, only the parsed browser and operating system. Instead, use the client-side ignore capability with the onError function.
Build your function to detect the tokens you want to exclude, and return false from the function if you don't want it to be sent.
I'm using email tracking mechanism in email by adding a hidden image URL https://example.com/tracking/open/SOME_UNIQUE_ID
The image URL converted to the following on Gmail.
<img src="https://ci3.googleusercontent.com/proxy/LP0uwO5fHA2LPxEfKkef1e9imTurKBU5wawN6p8SArM9l6CRtsT_dmRtTqfZDVpmWRlhgnRqr0uA9QO7w85wlGOl5DUl2G4rZ-0JQI4pXmlzjGho6yWUCA03oRRfwDOvd5HeGokeHMpHFQ=s0-d-e1-ft#https://example.com/tracking/open/SOME_UNIQUE_ID" width="0" height="0" border="0" alt="" role="presentation" class="CToWUd">
The problem here I can't detect the real user agent or IP, because it's always back related to google Ips and user agent 'Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)
I see some people get the correct Ip and location for GMAIL open/click tracking.
I checked all request headers but nothing useful for the real user. it's all related to google.
any suggestions for this?
Thanks.
For me taking user agent value from the header helped. If user agent is equals to "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 Mozilla/5.0" then its a Google bot.
private static boolean IsGoogleBot(HttpRequest req){
var userAgent = req.Headers["User-Agent"];
return userAgent == "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 Mozilla/5.0";}
The ip address will point to Google (on Gmail im sure), but with this method you can detect a real email opening.
I am trying to request the html data from the web site as shown below, but it prompts following error:
'Connection aborted.', OSError("(54, 'ECONNRESET')"
I have tried to add the certificate as well, but it also prompts following error:
Error: [('x509 certificate routines', 'X509_load_cert_crl_file', 'no certificate or crl found')]
The certificate is exported from Chrome.
Python Code:
import requests
from bs4 import BeautifulSoup
url ='https://www.openrice.com/zh/hongkong/restaurants/type/%E5%BF%AB%E9%A4%90%E5%BA%97?page=1'
html=requests.get(url, verify=False)
#html=requests.get(url, verify="/Users/xxx/Documents/Python/Go Daddy Root Certificate Authority - G2.cer")
Can you try this?
First of all, I didn't reproduce your environment the same way, and I tried to access the site from my PC, but it didn't work so well, so I added a user-agent to the header and it worked fine.
But I don't know if it will work on your PC.
import requests
url ='https://www.openrice.com/zh/hongkong/restaurants/type/%E5%BF%AB%E9%A4%90%E5%BA%97?page=1'
headers = {
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
}
html=requests.get(url,headers=headers)
print(html.text)
I am trying to upload a large file to share point of size 2GB its throwing error
System.Net.WebException: The request was aborted: The request was canceled.
Can anyone let me know what i can modify to overcome this. Here is my code below
The code is written through SSIS script task.
WebClient client = new WebClient();
client.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.33 Safari/537.36");
//Adding FedAuth cookies to Header O365
if (!String.IsNullOrEmpty(SPToken))
client.Headers.Add(HttpRequestHeader.Cookie, SPToken);
client.Credentials = new NetworkCredential(userName, password);
You can use webdav like https://msdn.microsoft.com/en-us/library/cc250097.aspx
You can also manually upload from internet explorer -> library -> open in explorer -> upload to use webdav option.
I've code for Proxy IP Rotation and user agent spoofing in order to use in scraping. But because of code was provided as an example, I don't know if it really works when I add it to my code.
I am a beginner in Python. I just add it to my .py file (after the codes that is for scraping). When I add it and start scraping it works and gets all the data but I don't know if it is working or not.
Do I have to create another file for these codes (user agent spoofing and IP rotation)?
And how can I know if these are working or not when I do scraping?
Does it matter if they have defined urls?
Proxy Rotation:
from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
proxies = ['121.129.127.209:80', '124.41.215.238:45169', '185.93.3.123:8080', '194.182.64.67:3128', '106.0.38.174:8080', '163.172.175.210:3128', '13.92.196.150:8080']
proxies = get_proxies()
proxy_pool = cycle(proxies)
url = 'https://httpbin.org/ip'
for i in range(1,11):
proxy = next(proxy_pool)
print("Request #%d"%i)
try:
response = requests.get(url,proxies={"http": proxy, "https": proxy})
print(response.json())
except:
print("Skipping. Connnection error")
User Agent Spoofing:
import requests
import random
user_agent_list = [
#Chrome
'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
#Firefox
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
]
url = 'https://httpbin.org/user-agent'
#Lets make 5 requests and see what user agents are used
#Using Requests
for i in range(1,6):
#Pick a random user agent
user_agent = random.choice(user_agent_list)
#Set the headers
headers = {'User-Agent': user_agent}
#Make the request
response = requests.get(url,headers=headers)
print("Request #%d\nUser-Agent Sent:%s\nUser Agent Recevied by HTTPBin:"%(i,user_agent))
print(response.content)
print("-------------------\n\n")
If you wanted to check if your proxy and user agent are rotating, you need to go to a request bin website, activate an endpoint and use that endpoint within your python code in place of what was previously requested.
You would then examine the request bin and read what is stated for user-agent and Ip address for the Get requests now listed after executing your python code.
I would suggest running a big number of requests than try to visualize the distribution of IPs you're getting. You can easily do this in your console with a for loop and a background curl command: see
https://weautomate.org/articles/load-testing-ip-rotation-proxy/