According to Web driver IO specification , I can set the browser's the user-agent for chrome as below:
desiredCapabilities: {
browserName: 'chrome',
chromeOptions: {
args: ['user-agent=Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X; en-us) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53']
}
}
However, I could not find a way to override for Microsoft Edge using desiredCapabilities config in Webdriver.
Based on the information available on the official EdgeDriver page, I do not think MS Edge provides an option to change user agent : https://learn.microsoft.com/en-us/microsoft-edge/webdriver#w3c-webdriver
Also, here is the java doc for EdgeOptions : https://seleniumhq.github.io/selenium/docs/api/java/org/openqa/selenium/edge/EdgeOptions.html which has a list of support EdgeOptions
Solution for PHP
$options = new ChromeOptions();
$options->addArguments(['--user-agent=' . $userAgent]);
$seleniumServerCapabilities = DesiredCapabilities::microsoftEdge();
$seleniumServerCapabilities->setCapability('ms:edgeOptions', $options);
For more intel checkout the source code of the Webdriver: https://github.com/SeleniumHQ/selenium/blob/trunk/java/src/org/openqa/selenium/edge/EdgeOptions.java
A quick google search gave me this :
For Desktop :
Mozilla/5.0 (Windows NT 10.0; <64-bit tags>) AppleWebKit/<WebKit Rev> (KHTML, like Gecko) Chrome/<Chrome Rev> Safari/<WebKit Rev> Edge/<EdgeHTML Rev>.<Windows Build>
For Mobile :
Mozilla/5.0 (WM 10.0; Android <Android Version>; <Device Manufacturer>; <Device Model>) AppleWebKit/<WebKit Rev> (KHTML, like Gecko) Chrome/<Chrome Rev> Mobile Safari/<WebKit Rev> Edge/<EdgeHTML Rev>.<Windows Build>
More Details
Related
I've got an app that sends emails with links to allow end users to triage leads (mark them dead, sold, disqualified, etc.) The links go a landing page on the server where the end user submits a POST form to register the update.
Looking at the server side logs, when the end user clicks the link in gmail (web client or an app), I can see the request from their browser to the server. Frequently, I'll see additional GET requests from google (specifically cache.google.com) following the end users click. These are probably ok, just google doing some form of spam/malicious link checking.
Under some undefined circumstances, after the google GETs, there is occasionally a POST. This seems plan wrong or anti-social at least. It is problematic for me because it causes an errant status update to be registered or submits a form that needs the end user to enter information.
Here's a log snippet showing an instance:
{ my IP redacted } test.com - [03/May/2022:16:51:27 -0400] "GET /app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19 HTTP/1.1" 200 7543 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.4 Safari/605.1.15"
167.142.232.4 test.com - [03/May/2022:17:23:17 -0400] "GET /app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19 HTTP/1.1" 200 7543 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
167.142.232.4 test.com - [03/May/2022:17:23:18 -0400] "GET /app/js/Common/emailValidation.js HTTP/1.1" 200 413 "https://test.com/app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
167.142.232.4 test.com - [03/May/2022:17:23:18 -0400] "GET /app/js/Leads/landing.js?4 HTTP/1.1" 200 7214 "https://test.com/app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
167.142.232.4 test.com - [03/May/2022:17:23:18 -0400] "GET /app/js/Leads/leadDetailValidators.js HTTP/1.1" 200 2812 "https://test.com/app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
167.142.232.4 test.com - [03/May/2022:17:23:19 -0400] "GET /app/images/email/logo.png HTTP/1.1" 200 4796 "https://test.com/app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
167.142.232.4 test.com - [03/May/2022:17:23:20 -0400] "POST /app/index.cgi HTTP/1.1" 302 - "https://test.com/app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
167.142.232.4 test.com - [03/May/2022:17:23:21 -0400] "GET /app/index.cgi?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur HTTP/1.1" 200 4828 "https://test.com/app/?m=l&t=3Nj6QOWbhvmde35zPqBsuyl71YMTlEur&s=19" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
Details:
Line 1 - end user clicks
Lines 2-6 - google loading the landing page and resources.
Line 7 - google posting to the form on landing page
Line 8 - redirect to form submission results
Other Details Noted:
The landing page has some javascript that changes some for the form data in the post. I can see that google is running the javascript on the page.
I removed the javascript on the page and simplified the form to include only basic form (no css, no js, bare minimum marker, no submit button in form) and the form still gets POSTed by google.
Questions:
Has anyone else seen this behavior?
What triggers this behavior? (we have other applications that do similar operations with links and don't see extra GETs or POSTs when clicking through from gmail)
What are some other ways to debug the situation?
It can be due to the preview functionality of the gmail application. If your users are long-clicking the link in the email, a preview window will pop which navigates to the url (at least on ios) I assume this preview window doesn't navigates directly to the url but rather through a reverse proxy that google provides, then things gets complicated after this point when your users try to submit a form inside this preview window.
I'm not sure of it but you can easily try/test the case to see if this is the problem or not.
I'm using email tracking mechanism in email by adding a hidden image URL https://example.com/tracking/open/SOME_UNIQUE_ID
The image URL converted to the following on Gmail.
<img src="https://ci3.googleusercontent.com/proxy/LP0uwO5fHA2LPxEfKkef1e9imTurKBU5wawN6p8SArM9l6CRtsT_dmRtTqfZDVpmWRlhgnRqr0uA9QO7w85wlGOl5DUl2G4rZ-0JQI4pXmlzjGho6yWUCA03oRRfwDOvd5HeGokeHMpHFQ=s0-d-e1-ft#https://example.com/tracking/open/SOME_UNIQUE_ID" width="0" height="0" border="0" alt="" role="presentation" class="CToWUd">
The problem here I can't detect the real user agent or IP, because it's always back related to google Ips and user agent 'Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)
I see some people get the correct Ip and location for GMAIL open/click tracking.
I checked all request headers but nothing useful for the real user. it's all related to google.
any suggestions for this?
Thanks.
For me taking user agent value from the header helped. If user agent is equals to "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 Mozilla/5.0" then its a Google bot.
private static boolean IsGoogleBot(HttpRequest req){
var userAgent = req.Headers["User-Agent"];
return userAgent == "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246 Mozilla/5.0";}
The ip address will point to Google (on Gmail im sure), but with this method you can detect a real email opening.
I need to open Instagram in mobile mode as it shows Messages tab only in mobile view. I want to know what Chrome does so I can replicate it in my Chrome extension.
I've already noticed Google Chrome updates the User Agent. I have however tried this and replaced the headers with "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Mobile Safari/537.36" but this doesn't seem enough for Instagram to show mobile view. I suspect Google Chrome does something extra or Instagram has some safety measure?
let mobileAgent = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Mobile Safari/537.36";
chrome.webRequest.onBeforeSendHeaders.addListener(
function(details) {
for (var i = 0; i < details.requestHeaders.length; ++i) {
if (details.requestHeaders[i].name === 'User-Agent') {
details.requestHeaders[i].value = mobileAgent;
j=i;
break;
}
}
return {requestHeaders: details.requestHeaders};
},
{urls: ["*://*.instagram.com/*"]},
["blocking", "requestHeaders"]);
With above code, I checked the headers and sure enough they are using mobile agent but Instagram still doesn't display messages tab.
Some webpages I encounter have links that are generated from a javascript code and I can only access them with phantomjs as per the code below.
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166"
driverpjs = webdriver.PhantomJS("/Users/xx/Downloads/phantomjs-2.1.1-macosx/bin/phantomjs",desired_capabilities=dcap)
with contextlib.closing(driverpjs) as browser:
browser.get(link)
links = browser.find_elements_by_xpath('.//a')
How do I do this with chrome ? Right now I am trying the below:
options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_argument('--user-agent="Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166"')
driver = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver", chrome_options=options)
with contextlib.closing(driver) as browser:
browser.get(link)
# GET ALL LINKS
#links = browser.find_elements_by_css_selector("a")
links = browser.find_elements_by_xpath('.//a')
To get all links on a page emulating the similar functionality of PhantomJS with Chrome using contextlib you can use the following solution:
Code Block:
from contextlib import closing
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("headless")
options.add_argument('--user-agent="Mozilla/5.0 (Windows Phone 10.0; Android 4.2.1; Microsoft; Lumia 640 XL LTE) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Mobile Safari/537.36 Edge/12.10166"')
driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe', chrome_options=options)
with closing(driver) as browser:
browser.get("https://www.google.com/")
# get all the elements with name as q
print(browser.find_elements_by_name('q'))
Console Output:
[<selenium.webdriver.remote.webelement.WebElement (session="ab581b3b679b521ffa5bf2220f801fcf", element="0.39081088826075705-1")>]
I've code for Proxy IP Rotation and user agent spoofing in order to use in scraping. But because of code was provided as an example, I don't know if it really works when I add it to my code.
I am a beginner in Python. I just add it to my .py file (after the codes that is for scraping). When I add it and start scraping it works and gets all the data but I don't know if it is working or not.
Do I have to create another file for these codes (user agent spoofing and IP rotation)?
And how can I know if these are working or not when I do scraping?
Does it matter if they have defined urls?
Proxy Rotation:
from lxml.html import fromstring
import requests
from itertools import cycle
import traceback
proxies = ['121.129.127.209:80', '124.41.215.238:45169', '185.93.3.123:8080', '194.182.64.67:3128', '106.0.38.174:8080', '163.172.175.210:3128', '13.92.196.150:8080']
proxies = get_proxies()
proxy_pool = cycle(proxies)
url = 'https://httpbin.org/ip'
for i in range(1,11):
proxy = next(proxy_pool)
print("Request #%d"%i)
try:
response = requests.get(url,proxies={"http": proxy, "https": proxy})
print(response.json())
except:
print("Skipping. Connnection error")
User Agent Spoofing:
import requests
import random
user_agent_list = [
#Chrome
'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36',
#Firefox
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)',
'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)'
]
url = 'https://httpbin.org/user-agent'
#Lets make 5 requests and see what user agents are used
#Using Requests
for i in range(1,6):
#Pick a random user agent
user_agent = random.choice(user_agent_list)
#Set the headers
headers = {'User-Agent': user_agent}
#Make the request
response = requests.get(url,headers=headers)
print("Request #%d\nUser-Agent Sent:%s\nUser Agent Recevied by HTTPBin:"%(i,user_agent))
print(response.content)
print("-------------------\n\n")
If you wanted to check if your proxy and user agent are rotating, you need to go to a request bin website, activate an endpoint and use that endpoint within your python code in place of what was previously requested.
You would then examine the request bin and read what is stated for user-agent and Ip address for the Get requests now listed after executing your python code.
I would suggest running a big number of requests than try to visualize the distribution of IPs you're getting. You can easily do this in your console with a for loop and a background curl command: see
https://weautomate.org/articles/load-testing-ip-rotation-proxy/