why there is different `User-Agent` for same browser - node.js

I have a created a local node server and when i'm printing User-Agent from req of GET request, like:
router.get('**', function (req, res, next) {
if (req.header('User-Agent')) {
console.log('user-agent = ', (req.header('User-Agent')))
res.end(req.header('User-Agent'));
} else {
res.send('Hello World!!!')
}
});
then it print different User-Agent for / path and /favicon.ico path for my One Plus device.
result:-
/ = Mozilla/5.0 (Linux; Android 8.0.0; ONEPLUS A3003 Build/OPR6.170623.013) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.109 Mobile Safari/537.36
/favicon.ico = Mozilla/5.0 (Linux; Android 8.0.0; Build/OPR6.170623.013) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.109 Mobile Safari/537.36
why these two different User-Agent is comming from same browser.
I'm testing this on Chrome browser.

Related

Getting 444 response code while trying to web scrape in Python in Python using Request function and via scrapy

I am trying to make a request to "https://www.walmart.com/search/?page=1&query=" using request function or using scrapy module but getting the response code 444.
See below my snippet:
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36',
'Accept': 'application/json', 'Content-Type': 'application/json'
}
res = requests.get('https://www.walmart.com/', headers=headers)
cookie = res.cookies
res1 = requests.get('https://www.walmart.com/search/?page=1&query=',headers=res.headers,cookies=cookie)
But I m getting the res1.status_code as 444. Would appreciate any help here.
This is how you should reuse request elements:
import requests
with requests.Session() as connection:
connection.headers.update(
{
"Accept": "application/json",
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/89.0.4389.86 YaBrowser/21.3.0.740 "
"Yowser/2.5 Safari/537.36",
}
)
_ = connection.get("https://www.walmart.com/")
response = connection.get('https://www.walmart.com/search/?page=1&query=')
print(response.status_code)
Output (status code):
200

NODE JS 504 error on server, but works on localhost

I tried to simply get locations of photos in my route and render product.ejs file:
//Show individual product info
router.get('/product/:id', async function(req, res, next) {
let filesFromFolder;
Promise.all([
database.retreaveImage(req.params.id)
]).then(resultArr => {
filesFromFolder = resultArr[0];
res.render('product.ejs', {
productName: req.params.id,
data: filesFromFolder
});
});
});
It works on localhost, now i importet my route.js file in real server, and when i try to open product it throws 504 error.
tried to follow this instructions but no help.
Getting 504 GATEWAY_TIMEOUT NodeJs
grep -i "504" /var/log/nginx/access.log
82.135.208.60 - - [16/Sep/2019:07:52:25 +0000] "GET /product/line_fan_pool HTTP/1.1" 504 594 "http://13.58.120.242:3000/horizontal" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
82.135.208.60 - - [16/Sep/2019:08:03:15 +0000] "GET /product/line_pool HTTP/1.1" 504 594 "http://13.58.120.242:3000/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36"
82.135.208.60 - - [16/Sep/2019:08:10:19 +0000] "GET /product/line_pool HTTP/1.1" 504 594 "http://13.58.120.242:3000/" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Mobile Safari/537.36"
Problem was that, amazon web server did not support old version of MYsql so i had to update it. That solved my problem. Had to watch in pm2 monit for error.
I experienced a 504 in my node expressjs app when something broke inside my http-proxy-middleware.
It was a Promise error not catched and so the server silently timedout into the 504 instead of reporting the error.
So never forget to catch your Promises when they fall.
(Is this what happened here?)
//Show individual product info
router.get('/product/:id', async function(req, res, next) {
let filesFromFolder;
Promise.all([
database.retreaveImage(req.params.id)
]).then(resultArr => {
//...
}).catch(err => next); // catch and burn.
});

CookieJars obtaining all cookies, nodeJS using request-promise

I am struggling to successfully make a request using request-promise npm on a site that requires a cookie to view or for the request to be successful.
Henceforth, I have looked into cookieJars in order to store all those that are given in the repsonse after the request has been done.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
Can someone tell me why the request isn't successfully going through? How do I apply the cookies that I initially get before being timed out.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true,
jar: cookieJar
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
If you're asking about including your jar which you filled with the cookies from the request to be sent to across you have to add jar: cookiejar as pasrt of your options object before sending it.

Using PhantomJS w/ Node and Setting Custom UserAgent

I am having trouble setting a custom user-agent for my phantom page. I have searched for possible solutions but I seem to be missing some fundamental part of how this should be working because when I try to set my settings, my phantom just hangs and doesn't complete the request or move into the page.open method. Here is my code:
phantom.create().then(function(ph) {
ph.createPage().then(function(page) {
page.set('settings.userAgent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.256');
page.open(req.cookies.website).then(function(status) {
page.property('content').then(function(content) {
res.send(content);
page.close();
ph.exit();
});
});
});
});
In case anyone is wondering, I solved it... I just needed to look more closely at the phantom npm documentation. Here is the solution if anyone else has the same problem:
phantom.create().then(function(ph) {
ph.createPage().then(function(page) {
page.setting('userAgent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.256');
page.open(req.cookies.website).then(function(status) {
page.property('content').then(function(content) {
res.send(content);
page.close();
ph.exit();
});
});
});
});

How to set the user agent string in the phantom module?

var phantom = require('phantom');
console.dir(phantom);
phantom.create(function(browser){
browser.createPage(function(page){
page.customHeaders={
"HTTP_USER_AGENT": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36",
};
console.dir(page.settings);
//undefined
page.settings={};
page.settings.userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36';
page.settings.HTTP_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36';
console.dir(page.settings);
page.open('http://example.com/req.php', function() {
setTimeout(function() {
var output = page.evaluate(function() {
return document;
});
console.dir(output);
//undefined
}, 1000);
});});});
when I use phantomjs I try and set the header for userAgent using three different ways but when I visit the page and save the PHP $_SERVER object to a txt pad I still see PhantomJS
HTTP_USER_AGENT: Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.1-development Safari/538.1
not only that but the output of the page is also undefined.
It seems that the docs have changed or I cant find the correct ones. I am looking at
http://phantomjs.org/api/webpage/property/settings.html
https://www.npmjs.com/package/phantom
How is this used correctly?
According to the Functional Details in the docs, you have to set the user agent through page.set():
page.set('settings.userAgent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36');
It has to be done this way, because the bridge has to communicate with the PhantomJS process and isn't doing this in a non-asynchronous fashion. This could've probably been implemented with Object.defineProperty.
If you want to set multiple settings at once, you can do (ref):
page.set('settings', {
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11",
javascriptEnabled: false,
loadImages: false
});
You can find a list of settings that you can set in page.settings.
Currently [ 27.01.2018 ], with these requirements:
phantom: ^4.0.12,
webpage: ^0.3.0
i use this method to set up this property:
page.setting(key, value);
I checked it out with php in $_SERVER array. It works correctly.
Сompletely code looks like this:
const phantom = require('phantom');
(async function() {
const instance = await phantom.create();
const page = await instance.createPage();
page.setting('userAgent',"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11");
await page.on('onResourceRequested', function(requestData) {
//Dump request settings to view result of our changes:
console.info('Requesting', requestData);
});
const status = await page.open('https://stackoverflow.com');
const content = await page.property('content');
//console.log(content);
await instance.exit();
})();

Resources