Website access denied using puppeteer on cloud functions - node.js

I am trying to scape this url https://www.myntra.com/laptop-bag/chumbak/chumbak-unisex-brown-geo-bird--printed-laptop-bag/6795882/buy using puppeteer.
It's working when i use { headless: false }, but failing in headless mode.
Then i have compared response in both cases using this.
const resp = await page.goto(url);
console.log(resp);
Then i figured out that we need to add userAgent when using headless mode. so i have added this.
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');
Now it is working in both cases locally. But when i deploy to cloud function, it is still failing.
This is the screenshot taken using puppeteer.
this is some part of the response log.
_headers:
{ status: '403',
server: 'AkamaiGHost',
'mime-version': '1.0',
'content-type': 'text/html',
'content-length': '395',
expires: 'Thu, 09 Jul 2020 12:16:30 GMT',
date: 'Thu, 09 Jul 2020 12:16:30 GMT',
'set-cookie': 'AKA_A2=A; expires=Thu, 09-Jul-2020 13:16:30 GMT........
Am i missing anything?
Thanks.
update:
I have used puppeteer stealth plugin along with IP rotation. here is the code
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker')
puppeteer.use(AdblockerPlugin({ blockTrackers: true }))
And for IP rotation:
var browser = await puppeteer.launch({
headless: true,
args: ['--proxy-server=abcd-efg.proxymesh.com:12345']
});
var page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');
await page.authenticate({
username: 'myusername',
password: 'mypassword'
});
IP rotation working locally but still blocked on cloud function.

Using residential proxies fixed the issue.
Initially I have deployed in cloud function and AWS lambda with IP rotation. I have used proxymesh service for IP rotation. but it provides data center proxies only. It was failed. Then i tried with residential proxies from another service. It worked.

Related

Heroku. Should I use a Web or Worker Process?

Im very new in use heroku and I dont know when to use web dynos or workers. My code do http requests and downlaods archives from an external site. What I want to know if it has to be a worker or a web dyno
const https = require('https');
const fs = require("fs");
const tiktok = require("tiktok-scraper");
var link
(async () => {
try {
const posts = await tiktok.user('doarda', { number: 100 });
link = posts.collector[0].videoUrl
const optionsRequest = {
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "pt-BR,en-US;q=0.7,en;q=0.3",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Referer": 'https://www.tiktok.com/',
"Upgrade-Insecure-Requests": "1"
}
}
const file = fs.createWriteStream(posts.collector[0].id +".mp4");
const request = await https.get(link,optionsRequest, function(response) {
response.pipe(file)
});
} catch (error) {
console.log(error);
}
})();
You need a Web Dyno if your application is going to accept (incoming) HTTP requests. It will bind a given port ($PORT env variable) when it starts and the URL would be something like myapp.herokuapp.com
On the other hand the Worker does not require connectivity: it is typically a backend process that can perform some logic. Notice you can still initiate outgoing connection from the worker (ie connect to cloud service or web sites).
Also note that web processes are given their requests through heroku's dyno manager, meaning that your web process(in the free tier) will run only when it has active requests
On the other hand worker processes run until you stop them (either with the cli or the website)

How to add custom headers in Playwright

headers["user-agent"] = fakeUa();
console.log(fakeUa())
let firstReq = true;
page.route('**/*', route => {
const request = route.request()
//console.log(request.url(), JSON.stringify(request.headers()));
if("x-j3popqvx-a" in request.headers()){
headers = request.headers();
//console.log(headers);
console.log("exiting");
return;
}
else {
console.log("in");
return route.continue({headers: headers});
}
});
let pageRes = await page.goto(url, {waitUntil: 'load', timeout: 0});
I want to add fake user agent when sending request to url. But it doesn't add the fake useragent rather goes with the default one.
While in puppeteer it was possible with the page.setUserAgent() method to apply a custom UA and page.setExtraHTTPHeaders() to set any custom headers, in playwright you can set custom user agent (userAgent) and headers (extraHTTPHeaders) as options of browser.newPage() or browser.newContext() like:
const page = await browser.newPage({ userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36' })
const page = await browser.newPage({
extraHTTPHeaders: {
'Cache-Control': 'no-cache'
}
})
Edit: In case you are using it with newContext() usage looks like this (make sure to set userAgent in the settings of newContext and not in newPage!):
const context = await browser.newContext({ userAgent: 'hello' })
const page = await context.newPage()
// to check the UA:
console.log(await page.evaluate(() => navigator.userAgent))
If you're using #playwright/test, you can set a user agent as follows:
import {expect, test} from "#playwright/test"; // ^1.30.0
const userAgent =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
test.describe("with user agent", () => {
test.use({userAgent});
test("does stuff", async ({page}) => {
await page.goto("https://example.com/");
await expect(page.locator("h1")).toHaveText("Example Domain");
});
});

CookieJars obtaining all cookies, nodeJS using request-promise

I am struggling to successfully make a request using request-promise npm on a site that requires a cookie to view or for the request to be successful.
Henceforth, I have looked into cookieJars in order to store all those that are given in the repsonse after the request has been done.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
Can someone tell me why the request isn't successfully going through? How do I apply the cookies that I initially get before being timed out.
const rp = require("request-promise")
var cookieJar = rp.jar()
function grabcfToken(){
let token = ""
let options = {
url : 'https://www.off---white.com/en/GB',
method: "GET",
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
resolveWithFullResponse : true,
jar: cookieJar
}
rp(options)
.then((response)=>{
console.log(response)
})
.catch((error)=>{
console.log(error)
})
}
If you're asking about including your jar which you filled with the cookies from the request to be sent to across you have to add jar: cookiejar as pasrt of your options object before sending it.

node js request proxy

I send a request through a proxy and always receive such a response
tunneling socket could not be established, cause=read ECONNRESET
or
tunneling socket could not be established, cause= socket hang up
My code
let settings = {
url: `url`,
headers: {
'Connection': 'keep-alive',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
},
method: 'POST',
proxy: `http://${ip}:${port}`,
strictSSL: false
}
request.request(settings, (err, response, body) => {
// err here
})
what am I doing wrong ?
Now this error : Error: Tunnel creation failed. Socket error: Error: read ECONNRESET
My code:
const request = require('request'),
proxyingAgent = require('proxying-agent');
;
let settings = {
url: url,
headers: {
'Connection': 'keep-alive',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
},
method: 'POST',
// proxy: `http://${obj.proxy[obj.proxyIdx]}`,
agent: proxyingAgent.create(`http://${obj.proxy[obj.proxyIdx]}`, url),
}
About your code, problem possibly lies in your settings object.
You need to use syntax like this:
let settings = {
url,
headers: {
'Connection': 'keep-alive',
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36"
},
method: 'POST',
proxy: `http://${ip}:${port}`,
strictSSL: false
}
Here we use ES6 to make object short.
But also, you can establish proxy connection with npm package proxying agent.
Your code should look something like this:
const proxyingAgent = require('proxying-agent');
const fetch = require('node-fetch');
const host = <your host>;
const port = <port>;
const creds = {
login: 'username',
password: 'pass'
};
const port = <proxy port>;
const buildProxy = (url) => {
return {
agent: proxyingAgent.create(`http://${creds.login}:${creds.password}#${host}:${port}`, url)
};
};
//If you don't have credentials for proxy, you can rewrite function
const buildProxyWithoutCreds = (url) => {
return {
agent: proxyingAgent.create(`http://${host}:${port}`, url)
};
};
And than you can use it with your url and credentials. We'll use fetch package.
const proxyGetData = async (url, type) => {
try {
const proxyData = buildProxyWithoutCreds(url);
// Make request with proxy. Here we use promise based library node-fetch
let req = await fetch(url, proxyData);
if (req.status === 200) {
return await req[type]();
}
return false;
} catch (e) {
throw new Error(`Error during request: ${e.message}`);
}
};

Not able to call a webservice from node js

I am trying to call a soap web service from node js. But the server does not show me any output either the success or the error. I am using http server for this purpose
My output is always :
[Thu Dec 15 2016 15:12:16 GMT+0530 (India Standard Time)] "GET /views/bookNGo.html?empId=EMP2&empName=ABCD&mobile=3312&alternateMobile=678&address=hjhjh&landMark=hjgjhgj&deptId=hjhj&projectId=hjgh&app
rovingManager=kjhkj&pickupPoint=jkhjkh&dropPoint=jkhjk" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"
var soap = require('soap');
console.log('One');
var url = 'https://ap2.salesforce.com/services/wsdl/class/InsertEmployee?WSDL';
var args = {"employeeId": "EMP2","employeeName": "Chandra","phoneNumber":"12345","alternatePhoneNumber": "98765","address": "Boduppal", "landMark": "Temple","departmentID": "001","projectID": "001","approvingManager": "Prashanth","pikupPoint": "Uppal","dropPoint":"Hitec"};
soap.createClient(url, function(err, client) {
client.createEmployee(args, function(err, result) {
console.log("Webservice called");
console.log(result);
});
});
I have tired running the code in https://runkit.com/npm/soap and it gives me an error saying "TypeError: Cannot read property 'InsertEmployeeService' of undefined".

Resources