Heroku. Should I use a Web or Worker Process? - node.js

Im very new in use heroku and I dont know when to use web dynos or workers. My code do http requests and downlaods archives from an external site. What I want to know if it has to be a worker or a web dyno
const https = require('https');
const fs = require("fs");
const tiktok = require("tiktok-scraper");
var link
(async () => {
try {
const posts = await tiktok.user('doarda', { number: 100 });
link = posts.collector[0].videoUrl
const optionsRequest = {
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "pt-BR,en-US;q=0.7,en;q=0.3",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Referer": 'https://www.tiktok.com/',
"Upgrade-Insecure-Requests": "1"
}
}
const file = fs.createWriteStream(posts.collector[0].id +".mp4");
const request = await https.get(link,optionsRequest, function(response) {
response.pipe(file)
});
} catch (error) {
console.log(error);
}
})();

You need a Web Dyno if your application is going to accept (incoming) HTTP requests. It will bind a given port ($PORT env variable) when it starts and the URL would be something like myapp.herokuapp.com
On the other hand the Worker does not require connectivity: it is typically a backend process that can perform some logic. Notice you can still initiate outgoing connection from the worker (ie connect to cloud service or web sites).
Also note that web processes are given their requests through heroku's dyno manager, meaning that your web process(in the free tier) will run only when it has active requests
On the other hand worker processes run until you stop them (either with the cli or the website)

Related

Web scraping using fetch - promise doesn't resolve

I am trying to fetch a particular website, and I already mimic all the request headers that Chrome sends and I am still getting a pending promise that never resolves.
Here is my current code and headers:
const fetch = require('node-fetch');
(async () => {
console.log('Starting fetch');
const fetchResponse = await fetch('https://www.g2a.com/rocket-league-pc-steam-key-global-i10000003107015', {
method: 'GET',
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Accept-Language': 'en-US;q=0.7,en;q=0.3',
'Accept-Encoding': 'gzip, deflate, br'
}
})
console.log('I never see this console.log: ', fetchResponse);
if(fetchResponse.ok){
console.log('ok');
}else {
console.log('not ok');
}
console.log('Leaving...');
})();
This is the console logs I can read:
Starting fetch
This is a pending promise: Promise { <pending> }
not ok
Leaving...
Is there something I can do here? I notice on similar questions that for this specific website, I only need to use Accept-Language header, I already tried that, but still the promise never gets resolved.
Also read on another question that they have security against Node.js requests, maybe I need to use another language?
You'll have a better time using async functions and await instead of then here.
I'm assuming your Node.js doesn't support top-level await, hence the last .then.
const fetch = require("node-fetch");
const headers = {
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",
"Accept-Language": "en-US;q=0.7,en;q=0.3",
"Accept-Encoding": "gzip, deflate, br",
};
async function doFetch(url) {
console.log("Starting fetch");
const fetchResponse = await fetch(url, {
method: "GET",
headers,
});
console.log(fetchResponse);
if (!fetchResponse.ok) {
throw new Error("Response not OK");
}
const data = await fetchResponse.json();
return data;
}
doFetch("https://www.g2a.com/rocket-league-pc-steam-key-global-i10000003107015").then((data) => {
console.log("All done", data);
});

Website access denied using puppeteer on cloud functions

I am trying to scape this url https://www.myntra.com/laptop-bag/chumbak/chumbak-unisex-brown-geo-bird--printed-laptop-bag/6795882/buy using puppeteer.
It's working when i use { headless: false }, but failing in headless mode.
Then i have compared response in both cases using this.
const resp = await page.goto(url);
console.log(resp);
Then i figured out that we need to add userAgent when using headless mode. so i have added this.
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');
Now it is working in both cases locally. But when i deploy to cloud function, it is still failing.
This is the screenshot taken using puppeteer.
this is some part of the response log.
_headers:
{ status: '403',
server: 'AkamaiGHost',
'mime-version': '1.0',
'content-type': 'text/html',
'content-length': '395',
expires: 'Thu, 09 Jul 2020 12:16:30 GMT',
date: 'Thu, 09 Jul 2020 12:16:30 GMT',
'set-cookie': 'AKA_A2=A; expires=Thu, 09-Jul-2020 13:16:30 GMT........
Am i missing anything?
Thanks.
update:
I have used puppeteer stealth plugin along with IP rotation. here is the code
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker')
puppeteer.use(AdblockerPlugin({ blockTrackers: true }))
And for IP rotation:
var browser = await puppeteer.launch({
headless: true,
args: ['--proxy-server=abcd-efg.proxymesh.com:12345']
});
var page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');
await page.authenticate({
username: 'myusername',
password: 'mypassword'
});
IP rotation working locally but still blocked on cloud function.
Using residential proxies fixed the issue.
Initially I have deployed in cloud function and AWS lambda with IP rotation. I have used proxymesh service for IP rotation. but it provides data center proxies only. It was failed. Then i tried with residential proxies from another service. It worked.

Node JS https request socket hang up (mikeal/request module)

I'm brand new to Node JS (v.10.9.0) and wanted to make a simple web scraping tool that gets statistics and ranks for players on this page. No matter what I can't make it work with this website, I tried multiple request methods including http.request and https.request and have gotten every method working with 'http://www.google.com'. However every attempt for this specific website either gives me a 301 error or a socket hang up error. The location the 301 error gives me is the same link but with a '/' on the end and requesting it results in a socket hang up. I know the site runs on port 443. Do some sites just block node js, why are browsers able to connect but not stuff like this?
Please don't link me to any other threads I've seen every single one and none of them have helped
var request = require('request');
var options = {
method: "GET",
uri: 'https://www.smashboards.com',
rejectUnauthorized: false,
port: '443'
};
request(options, function (error, response, body) {
console.log('error:', error); // Print the error if one occurred
console.log('statusCode:'); // Print the response status code if a response was received
console.log('body:', body); // Print the HTML for the Google homepage.
});
Error:
error: { Error: socket hang up
at createHangUpError (_http_client.js:322:15)
at TLSSocket.socketOnEnd (_http_client.js:425:23)
at TLSSocket.emit (events.js:187:15)
at endReadableNT (_stream_readable.js:1085:12)
at process._tickCallback (internal/process/next_tick.js:63:19) code: 'ECONNRESET' }
EDIT:
Adding this to my options object fixed my problem
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
OP Here
All I did was add:
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
To my options Object and it's working perfectly.
New code:
var request = require('request');
var options = {
method: 'GET',
uri: 'https://www.smashboards.com',
rejectUnauthorized: false,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
};
request(options, function (error, response, body) {
console.log('error:', error); // Print the error if one occurred
console.log('statusCode:'); // Print the response status code if a response was received
console.log('body:', body); // Print the HTML for the Google homepage.
});
Thats 12+ hours I'm never getting back

How to do a get request and get the same results that a browser would with Nodejs?

I'm trying to do a get request for image search, and I'm not getting the same result that I am in my browser. Is there a way to get the same result using node.js?
Here's the code I'm using:
var keyword = "Photographie"
keyword = keyword.replace(/[^a-zA-Z0-9éàèùâêîôûçëïü]/g, "+")
var httpOptions = { hostname: 'yandex.com',
path: '/images/search?text=' + keyword, //path does not accept spaces or dashes
headers: { 'Content-Type': 'application/x-www-form-urlencoded', 'user-agent': 'Mozilla/5.0'}}
console.log(httpOptions.hostname + httpOptions.path +postTitle)
https.get(httpOptions, (httpResponse) => {
console.log(`STATUS: ${httpResponse.statusCode}`);
httpResponse.setEncoding('utf8');
httpResponse.on('data', (htmlBody) => {
console.log(`BODY: ${htmlBody}`);
});
});
By switching to the request-promise library and using the proper capitalization of the User-Agent header name and an actual user agent string from the Chrome browser, this code works for me:
const rp = require('request-promise');
let keyword = "Photographie"
let options = { url: 'http://yandex.com/images/search?text=' + keyword,
headers: {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
};
rp(options).then(response => {
console.log(response);
}).catch(err => {
console.log(err);
});
When I try to run your actual code, I get a 302 redirect and a cookie set. I'm guessing that they are expecting you to follow the redirect and retain the cookie. But, you can apparently just switch to the above code and it appears to work for me. I don't know exactly what makes my code work, but it could be that is has a more recognizable user agent.

Error 503 trying to delete in mongodb using node and hosted in heroku

Using mongoose to connect to mongolab and hosted in heroku. Method get, post , put works perfectly but deleting is the "problem".
when i tried to delete. i got it first.
Request URL:https://---------.herokuapp.com/------/54b413c2647bec02001efdd0
Request Headers
Provisional headers are shown
Accept:application/json, text/plain, */*
Origin:null
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/39.0.2171.95 Safari/537.36
After 30 seconds more or less i got this.
Remote Address:150.100.2.200:8080
Request URL:https://---------.herokuapp.com/------/54b413c2647bec02001efdd0
Request Method:DELETE
Status Code:503 Service Unavailable
Request Headersview source
Accept:application/json, text/plain, */*
Accept-Encoding:gzip, deflate, sdch
Accept-Language:en-US,en;q=0.8,es-419;q=0.6,es;q=0.4,fr;q=0.2
Host:-----.herokuapp.com
Origin:null
User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/39.0.2171.95 Safari/537.36
Response Headersview source
Cache-Control:no-cache, no-store
Connection:keep-alive
Content-Length:484
Content-Type:text/html; charset=utf-8
Date:Mon, 12 Jan 2015 20:36:33 GMT
Server:Cowboy
I have this. Showing me an error 503 but.. if i chek on mongolab webpage the item is already gone. When i tested it in my local machine everything works fine including delete method but using heroku i got this problem.
-CORS avaliable.
-These are 3 different ways that i tried to do obtaining the same result.
exports.deleteNotificacion = function(req, res) {
var id = req.params.id;
console.log(id);
Todo.findById(id,function(err,notificacion){
notificacion.remove();
notificacion.save();
}); }
exports.deleteNotificacion = function(req, res) {
var id = req.params.id;
console.log(id);
Todo.findById(id,function(err,notificacion){
notificacion.remove (function(err){
if (!err) {
res.send('');
console.log('Removed');
}else {
console.log('ERROR: ' + err);
};
})
}); }
exports.deleteNotificacion = function(req, res) {
var id = req.params.id;
console.log(id);
Todo.findByIdAndRemove(id,function(err){
if(err){console.log("ERROR " + err);}
// res.send("eliminado");
});}
Code samples that you have posted are difficult to read, but it seems that you just don't send anything to the client if the request was successful. After 30s of waiting Heroku router timeouts and your client get a 503 page, as described in this blogpost.
One of the correct responses to a DELETE request is to send HTTP 204 code with no body:
res.status(204).end();

Resources