Request blocked if it is sent by node,js axios - node.js

I am using axios and a API (cowin api https://apisetu.gov.in/public/marketplace/api/cowin/cowin-public-v2) which has strong kind of protection against the web requests.
When I was getting error 403 on my dev machine (Windows) then, I solve it by just adding a header 'User-Agent'.
When I have deployed it to heroku I am still getting the same error.
const { data } = await axios.get(url, {
headers: {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
},
})

Using a fake user-agent in your headers can help with this problem, but there are other variables you may want to consider.
For example, if you are making multiple HTTP requests you may want to have multiple fake user-agents to and then randomize the user-agent for every request made. This can help limit the changes of your scraper being detected.
If that still doesn't work you may want to consider optimizing your headers further. Other than sending HTTP requests with a randomized user-agent, you can further imitate a browser's request Headers by adding more Headers than just the "user-agent"- then ensuring that the user-agent that is selected is consistent with the information sent from the rest of the headers.
You can check out here for more information.
On the site it will not only provide information on how to optimize your headers consistently with the user-agent, but also provide more solutions in case the above mentioned still was unsuccessful.
In my situation, it was the case that I had to bypass cloudflare. You can determine if this is your situation as well if you log your error to the terminal and then check if under the "server" key it says "cloudflare". In which case you can use this documentation for further assistance.

Related

Using proxy to make request results in bad request (400) error code

I'm using node-fetch and https-proxy-agent to make a request using a proxy, however, I get a 400 error code from the site I'm scraping only when I send the agent, without it, everything works fine.
import fetch from 'node-fetch';
import Proxy from 'https-proxy-agent';
const ip = PROXIES[Math.floor(Math.random() * PROXIES.length)]; // PROXIES is a list of ips
const proxyAgent = Proxy(`http://${ip}`);
fetch(url, {
agent: proxyAgent,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.72 Safari/537.36'
}
}).then(res => res.text()).then(console.log)
This results in a 400 error code like so:
I have absolutely no idea why this is happening. If you want to reproduce the issue, I'm scraping https://azlyrics.com. Please let me know what is wrong.
The issue has been fixed. I did not notice I was making a request to a https site with a http proxy. The site was using https protocol but the proxies were http only. Changing to https proxies works. Thank you.

Getting 403 forbidden status through python requests

I am trying to scrape a website content and getting 403 Forbidden status. I have tried solutions like using sessions for cookies and mocking browser through a 'User-Agent' header. Here is the code I have been using
session = requests.Session()
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
}
page = session.get('https://www.sizeofficial.nl/product/zwart-new-balance-992/343646_sizenl/', headers = headers)
Note that this approach works on other websites, it is just this one which does not seem to work. I have even tried using other headers which my browser is sending them, and it does not seem to work. Another approach I have tried is to first create a session cookie and then pass that cookie to session.get, still doesn't work for me. Is it not allowed to scrape the website or am I still missing something?
I am using python 3.8 requests to achieve this purpose.

Google reCAPTCHA cannot be solved in Electron BrowserWindow

In my Electron app I try to open an external website (e.g. BrowserWindow.lodUrl('www.abc.xyz')), which is protected by Googles reCAPATCHA. The browser Window with the page is open, so the user could solve the captcha and it does not act like a bot.
But somehow, the only response for the reCAPTCHA validation request is
)]}'
["rresp",null,null,null,null,null,1]
Also no reCAPTHCA popup for "street sign" or "crossign" selection appears.
Additionally I get a warning in the console
A cookie associated with a cross-site resource at http://google.com/ was set without the `SameSite` attribute.
A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`.
You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.
I could solve the problem, by adding the user agent to every request separately.
const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36';
newSession.webRequest.onBeforeSendHeaders((details, callback: (beforeSendResponse) => void) => {
details.requestHeaders['userAgent'] = userAgent;
callback({cancel: false, requestHeaders: details.requestHeaders});
})

Chrome doesn't send cookies after redirect

In node.js (using Hapi framework) I'm creating link for user to allow my app reading user account. Google handles that request and asks about giving permissions. Then Google makes redirect to my server with GET parameter as a response code and here I have an issue.
Google Chrome isn't sending cookie with session ID.
If I mark that cookie as a session cookie in cookie edit extension, it is sent. Same behavior in php, but php marks cookie as session when creating session, so it isn't problem. I'm using plugin hapi-auth-cookie, it creates session and handles everything about it. I also mark that cookie then in hapi-auth-cookie settings as non HttpOnly, because it was first difference, that I have noticed, when inspecting that PHP session cookie and mine in node.js. I have response 401 missing authentication on each redirect. If I place cursor in adress bar and hit enter, everything works fine, so it is an issue with redirect.
My question is basically, what may be causing that behavior. On the other hand I have to mention that firefox sends cookie after each request without any issues.
Headers after redirect (no cookie with session):
{
"host": "localhost:3000",
"connection": "keep-alive",
"cache-control": "max-age=0",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36",
"x-client-data": "CJS2eQHIprbJAQjEtskECKmdygE=",
"x-chrome-connected": "id=110052060380026604986,mode=0,enable_account_consistency=false",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"accept-encoding": "gzip, deflate, sdch, br",
"accept-language": "pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4"
}
Headers after hitting enter in adress bar (what will work fine):
{
"host": "localhost:3000",
"connection": "keep-alive",
"cache-control": "max-age=0",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"accept-encoding": "gzip, deflate, sdch, br",
"accept-language": "pl-PL,pl;q=0.8,en-US;q=0.6,en;q=0.4",
"cookie": "SESSID=very_long_string"
}
Strict cookies are not sent by the browser if the referrer is a different site. This will happen if the request is a redirect from a different site. Using lax will get around this issue, or you can make your site deal with not being able to access strict cookies on your first request.
I came across this issue recently and wrote more detail on strict cookies, referrers and redirects.
This issue is caused by hapi-auth-cookie not dealing yet with isSameSite (new feature of Hapi). We can set it manually, eg.
const server = new Hapi.Server(
connections: {
state: {
isSameSite: 'Lax'
}
}
);
But please consider that, by default you have 'Strict' option, and in many cases you may not want to change that value.
A recent version of Chrome was displaying this warning in the console:
A cookie associated with a cross-site resource at was set
without the SameSite attribute. A future release of Chrome will only
deliver cookies with cross-site requests if they are set with
SameSite=None and Secure.
My server redirects a user to an authentication server if they didn't have a valid cookie. Upon authentication, the user would be redirected back to my server with a validation code. If the code was verified, the user would be redirected again into the website with a valid cookie.
I added the SameSite=Secure option to the cookie but Chrome ignored the cookie after a redirect from the authentication server. Removing that option fixed the problem, but the warning still appears.
A standalone demo of this issue: https://gist.github.com/isaacs/8d957edab609b4d122811ee945fd92fd
It's a bug in Chrome.

how to using cookie with request (request , tough-cookie , node.js)

I'm wondering to know how to using cookie with request (https://github.com/mikeal/request)
I need to set a cookie which able to be fetched for every sub domains from request,
something like
*.examples.com
and the path is for every page, something like
/
then server-side able to fetch the data from cookie correctly, something like
test=1234
I found the cookies which setup from response was working fine,
I added a custom jar to save the cookies, something like
var theJar = request.jar();
var theRequest = request.defaults({
headers: {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.116 Safari/537.36'
}
, jar: theJar
});
but the cookies which I setup from request, only able to be fetched in same domain,
and I can't find a method to setup cookie in more options
for now if I want one cookie which able to be fetched in three sub domains,
I have to setup like this way:
theJar.setCookie('test=1234', 'http://www.examples.com/', {"ignoreError":true});
theJar.setCookie('test=1234', 'http://member.examples.com/', {"ignoreError":true});
theJar.setCookie('test=1234', 'http://api.examples.com/', {"ignoreError":true});
Is here any advance ways to setup a cookie from request,
made it able to be fetched in every sub domains ???
I just found the solution ....
theJar.setCookie('test=1234; path=/; domain=examples.com', 'http://examples.com/');
hm...I have to say, the document which for request is not so good..., lol

Resources