This might be a really dumb question, but is there a way to fetch without entering the server address? I'm wondering if I can just use "/init" instead of "http://localhost:3000/init"
try{
const result = await fetch("http://localhost:3001/init",
{
method:"GET",
headers:{
"content-type":"application/json"
}
});
response = await result.json();
}
catch(e){
console.log(e);
}
Is there a way to fetch without entering the server address
No.
In node.js, node-fetch requires a fully qualified URL. There is no "default" target domain or path that it could substitute like there is inside a browser web page with the browser version of fetch().
From the node-fetch documentation:
fetch(url[, options])
url should be an absolute url, such as https://example.com/.
A path-relative URL (/file/under/root) or protocol-relative URL
(//can-be-http-or-https.com/) will result in a rejected Promise.
If the problem you're really trying to solve here is to be able to write code that will work with different hosts (run locally and in a hosting environment), then you can set some sort of configuration variable with the hostname and then construct your URL using the host name in the configuration variable.
I am trying to fetch a site: link here. If you click on the link, it shows JSON: {"error":"Socket Error"}. I am trying to fetch that website, and return the error.
However, I get a 403 Forbidden error instead. Is there a reason for this? I turned CORS off, but I don't think it did anything. Here is an example of what I have tried:
async function b(){
error = await fetch('https://matchmaker.krunker.io/seek-game?hostname=krunker.io®ion=us-ca-sv&game=SV%3A4jve9&autoChangeGame=false&validationToken=QR6beUGVKUKkzwIsKhbKXyaJaZtKmPN8Rwgykea5l5FkES04b6h1RHuBkaUMFnu%2B&dataQuery=%7B%7D', {mode:'no-cors'}).then(res=>res.json())
console.log(JSON.stringify(error))
}
b()
Why doesn't anything seem to work?
Please comment if there is anything I need to add, this is my first Stack Overflow post so I am still slightly confused by what makes a good question. Thanks for helping!!
NOTE: My environment is Node.JS (testing on Repl.it which I think uses the latest Node version).
This particular host is protected width Cloudflare anti DDoS protection. The server doesn't accept requests made by fetch, but the do accept requests from curl. God knows why.
$ curl 'https://matchmaker.krunker.io/seek-game?hostname=krunker.io®ion=us-ca-sv&game=SV%3A4jve9&autoChangeGame=false&validationToken=QR6beUGVKUKkzwIsKhbKXyaJaZtKmPN8Rwgykea5l5FkES04b6h1RHuBkaUMFnu%2B&dataQuery=%7B%7D'
// => {"error":"Socket Error"}
You can use curl in node.js with node-libcurl package.
const { curly } = require('node-libcurl')
const url = 'https://matchmaker.krunker.io/seek-game?hostname=krunker.io®ion=us-ca-sv&game=SV%3A4jve9&autoChangeGame=false&validationToken=QR6beUGVKUKkzwIsKhbKXyaJaZtKmPN8Rwgykea5l5FkES04b6h1RHuBkaUMFnu%2B&dataQuery=%7B%7D'
curly.get(url)
.then(({ statusCode, data }) => console.log(statusCode, data))
// => 400 { error: 'Socket Error' }
Works as expected :-)
You can use a proxy such as allorigins.win which is a cors proxy that can retrieve the data from a URL in the form of json. You can fetch from this URL: https://api.allorigins.win/raw?url=https://matchmaker.krunker.io/game-list?hostname=krunker.io
Is there a way to check what the protocol is of an external site using NodeJS.
For example, for the purposes of URL shortening, people can provide a url, if they omit http or https, I'd check which it should be and add it.
I know I can just redirect users without the protocol, but I'm just curious if there is a way to check it.
Sure can. First install request-promise and its dependency, request:
npm install request request-promise
Now we can write an async function to take a URL that might be missing its protocol and, if necessary, add it:
const rq = require('request-promise');
async function completeProtocol(url) {
if (url.match(/^https?:/)) {
// fine the way it is
return url;
}
// https is preferred
try {
await rq(`https://${url}`, { method: 'HEAD' });
// We got it, that's all we need to know
return `https://${url}`;
} catch (e) {
return `http://${url}`;
}
}
Bear in mind that making requests like this could take up resources on your server particularly if someone spams a lot of these. You can mitigate that by passing timeout: 2000 as an option when calling rq.
Also consider only requesting the home page of the site, parsing off the rest of the URL, to mitigate the risk that this will be abused in some way. The protocol should be the same for the entire site.
So i'm learning web scraping with node 8, followed this
npm install --save request-promise cheerio puppeteer
The code is simple
const rp = require('request-promise');
const url = 'https://www.examples.com'; //good
rp(url).then( (html) => {
console.log(html);
}).catch( (e) => {
console.log(e);
});
Now if url is examples.com, i can see the plain html output, great.
Q1: If yahoo.com, it outputs binary data, e.g.
�i��,a��g�Z.~�Ż�ڔ+�<ٵ�A�y�+�c�n1O>Vr�K�#,bc���8�����|����U>��p4U>mś0��Z�M�Xg"6�lS�2B�+�Y�Ɣ���? ��*
why is this ?
Q2: Then with nasdaq.com,
const url = 'https://www.nasdaq.com/earnings/report/msft';
the above code just won't finish, seems hangs there.
Why is this please ?
I'm not sure about Q2, but I can answer Q1.
It seems like Yahoo is detecting you as a bot and preventing you from scraping the page! The most common method sites use to detect bots is via the User-Agent header. When you make a request using request-promise (which uses the request library internally), it does not set this header at all. This means websites can infer your request came from a program (instead of a web browser) because there is not User-Agent header. They will then treat you like a bot and send you back gibberish or never serve you content.
You can work around this by manually setting a User-Agent header to mimic a browser. Note this seems to work for Yahoo, but might not work for all websites. Other websites might use more advanced techniques to detect bots.
const rp = require('request-promise');
const url = 'https://www.yahoo.com'; //good
const options = {
url,
headers: {
'User-Agent': 'Mozilla/5.0 (Android 4.4; Mobile; rv:41.0) Gecko/41.0 Firefox/41.0'
}
};
rp(options).then( (html) => {
console.log(html);
}).catch( (e) => {
console.log(e);
});
Q2 might be related to this, but the above code does not solve it. Nasdaq might be running more sophisticated bot detection, such as checking for various other headers.
To learn node.js I'm creating a small app that get some rss feeds stored in mongoDB, process them and create a single feed (ordered by date) from these ones.
It parses a list of ~50 rss feeds, with ~1000 blog items, so it's quite long to parse the whole, so I put the following req.connection.setTimeout(60*1000); to get a long enough time out to fetch and parse all the feeds.
Everything runs quite fine, but the request is called twice. (I checked with wireshark, I don't think it's about favicon here).
I really don't get it.
You can test yourself here : http://mighty-springs-9162.herokuapp.com/feed/mde/20 (it should create a rss feed with the last 20 articles about "mde").
The code is here: https://github.com/xseignard/rss-unify
And if we focus on the interesting bits :
I have a route defined like this : app.get('/feed/:name/:size?', topics.getFeed);
And the topics.getFeed is like this :
function getFeed(req, res) {
// 1 minute timeout to get enough time for the request to be processed
req.connection.setTimeout(60*1000);
var name = req.params.name;
var callback = function(err, topic) {
// if the topic has been found
if (topic) {
// aggregate the corresponding feeds
rssAggregator.aggregate(topic, function(err, rssFeed) {
if (err) {
res.status(500).send({error: 'Error while creating feed'});
}
else {
res.send(rssFeed);
}
},
req);
}
else {
res.status(404).send({error: 'Topic not found'});
}};
// look for the topic in the db
findTopicByName(name, callback);
}
So nothing fancy, but still, this getFeed function is called twice.
What's wrong there? Any idea?
This annoyed me for a long time. It's most likely the Firebug extension which is sending a duplicate of each GET request in the background. Try turning off Firebug to make sure that's not the issue.
I faced the same issue while using Google Cloud Functions Framework (which uses express to handle requests) on my local machine. Each fetch request (in browser console and within web page) made resulted in two requests to the server. The issue was related to CORS (because I was using different ports), Chrome made a OPTIONS method call before the actual call. Since OPTIONS method was not necessary in my code, I used an if-statement to return an empty response.
if(req.method == "OPTIONS"){
res.set('Access-Control-Allow-Origin', '*');
res.set('Access-Control-Allow-Headers', 'Content-Type');
res.status(204).send('');
}
Spent nearly 3hrs banging my head. Thanks to user105279's answer for hinting this.
If you have favicon on your site, remove it and try again. If your problem resolved, refactor your favicon url
I'm doing more or less the same thing now, and noticed the same thing.
I'm testing my server by entering the api address in chrome like this:
http://127.0.0.1:1337/links/1
my Node.js server is then responding with a json object depending on the id.
I set up a console log in the get method and noticed that when I change the id in the address bar of chrome it sends a request (before hitting enter to actually send the request) and the server accepts another request after I actually hit enter. This happens with and without having the chrome dev console open.
IE 11 doesn't seem to work in the same way but I don't have Firefox installed right now.
Hope that helps someone even if this was a kind of old thread :)
/J
I am to fix with listen.setTimeout and axios.defaults.timeout = 36000000
Node js
var timeout = require('connect-timeout'); //express v4
//in cors putting options response code for 200 and pre flight to false
app.use(cors({ preflightContinue: false, optionsSuccessStatus: 200 }));
//to put this middleaware in final of middleawares
app.use(timeout(36000000)); //10min
app.use((req, res, next) => {
if (!req.timedout) next();
});
var listen = app.listen(3333, () => console.log('running'));
listen.setTimeout(36000000); //10min
React
import axios from 'axios';
axios.defaults.timeout = 36000000;//10min
After of 2 days trying
you might have to increase the timeout even more. I haven't seen the express source but it just sounds on timeout, it retries.
Ensure you give res.send(); The axios call expects a value from the server and hence sends back a call request after 120 seconds.
I had the same issue doing this with Express 4. I believe it has to do with how it resolves request params. The solution is to ensure your params are resolved by for example checking them in an if block:
app.get('/:conversation', (req, res) => {
let url = req.params.conversation;
//Only handle request when params have resolved
if (url) {
res.redirect(301, 'http://'+ url + '.com')
}
})
In my case, my Axios POST requests were received twice by Express, the first one without body, the second one with the correct payload. The same request sent from Postman only received once correctly. It turned out that Express was run on a different port so my requests were cross origin. This caused Chrome to sent a preflight OPTION method request to the same url (the POST url) and my app.all routing in Express processed that one too.
app.all('/api/:cmd', require('./api.js'));
Separating POST from OPTIONS solved the issue:
app.post('/api/:cmd', require('./api.js'));
app.options('/', (req, res) => res.send());
I met the same problem. Then I tried to add return, it didn't work. But it works when I add return res.redirect('/path');
I had the same problem. Then I opened the Chrome dev tools and found out that the favicon.ico was requested from my Express.js application. I needed to fix the way how I registered the middleware.
Screenshot of Chrome dev tools
I also had double requests. In my case it was the forwarding from http to https protocol. You can check if that's the case by looking comparing
req.headers['x-forwarded-proto']
It will either be 'http' or 'https'.
I could fix my issue simply by adjusting the order in which my middlewares trigger.