What is the best way to keep the sitemap.xml updated - node.js

We have a job portal website built with MEAN Stack with over 100,000 active jobs and 1,000 active company profile pages.
This data is completely dynamic. Every day tens of job postings and companies are registered.
The problem is that we should keep the sitemap.xml file(s) always updated with the latest active links.
Does anybody know the best practice to keep the sitemap.xml always updated in such a dynamic website?

In your express app you can add a sitemap.xml route like so
const xml = require('xml');
... // other routes
app.get('/sitemap.xml', function (req, res) {
// get routes from database or others
response.set('Content-Type', 'text/xml');
response.send(xml(jobs));
})
Your jobs could contain last edited date and the url to your jobs so it could be very accurate.
Or there are packages like express-sitemap-xml you can use that uses a great example to get routes from a database and generates a sitemap.xml
const express = require('express')
const expressSitemapXml = require('express-sitemap-xml')
const app = express()
app.use(expressSitemapXml(getUrls, 'https://bitmidi.com'))
async function getUrls () {
return await getUrlsFromDatabase() // this function would be to get your database jobs into an object
}

Related

What is the difference between "app.get/post/put/delete()" and "router.get/post/put/delete()"? [duplicate]

I'm starting with NodeJS and Express 4, and I'm a bit confused. I been reading the Express website, but can't see when to use a route handler or when to use express.Router.
As I could see, if I want to show a page or something when the user hits /show for example I should use:
var express = require('express')
var app = express()
app.get("/show", someFunction)
At the beginning, I thought this was old (for Express 3). Is that right or this is the way for Express 4 too?
If this is the way to do it in Express 4, what is express.Router used for?
I read almost the same example as above but using express.Router:
var express = require('express');
var router = express.Router();
router.get("/show", someFunction)
So, what's the difference between both examples?
Which one should I use if I just want to do a simple testing website?
app.js
var express = require('express'),
dogs = require('./routes/dogs'),
cats = require('./routes/cats'),
birds = require('./routes/birds');
var app = express();
app.use('/dogs', dogs);
app.use('/cats', cats);
app.use('/birds', birds);
app.listen(3000);
dogs.js
var express = require('express');
var router = express.Router();
router.get('/', function(req, res) {
res.send('GET handler for /dogs route.');
});
router.post('/', function(req, res) {
res.send('POST handler for /dogs route.');
});
module.exports = router;
When var app = express() is called, an app object is returned. Think of this as the main app.
When var router = express.Router() is called, a slightly different mini app is returned. The idea behind the mini app is that each route in your app can become quite complicated, and you'd benefit from moving all that code into a separate file. Each file's router becomes a mini app, which has a very similar structure to the main app.
In the example above, the code for the /dogs route has been moved into its own file so it doesn't clutter up the main app. The code for /cats and /birds would be structured similarly in their own files. By separating this code into three mini apps, you can work on the logic for each one in isolation, and not worry about how it will affect the other two.
If you have code (middleware) that pertains to all three routes, you can put it in the main app, before the app.use(...) calls. If you have code (middleware) that pertains to just one of those routes, you can put it in the file for that route only.
Express 4.0 comes with the new Router. As mentioned on the site:
The express.Router class can be used to create modular mountable route
handlers. A Router instance is a complete middleware and routing
system; for this reason it is often referred to as a “mini-app”.
There is a good article at https://scotch.io/tutorials/learn-to-use-the-new-router-in-expressjs-4 which describes the differences and what can be done with routers.
To summarize
With routers you can modularize your code more easily. You can use routers as:
Basic Routes: Home, About
Route Middleware to log requests to the console
Route with Parameters
Route Middleware for Parameters to validate specific parameters
Validates a parameter passed to a certain route
Note:
The app.router object, which was removed in Express 4, has made a comeback in Express 5. In the new version, it is a just a reference to the base Express router, unlike in Express 3, where an app had to explicitly load it.
How they are different
Everyone, including the documentation, tends to refer back to how much they are the same, but not actually reference any differences. Well, they are, in fact, different.
var bigApp = express();
var miniApp = express.Router();
listen()
The most obviously difference is that the bigApp will give listen, which just a rather confusing way to do what would otherwise be simple and obvious the node http or https module:
var server = require('http').createServer(bigApp);
server.listen(8080, function () {
console.info(server.address());
});
I consider this an anti-pattern because it abstracts and obscures away something that wasn't complicated or difficult in the first place, and then makes it difficult for people to use websockets and other middleware that require the raw http server.
Internal State
The big difference, which is really important, is that all bigApps have separate internal state.
bigApp.enable('trust proxy');
bigApp.enabled('trust proxy');
// true
var bigApp2 = express();
bigApp2.enabled('trust proxy');
// false
bigApp.use('/bunnies', bigApp2);
// WRONG! '/bunnies' will NOT trust proxies
A miniApp passed to a bigApp, however, will be operated by the bigApp in such a way that its internal state and thisness will be preserved and those routes will behave accordingly.
bigApp.enable('trust proxy');
bigApp.enabled('trust proxy');
// true
var miniApp = express.Router();
bigApp.use('/bunnies', miniApp);
// CORRECT! All state and such are preserved
This can be a big deal because express does a lot of (sometimes trixy) things to the http.ServerRequest and httpServerResponse object - such as modifying (or hijacking) req.url and req.originalUrl and various other properties you've been using without realizing - and you probably don't want that duplicated and separated.
Smaller API
There is a smaller, more well-defined number of functions a Router can use:
.use(mount, fn)
.all(mount, fn)
.options(mount, fn)
.head(mount, fn)
.get(mount, fn)
.post(mount, fn)
.patch(mount, fn)
.put(mount, fn)
.delete(mount, fn)
.route(mount).XXXX
.param(name, cb).XXXX
There are a few other convenience methods as well, such as basic(), but you won't find set() or enable() or other methods that change the larger app state.
app.route('/book')
.get(function (req, res) {
res.send('Get a random book')
})
.post(function (req, res) {
res.send('Post a random book')
})
As in above example, we can add different HTTP request method under a route.
Let’s say your application is little complex. So what we do first is we divide the application into multiple modules so that changes in one module doesn't clutter the others and you can keep working on individual modules, but at the end of the day you need to integrate everything into one since you are building a single application. It is like we have one main application and few child applications whose parent is the main application.
So when we create the parent application we create one using
const express = require("express");
const parent = express();
And to this parent application we need to bring in the child applications. But since the child applications are not totally different applications (since they run in the same context - java term), express provides the way to do it by means on the Express's Router function and this is what we do in the each child module file and lets call one such child module as aboutme.
const express = require("express");
export const router = express.Router();
By export we are making this module available for other to consume and since we have modularized things we need to make the module files available to the parent application by means of node's require function just like any other third party modules and the parent file looks something like this:
const express = require("express");
const parent = express();
const child = require("./aboutme");
After we make this child module available to the parent, we need to tell the parent application when to use this child application. Lets say when a user hits the path aboutme we need the child application about me to handle the request and we do it by using the Express's use method:
parent.use("/aboutme", child);
and in one shot the parent file looks like this:
const express = require("express");
const parent = express();
const child = require("./aboutme");
parent.use("/aboutme", child);
Above all what the parent can do is it can start a server where as the child cannot. Hope this clarifies. For more information you can always look at the source code which takes some time but it gives you a lot of information.
using app.js to write routes means that they are accessible to all the users as app.js is loaded on application start. However, putting routes in express.router() mini apps protect and restrict their accessibility.
In a word , express.Routercan do more things when compares to app.get(),such as middleware, moreover, you can define one more router object with express.Router()
express.Router has many options:
enable case sensitivity: /show route to not be the same as /Show, this behavior is disabled by default
strict routing mode: /show/ route to not the same as /show, this behavior is also disabled by default
we can add specific middleware/s to specific routes
In one of the questions in the quiz this was asked: "express.Router() creates an object that behaves similar to the app object."
The correct answer is 'True'. I know that we can both create routers by using either of the two but is it safe to say that they are not the same in all cases? If my understanding is correct, the express() variable can do more things like start a server while the other one cannot.
In a complicated application, app is module, for example article and user. router is controller or action in module, for example article create and list.
E.g the url https://example.com/article/create parse article module and create router.
also app and router can be level-in-level.

Node and Express – Conditionally Displaying Pages

What I want to do seems elementary; but, I am running into some blocks.
All I want to do is display pages based on a condition.
var express = require("express");
var app = express();
app.get('/', function(req, res) {
if (userIsLoggedIn()) {
res.sendFile(__dirname + '/public/index.html');
} else {
res.sendFile(__dirname + '/public/accessDenied.html');
}
});
I am looking to grab information from the browser – I want to call a function from another browserify-ed file, and use the return value to determine which page is displayed to the user.
I can't run the server from app.js because it needs to be browserify-ed since it requires Web3. And since the function relies on state, I am not sure how to access this state from the server file.
You have to post data from the browser to the server either with a form or query parameter.
For example:
web3.shh.post(object [, callback])
https://web3js.readthedocs.io/en/1.0/web3-shh.html#post
on the server site you need to extract those values and have to reply based on your posting.
How to process POST data in Node.js?
Unfortunately your use case is not clear ,but in general you might want to check out how to handle HTTP API communication.

Register new route at runtime in NodeJs/ExpressJs

I want to extend this open topic: Add Routes at Runtime (ExpressJs) which sadly didn't help me enough.
I'm working on an application that allows the creation of different API's that runs on NodeJs. The UI looks like this:
As you can see, this piece of code contains two endpoints (GET, POST) and as soon as I press "Save", it creates a .js file located in a path where the Nodejs application is looking for its endpoints (e.g: myProject\dynamicRoutes\rule_test.js).
The problem that I have is that being that the Nodejs server is running while I'm developing the code, I'm not able to invoke these new endpoints unless I restart the server once again (and ExpressJs detects the file).
Is there a way to register new routes while the
NodeJs (ExpressJs) is running?
I tried to do the following things with no luck:
app.js
This works if the server is restarted. I tried to include this library (express-dynamic-router, but not working at runtime.)
//this is dynamic routing function
function handleDynamicRoutes(req,res,next) {
var path = req.path; //http://localhost:8080/api/rule_test
//LoadModules(path)
var controllerPath = path.replace("/api/", "./dynamicRoutes/");
var dynamicController = require(controllerPath);
dynamicRouter.index(dynamicController[req.method]).register(app);
dynamicController[req.method] = function(req, res) {
//invocation
}
next();
}
app.all('*', handleDynamicRoutes);
Finally, I readed this article (#NodeJS / #ExpressJS: Adding routes dynamically at runtime), but I couldn't figure out how this can help me.
I believe that this could be possible somehow, but I feel a bit lost. Anyone knows how can I achieve this? I'm getting a CANNOT GET error, after each file creation.
Disclaimer: please know that it is considered as bad design in terms of stability and security to allow the user or even administrator to inject executable code via web forms. Treat this thread as academic discussion and don't use this code in production!
Look at this simple example which adds new route in runtime:
app.get('/subpage', (req, res) => res.send('Hello subpage'))
So basically new route is being registered when app.get is called, no need to walk through routes directory.
All you need to do is simply load your newly created module and pass your app to module.exports function to register new routes. I guess this one-liner should work just fine (not tested):
require('path/to/new/module')(app)
Is req.params enough for you?
app.get('/basebath/:path, (req,res) => {
const content = require('content/' + req.params.path);
res.send(content);
});
So the user can enter whatever after /basepath, for example
http://www.mywebsite.com/basepath/bergur
The router would then try to get the file content/bergur.js
and send it's contents.

Differences between express.Router and app.get?

I'm starting with NodeJS and Express 4, and I'm a bit confused. I been reading the Express website, but can't see when to use a route handler or when to use express.Router.
As I could see, if I want to show a page or something when the user hits /show for example I should use:
var express = require('express')
var app = express()
app.get("/show", someFunction)
At the beginning, I thought this was old (for Express 3). Is that right or this is the way for Express 4 too?
If this is the way to do it in Express 4, what is express.Router used for?
I read almost the same example as above but using express.Router:
var express = require('express');
var router = express.Router();
router.get("/show", someFunction)
So, what's the difference between both examples?
Which one should I use if I just want to do a simple testing website?
app.js
var express = require('express'),
dogs = require('./routes/dogs'),
cats = require('./routes/cats'),
birds = require('./routes/birds');
var app = express();
app.use('/dogs', dogs);
app.use('/cats', cats);
app.use('/birds', birds);
app.listen(3000);
dogs.js
var express = require('express');
var router = express.Router();
router.get('/', function(req, res) {
res.send('GET handler for /dogs route.');
});
router.post('/', function(req, res) {
res.send('POST handler for /dogs route.');
});
module.exports = router;
When var app = express() is called, an app object is returned. Think of this as the main app.
When var router = express.Router() is called, a slightly different mini app is returned. The idea behind the mini app is that each route in your app can become quite complicated, and you'd benefit from moving all that code into a separate file. Each file's router becomes a mini app, which has a very similar structure to the main app.
In the example above, the code for the /dogs route has been moved into its own file so it doesn't clutter up the main app. The code for /cats and /birds would be structured similarly in their own files. By separating this code into three mini apps, you can work on the logic for each one in isolation, and not worry about how it will affect the other two.
If you have code (middleware) that pertains to all three routes, you can put it in the main app, before the app.use(...) calls. If you have code (middleware) that pertains to just one of those routes, you can put it in the file for that route only.
Express 4.0 comes with the new Router. As mentioned on the site:
The express.Router class can be used to create modular mountable route
handlers. A Router instance is a complete middleware and routing
system; for this reason it is often referred to as a “mini-app”.
There is a good article at https://scotch.io/tutorials/learn-to-use-the-new-router-in-expressjs-4 which describes the differences and what can be done with routers.
To summarize
With routers you can modularize your code more easily. You can use routers as:
Basic Routes: Home, About
Route Middleware to log requests to the console
Route with Parameters
Route Middleware for Parameters to validate specific parameters
Validates a parameter passed to a certain route
Note:
The app.router object, which was removed in Express 4, has made a comeback in Express 5. In the new version, it is a just a reference to the base Express router, unlike in Express 3, where an app had to explicitly load it.
How they are different
Everyone, including the documentation, tends to refer back to how much they are the same, but not actually reference any differences. Well, they are, in fact, different.
var bigApp = express();
var miniApp = express.Router();
listen()
The most obviously difference is that the bigApp will give listen, which just a rather confusing way to do what would otherwise be simple and obvious the node http or https module:
var server = require('http').createServer(bigApp);
server.listen(8080, function () {
console.info(server.address());
});
I consider this an anti-pattern because it abstracts and obscures away something that wasn't complicated or difficult in the first place, and then makes it difficult for people to use websockets and other middleware that require the raw http server.
Internal State
The big difference, which is really important, is that all bigApps have separate internal state.
bigApp.enable('trust proxy');
bigApp.enabled('trust proxy');
// true
var bigApp2 = express();
bigApp2.enabled('trust proxy');
// false
bigApp.use('/bunnies', bigApp2);
// WRONG! '/bunnies' will NOT trust proxies
A miniApp passed to a bigApp, however, will be operated by the bigApp in such a way that its internal state and thisness will be preserved and those routes will behave accordingly.
bigApp.enable('trust proxy');
bigApp.enabled('trust proxy');
// true
var miniApp = express.Router();
bigApp.use('/bunnies', miniApp);
// CORRECT! All state and such are preserved
This can be a big deal because express does a lot of (sometimes trixy) things to the http.ServerRequest and httpServerResponse object - such as modifying (or hijacking) req.url and req.originalUrl and various other properties you've been using without realizing - and you probably don't want that duplicated and separated.
Smaller API
There is a smaller, more well-defined number of functions a Router can use:
.use(mount, fn)
.all(mount, fn)
.options(mount, fn)
.head(mount, fn)
.get(mount, fn)
.post(mount, fn)
.patch(mount, fn)
.put(mount, fn)
.delete(mount, fn)
.route(mount).XXXX
.param(name, cb).XXXX
There are a few other convenience methods as well, such as basic(), but you won't find set() or enable() or other methods that change the larger app state.
app.route('/book')
.get(function (req, res) {
res.send('Get a random book')
})
.post(function (req, res) {
res.send('Post a random book')
})
As in above example, we can add different HTTP request method under a route.
Let’s say your application is little complex. So what we do first is we divide the application into multiple modules so that changes in one module doesn't clutter the others and you can keep working on individual modules, but at the end of the day you need to integrate everything into one since you are building a single application. It is like we have one main application and few child applications whose parent is the main application.
So when we create the parent application we create one using
const express = require("express");
const parent = express();
And to this parent application we need to bring in the child applications. But since the child applications are not totally different applications (since they run in the same context - java term), express provides the way to do it by means on the Express's Router function and this is what we do in the each child module file and lets call one such child module as aboutme.
const express = require("express");
export const router = express.Router();
By export we are making this module available for other to consume and since we have modularized things we need to make the module files available to the parent application by means of node's require function just like any other third party modules and the parent file looks something like this:
const express = require("express");
const parent = express();
const child = require("./aboutme");
After we make this child module available to the parent, we need to tell the parent application when to use this child application. Lets say when a user hits the path aboutme we need the child application about me to handle the request and we do it by using the Express's use method:
parent.use("/aboutme", child);
and in one shot the parent file looks like this:
const express = require("express");
const parent = express();
const child = require("./aboutme");
parent.use("/aboutme", child);
Above all what the parent can do is it can start a server where as the child cannot. Hope this clarifies. For more information you can always look at the source code which takes some time but it gives you a lot of information.
using app.js to write routes means that they are accessible to all the users as app.js is loaded on application start. However, putting routes in express.router() mini apps protect and restrict their accessibility.
In a word , express.Routercan do more things when compares to app.get(),such as middleware, moreover, you can define one more router object with express.Router()
express.Router has many options:
enable case sensitivity: /show route to not be the same as /Show, this behavior is disabled by default
strict routing mode: /show/ route to not the same as /show, this behavior is also disabled by default
we can add specific middleware/s to specific routes
In one of the questions in the quiz this was asked: "express.Router() creates an object that behaves similar to the app object."
The correct answer is 'True'. I know that we can both create routers by using either of the two but is it safe to say that they are not the same in all cases? If my understanding is correct, the express() variable can do more things like start a server while the other one cannot.
In a complicated application, app is module, for example article and user. router is controller or action in module, for example article create and list.
E.g the url https://example.com/article/create parse article module and create router.
also app and router can be level-in-level.

Node.js middleware or not

I'm new to Node.js and I'm migrating a simple site of mine to Node.js mostly as a learning experience.
In all my sites, I like to keep the most relevant information on the site in a "sitemeta" object. This is queried from Redis on each request and if that fails (which it only does if sitemeta gets updated and then it reads it from MySQL instead and saves it in redis so that next request again gets it from redis as redis is much speedier than MySQL).
So, in PHP I would simply add the call for sitemeta in a settings.php file that I always include on top of each file so that the information like home_url or site_mode (that are part of sitemeta object) is always there.
However, now in Node.js I was wondering if this is really the way to go or if there's a better way to actually have this happen as a middleware rather than making like so, at the top of every controller file (router files, really).
//in index.js
var express = require('express');
var router = express.Router();
var site = require('./../lib/sitemeta');
...
//using it for something.
var siteMeta = site.meta();
res.render('index', { title: siteMeta.title });
Or, could I even have the sitemeta instantiated in app.js so that it's queried only once (when the node app.js starts) unless it needs to be updated and if so a refresh can be provoked somehow?
Thanks.
I think the best approach is through a middleware. Wherever you store you metadata (memory, redis, etc), having a middleware will let you inject you siteMeta in all requests, with the capacity to adapt your siteMeta based on the received request (locale, etc). We usually use a lot of small, easily testable middleware functions to inject different data elements for another middleware down the pipeline to process and produce the response.
// with promises
app.use(function(req, res, next){
req.siteMeta = loadSiteMeta();
next();
});
app.get('/endpoint', function(req, res){
// wait for the promise to be fulfilled
req.siteMeta.then(function(siteMeta){
res.render('my-view', { title: siteMeta.title });
});
});
// without promises
app.use(function(req, res, next){
loadSiteMeta(function(err, siteMeta){
req.siteMeta = siteMeta;
next();
});
});
app.get('/endpoint', function(req, res) {
// siteMeta will be populated
res.render('my-view', { title: req.siteMeta.title });
});
The actual implementation of loadSiteMeta will depend on your selected storage. The benefits of using promises instead of classic callbacks here is that, if you have multiple middleware loading different data elements before reaching your final processing function, they will be performed in parallel instead of sequentially. You might want to use Promise.all() to wait for all promises you need in your final function.

Resources