I have a jobpage which has url as /jobpage/:categoryname/:companyname/:jobtitle/:jobid. Parameters are generated dynamically. I want all such dynamically generated links on sitemap. I have used express-sitemap package, code is as below -
var sitemap = require('express-sitemap');
sitemap({
sitemap: 'sitemap.xml', // path for .XMLtoFile
robots: 'robots.txt', // path for .TXTtoFile
generate: app, // option or function, is the same
sitemapSubmission: '/sitemap.xml', // path of sitemap into robots
url : 'xxxx',
map: {
'/jobpage': ['get'],
'/college': ['get'],
},
route: { // specific option for some route
'/jobpage': {
lastmod: '2016-04-25',
changefreq: 'weekly',
priority: 1.0,
},
},
}).toFile(); // write sitemap.xml and robots.txt
Sitemap is getting generated with link as
<url>
<loc>xxxx/jobpage/:categoryname/:companyname/:jobtitle/:jobid</loc>
</url>
How do I generate dynamic links? Any leads will be highly appreciated.
in my case i did it like below.
Create a separate file that sitemap_generator.js which actually read all database models which leads to pages.
then generate xml and write to web folder and in certain interval it keep updating xml as well.
it start creating sitemap when node server start. i did this manually because i found no automated solution comes with limitations.
i think most of time your business logic might not fit into any lib, because that libs can't know what dynamic pages can be. which you already knew.
https://www.npmjs.com/package/express-sitemap
Related
I'm trying to create dynamic pages based on a database that grows by the minute. Therefor it isn't an option to use createPage and build several times a day.
I'm using onCreatePage here to create pages which works fine for my first route, but when I try to make an English route somehow it doesn't work.
gatby-node.js:
exports.onCreatePage = async ({ page, actions: { createPage } }) => {
if (page.path.match(/^\/listing/)) {
page.matchPath = '/listing/:id'
createPage(page)
}
if (page.path.match(/^\/en\/listing/)) {
page.matchPath = '/en/listing/:id'
createPage(page)
}
}
What I'm trying to achieve here is getting 2 dynamic routes like:
localhost:8000/listing/123 (this one works)
localhost:8000/en/listing/123 (this one doesn't work)
My pages folder looks like this:
pages
---listing.tsx
---en/
------listing.tsx
Can anyone see what I'm doing wrong here?
--
P.S. I want to use SSR (available since Gatsby v4) by using the getServerData() in the templates for these pages. Will that work together with pages created dynamically with onCreatePage or is there a better approach?
According to what we've discussed in the comment section: the fact that the /en/ path is never created, hence is not entering the following condition:
if (page.path.match(/^\/en\/listing/)) {
page.matchPath = '/en/listing/:id'
createPage(page)
}
Points me to think that the issue is on your createPages API rather than onCreatePage, which means that your english page is not even created.
Keep in mind that onCreatePage API is a callback called when a page is created, so it's triggered after createPages.
If you add a console.log(page.path) you shouldn't see the English page in the IDE/text editor console so try debugging how are you creating the /en/ route because it seems that onCreatePage doesn't have any problem.
I'm developing a web crawler in nodejs. I've created a unique list of the urls in the website crawle body. But some of them have extensions like jpg,mp3, mpeg ... I want to avoid crawling those who have extensions. Is there any simple way to do that?
Two options stick out.
1) Use path to check every URL
As stated in comments, you can use path.extname to check for a file extension. Thus, this:
var test = "http://example.com/images/banner.jpg"
path.extname(test); // '.jpg'
This would work, but this feels like you'll wind up having to create a list of file types you can crawl or you must avoid. That's work.
Side note -- be careful using path. Typically, url is your best tool for parsing links because path is aimed at files/directories, not urls. On some systems (Windows), using path to manipulate a url can result in drama because of the slashes involved. Fair warning!
2) Get the HEAD for each link & see if content-type is set to text/html
You may have reasons to avoid making more network calls. If so, this isn't an option. But if it is OK to make additional calls, you could grab the HEAD for each link and check the MIME type stored in content-type.
Something like this:
var headersOptions = {
method: "HEAD",
host: "http://example.com",
path: "/articles/content.html"
};
var req = http.request(headersOptions, function (res) {
// you will probably need to also do things like check
// HTTP status codes so you handle 404s, 301s, and so on
if (res.headers['content-type'].indexOf("text/html") > -1) {
// do something like queue the link up to be crawled
// or parse the link or put it in a database or whatever
}
});
req.end();
One benefit is that you only grab the HEAD, so even if the file is a gigantic video or something, it won't clog things up. You get the HEAD, see the content-type is a video or whatever, then move along because you aren't interested in that type.
Second, you don't have to keep track of file names because you're using a standard MIME type to differentiate html from other data formats.
I'm having an issue getting the text! plugin to work in my requirejs site. It's always including lib/ in the request url, however all of the other files (not using text!) are being successfully found and loaded. Here is my directory structure:
WebContent/
|--backbone/
|--Bunch of folders and files
|--config/
|--config.js
|--lib/
|--jquery.js
|--text.js
|--require.js
|--index.html
my index.html file is:
<body>
<div id="siteLayoutContainer"></div>
<script data-main='config/config' src="lib/require.js"></script>
</body>
The config file is:
requirejs.config({
baseUrl: './',
paths: {
jquery: 'lib/jquery.js',
backbone: 'lib/backbone.js',
text: 'lib/text',
application: 'backbone/application'
},
text: {
env: 'xhr'
}
});
require(['application'], function(App) {
App.start();
});
I'm using the text! plugin like so:
define([
'jquery',
'text!backbone/templates/SomeTemplate.html'
], function(jQuery, NotFoundHtml) {
//Some code here
}
So, in the above script, the url being used for the template is:
http://localhost/lib/backbone/templates/SomeTemplate.html
and I am expecting it to be:
http://localhost/backbone/templates/SomeTemplate.html
I've tried the following:
Moving the text.js and require.js files out into the WebContent
directory but I get the same results. Also something interesting is
if I put a space after text! and then the path, that works fine and
doesn't include the lib/ directory in the request to get the html
template. However the optimizer includes the space and can't find the
template.
Not defining a baseUrl - same results.
Moved the
require config.js content into index.html in it's own script tag that runs
before the require.js script tag - same results.
Getting rid of the the text options in the config file
Oh yeah, forgot I've also tried 'text!../backbone/templates/SomeTemplate.html - same results
So I'm stuck and can't figure out what I'm missing. I'm obviously not understanding how the text! plugin uses the baseUrl or how it determines the url it's going to use to fetch the defined file.
After your edits to your question, it now contains all the information to diagnose the problem. As you guessed in one of your comments, the issue is indeed that this path:
backbone: 'lib/backbone.js',
is throwing off the resolution of the template you give to the text plugin. When the text plugin loads what you give to it, it takes the path after the ! symbol and treats it as if it were a module name, and it goes through the module resolution process. The way module resolution works is that it checks if there is a prefix that matches any of the keys in paths and will change the prefix with the value associated with the key, which gives the result you obtained. One way to fix the issue would be to add this to your paths configuration:
"backbone/templates": "backbone/templates"
This will make it so that anything you request under backbone/templates won't get messed up by the backbone path.
Note: it is preferable to avoid putting extensions in module names so you should remove it from the values you have for jQuery and Backbone.
I'm looking to use Ghost to host both a blog and a static website, so the structure might look something like this:
/: the landing page (not the blog landing page, doesn't need access to posts)
/blog/: the blog landing page (needs access to posts that index.hbs typically has access to)
/page1/, etc: static pages which will use page.hbs or page-page1.hbs as needed
/blog-post-whatever/, etc: blog posts which will use post.hbs
The only thing I foresee being an issue is that only index.hbs (as far as I know) is passed the posts template variable (see code on GitHub here).
Before I go submit a pull request, it'd be nice to know whether:
Is there an existing way to get access to the posts variable in page.hbs?
If not, is it worthwhile to submit a pull request for this?
If yes, would we really want to send posts to all the pages? or should the pull request split apart page.hbs and only send it to those? or is there a better way to do this?
If you don't mind hacking the Ghost core files then here is how you can do it for the current version of Ghost (0.7.4). This hack will require recreation if upgrading to a new Ghost version.
First create the template files (that will not change if you upgrade):
Create the home page template in:
contents/themes/theme-name/home.hbs
home.hbs now supersedes index.hbs and will be rendered instead of it.
Also create the blog template file in:
contents/themes/theme-name/blog.hbs
The handlebars element that adds the paged posts is
{{> "loop"}}
so this should be in the blog.hbs file.
Again, the above files do not change if you upgrade to a new version of Ghost.
Now edit the following files in the core/server directory:
I have added a few lines before and after the sections of code that you need to add so that you can more easily find the location of where the new code needs to be added.
/core/server/routes/frontend.js:
Before:
indexRouter.route('/').get(frontend.index);
indexRouter.route('/' + routeKeywords.page + '/:page/').get(frontend.index);
After:
indexRouter.route('/').get(frontend.index);
indexRouter.route('/blog/').get(frontend.blog);
indexRouter.route('/' + routeKeywords.page + '/:page/').get(frontend.index);
This calls the Frontend controller that will render the blog page with the same data level as ‘index’ and ‘home’ (the default is load a the first page of the recent posts) thus enabling us to use the “loop” in the /blog/ page.
/core/server/controllers/frontend/index.js
Before:
frontendControllers = {
index: renderChannel('index'),
tag: renderChannel('tag'),
After:
frontendControllers = {
index: renderChannel('index'),
blog: renderChannel('blog'),
tag: renderChannel('tag'),
/core/server/controllers/frontend/channel-config.js
Before:
getConfig = function getConfig(name) {
var defaults = {
index: {
name: 'index',
route: '/',
frontPageTemplate: 'home'
},
tag: {
After:
getConfig = function getConfig(name) {
var defaults = {
index: {
name: 'index',
route: '/',
frontPageTemplate: 'home'
},
blog: {
name: 'blog',
route: '/blog/',
frontPageTemplate: 'blog'
},
tag: {
/core/server/controllers/frontend/channel-config.js
Before:
indexPattern = new RegExp('^\\/' + config.routeKeywords.page + '\\/'),
rssPattern = new RegExp('^\\/rss\\/'),
homePattern = new RegExp('^\\/$');
After:
indexPattern = new RegExp('^\\/' + config.routeKeywords.page + '\\/'),
rssPattern = new RegExp('^\\/rss\\/'),
blogPattern = new RegExp('^\\/blog\\/'),
homePattern = new RegExp('^\\/$');
and
Before:
if (indexPattern.test(res.locals.relativeUrl)) {
res.locals.context.push('index');
} else if (homePattern.test(res.locals.relativeUrl)) {
res.locals.context.push('home');
res.locals.context.push('index');
} else if (rssPattern.test(res.locals.relativeUrl)) {
res.locals.context.push('rss');
} else if (privatePattern.test(res.locals.relativeUrl)) {
res.locals.context.push('private');
After:
if (indexPattern.test(res.locals.relativeUrl)) {
res.locals.context.push('index');
} else if (homePattern.test(res.locals.relativeUrl)) {
res.locals.context.push('home');
res.locals.context.push('index');
} else if (blogPattern.test(res.locals.relativeUrl)) {
res.locals.context.push('blog');
} else if (rssPattern.test(res.locals.relativeUrl)) {
res.locals.context.push('rss');
} else if (privatePattern.test(res.locals.relativeUrl)) {
res.locals.context.push('private');
Restart the server and you should see the new /blog/ page come up with the list of recent blog posts
Here's a solution that I am currently using. I have an off-canvas nav that I want to use to display links to my latest posts. On the home page, this works great: I iterate over posts and render some links. On the other pages, I don't have the posts variable at my disposal.
My solution is this: wrap the pertinent post links on the homepage in a div with an id of "posts", then I make an ajax request for that specific content (using jQuery's load) and inject it into my nav on all other pages except the home page. Here's a link to jQuery's load docs.
Code:
index.hbs
<div id='posts'>
{{#foreach posts}}
<li>
{{{title}}}
</li>
{{/foreach}}
</div>
app.js
var $latest = $('#posts');
if ( location.pathname !== '/' )
$latest.load('/ #posts li');
There is no way currently (Ghost v0.5.8) to access posts within a page template.
I would think its probably not worth submitting the pull request. The Ghost devs seem to have their own plans for this and keep saying they'll get around to this functionality. Hopefully its soon because it is basic functionality.
The best way to go about this would be to hack the core yourself. Eventually the better way to do this would be with a hook. It looks like the Ghost API will eventually open up to the point where you can hook into core functions for plugins pretty much the same way Wordpress does it. https://github.com/TryGhost/Ghost/wiki/Apps-Getting-Started-for-Ghost-Devs
If this is a theme others will be using I would recommend working within the current limitations of Ghost. It's super annoying, I know, but in the long run its best for your users and your reputation.
If this is only for you, then I would hack the core to expose a list of posts or pages as locals in each route. If you're familiar with Express then this shouldn't be very difficult.
I think the way you've done it is pretty creative and there's a part of me that likes it but it really is a seriously ugly hack. If you find yourself hacking these kinds of solutions together a lot then Ghost might not be the tool you want to be using.
A better solution than briangonzalez one, is to get the posts-info from the RSS-feed, instead of the home page.
See this gist for how it can be done.
Now you can use the ghost-url-api, it's currently in beta but you can activate it in the administration (Settings > labs).
For example the {{#get}} helper can be use like this in a static page:
{{#get "posts" limit="3" include="author,tags"}}
{{#foreach posts}}
... call the loop
{{/foreach}}
{{/get}}
More informations :
http://themes.ghost.org/docs/ghost-url-api
As of Ghost v0.9.0, the Channels API is still under development. However, achieving this is much simpler now. It still requires modification of core files, but I'm planning on submitting some pull requests soon. Currently, one downside of the following method is that your sitemap-pages.xml will not contain the /blog/ URL.
Thanks to #Yuval's answer for kicking this off.
Create a template file for your index page with the path content/themes/theme-name/index.hbs. This can contain whatever you would like for your "static" homepage.
Create a template file for your blog index page with the path content/themes/theme-name/blog.hbs. This simply needs to contain:
{{> "loop"}}
In /core/server/controllers/frontend/channel-config.js:
Edit the var defaults object to include:
blog: {
name: 'blog',
route: '/blog/'
}
I have an app running on the MEAN stack. I am using jade for templates and was wondering where to put page specific javascript. Right now my directory looks like:
app/
|- public
| |- js
| |- css
|- views
|- routes
|- schemas
One of my views, signup.jade, I need to include some javascript:
$(function() {
$.validator.addMethod("passwordStrength", function( value, element ) {
console.log("here")
var result = this.optional(element) ||
/^[a-zA-Z0-9- ]*$/.test(value) &&
/\d/.test(value) &&
/[a-z]/i.test(value);
if (!result) {
var validator = this;
}
return result;
}, "Your password must contain at least one number and one special character.");
$('#signup').validate({
rules: {
email: {
required: true
},
password: {
required: true,
passwordStrength: true,
minlength: 6
},
"repeat-password": {
required: true,
passwordStrength: true,
minlength: 6
}
}
});
});
Where is the best place to put this? Do I create a javascript file for each page inside of app/public/js?
If anyone has any good articles on MEAN file structure best practices as a whole those would be appreciated as well, thanks!
From my experience it is completely fine to keep script in the corresponding jade view if such script is used only once.
You can however create directory with helpers and move this script to this directory (just create a plain js file) and then add it on the page by adding a variable and set its value to the file content. It may look a little bit more clean (and allows you to apply js lint to helpers, etc) but requires a bit more work.
Here is how my code is organized. We are using angular fullstack generator by yeoman. In the image i provided, home.html is the partial view whose controller is home.js. I suggest you create a separate file for every html partial page you create. As a matter of fact, a good angular page should have user defined directives, and respective Controllers hooking scope into the directives and the directives managing the scope that has been provided. It keeps it neat/simple/beautiful. If you get a chance and can afford it, buy the ng-book. It is beautiful, else even angular's guide is amazing.