Node with express - crawl and generate sitemap automatically

Node with express - crawl and generate sitemap automatically - node.js

My website changes very frequently, and I need a way to dynamically generate a new site map every day.
I tried to use sitemap.js but it requires me to give it specific urls for my site.
I am wondering if there's a way to have it crawl the site and generate a site map based on the urls it finds dynamically.
If not, is there any other server-side script that I can use to dynamically generate site maps?
Thanks

Your website does contain any backend? Or storage for data? Or is it only clean HTML and nothing more? If its any of the 1 or 2nd option, then you can just exctract it from this. Or you can
1. get your homepage, extract all urls, 2. omit those from other domains. 3. Repeat for links that you've stored. 4. Do not store duplicates.

Related

Accessing a file in a folder of theme with two parameters in wordpress

In wordpress I want to show data from custom table. For this I created custom template and in that achieved from global $wpdb. I dont want any categories and posts of wordpress. I want to use wordpress for only static pages. I created one pluggin and storing data from this pluggin. So I need to show this data in the front end.
So far I created only one page from backend. So my problem is I need a links like these
http://localhost/application/medical/parameter1/parameter2
Here I created only one medical page from backend and added custom template. If run this url: http://localhost/application/medical/parameter1/parameter2.
It is redirecting to 404 page because parameters will not be send through the url.Here is the 2nd uri segment is medical and 3rd segment is parameter1 and 4th segment is parameter2.
Based on these segments I need to get data from database. I am assuming this is a htaccess problem.
How do I set htaccess for getting this?
Thanks in Advance.

URL Rewrite IIS and search engine

I've configured my IIS (asp.net site) to use URL Rewrite.
In particular this is my rule (dynamic one): whatever url in format number/string will be redirected to a special aspx page.
SSo whatever url starts with mysite/id/Name is redirected to showprof.aspx?id=id&title=Name. This works perfectly.
My question is about search engines. I don't have any "fixed" page that contains links like mysite/id/Name that the spider can scan, so I'm trying to figure it out how search engines could index my dynamic pages. Should I create a sitemap.xml? if yes in wich way? or should I create a "hidden" page that contains every link to all my dynamic contents like mysite/id1/Name1 mysite/id2/Name2 and so on?
thank you

A starting point is definitely a Sitemap.xml, You could try for example the IIS SEO Toolkit and see if it is able to index any of your pages: http://www.iis.net/downloads/microsoft/search-engine-optimization-toolkit
It also has functionality to generate a sitemap.xml, although I'm guessing in your case you probably have some dynamic content, so a better approach would be to have a "handler" that generates it dynamically on demand (maybe cache it for performance reasons).
I would also recommend to have some pages that actually are accessible through normal links, for example maybe have in your home page of the site a link to a "site map" page (not sitemap.xml), where there you render a set of links that you want to index (at least the ones that are most important to you), and that will make them easy to discover.

Updating an existing website

I've been asked by a family friend to completely overhaul the website for their business. I've designed my own website, so I know some of the basics of web design and development.
To work on their website from my own home, I know I'll need to FTP into their server, and therefore I'll need their FTP credentials, as well as their CMS credentials. I'm meeting with them in a couple of days and I don't want to look like a moron! Is there anything else I need to ask them for during our first meeting (aside from what they want in their new site, etc.) before I start digging into it?
Thanks!

From an SEO point of view, you should be concerned with 301 redirects as (i suppose) some or all URL adressess will change (take a different name, be removed and etc)
So, after you`ve created a new version of the site - and before you put it online - you should go ahead and list all "old site" URLs and decide, preferably for each one, it's new status (unchanged or redirected and if so - to what URL).
Mind that even is the some content will not re-appear on the new site, you still have to redirect the URL (say to HomePage) to keep link juice and SERP rankings.
Also, for a larger sites, (especially dynamic sites) try looking for URL patterns for bulk redirects. For example, if you see that google indexes 1,000 index.php?search=[some-key-word] pages, you don`t need to redirect each one individually as these are probably just search result pages that can be grouped with REGEX to be redirected to main search result page.
To index "old site" URLs you should:
a. site:domainname.com in Google (then set the SERP to 100 results and scaped manually of with Xpath)
b. Xenu or other site crawler (some like screamingfrog) to get a list of all URLs.
c. combine the lists in excel and remove all duplicates.
If you need help with 301 redirects you can start with this link:
http://www.webconfs.com/how-to-redirect-a-webpage.php/

If the website is static, knowing html, css and javascript along with FTP credentials is enough for you to get started. However if the site is dynamic interactive and database driven, you may need to ask if they want to use a php, In that case you might end up building this site in wordpress.

If you are going to design the website from scratch then also keep this point in mind.. Your friend might have hosted this website at somewhere (i.e. hosting provider). You should get its hosting control panel details as well which will help to manage the website (including database, email, FTP, etc.).

Cookieless Domain Help

I've been reading a bit about serving my images, javascript and css from a seperate domain.
I have set up a domain, not a sub domain but a new account on my server.
My site is for example: http://www.site.com
I have set up a new account for http://s1.site.com
To cut it short, I have about 40gb of images, so rather me moving it (would also require me to update a few of my scripts as I have scrapers to grab images) I was wondering if there was a way to point my new sub-domain to my other content?
Basically, I want to create a .htaccess file on s1.site.com and get it to pull the info from www.site.com, for example:
http://www.site.com/images/picture.jpg
becomes
http://s1.site.com/images/picture.jpg
But the image doesn't really exist on the s1.site.com, we are just 'mirroring' it using htaccess to save the hassle of copying everything over to the static domain.
Please let me know how this is possible, as it would save me a great deal of time and would work wonders.
I basically just want to make anything on
http://s1.site.com/* pull from /home/originalsite/public_html/(images/js/css folders)

Nutch crawling with seeds urls are in range

Some site have url pattern as www.___.com/id=1 to www.___.com/id=1000. How can I crawl the site using nutch. Is there any wway to provide seed for fetching in range??

I think the easiest way would be to have a script to generate your initial list of urls.

no. you have inject them manually or using a script

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node with express - crawl and generate sitemap automatically - node.js

Related

Accessing a file in a folder of theme with two parameters in wordpress

URL Rewrite IIS and search engine

Updating an existing website

Cookieless Domain Help

Nutch crawling with seeds urls are in range

Categories

Resources