How to properly split a site?

How to properly split a site? - web

Suppose I have a new verison of a website:
http://www.mywebsite.com
and I have would like to keep the older site in a sub-directory and treat it seperately:
http://www.mywebsite.com/old/
My new site has a link to the old one on the main page, but not vice-versa.
1) Should I create 2 sitemaps? One for the new and one for the old?
2) When my site gets crawled, how can I limit the path of the crawler? In other words, since the new site has a link to the old one, the crawler will reach the old site. If I do the following in my robots.txt:
User-agent: *
Disallow: /old/
I'm worried that it won't crawl the old site (using the 2nd sitemap) since it's blocked. Is that correct?

1) You could include all URLs in one file, or you could create separate files. One could understand a sitemap as "per (web) site", e.g. see http://www.sitemaps.org/:
In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL
Since you now have two sites, you may create two sitemaps. But again, I don't think that it is strictly defined that way.
2) Well, if you block the URLs in robots.txt, these URLs won't be visited by conforming bots. It doesn't mean that these URLs will never be indexed by search engines, but the pages (= the content) will not.

Related

URL Rewrite IIS and search engine

I've configured my IIS (asp.net site) to use URL Rewrite.
In particular this is my rule (dynamic one): whatever url in format number/string will be redirected to a special aspx page.
SSo whatever url starts with mysite/id/Name is redirected to showprof.aspx?id=id&title=Name. This works perfectly.
My question is about search engines. I don't have any "fixed" page that contains links like mysite/id/Name that the spider can scan, so I'm trying to figure it out how search engines could index my dynamic pages. Should I create a sitemap.xml? if yes in wich way? or should I create a "hidden" page that contains every link to all my dynamic contents like mysite/id1/Name1 mysite/id2/Name2 and so on?
thank you

A starting point is definitely a Sitemap.xml, You could try for example the IIS SEO Toolkit and see if it is able to index any of your pages: http://www.iis.net/downloads/microsoft/search-engine-optimization-toolkit
It also has functionality to generate a sitemap.xml, although I'm guessing in your case you probably have some dynamic content, so a better approach would be to have a "handler" that generates it dynamically on demand (maybe cache it for performance reasons).
I would also recommend to have some pages that actually are accessible through normal links, for example maybe have in your home page of the site a link to a "site map" page (not sitemap.xml), where there you render a set of links that you want to index (at least the ones that are most important to you), and that will make them easy to discover.

A description for this result is not available because of this site's robots.txt – learn more For mobile version

I created a website www.example.com. I created a mobile version of the website with subdomain www.m.example.com. I used htaccess file for redirectiong to mobile version in smartphones. I put my mobile website's files in folder named "mobile". I put a robot.txt file in main root folder for prevent indexing mobile urls in search engines result.
my robot.txt file is like this.
User-agent: *
Disallow: /mobile/
I also put a robot.txt file in folder named mobile.
User-agent: *
Disallow: /
My problem is that.
In desktop version all result and snippets are correct.
but when i searching in mobil, the result in snippet shows like this.
A description for this result is not available because of this site's robots.txt – learn more
How to solve this?

By using this robots.txt on www.m.example.com
User-agent: *
Disallow: /
you are forbidding bots to crawl any resource on www.m.example.com.
If bots are not allowed to crawl, they can’t access your meta-description.
So everything is working as intended.
If you want your pages to get crawled (and indexed), you have to allow it in your robots.txt (or remove it altogether).
By using the canonical link type, you can denote that two (or more) pages are the same, or that they only have trivial differences (e.g., different HTML structure, table sorted differently etc.), or that one is the superset of the other.
By using the alternate link type, you can denote that it’s an alternate representation of essentially the same content.
(You can see examples in my answer on Webmasters SE.)

Block Bots from crawling one of my sites on a multistore multidomain prestashop

Hello i have a multistore multidomain prestashop installation with main domain example.com and i want to block all bots from crawling a subdomain site subdomain.example.com made for resellers where they can buy at lower prices because the content is duplicate to the original site, and i am not exacly sure how to do it. Usualy if i want to block the bots for a site i would use
User-agent: *
Disallow: /
But how do i use it without hurting the whole store ? and is it possible to block the bots from the htacces too ?

Regarding your first question:
If you don't want search engines to gain access to the subdomain (sub.example.com/robots.txt), using a robots.txt file ON the subdomain is the way to go. Don't put it on your regular domain (example.com/robots.txt) - see Robots.txt reference guide.
Additionally, I would verify both domains in Google Search Console. There you can monitor and control the indexation of the subdomain and main domain.
Regarding your second question:
I've found a SO thread here which explains what you want to know: Block all bots/crawlers/spiders for a special directory with htaccess.

We use a canonical URL to tell the search engines where to find the original content.
https://yoast.com/rel-canonical/
A canonical URL allows you to tell search engines that certain similar
URLs are actually one and the same. Sometimes you have products or
content that is accessible under multiple URLs, or even on multiple
websites. Using a canonical URL (an HTML link tag with attribute
rel=canonical) these can exist without harming your rankings.

How to redirect one page to a different URL than the rest of the HTTP redirect IIS8

First off, apologies for not knowing the nomenclature for what I'm looking for, I'm not typically a Windows web admin.
I have a SharePoint website which contains several subsites. We also have several alternate URLs that point to specific pages, and some of those alternate URLs have friendly URLs which also redirect to other specific pages. We're in the process of migrating from a SharePoint 2007 site to this one, and in the process, I'm trying to remove our reliance on our registrar for handling some of this redirection, because it is apparently not a free service.
Currently our registrar does the following redirects:
http://alias1.tld/* redirects to http://subsite1.ca/page1
http://alias1.tld/friendly redirects to http://subsite1.ca/page2
http://alias2.tld/ redirects to http://subsite1.ca/page3
I know I can accomplish the first and second by setting the sites up in IIS, and using the HTTP Redirect function, but I'm not sure how I can do the second one. In Apache this would be easy, but I'm not sure what I'm looking for here.
Is this something that should be handled within SharePoint, and have that take care of redirecting alias1.tld/friendly to the specific page, or is this something I need to setup in IIS? Is this what URL rewrite is for, or is there a different IIS way to do this?

I'm not sure that this is the best way to do it, but I got things working how I wanted them. Here's what I ended up doing:
Create a new subsite on subsite1 to give me the URL subsite1.ca/subsubsite
Create a redirect from alias1.tld to subsite1.ca/subsite
Create 2 pages for the new subsite. One for the default page and one to use to redirect to page2. Both pages are redirects, Default points to Page1, the second points to Page2.
Set the subsite to use Managed Navigation for global and current through Site Settings > Navigation, and created a default term set by selecting the new subsite in the list and then clicking Create Term Set, then clicking OK.
Then created a term store for the one page that needs to be handled differently by going to Site Settings > Term Store Management. Click on the Term Set created in the last step, then select New Term. On the Term-Driven Pages tab, create the friendly URL and then select the target page, which is the redirect page created in step 3, then click Save.

Updating an existing website

I've been asked by a family friend to completely overhaul the website for their business. I've designed my own website, so I know some of the basics of web design and development.
To work on their website from my own home, I know I'll need to FTP into their server, and therefore I'll need their FTP credentials, as well as their CMS credentials. I'm meeting with them in a couple of days and I don't want to look like a moron! Is there anything else I need to ask them for during our first meeting (aside from what they want in their new site, etc.) before I start digging into it?
Thanks!

From an SEO point of view, you should be concerned with 301 redirects as (i suppose) some or all URL adressess will change (take a different name, be removed and etc)
So, after you`ve created a new version of the site - and before you put it online - you should go ahead and list all "old site" URLs and decide, preferably for each one, it's new status (unchanged or redirected and if so - to what URL).
Mind that even is the some content will not re-appear on the new site, you still have to redirect the URL (say to HomePage) to keep link juice and SERP rankings.
Also, for a larger sites, (especially dynamic sites) try looking for URL patterns for bulk redirects. For example, if you see that google indexes 1,000 index.php?search=[some-key-word] pages, you don`t need to redirect each one individually as these are probably just search result pages that can be grouped with REGEX to be redirected to main search result page.
To index "old site" URLs you should:
a. site:domainname.com in Google (then set the SERP to 100 results and scaped manually of with Xpath)
b. Xenu or other site crawler (some like screamingfrog) to get a list of all URLs.
c. combine the lists in excel and remove all duplicates.
If you need help with 301 redirects you can start with this link:
http://www.webconfs.com/how-to-redirect-a-webpage.php/

If the website is static, knowing html, css and javascript along with FTP credentials is enough for you to get started. However if the site is dynamic interactive and database driven, you may need to ask if they want to use a php, In that case you might end up building this site in wordpress.

If you are going to design the website from scratch then also keep this point in mind.. Your friend might have hosted this website at somewhere (i.e. hosting provider). You should get its hosting control panel details as well which will help to manage the website (including database, email, FTP, etc.).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to properly split a site? - web

Related

URL Rewrite IIS and search engine

A description for this result is not available because of this site's robots.txt – learn more For mobile version

Block Bots from crawling one of my sites on a multistore multidomain prestashop

How to redirect one page to a different URL than the rest of the HTTP redirect IIS8

Updating an existing website

Categories

Resources