Add .html when rewriting URL in htaccess? - .htaccess

I'm in the process of rewriting all the URLs on my site that end with .php and/or have dynamic URLs so that they're static and more search engine friendly.
I'm trying to decide if I should rewrite file names as simple strings of words, or if I should add .html to the end of everything. For example, is it better to have a URL like
www.example.com/view-profiles
or
www.example.com/view-profiles.html
???
Does anyone know if the search engines favor doing it one way or another? I've looked all over Stack Overflow (and several other resources) but can't find an answer to this specific question.
Thanks!

SEO optimized URLs should be according to this logic (listed in priority)
unique (1 URL == 1 ressource)
permanent (they do not change)
manageable (1 logic per site section, no complicated exceptions)
easily scaleable logic
short
with a targeted keyword phrase
based on this
www.example.com/view-profiles
would be the better choice.
said that:
google has something i call "dust crawling prevention" (see paper: "do not crawl in dust" from this google http://research.google.com/pubs/author6593.html) so if google discovers a URL it must decide if it is worth crawling that specific page.
as google gives URLs with an .html a "bonus" credit of trust "this is an HTML page i probably want to crawl it".
said that: if your site mostly consists out of HTML pages that have actual textual content , this "bonus" is not needed.
i personally only add the .html to HTML sitemap pages that consists only out of long lists and only if i have a few millions of it, as i have seen a slightly better crawlrate above these pages. for all other pages i strictly keep the Franzsche URL logic mentioned above.
br
franz, austria, vienna
p.s.: please see https://webmasters.stackexchange.com/ for not programming related SEO questions

Related

RewriteRule - redirect multi variable URL to multi variable URL

Our old website has a search URL structure like this:
example.com/Country/United States/Region/California/Area/Southern California/City/San Diego/Suburb/South Park/Type/House/Bedrooms/4/Bathrooms/3/
This is currently rewritten to point to the physical page:
/search/index.aspx
The parameters in the URL can be mixed up in different orders, and the URL can include one or more parameters.
We want to 301 redirect these old URLs to a new structure that is ordered in a logical way and more concise:
example.com/united-states/california/southern-california/san-diego/south-park/?type=house&bedrooms=4&bathrooms=3
example.com/united-states/california/?type=house&bedrooms=4&bathrooms=3
Is there a way with URL rewriting to interrogate the old URL, work out what parameters are existing and then write out the new URL structure?
Even if we can limit it to just the Country, Region, Area, City and Suburb, that may be good enough to at least return some results even if it's not perfect.
Also, spaces should be turned into hyphens and all text made lowercase.
I already have the RewriteRule to turn the new URL structure into a URL to point to a physical page. It's just transforming the old URL in to the new URL I need help with. I've googled endlessly and it's just beyond me!
Can anyone help? Thanks.
Since you already have the old search page with rewriting rules set up for it and which is capable of parsing all parameters you need, the easiest and most appropriate solution I see here is to issue a redirect you require from this old search page's code. Just put the code that composes new URL with all parameters needed and redirects from this page - this should be a lot easier than trying to parse all these parameters in .htaccess and combine them into the new format.

How match any * in Orchard CMS auto route

I want to have stack overflow like url pattern in orchard blog. How to achieve it with Auto route pattern.
For example I want to have a pattern like
/myblog/Pages/4453/what-ever-title
Here, regardless of the trailing page name (what-ever-title) I want to always point to the item 4453. I have tried following pattern but failed.
{Content.Container.Path}/Pages/{Content.Id}
{Content.Container.Path}/Pages/{Content.Id}/*
{Content.Container.Path}/Pages/{Content.Id}/{Content.Slug}
The reason I want this is that I can then change the page final url without affecting the links already being built in SEO efforts.
for instance for this question stack overflow url is
/questions/24145078/how-match-any-in-orchard-cms-auto-route
Regardsless of what I use for trailing part as long as the number 24145078 is there the url works fine.
This is not how autoroute works. Autoroute is not routing, it's generating unique paths for content items, based on token-driven rules. I you want a wildcard route, write a wildcard route.
But for this specific appliation, I'm afraid that's still not what you should do. The standard way of dealing with resources that move to a new address is to establish a permanent redirect from the old URL to the new. This is most efficiently done using the URL rewriting feature of IIS.

Find URL structure in htaccess

I would like to recreate old pages that were online 2 years ago. Unfortunately, all I have is the old htaccess file. Is there a way to find all used URLs?
Thanks in advance!
.htaccess does not store any information about pages or their urls. The only possible information to get is a redirection mechanism and/or some information regarding folders structure, if such was used. If you paste the source of your file here, one could help you with this. In general - in the most optimistic scenario you will just get a set of rules in the form of
regular Expression --> server query
(unless author did enumerate all the urls instead of using regular expressions)
Check the http://archive.org/web/web.php for some cached versions of your website.

CakePHP nice urls - how to prevent normal urls from working

I have a website that's written using CakePHP. I've added some rewrite rules in the .htacces file to change the default urls to different ones (instead of /controller1/action1/parameter I have /some-string-about-controller-and-action/parameter, for example).
The problem is that now both the normal url and the nice one are available, and google seems to be indexing both, which is a problem. I'd like to only keep the nice one, which is the proper way to handle this so that it affects the google results as little as possible?
I don't know why you don't want to use cakes own routing (if you are having trouble doing what you want, you can accomplish what you want with a custom route class), then make sure that you redirect all relevant URL's in your .htaccess file to the desired URL using a MOVED PERMANENTLY redirect.
This way google will index the target url instead of the one that is undesirable. You are right to take offense to this, double indexing is a great way to harm your SEO rankings.

How realize SEF pagination (Ditto) - MODx?

Is it possible realize SEF pagination on Ditto - MODx? How do it - in scripts / nginx configuration (or anyone know htaccess solution )
What means..
It's pages whith list of articles
/articles.php
/articles.php?start=10
....
So in result should be
/articles.php/start/10
or something like it
And realize correct redirect to SEF
I'm very grateful to you in advance.
If you're going to use Ditto, you won't be able to do this without modifying the snippet code to set up paging as you require.
However, I would argue that /articles.php/start/10 is no more search friendly than the original as both 'start' and '10' are in no way related to the content on those pages. Google and the other search engines are certainly capable of distinguishing between pages with different url parameters in this case, however if this remains an issue you might be better off exploring a different way to create your listings.
Have a look here for some useful insights:
http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html
http://www.seomoz.org/blog/pagination-best-practices-for-seo-user-experience

Resources