Should a friendly URLs have ID at the beginning or at the end? - friendly-url

So I am planning to have friendly URLs for my site where each page has its own ID. So I found two common ways people do this:
http://www.example.com/page/123/slug-for-the-page
http://www.example.com/page/slug-for-the-page/123
Which one is preferred? What are arguments for and against each? I see that when typing the URL the second form is easier, because it is easier for browser to autocomplete from history or bookmarks. But for me it is really strange to have ID at the end.
I will use redirects to canonical URL for any slug which does not match the current one. But does people also do redirects from URL having slug, but no ID, when the slug is unique? For example, redirecting http://www.example.com/page/slug-for-the-page to http://www.example.com/page/slug-for-the-page/123.

First, some history:
The original reason for these URLs was that people wanted their database to be indexed by ID only. Indexing by slug (and enforcing uniqueness) was extra work for both the computer and the programmer. Rails made this style popular (/page/1).
But this style is fairly opaque to search engines. So people added the slug after the ID (/page/1-slug-for-page) to help the search engine (and the user). But the text as the end was irrelevant -- only the number mattered. The slug may not even be stored in the Database (sometimes it's auto-generated), and it wasn't indexed to save space.
Which one is preferred?
For putting the ID in the URL, either style is OK, but the "number then slug" style is much more popular.
I will use redirects to canonical URL for any slug which does not match the current one. But do people also do redirects from URL having slug, but no ID, when the slug is unique?
Just pick ID or slug and run with it.
1) If you lookup by ID, you should ignore the slug (they are only for search engines). You can easily redirect to the 'real' slug if you have the ID. (to correct any truncated URLs, etc.)
2) If you lookup by slug, your slugs need to be unique. (Most sites scope by date to make this easier). Once you're doing a database lookup on the slug, it's not clear why you would even need the ID in the URL. (They look ugly and are quite arbitrary, so it's better to drop them.) What you are proposing is extra work (for you and the computer) but I'm not sure it's useful.
Only high traffic sites need to worry about the index size/speed tradeoffs of indexing on slugs vs IDs. Even then, the difference can be minimized by doing tricks. (i.e. don't lookup the slug, look up the first 8 characters of the MD5 of the slug instead.)

One advantage of having the ID at the beginning:
In some contexts long URLs are cut off (e.g. by line breaks), so that only the first part of the URL is clickable. Now, if the ID is at the end (and therefore in the second part), clicking at the URL will lead to a 404. If the ID is at the beginning (and therefore in the first part), the link will usually still work.
Example? Plain text mails from Stack Exchange:
As you can see, the question URLs contain a line break, but the clickable part of the URL still contains the ID and so the links still work.

Related

Searching Single Pages with Dynamic Content

I have a slight problem I have been trying to address for a client I have been working with. We have 4 sets of single pages that are loading content from a database using PHP based upon a get string that is provided. These pages that are generated are optimized well for SEO and have alt tags for images and Content that we need to be able to search using a search feature.
Now i had assumed (An everyone knows what assuming gets you) that these pages by default would be able to be searched by the concrete 5 built in search feature. But it doesn't work. If I search for a word that I know is definitely on one of these pages even multiple times no results are found.
How can I make Concrete5 search these pages. If its no do able by a default or by a plugin, then can someone please offer some advice on how to fix this. This is an important feature and must be completed.
EDIT: See my comment below. I still need some help or direction here as CSE inst much of an option.
EDIT2: It may be viable for me to install a crawler and a custom search engine to address my problems. I was thinking of spider. Any other suggestions on that or other options are much appreciated!
Unfortunately C5 doesn't provide a way to do this -- the only way to tap into the search index is with blocks. And even if you created a phony block just to pass content from the single_page through to the search index, there's no way to say that some content is from one URL while other content is from another URL (which you'd need to do since your single_page controller is handling many different URL's).
I don't know of a way to achieve what you want to do (and it appears that nobody else does either -- http://www.concrete5.org/community/forums/customizing_c5/make-content-in-single-pages-searchable/ ), other than building your own internal search engine.
EDIT: I just did some digging, and thought that perhaps you could manually insert records into the PageSearchIndex table and specify the searchable content and the desired path there -- but this won't work because it relies on one cID (collection id, a.k.a. page id) per entry -- so you'd only be able to insert one record for the top-level single_page path.
I think the simplest solution here would be to create your own searching infrastructure for your single_pages (like some kind of function in the controller that would return an array of page paths and searchable content for each one), then override the search block and perform an additional search of your single_page -- then combine the results on the search results page there. Or just use google site search for your site, which will actually crawl the pages and hence find your various single_page urls: https://www.google.com/cse/
Best of luck.
I have not tested this, but maybe you can put a function getSearchableContent() in the single pages controller like you do for blocks. This would return the string to be searched. Would look something like this:
function getSearchableContent() {
// ... compose searchstring depending on the queried content.
return $searchstring;
}
But I don't know if this works for dynamic content. If not, I'd look into C5's search index core classes and try to extend them for your project.

Friendly URLs when using a Record ID for dynamic content

I've read a bit on the matter of friendly urls and I'm a little unsure as to what is better.
I currently have my website using a structure of http://www.domain.com/page.php?id=2
I am using the record id to determine the content of the page. My record id's are numeric and increment for new pages added. The content of existing pages can change completely over time. But, still use the same record id (this is a cms so the client may do this).
The way I understand it I have two options for friendly urls:
http://www.domain.com/page/2
http://www.domain.com/some-text-describing-the-page
Now because I identify the content by the record id, I would assume the first option would make more sense.
My client seems to want option two.
After some reading I found two conflicting points.
As per Tim Berners-Lee (the architect of the WWW) he states that you want a URI which will have the potential to remain the same 2 months, 2 years, 200 years from now. So you DO NOT want to use a page title or something similar for your pages. If you change your pages content you are either forced to change the content and leave the URI alone, or change the URI and are stuck with dangling links. You can read his article here (http://www.w3.org/Provider/Style/URI)
However, a number of other people on the internet (with no know authority to me) clearly state that you need to have a descriptive yet short URI for the best SEO value. From what I read, mostly for the purpose of backlinks and having keywords in the anchor text since people just use the link itself for the anchor text. So having keywords in the link itself helps search engines know what the link is about without a custom title.
It seems to me the difference has to do with long term VS short term.
Am I grasping this correctly?
If I am to use a slug style URI as defined by the user, do I have to just allow my user to type in whatever they want to a field and check against the current database to see if it exist? If so, am I supposed to anticipate static links by running a query for the know record id and then use the result to generate the url which would just be rewritten back to the format: http://www.domain.com/page.php?id=2?
It seems to me that would be a lot of extra overhead.
I would suggest something in the middle of those two:
http://www.domain.com/page/2/some-text-describing-the-page
or without page:
http://www.domain.com/2/some-text-describing-the-page
You can still get page Id from the Url, and there is a title as well! And what even more important, you're still able to get correct content, even when page title change later.
So think about situation like that: User creates a page, it receives Id=4 and it's title is My great title. From that information Url is generated, and is e.g. http://www.domain.com/page/3/my-great-title. After 2 months user changes the title to This title is better then the last one!. Url changes as well to http://www.domain.com/page/3/this-title-is-better-then-the-last-one. However, there is still 3 within the Url, so you're able to show right content! You can also check, if the rest of Url is actual, and redirect (301 would be the best one) to new one to let search engines know, that Url changed.

Whats the best way to use multiple languages on a website?

I was wondering what would be the best way to achieve a multi-language template based website. So say I want to offer my website in Englisch and German there are some different methods. My interest is mainly about SEO, so which would be the best way for search engines.
The first way that I often see is using different directories for each language, for example www.example.com for English and www.example.com/de/ for the German translation. The disadvantage of this is: when changing a file, ist has to be changed in every directory manually. And for search engines the two directories would be concerned as duplicate content, wouldnt they?
The second way I know is just using some GET value like www.example.com?lang=de and then setting a cookie. But this way search engines probably wont even find the different languages.
So is there another way or which one is the best?
I worked on internationalised websites until this year. The advice we always had from SEO gurus was to discriminate language based on URL - so, www.example.com/en and www.example.com/de.
I think this is also better for users; if i bookmark a page in German, then when i come back to it, i get a page in German even if my cookies have expired. Similarly, i can do things like post the URL on Facebook, and have my German-speaking friends click on it and get a site in German.
Note that if your site serves multiple countries, you should handle those along with language - so, you might have example.com/de-DE, example.com/en-GB, example.com/en-IE, etc.
However, this should not involve duplication. Instead, you should set your application up to process the URL, extract the locale information, and then forward the request internally to a locale-independent page. So, a request for example.com/de-DE/info and a request for example.com/en-IE/info should both be passed to /info.jsp (or i'm guessing info.php in your case). That page should then be coded to emit text in the appropriate language, using a page-level localisation mechanism.
Things are a bit trickier if you want the URLs themselves to be localised (eg example.org/de-DE/anmelden vs example.org/en-IE/sign-in). However, the same principle applies: extract the locale, then forward to a common page. The difference is that there must be more sophistication in figuring out what the page is from the URL; you will need a mapping from natural language in the URL to the page filename.

How can I make clean search urls?

If I have search that has a lot of different options, then url becomes very long and looks very bad. Is there anyway to make urls look better? Using POST to make search would keep urls clean, but people couldn't share search urls.
Try doing an advanced search with many options on Google: the URL is long and not especially human-readable. I really don't think that's a problem; I don't think many people read URLs often. If you expect people to share search results, then show a button on the search results page that will generate a tinyURL-style shortened URL for that particular query.
A POST is meant for something that changes server state (e.g. a database update) and really shouldn't be used for a search.
You can encode all of your search criteria into something like a hash and then have a single parameter in your querystring that has that value:
http://www.mysearch.com/query=2esd32d2csg3fasfdlkjSDDFdskjsEWFsDFFR39fdf
I'm not sure exactly how you'd encode everything, but it wouldn't be too difficult.
Do the different options actually need to be in the URL? For example, a quick search from my Firefox search window gives a URL like:
http://www.google.com/search?q=search&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
If I'm sending the link to anyone, I habitually cut off everything after q=search. Why not have the URL be the bare minimum that you need to send the link to someone (or bookmark), and make the rest as invisible POST variables?

How would I best make this SEO_able?

I have a search engine that searches albums.
For each music album, I have a page.
So, the work flow goes like this:
People search for music titles
The search engine displays a list of albums.
People click on an album to go to a details page.
I want google to index my front page and the details page. I want the details page to be highly ranked. How can I build a sitemap for this?
By the way, I have about 5 million albums (but I want the top 1000 ones to be highly ranked on google)
You would not use a sitemap for that many results. You would want each album to appear as a page with a unique URI to reference that page. That way the search engine can crawl your site by crawling links since search bots cannot submit form data. Each of those URIs should be simple, meaning limited to this part of the URI syntax:
scheme://authority_segment/path
Program your web application to remove and throw away any extraneous data, such as query string or parameters. If you do this you have to be sure that you are watching for URI poisoning or SQL injection even through means of character encoding.
How can I build a sitemap for this?
By pulling the addresses out of your database and creating a XML file with a high priority for some selected pages. Somehow I think that isn’t your real question …
If I wanted to automate building a site map for a site like this, I'd employ Python. I'd pretty much write everything from the ground up (except the data store access). The format is quite simple.
I'm not sure I quite understand your question...

Resources