Friendly URLs when using a Record ID for dynamic content

Friendly URLs when using a Record ID for dynamic content - friendly-url

I've read a bit on the matter of friendly urls and I'm a little unsure as to what is better.
I currently have my website using a structure of http://www.domain.com/page.php?id=2
I am using the record id to determine the content of the page. My record id's are numeric and increment for new pages added. The content of existing pages can change completely over time. But, still use the same record id (this is a cms so the client may do this).
The way I understand it I have two options for friendly urls:
http://www.domain.com/page/2
http://www.domain.com/some-text-describing-the-page
Now because I identify the content by the record id, I would assume the first option would make more sense.
My client seems to want option two.
After some reading I found two conflicting points.
As per Tim Berners-Lee (the architect of the WWW) he states that you want a URI which will have the potential to remain the same 2 months, 2 years, 200 years from now. So you DO NOT want to use a page title or something similar for your pages. If you change your pages content you are either forced to change the content and leave the URI alone, or change the URI and are stuck with dangling links. You can read his article here (http://www.w3.org/Provider/Style/URI)
However, a number of other people on the internet (with no know authority to me) clearly state that you need to have a descriptive yet short URI for the best SEO value. From what I read, mostly for the purpose of backlinks and having keywords in the anchor text since people just use the link itself for the anchor text. So having keywords in the link itself helps search engines know what the link is about without a custom title.
It seems to me the difference has to do with long term VS short term.
Am I grasping this correctly?
If I am to use a slug style URI as defined by the user, do I have to just allow my user to type in whatever they want to a field and check against the current database to see if it exist? If so, am I supposed to anticipate static links by running a query for the know record id and then use the result to generate the url which would just be rewritten back to the format: http://www.domain.com/page.php?id=2?
It seems to me that would be a lot of extra overhead.

I would suggest something in the middle of those two:
http://www.domain.com/page/2/some-text-describing-the-page
or without page:
http://www.domain.com/2/some-text-describing-the-page
You can still get page Id from the Url, and there is a title as well! And what even more important, you're still able to get correct content, even when page title change later.
So think about situation like that: User creates a page, it receives Id=4 and it's title is My great title. From that information Url is generated, and is e.g. http://www.domain.com/page/3/my-great-title. After 2 months user changes the title to This title is better then the last one!. Url changes as well to http://www.domain.com/page/3/this-title-is-better-then-the-last-one. However, there is still 3 within the Url, so you're able to show right content! You can also check, if the rest of Url is actual, and redirect (301 would be the best one) to new one to let search engines know, that Url changed.

Related

How to force a Live Url (AbsoluteUrl) to update on a Kentico page type

I'm looking for information about how the Live Url (Absolute Url on the back end) regenerates and what triggers it to update.
Using Kentico 12SP MVC I have a pretty normal NewsArticle page type that uses a custom url pattern of "/news/{% UrlSlug %}" to route to an article. It was previously using AliasPath but because the content editors wanted the ability to create slugs that would be longer than the 50 character limit we created a custom field for it.
On any page that I create from scratch and many newer pages that I've edited this works out just fine and changing the UrlSlug to the desired (very long) slug updates the url. On a huge number of older articles though it appears that changing the UrlSlug has no effect on the Live Url. On many the url has changed to just "/news/" and others it's still showing as the old url (based on NodeAlias). I can still route to the page by hand typing the UrlSlug based url, but I've been using the TreeNode.AbsoluteUrl which is based on the Live Url (afaik) to generate menus and sitemap items and those are all still refusing to update on a large portion of our articles.
Hopefully someone knows how to force them to all regenerate or at least has a clue why some would be working and others not.

The "Live URL" displayed on the Page "General" tab is sourced from CMS.DocumentEngine.DocumentURLProvider.GetAbsoluteLiveSiteURL(TreeNode node);
Eventually that calls out to DocumentURLProvider.GetUrlInternal(TreeNode node);
You can override this with a custom DocumentURLProvider by registering a custom provider.
This would let you call the base.GetUrlInternal(node) and see what that is returning.
One conditional that is checked in the original DocumentURLProvider is NodeIsContentOnly, which is in the CMS_Tree table.
So I would check and make sure that all the pages with issues have this set to true (1 in the db column), otherwise the traditional Portal Engine Live URL generation takes effect.
At no point is there any 'regeneration' of Live URLs. What is displayed is coming from the values of the Page Type configuration (URL Pattern), the Node in the db, and the value populating the Macro Expression in your URL Pattern.

Should a friendly URLs have ID at the beginning or at the end?

So I am planning to have friendly URLs for my site where each page has its own ID. So I found two common ways people do this:
http://www.example.com/page/123/slug-for-the-page
http://www.example.com/page/slug-for-the-page/123
Which one is preferred? What are arguments for and against each? I see that when typing the URL the second form is easier, because it is easier for browser to autocomplete from history or bookmarks. But for me it is really strange to have ID at the end.
I will use redirects to canonical URL for any slug which does not match the current one. But does people also do redirects from URL having slug, but no ID, when the slug is unique? For example, redirecting http://www.example.com/page/slug-for-the-page to http://www.example.com/page/slug-for-the-page/123.

First, some history:
The original reason for these URLs was that people wanted their database to be indexed by ID only. Indexing by slug (and enforcing uniqueness) was extra work for both the computer and the programmer. Rails made this style popular (/page/1).
But this style is fairly opaque to search engines. So people added the slug after the ID (/page/1-slug-for-page) to help the search engine (and the user). But the text as the end was irrelevant -- only the number mattered. The slug may not even be stored in the Database (sometimes it's auto-generated), and it wasn't indexed to save space.
Which one is preferred?
For putting the ID in the URL, either style is OK, but the "number then slug" style is much more popular.
I will use redirects to canonical URL for any slug which does not match the current one. But do people also do redirects from URL having slug, but no ID, when the slug is unique?
Just pick ID or slug and run with it.
1) If you lookup by ID, you should ignore the slug (they are only for search engines). You can easily redirect to the 'real' slug if you have the ID. (to correct any truncated URLs, etc.)
2) If you lookup by slug, your slugs need to be unique. (Most sites scope by date to make this easier). Once you're doing a database lookup on the slug, it's not clear why you would even need the ID in the URL. (They look ugly and are quite arbitrary, so it's better to drop them.) What you are proposing is extra work (for you and the computer) but I'm not sure it's useful.
Only high traffic sites need to worry about the index size/speed tradeoffs of indexing on slugs vs IDs. Even then, the difference can be minimized by doing tricks. (i.e. don't lookup the slug, look up the first 8 characters of the MD5 of the slug instead.)

One advantage of having the ID at the beginning:
In some contexts long URLs are cut off (e.g. by line breaks), so that only the first part of the URL is clickable. Now, if the ID is at the end (and therefore in the second part), clicking at the URL will lead to a 404. If the ID is at the beginning (and therefore in the first part), the link will usually still work.
Example? Plain text mails from Stack Exchange:
As you can see, the question URLs contain a line break, but the clickable part of the URL still contains the ID and so the links still work.

can you have "variables" in text in google sites?

Sorry, this is a bad question. I don't even know what the title should be. I'm a total noob at making websites so this might be easy to find but I just don't know the terminology to search for. I cannot find anything about how to do this...
What I want to do is have something like references/variables that I can use in a block of text and it will automatically get replaced with whatever value should be there. Best way I can think of to describe it would be if I was using the site as a design doc for a game or something, I would be able to type in [Title] or something similar on any page and when it loads that text would be replaced with whatever my Title is. That way If I ever change titles, names, classes, races, places, items, etc... they would only have to be changed in 1 place and the change would be reflected everywhere.
I notice if I add a link to a page it will automatically use the Title of that page as the text of the link. That is almost exactly what I want. Except when I change the Title of the other page the text of the link remains as the original text. It doesn't get updated to the new Title and that is not at all what I want.
Also, I want to do this in Google Sites and as simply as possible. I don't really want to use a database. I was hoping Google Sites would have some kind of funcionality for this.

I don't believe this is possible (on Google Sites) and likely you need to consider a hosted solution.
Quoting the answer from this relevant post:
You should consider hosting your solution using Google's App Engine
instead of Google Sites. You can set it up so it uses PHP (see link
below), you can configure it to use your domain name and you get
enough CPU, disk and bandwidth allowance to serve around five million
page views for free each month, if you are serving more than that,
their prices are extremely competitive.
Google App Engine:
http://code.google.com/appengine/docs/whatisgoogleappengine.html How
to setup PHP using Google App Engine: http://blog.caucho.com/?p=187
Also I'm not sure how your PHP skills are but if you're unfamiliar with it then this should help to get you started.

Whats the best way to use multiple languages on a website?

I was wondering what would be the best way to achieve a multi-language template based website. So say I want to offer my website in Englisch and German there are some different methods. My interest is mainly about SEO, so which would be the best way for search engines.
The first way that I often see is using different directories for each language, for example www.example.com for English and www.example.com/de/ for the German translation. The disadvantage of this is: when changing a file, ist has to be changed in every directory manually. And for search engines the two directories would be concerned as duplicate content, wouldnt they?
The second way I know is just using some GET value like www.example.com?lang=de and then setting a cookie. But this way search engines probably wont even find the different languages.
So is there another way or which one is the best?

I worked on internationalised websites until this year. The advice we always had from SEO gurus was to discriminate language based on URL - so, www.example.com/en and www.example.com/de.
I think this is also better for users; if i bookmark a page in German, then when i come back to it, i get a page in German even if my cookies have expired. Similarly, i can do things like post the URL on Facebook, and have my German-speaking friends click on it and get a site in German.
Note that if your site serves multiple countries, you should handle those along with language - so, you might have example.com/de-DE, example.com/en-GB, example.com/en-IE, etc.
However, this should not involve duplication. Instead, you should set your application up to process the URL, extract the locale information, and then forward the request internally to a locale-independent page. So, a request for example.com/de-DE/info and a request for example.com/en-IE/info should both be passed to /info.jsp (or i'm guessing info.php in your case). That page should then be coded to emit text in the appropriate language, using a page-level localisation mechanism.
Things are a bit trickier if you want the URLs themselves to be localised (eg example.org/de-DE/anmelden vs example.org/en-IE/sign-in). However, the same principle applies: extract the locale, then forward to a common page. The difference is that there must be more sophistication in figuring out what the page is from the URL; you will need a mapping from natural language in the URL to the page filename.

How would I best make this SEO_able?

I have a search engine that searches albums.
For each music album, I have a page.
So, the work flow goes like this:
People search for music titles
The search engine displays a list of albums.
People click on an album to go to a details page.
I want google to index my front page and the details page. I want the details page to be highly ranked. How can I build a sitemap for this?
By the way, I have about 5 million albums (but I want the top 1000 ones to be highly ranked on google)

You would not use a sitemap for that many results. You would want each album to appear as a page with a unique URI to reference that page. That way the search engine can crawl your site by crawling links since search bots cannot submit form data. Each of those URIs should be simple, meaning limited to this part of the URI syntax:
scheme://authority_segment/path
Program your web application to remove and throw away any extraneous data, such as query string or parameters. If you do this you have to be sure that you are watching for URI poisoning or SQL injection even through means of character encoding.

How can I build a sitemap for this?
By pulling the addresses out of your database and creating a XML file with a high priority for some selected pages. Somehow I think that isn’t your real question …

If I wanted to automate building a site map for a site like this, I'd employ Python. I'd pretty much write everything from the ground up (except the data store access). The format is quite simple.
I'm not sure I quite understand your question...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string