There is some precedent for search-engine-ranking-related questions on StackOverflow, so please don't close this question. It's programming-related to the extent that HTML META tags can be called "programming".
Here's the problem:
We make FogBugz, the software project planning and bug tracking suite.
Either we did a great job with our old documentation or a crummy job with our new documentation, but for most of the popular searches on FogBugz terms, documentation for our old versions comes up.
Here's an example. For context, our current FogBugz version is FogBugz 7. The top two results for that search are for FogBugz 5, which is positively ancient.
As best I can tell, there are several options for getting these results out of the top slots, but each has problems:
A NOINDEX tag, but what happens if someone is actually searching for help on an old version?
Finding the incoming links to the old documentation and placing a NOFOLLOW on them to deprive the old docs of PageRank. Problem here is that it's really fiddly to find the links to the content, rather than changing the content itself.
The unavailable_after tag, which is just a time-delayed NOINDEX, with the same problem of removal rather than demotion.
I just want these old documentation versions to stop competing with our current versions, without being completely unavailable.
An approach I used in the past (3 years ago)
Change the URL to your old documentation, and change your own links to point to the new url. e.g. abc.com/docs/fogzbugz/v5/xyz becomes abc.com/docs/fogzbugz/ancient/v5/xyz
Using the old URLs, implement a 301 redirection to your new v7 content. e.g. a request to abc.com/docs/fogzbugz/v5/GettingStarted.html is redirected to abc.com/docs/fogzbugz/v7/GettingStarted.html
In this way, existing links from external sites will take browsers to the latest documentation, and inform indexing robots that the page has moved.
Google will find the new links to your old documentation by indexing your site, but there will be no external links, thus reducing page rank.
Google will also find the new links to your new documentation, and as more sites link to it, its page rank will increase and so take priority.
This worked for me on a small scale (100 or so pages) site, and visitor attempts to view the old content rapidly dropped off.
If a user does land on a v5 page, how about the MSDN approach of explicitly stating the version that the page describes, and providing links to the equivalent topic in the v6 and v7 docs?
I would suggest that external links to older versions get redirected to the latest version - with some sort of note that if you really needed version 5 the link is here.
I think a lot of the problem deals with the fact that search engines give something a high rank if a lot of people are linking to a specific page. Unless you can get all the people linking to your old documentation, to link to your new documentation, then you are going to have a problem with the older documents being rated artificially high. In order to overcome this, you might need to change the way you handle documentation pages. One good way would be to always show the newest information on a particular topic, and then only by clicking on a link on the page, do you get to the older versions. Optimally, this would be the same page, with a different parameter, to state which version you want to get documentation for.
What about trying the MSDN approach? You assign a version tag to your pages. When this page is displayed, its version number is displayed as well. Users will be able to see immediately that this information is deprecated.
You may need to write some stubs for new version pages like "This problem has been resolved in the current version" so that the users don't have to think you didn't do anything in 5 years. Some writing work, some interlinking but it's doable for a limited number of problematic pages.
Related
I would like your support in order to get some help in customizing the search component in Liferay DXP 7.0 Enterprise.
I have reviewed all the available documentation but although I have found many articles about the issue, the step by step is not so clear for me.
I need to customize the native search component:
Change the input component to give suggestions while the user is typing the search terms
Change the search result page look and feel.
Anyone ever implemented anything like this?
I know this is an old thread, but StackOverflow keeps showing it as the first open question when I am navigation this particular tag...So here are some pointers, as this is a pretty broad topic...
The search is really confusing for adding customization. Mainly you have to provide some of them as contributors, using the asset framework. following the regular steps to build an asset for the asset publisher you will hit the best place to find documentation about the search contributors.
About the search page, best is to create a new page, besides the default one for extra freedom. As long you use the friendly URL /search it will a basic replacement. In this page you can add everything you need, except for translations for the friendly URL - not great. Another option is to keep the default page (which will not be visible in the build area 7.1.x, but you can edit after you search something and fall inside it).
I don't understand the security patch from last week: https://typo3.org/teams/security/security-bulletins/typo3-core/typo3-core-sa-2016-022/ . I have an old TYPO3 6.2 installation. I have truncated all cf_* tables and opened the pages with UID 2-6. No cHash. As a result I see 13 cf_cache_hash-entries.
Now I have opened a detail page from a listing page in frontend. I see some parameters in URL like action, controller, the UID of the current displayed record and of cause a cHash.
Then I have copied these parameters (excluding id=x) to the URL of my pages 2-6. In cf_cache_hash I have still 13 records. So, there is no cache flooding.
Or how I have to interprete this quote:
Links with a valid cHash argument lead to newly generated page cache
entries. Because the cHash is not bound to a specific page, attackers
could use valid cHash arguments for multiple pages, leading to
additional useless page cache entries.
Next problem:
If extensions like realurl are used, it is required to flush their
caches (and TYPO3 caches as well)
Can you please tell me WHICH tables I/we should clear?
tx_realurl_urldecodecache
tx_realurl_urlencodecache
are maybe OK. But what about tx_realurl_pathcache? Of cause, I can clear that, but what about older entries for earlier realurl configuration? If I truncate that table, these old entries are not valid anymore and they were not builded again. So, old Search Engine Results are invalid.
Question from one of our customers: Is it enough to clear system cache in backend or should he click on Clear all Cache in Installtool? Nice. IMO, it is not enough and the tables have to be truncated on DB directly. Right.
Next one:
This means if such URLs are indexed by a search engine, visitors from
this search engine will end up on a not properly working page.
Hey cool. And now? What is the solution? Keep it as it is? IMO it depends on an InstallTool setting called: pageNotFoundOnCHashError. Right?
Please tell us what to do and please add some more details how to handle that.
Stefan
For me it boils down to (after installing the updated TYPO3 version):
If you don't use realurl: enable
$GLOBALS['TYPO3_CONF_VARS']['FE']['cHashIncludePageId'] = true;
& and you are probably "done". Of course all old google hits will be done for, but on a "public" site it's quite probable you never cared about google anyway if you didn't run realurl (or similar)
If you use realurl 1.X on a 6.2
Don't enable the config (there'll probably never be a proper patch)
Two options:
take the risk of a DDOS
use the 1.x version from https://github.com/mogic-le/typo3-realurl
If I understand it correctly it will set TYPO3 to no_cache mode if there is no hit on the caching table; While that is a performance issue, it will prevent cache table entries being made (as a side effect)
If you run 7.6+ and realurl 2
Wait for realurl 2.1 (and take the risk?)
Change the caching
framework to something like memcached (it's somewhat suggested
between the lines: If you have a caching backend that cannot be used
for a DDOS, you don't really have to care)
Use the fork from
helhum (though I think that won't help you one bit regarding old
links)
Realurl >= 2.1.0 supports this core option. But you are recommended to update to at least 2.1.4 because that fixes various other cHash issues.
Sorry, this is a bad question. I don't even know what the title should be. I'm a total noob at making websites so this might be easy to find but I just don't know the terminology to search for. I cannot find anything about how to do this...
What I want to do is have something like references/variables that I can use in a block of text and it will automatically get replaced with whatever value should be there. Best way I can think of to describe it would be if I was using the site as a design doc for a game or something, I would be able to type in [Title] or something similar on any page and when it loads that text would be replaced with whatever my Title is. That way If I ever change titles, names, classes, races, places, items, etc... they would only have to be changed in 1 place and the change would be reflected everywhere.
I notice if I add a link to a page it will automatically use the Title of that page as the text of the link. That is almost exactly what I want. Except when I change the Title of the other page the text of the link remains as the original text. It doesn't get updated to the new Title and that is not at all what I want.
Also, I want to do this in Google Sites and as simply as possible. I don't really want to use a database. I was hoping Google Sites would have some kind of funcionality for this.
I don't believe this is possible (on Google Sites) and likely you need to consider a hosted solution.
Quoting the answer from this relevant post:
You should consider hosting your solution using Google's App Engine
instead of Google Sites. You can set it up so it uses PHP (see link
below), you can configure it to use your domain name and you get
enough CPU, disk and bandwidth allowance to serve around five million
page views for free each month, if you are serving more than that,
their prices are extremely competitive.
Google App Engine:
http://code.google.com/appengine/docs/whatisgoogleappengine.html How
to setup PHP using Google App Engine: http://blog.caucho.com/?p=187
Also I'm not sure how your PHP skills are but if you're unfamiliar with it then this should help to get you started.
Is there any sort of documentation on exactly what parameters you can put in the url of Google viewer?
Originally, I thought it was just url,embedded,chrome, but I've recently come accross other funny ones like a,pagenumber, and a few others for authentication etc.
Any clues?
One I know is "chrome"
If you've got https://docs.google.com/viewer?........;chrome=true
then you see a fairly heavy UI version of that doc, however with "chrome=false" you get a compact version.
But indeed, I'd like a complete list myself!
I know this question is very old and perhaps you already solved your issue, but for anyone on the internet who might be looking for an answer...
I have been looking for this recently, following a guide I found on GitHub Gist
https://gist.github.com/tzmartin/1cf85dc3d975f94cfddc04bc0dd399be
More specifically, the option to embed a certain page of pdf using
<iframe src="https://docs.google.com/viewer?srcid=[put your file id here]&pid=explorer&efh=false&a=v&chrome=false&embedded=true" width="580px" height="480px"></iframe>
The best I could fing was this article (I suppose from a long time now)
https://weekly-geekly.github.io/articles/111647/index.html
HOWEVER, I tried modifying the attributes and the result was simply a redirect to
https://drive.google.com/file/d/[ID]/edit
https://drive.google.com/file/d/[ID]/preview or
https://drive.google.com/file/d/[ID]/view
AS OF MAY 2020, THIS SOLUTION PROBABLY DOESN'T WORK
I'm also on a quest to discover some of the parameters of the viewer.
the "chrome" parameter doesn't seem to do anything, though. Is this
supposed to be the same as embedded=true?
Parameters I know of:
url= (obviously)
embedded= (obviously)
hl= set language of UI (tooltips)
#:0.page.1 = jump to page 2 (page 1 is numbered 0) - this is unreliable and often requires a refresh after the first load,
defeating the purpose.
That said, when I use the Google Docs viewer on my site, "fit page to
screen" is the default view without any parameters. So maybe I'm
misunderstanding your question.
Source: For convenience, this is a full quote of the sole answer (it is from user k3david) to the crosspost of this question #Doc has posted to the Google support forum in 2011.
You can pass q=whatever to pass a search query to the viewer.
I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively
edited question and answer site for
programmers. Questions and answers
displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.
Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.
These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.
You may want to check AboutUs.org (i.e. http://www.aboutus.org/StackOverflow.com).
But, there's little chance that the site will have an aboutus page and not have a meta description.
Some info that might explain how google does this:
Webmasters/Site owners Help
Adding a URL to google
I am not familiar with Google APIs, but perhaps there is an official way to get such information.
Interesting. some sources are better than others.
For "audiotuts.com" google has a worse description than AboutUs.com.
Google
Nov 18th in General by Joel Falconer ·
1. Recently, an AUDIOTUTS reader asked me about creative process. While this
is a topic that can’t be made into a
...
AboutUs.com:
AUDIOTUTS is a blog/tutorial site for
musicians, producers and audio
junkies! It is the sister site of the
popular PSDTUTS, VECTORTUTS and
NETTUTS.
I hate problems like these... they should be trivial but they aren't!
If you can assume English content, you can first look for Meta Description, and if that doesn't work, you can look for the first two or three sentence-like word sequences.
A product I worked on looked for the first P or DIV that contained more than one sequence of > n "words" delimited by periods. It would use the two or three sentence-like sequences, up to x total words, as a summary paragraph. It wasn't 100% accurate, but good enough for the average case. The number of words was adjusted a few times to eliminate things like navigation elements.