SEO: Google fetch returns blank page (but rendered HTML seems correct) - .htaccess

I usually find all the answers to my question but this time I could not find any. This if actually the first time I post on stackoverflow!
Here is my problem.
The root of my website: "www.example.com" returns a blank (not empty) page when I use Google Webmaster Tools to fetch my website. When I look at the rendered HTML, it is exactly what it should be, but the preview of the page is just blank.
All the other pages of my website like "www.example.com/sample_page.html" seem to give the proper preview though. I have even tried to make a redirection (htaccess) of the root domain "www.example.com" to "www.example.com/sample_page.html" but it also gives a blank preview.
I use cached HTML files so it does not have anything to do with enabled JS or whatsoever. Furthermore, like I told you, the rendered HTML seems OK it's just the preview that does not return anything.
Any hint is greatly appreciated as I have been trying to find a solution since a few weeks now.

Related

TYPO3 v. 11.5.21 Search field is not working properly

as expected, there coming more and more with TYPO3. I'm very in TYPO3 and working with the bootstrap package. Therefore I don't understand loads of stuff.
The task is actually very simple (i guess). I just want to include a search field on in my page with indexed search. I included the extension with the extension manager.
So far so good, the search field is shown and it's able to find content on my page, but when I select one of the search results this page is shown:
Error message
Does anyone understand what happened here?
Many thanks in advance for your hints!
cheers,
expikx

How can i get website report showing links from each page

I want to get a report which specifies what all links are there in each page of the website.I tried using different softwares,but the problem is they are just giving all links without showing exactly which links are there in each page.Also the website i am trying to make a report on is very unstructured,so it's not possible to just classify links,based on url forward slashes.For example,links starting with https://example.com/blog, will not give me all links inside the
'https://example.com/blog' page,because links inside 'https://example.com/blog' page can contains links without 'https://example.com/blog/' in the beginning of the link.
What can i do about this?
Thanks.
In Google analytics, there is no such concept as the next page.
Rather, it only knows the previous page.
It is due to the disconnected nature of the web.
You can, however, use the previous page to trace back to get the data you want.
Instead of looking for all links inside the https://example.com/blog, you will be looking at getting all links where the previous page is https://example.com/blog
More detailed explanation

Search result: How to show only pages, not different content items?

We are using Liferay as a classic CMS meaning that we compose pages using web content articles. There is an issue with Liferay's internal search I could not yet find a proper answer for:
Because web content articles are pretty much only building blocks for pages we don't want the search to show them as distinct items. The user should only get a list of pages that contain their search keywords, including all the articles put onto this page.
At the moment we can see two different approaches and both come with certain problems we could not solve yet:
Idea 1
We modify the journal indexer and try to obtain all URLs of the pages (how?) where the article has been placed on. Then we add them to the document to be indexed. In the search result we then can access the URLs and collect them. In the end we make sure every URL is only shown once.
Idea 2
At some point Liferay renders the entire page before sending it to the browser. If we somehow could put an indexer there, we could index the entire page. We then could limit the search to the special "page documents". Getting the fully rendered page would be the main issue here, because either we would have to run a crawler to frequently trigger this indexing or we would need to find a way to trigger page rendering from within an indexer or something like that.
I have been carrying this problem around for quite a while now and still could not find an idea good enough to spend time trying it out. If anyone of you has some input on those two ideas or maybe an entirely different approach, I would be extremely grateful.
I'll just answer myself, because by now we found a suitable solution to solve our problem:
In addition to the default search portlet there is also a "Web Content Search Portlet" shipped with Liferay. It seems to have been part of Liferay for quite a while now, but it's somewhat hard to find, because there is hardly any documentation for it (I only found the Liferay wiki page, which isn't really anything at all). It searches only within web content articles and shows links to the pages rather than just a link an isolated view of the article. It has much less configuration options than the default search portlet, however. Pretty much all it allows to change is whether articles actually have to be placed on at least one page to show up in the results.
So there is no need for any kind of custom indexer or any other "hack"...all we need to do is use the correct portlet. We will only need to write a hook that changes the appearance of the result page.
What you ask is interesting but your ideas are on the wrong direction.
Specially idea 2 it's particulary wrong because you cannot do indexing work meanwhile a page is rendered. Think about performace only.
In Liferay pages and assets are not directly linked: pages have portlets and portlets display assets (web content and more).
Liferay indexing refers and scans assets content, not refers the display result of the assets. Think about permission: the same page can display different contents depends on the user who looks.
bye

google search results in iframe alternative

I don't think it's possible from what I've read, but wanted to see if anyone else was in a similar situation and found a more elegent solution to this problem.
Basically I have a site I am building, nothing fancy, which consists of a header section, and then one big iframe to display the content of the page in.
I know, I know, iframe are generally looked upon with displeasure, but for my needs, it works wonders.
My issue, is that in the header of the page, I have a simple google search box (basically just an html form), and have set the target as my iframe.
Obviously when searching for anything, the results should show up in the iframe, however, all i get is a message saying this content can't be displayed in an iframe. This makes sense and im sure it is of googles design not to allow this kind of practice.
For me, this would be the most ideal situation, and was wondering if anyone knew of a way to display the search results within my iframe?
I have also looked at possibly displaying a lightbox, or similar popup box, with an ajax request to display the google page, but have thusfar been unsuccessful.
You won't be able to use any kind of frame anymore as Google obviously put an end to that by blocking frames altogether. Your only solution is to use the custom search API and then parse and display the results yourself.

Mediawiki / Excel: Hyperlink from Excel to a non-existant wiki page gives a 404 - how can I fix or work around this?

I suspect this could be something faulty with Excel (although I keep an open mind), but I wondered if anyone knew how I could get around this apparent bug:
I wish to create Excel spreadsheets which link to pages in a local wiki (running MW 1.14.0, full details below) where those pages don't yet all exist.
The idea is that over time we will fill in details of the pages, but we would like to create the links now (because copies of the Excel files will get sent out to various internal users and it will not be feasible to go track them down and add links later once the pages are created)
The problem is that when I create such a hyperlink in Excel and then go to follow the hyperlink, I get a message back indicating that the page does not exist. The full text of the message is:
"Unable to open http://. The Internet site reports that the item you requested could not be found. (HTTP/1.0 404)"
This happens on our site or in fact if you link to a non-existant page on wikipedia (e.g. http://en.wikipedia/wiki/Swed53rf). Whereas if you put such a link into a browser you get the correct response (which is to be taken to a page indicating that there is no such page but that you can create it by following the usual link)
Is there some setting on Apache that I might need to configure / override to make sure it returns a valid server response to Excel?
Creating links to existing pages works fine. I appreciate that in theory we could go around creating all the pages that are required, but some of the people involved in the project (creating the initial Excel files) do not / cannot use our wiki and it would be better if this just worked as it would appear it should rather than having to try to add steps to work around it in this way.
I also wondered if it were anything to do with the short URL reformatting. Our wiki, like wikipedia has short URLs, eg:
http://server/w/index.php?title=User:Joe_Blogs/Sandbox
can be reached from
http://server/wiki/User:Joe_Blogs/Sandbox
but including hyperlinks to the full name versions of the pages does not resolve the issue.
The version of Excel being used is Excel 2003 (SP3)
I have discovered that this also happens with Word 2003 (I imagine they are using the same code). However the desired behaviour occurs with Lotus Notes (a miracle, as it's rubbish in so many other ways! )
I have not done any significant development on Apache, but I could consider some form of custom page that re-directs to the non-existent wiki page if Mediawiki changes were deemed to complex/tricky. (although I'm not particularly sure where I'd start with this idea, I'm guessing some sort of URL parameter to accept the destination pagename might be a possible approach)
Any helpful suggestions gratefully received!!
[FYI: I have posted a question on MWUsers forum (www.mwusers.com) too after Googling this to no avail! I'll update the forum response there if I get an answer here or vice versa]
Many thanks,
Neil
Running on Ubuntu Server 8.10
Product Version:
MediaWiki 1.14.0
PHP 5.2.4-2ubuntu5.6 (apache2handler)
MySQL 5.0.51a-3ubuntu5.4
Installed extensions:
CategoryTree (Version r44056)
Renameuser
CategoryTree (Version r44056)
ImageMap (Version r35980)
ParserFunctions (Version 1.1.1)
StringFunctions (Version 2.0.2)
Not sure how to get Excel to let you go to a page which turns out to be a 404, but as a temporary workaround, you can hack out MediaWiki's 404 reporting on missing pages...
In MediaWiki 1.14 or 1.15 releases this will be in Article::view() in includes/Article.php:
if( $return404 ) {
$wgRequest->response()->header( "HTTP/1.x 404 Not Found" );
}
Note that the latest dev code is a little different, but you can find it where it sends the same header in the same file. :)
Wikipedia returns a 404 with a redirect which gets you to the page you want; my guess would be that Excel's rendering engine is not following the redirect.
You could try capturing the conversation in Wireshark, both with a browser and with Excel. That might show you what's happening differently.
Surely once you roll out the new pages, the links would start working, though?

Resources