Recently all our websites URL's have appeared as https:// in google's search results. However, that url is not even valid for our website as it should be http:// for all of them. We're not sure what caused google to think https:// is correct. If you click any of these links none of them will work correctly.
So if you search for our website in google the results come back as this:
https://ourdomain.com
https://ourdomain.com/site2
When in reality it should be:
http://ourdomain.com
http://ourdomain.com/site2
Is this a certificate error or something in IIS that is causing issues? We do use a certificate on one specific website but on none of the others.
Related
I have:
a simple static website;
hosted on a shared server;
with SSL;
which I have recently redesigned.
Google tells me there were two url crawl errors for my website:
apple-app-site-association;
.well-known/apple-app-site-association
For reference, here is the error report for the first (the second is the same):
Not found
URL:
https://mywebsite.com/apple-app-site-association
Error details
Last crawled: 5/5/16
First detected: 5/5/16
Googlebot couldn't crawl this URL because it points to a non-existent page. Generally, 404s don't harm your site's performance in search, but you can use them to help improve the user experience. Learn more
From looking around here, these appear to be related to associating an apple app with related website.
I have never tried to implement any sort of "apple app / site association" - at least not intentionally.
I can't for the life of me figure out where these links are coming from.
I will be removing these urls but am concerned the error may arise again.
I have looked at several related questions here, but they seem to be for errors from people trying to make that verification - which I haven't - or from people querying why their server logs show requests to these urls.
Can anyone shed any light on why this is happening?
So the answer is that Googlebot is now searching for these urls when they crawl your site, and as part of their effort to map associations between sites and their related apps. See: https://firebase.google.com/docs/app-indexing/ios/app
It seems that Googlebot hasn't (at this time) been told not to return a crawl error if the url /folder is not there.
Here is a link to an answer to a very similar (but slightly different) question that gives more detail if you are so inclined: https://stackoverflow.com/a/35812024/4806086
I am having some issues with spam links visiting my site returning a 404 error.
My site was hacked with a secret spam links folder on public_html that redirected users to pornographic sites, those links were plastered across the internet. I have since remedied the malware issue, but have several hundred visitors hitting a 404 page because these links no longer exist, messing up all my analytics accounts, using bandwidth, etc.
I have searched for a way to block (so that they never hit my website) anyone that tries to access these URL paths, but cannot possibly redirect every single link (there were over 2000) using a wildcard, or something. My search led me here: Block Spam Referrer Traffic and it is not quite the solution I need.
The searches go to pages like this: www.mywebsite.com/spampage/morespam/ (which have been deleted and are now 404 errors)
There are several iterations of the /spampage/ and /morespam/ urls.
The referrer is generally a google search, so I can't block the referrer using .htaccess. I'd like to somehow block www.mywebsite.com/spampage/*/ and all iterations.
Apologies, I am by no means a programmer. I do appreciate any help that can be offered.
Update#1:
Seems that perhaps the best way is to block these links/directories using the robots.txt file, I have done so and will report back if I have success!
Update#2:
Reporting back. I am new to this all, so I was going about the solution wrong in my original question. Essentially, I found that I needed all of the links de-indexed, as they were generating all the traffic by being indexed by google. I was able to request de-indexing of the directories in question manually through the google webmaster tools account. One requirement for de-indexing was to have the robots.txt on the site block the directories in question from being crawled. Once I did that I submitted the request to remove the directory from the google index. Those pages were taken off in about 3 hours by google (thanks google!), so it was pretty quick once I found out the proper way to go about it. No .htaccess editing needed. Once the pages were no longer index, traffic went back down to normal levels and my keywords, etc, will be back to normal.
There has been a question made towards me recently to do the following:
We have a website with Drupal running in IIS.
On that site is an URL Redirect to a website hosted externally, obviously with a name completely irrelevant to the name of our company.
The question now is the following;
They want to change to URL to a subdomain of our website. Example: from "www.external-site.com" to "www.sub.internal.com" (while still showing content of the external website)
They want the current page of that website to be reflected in the URL bar. So it wouldn't say "www.sub.internal.com", but it would say "www.sub.internal.com/solutions/page1.html" (instead of "www.external-site.com/solutions/page1.html")
It's possible that I forgot another 'condition' but have mentioned before this.
So, if someone visits through our URL Redirect to External-website, it needs to show our subdomain instead of their domain in the URL, AND it needs to show the current page when people start browsing while still using our subdomain in the URL.
Now, I checked the external-website, and it seems that most of the links available are relative links (if this would be any useful information).
Currently, the external website is hosted externally, and will remain to be so for next few years. (I believe we bought the company)
I have been asking around and looking up, and the best possible thing seems to use domain forwarding, but even then it still doesn't seem to comply with the entirety that they asked of me.
I am but a 'simple' .NET programmer, held responsible to do support for anything involving the websites, and I can't say I have extended knowledge about infrastructure. (But I can ask people to do this for me)
Is there anything that could solve this?
Thanks so much!
IIS's URL rewite and Application Request Routing (ARR) combo can help you what you want to achive. Here are few links which may guide you to configure ARR. Please note that these links dont exibit exact solution to your problem however you can take clue from it and fabricate your solution accordingly.
http://www.iis.net/learn/extensions/url-rewrite-module/reverse-proxy-with-url-rewrite-v2-and-application-request-routing
http://www.iis.net/learn/extensions/url-rewrite-module/reverse-proxy-rule-template
It sounds like you'll want to use a full-page iframe: do not redirect but show a page with an "inner page" instead: that inner page is the external web site. That way, users do not see the external site in their URL bar.
http://webdesign.about.com/od/iframes/a/aaiframe.htm
You need to configure the equivalent of Apache Virtual Host with Reverse Proxy on IIS.
See this answers:
https://serverfault.com/a/271030
and
https://stackoverflow.com/a/10003306/2131693
Google Webmaster Tools is reporting 403 errors for some folders on the websites server for example:
http://www.philaletheians.co.uk/Study%20notes/
The folder isnt forbidden so dont understand why it would be 403 errors for Googles Crawler?
How come the Google Crawler is trying to browser the actual folders and not just going straight to the files in that folder? Is this somthing to do with robots.txt ?
Make sure is there any actual place or document to be present if some one request that url. I've browsed through your site and could not found a link that directs to http://www.philaletheians.co.uk/Study%20notes/
Also it seems, all the study notes are inside this "Study%20notes" directory.So actual this link will not work anyway. So check the google web master tools's link from to find where this broken link situate and cure it.
Have you set default document correctly in your web server? In apache, this comes in the DirectoryIndex setting (and defaults to index.html). Also, in general it might be better to strip off spaces etc.. from your traversable directory names (the %20 you are seeing between Study and notes is a url-encoded space character), so as to keep your URLs clean to your visitors and search engine bots.
I have an interesting issue with HTTPS ports not being handled properly. It is a relatively small issue and I bet it is pretty simple to solve, I am just not thinking of it.
We have a website served with IIS 6, www.mylongdomainname.com. We have a secure portal which is handled via https://www.mylongdomainname.com. Now we have several vanity and marketing URLs that we use over the phone like www.shortname.com, etc. I have two websites setup, one that handles all request with the header www.mylongdomain.com which actually serves the website. The other accepts any traffic and permanently redirects to www.mylongdomain.com. This way if we ever add any more domains, they will all end up at the one, also it redirect mylongdomain.com to www.mylongdomain.com.
Everything here works fine. The issue now is when I google "shortname.com," the first result returned is the same as if I were googling "mylongdomain" however, google has been able to crawl the other pages via https://shortname.com and index them that way. We dont have SSL certificates for these other domains, so when you click through, you get a nasty un-trusted error.
This really wouldn't be an issue if we didn't use these URLs over the phone, and you all know how many people don't know the difference between the URL bar and a search box.
any suggestions or tips?
I'd set up a redirect so that https://shortname.com is sent to http://shortname.com with a 301 (permanent) redirect. This will put an end to the nasty untrusted error immediately. Furthermore, this will also cause Google to slowly but surely update their index.
There are multiple ways to do this. If you're using IIS7 you can use the URL Rewrite Module and write a redirect rule to take care of it.
Or if you're not on IIS7 it may be perfectly acceptable to write some code to accomplish this. I wrote some ASP.NET I've used plenty of times to take care of this HTTP/HTTPS redirection. In your particular case you could simply take my code and call SetSSL(False) in the Application_BeginRequest function of your global.asax.