Multilingual Umbraco Website cannot be scraped?

Multilingual Umbraco Website cannot be scraped? - dns

I have created a multilingual Umbraco website which has 3 domain names pointing to it for each language. The site has gone live and people are starting to share links to it on LinkedIn and other social media. I have metadata in the website which should be picked up when these links are shared. On LinkedIn when the link is shared it has 'coming soon' as the strap-line, which is what was in the holding page months ago suggesting the site isn't being re-scraped.
I used the Facebook link debugging tool and that was returning a run-time error with a 500 response code.
My co-worker insists that there is nothing wrong with the DNS and there aren't any errors in the code of the website so I am wondering if anyone has any ideas why the website cannot be scraped?
It also has another issue where one of the domains sometimes doesn't redirect to it's www. version despite have a redirect on the DNS which may be related.
Is there some specific Umbraco configuration that I may have missed? Or a bug within Umbraco that may cause this?
Aside from this issue the website is working fine, it is just these scrapers seem to be unable to hit the website successfully.

Do you have meta data set for encoding? see https://www.w3.org/International/questions/qa-html-language-declarations probably long shot.

Related

Website hosting service problem with loading

I have a website that I have hosted for a while now... It seemed to work fine, but now it just shows a blank page when I visit it.
Could this be a hosting error? If not what else?

Your question really needs more details for anyone to be able to give you any help.
I this page just a static HTML file, or is it a database-driven site (WordPress or Drupal for example).
Have you checked the error logs for your hosting provider and have you tried contacting your hosting provider to see what they say.

Asp.net website developed using kentico cms causing "www" prefix issue

We are facing a weird issue specific only on one website even though we are hosting more than 150 websites on same server using Kentico CMS 11.
We have turned on "www" prefix settings from Kentico -> Urls & SEO for all the websites and all these sites are working without any issue.
But there is a specific website on which all the requests are getting redirected to home page if try to access with "www" prefix.
I have tried checking at code level and could not find any issue. If it would be code issue then it should appear for other websites as well.
We are using Azure app service for hosting our application so I have checked in application insight as well and could not get any lead as all the request which are logged for this specific websites are of home page.
Please help me in understanding if I need to see through this issue at some other place. I have checked with Client's IS team and they could not find any DNS settings difference when compared to other websites which are working fine.

Please check the web.config file of this site. It may contain a wrong URL redirect associated with "www" so that probably this redirect just cuts all the path from URL.

Fetch as Google - Temporarily Unreachable ONLY on Mobile

I have created a new website www.bucketshowers.com and I tried to index it using google webmaster tools. Fetch as Google for the desktop worked just fine, but doing the same for mobile shows an error "Temporarily unreachbale". It's been a few days and the website REALLY is not avaible on mobile. It's driving me nuts. Here're is some information and things I have already tried:
Website is made with WP
I have disabled all SEO/meta tags plugins and I added a very basic robots.txt http://bucketshowers.com/robots.txt
I tried waiting 15min between fetching the root page on mobile
I have checked source code for the homepage to make sure there are no meta tags with nofollow or noindex attributes
I baffled by this issue and I would gladly take any advise/pointers what else can be done. Thank you.

The crazy thing was, that it was caused by WP Statistics plugin, which is probably the most popular from its kind - 500k downloads. When I deactivated it, everything is fine, google fetches of the mobile and the website is available. Incredible! I'm still searching for the actual problem within that plugin.

Google servers see website differently

I Googled one of our sites today (gamestyling.com) and saw that the results where in Chinese. It looks like our site was hacked but I see no traces of that. When opening the site all looks normaal (no Chinese).
On further inspection it seems that Google doesn't see the website correctly:
I cannot verify in Google search console. When I use the meta tag it shows me it detected a completely different tag.
When running pagespeed insight the preview does show Chinese: https://developers.google.com/speed/pagespeed/insights/?url=gamestyling.com
Also, when running the site through a proxy it looks completely normal.
Any idea how I can get Google to see my site correctly or what is causing this issue?
UPDATE
I now have access to Google search console and found that someone already had access to the property (2nd user):
I cannot remove the user because it uses a meta tag that google thinks is still in the header but doesn't appear in my code. So I'm still not sure if someone is playing tricks on Google or that we've been actually hacked. Note; nothing has changed on the server itself.
UPDATE2
This article describes exactly what's going on; https://blog.sucuri.net/2015/09/malicious-google-search-console-verifications.html. I must say that's an amazing safety fault on Google's part...

I had experienced this issue on one of the site and resubmitted website for review in google webmasters. Search results in google were corrected in couple of days.

Google links opens wrong pages

Our website has been recently hacked (Joomla 1.5, hosted on VPS). Attacker added few php scripts that were redirecting to some ad sites. We have cleaned everything (or at least we think we did), and now everything works as it should.
However, links on Google (or Yahoo) that are pointing to our web site are still trying to include these php scripts (and returns 404 as these are deleted now). Direct links from browser works as they should.
We have cleaned site 10 days ago, so I do not think that something is cached at Google servers. Re-indexing should be done by now.
To reproduce this behavior:
Go to www.google.com
type in "anitex socks"
click any php link that starts with "anitexsocks.com"
You will get "The requested URL /wp-includes/client.php was not found on this server" + 404 error
Refresh page and everything works without issues
Why are only Google links making troubles?
Any help is welcome. Thanks!

As for the reason why this is happening, I installed a firefox add-on which blocks my browser's Referrer Header and then followed a Google link to your site and it worked fine. Then I disabled the add-on and the problem started occurring again.
This shows that there is still some malicious code running on your website which is checking all http requests to see if they come from Google (based on checking the HTTP Referrer header) and redirecting them to /wp-includes/client.php if they do,
To try to determine where this code may lie, try performing a recursive grep through all your www files on your server as well as your www configuration files,somewhere in there there must still be a reference to that client.php script, hopefully you can find and eliminate it.
That said, if it were my site and I knew a hacker had had free reign over my server to do whatever they wanted to it, I would not mess around with trying to undo the damage and would instead restore the most recent backup from before the site was hacked. You only have to miss one back door the hacker left in place and they can re-enter your site. After restoring backups, you should also upgrade/reconfigure the software they used to gain access in the first place so they can't simply rehack it in the same manner again.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Multilingual Umbraco Website cannot be scraped? - dns

Do you have meta data set for encoding? see https://www.w3.org/International/questions/qa-html-language-declarations probably long shot.

Related

Website hosting service problem with loading

Asp.net website developed using kentico cms causing "www" prefix issue

Fetch as Google - Temporarily Unreachable ONLY on Mobile

Google servers see website differently

Google links opens wrong pages

Categories

Resources