Googlebot and Google Mobile bot Crawl Error after Domain IP and Domain DNS changed (Site not Reachable)

Googlebot and Google Mobile bot Crawl Error after Domain IP and Domain DNS changed (Site not Reachable) - dns

in past week I had to change my server to cloud (Digital Ocean Droplet), I am using a shared service but the concurrent user reached the number of Php execution (30). I shifted the entire site and site is up and running successfully, moreover Yandex and Bing are able to crawl my website but it is Google that I want.
I have got like 100K errors in console dashboard and raising, google ads bot isn't able to crawl my pages too. I have checked the following and there is no error in these.
.htaccess and redirections.
SSL
DNS records (I shifted name servers to DO and then back to registrar to see if the DNS was the error) but it doesn't seem like it is.
I double checked robots.txt it is fine by the google robots.txt validator and other search engines.
Similar setups are running on other servers with no changes at all the are fine.
UfW, I am new to it but due to its temporary nature I don't think it is the reason. I disabled it and checked it doesn't make difference.
I haven't blocked anything on apache so it should be good too.
The error that appears is attached at screenshots
Help me out as instead of scaling, I am going down bad.

I repointed the DNS through another service it took its time but it is resolved. I wasn't sure about the error, now I am, it is because of improper or partial DNS resolution issue.

Related

Domain name forwarding works only partially

For many years I had a successful website at https://www.lunarium.co.uk built on top of Google App Engine, Java version. Some time ago, GAE deprecated the technology they initially recommended for storage, so I decided to re-create the site on a new, less cumbersome platform. Eventually, I re-created it with Django, hosted on Pythonanywhere.com at the domain name https://www.lunarium.co.
When the new version was ready, I've forwarded the domain name lunarium.co.uk (hosted with GoDaddy) to lunarium.co (301, no masking). I also changed the CNAME www on lunarium.co.uk to point to the naked domain name, lunarium.co.uk. This was done in the beginning of April, but the stats keep showing that many people are still going to the old version of the website. On some days, many more people visit the old website than the new one. This is one part of the problem — why is that happening? (Right now I've also added forwarding from www.lunarium.co.uk to www.lunarium.co but was unable to delete the www CNAME).
Also, I had some pages on the old website that were very popular. For example, this one: https://www.lunarium.co.uk/moonsign/calculator.jsp. I made sure that if someone will come looking for this page on the new website (like https://www.lunarium.co/moonsign/calculator.jsp) they would be redirected to the appropriate new page. However, when trying to navigate to that popular old page, I'm getting a strange error message: Not Found 404.0, and I'm not sure where this message is coming from.
Previously, when navigating to the home page of the old website, I used to be correctly redirected to the home page of the new website. (I just tried to do that, and it didn't work, but maybe that is a temporary glitch). However, specific pages within the website are never properly redirected. Is there a way to make sure that they are redirected?

HTTP works, but HTTPS doesn't work (using Cloudflare, Github Pages, and Namecheap)

I want to host my portfolio for as little money as I can.
So I bought my domain off of Namecheap, I'm hosting my website on GitHub pages, and I'm using Cloudflare so as to get a free SSL certificate and have a HTTPS connection available on my website.
When I try http://sitemeer.com/#https://josipmuzic.com
It shows that it's only partially available
But when I try http://sitemeer.com/#http://josipmuzic.com
It shows that it is available everywhere
This came to my attention when I asked my friend from a different country to check my website. After a bit of digging, we confirmed that the reason why wasn't because he was using a VPN, but instead, because of the country he was in.
Does anyone have any suggestions as to what I could do?
I have been googling for a while now, but I'm not exactly sure what I should be googling for, I cant seem to find anyone else having this problem.
Error: You can see the error you get when you open the https page by either opening the page https://josipmuzic.com, but I'll also provide it here
Fastly error: unknown domain: josipmuzic.com. Please check that this domain has been added to a service
Details cache-vie6323-VIE
Note: You likely can't open the HTTP page either because through Cloudflare I made it so that it always redirects from HTTP to HTTPS

For me, the errror disappeared after I put AAAA records to the DNS management. According to this documentation, you should put these IP addresses:
2606:50c0:8000::153
2606:50c0:8001::153
2606:50c0:8002::153
2606:50c0:8003::153
each as Value to a separate row, your domain URL (e.g.: example.com) to Name, AAAA to Type.

How to prevent issues with our server company

We are trying to use Stormcrawler to crawl grab the index page of every site that we know a domain of - politely and ignoring any where robots.txt tell us not to. We have a database of domains - around 250m of them - and we are using that as a start. The idea is that we will crawl these once a week.
We have had a number of warnings from our server provider
Currently our crawls attempt to go to a domain name - ie abc123.com - when we do this and the domain does not resolve, this gets 'flagged'. Obviously there are MANY domains that don't resolve and point to the same IP address and therefore when we try to access a large number of domains that don't work we think this causes our provider to send an alert to us.
Our plan is after the first crawl that we will identify the domains that do not work and we will only crawl these on a monthly basis to see if they have become live, but any help would be appreciated. Apologies for being a bit naive also, so any help/guidance will be appreciated

The alerts from your server provider are probably triggered during the DNS resolution. What DNS servers are used on your machines? They are probably the ones from your provider, have you tried using different ones e.g. OpenDNS or Google's? They might even be faster than the ones you are currently using. I'd also recommend using a DNS cache on your servers.

Azure Traffic Manager, Priority Mode: Browser refresh won't go to secondary node when primary goes down

We are testing out Traffic Manager to see if it is a viable solution for failover. If our primary Azure region becomes unavailable for any reason, we want end users to be directed to a secondary location where they can continue using the site.
I have followed the documentation for setting this up and have 3 simple API return pages as endpoints in 3 different regions that simply alert which one you are hitting. I have them prioritized, 1, 2 and 3.
When hitting the .trafficmanager.net URL, the primary is displayed as it should. All 3 show "online" in the traffic manager profile. If I stop the primary site, then refresh my browser, I get a 403 error stating that the site has stopped.
I set the TTL in the traffic manager profile configuration to 60 seconds. However, after 15+ minutes, the browser still displays the 403. The only way I seem to be able to get the secondary site to pull up is by starting a new browser session. It's like there is some sort of caching and/or TTL issue with the browser session that prevents it from trying the secondary site.
This obviously wouldn't be acceptable in a live, production environment. There has to be a way around this, right? Has anyone else dealt with this issue?

The browser might be using Keep-Alive
Keep in mind that Azure Traffic Manager works at the DNS level so, rather than using a browser to get a repro, try to get a repro with some DNS tools like dig, nslookup, etc.

This isn't just a browser setting. Your IIS Manager could be considered to use keep-alive to reduce strain on itself, thus leaving open connections that completely bypass the Traffic Manager's DNS rules. I had these exact same symptoms, and was able to alleviate them by following steps I posted here. Whether it'll prove useful in a real-world scenario has yet to be seen, but I'm hoping this will help you get further.

What file is causing this URL redirection in AWS?

I'm migrating over a test site to AWS from another company. They've been nothing but unhelpful in giving up the necessary credentials and info to make this a seamless transition.
My test site, now, has everything it needs to be a perfect test site. Looks exactly like the current up and running site, has all the databases, necessary bells and whistles. The only issue is that my AWS public DNS is redirecting to the live server.
I've tried removing all .htaccess files from the EC2 instance and the S3 buckets. I've tried searching for any and all files that would cause this redirect. The live server has nothing on it that would cause this as well.
The IT department of the client only knew that there was some code injection in some file to help redirect every URL the client owns to the same site. I'm at my wits end with non-cooperative dev shops and don't want to spend more time digging through endless files for some few lines of code.
Am I forgetting / missing / overlooking something here? Before I go crazy.

What do you mean unhelpful giving credentials and information? AWS is a IaaS company, you are responsable for the setup and configuration of your servers. They do offer a paid support plan if you will like to purchase it but it's pretty straightforward to get your access keys when you create an EC2 or RDS instance.
Why don't you fix you problem at the dns level? Simply create a subdomain where you will host the temporary test site on the testing server, see if everything works and then change the dns configuration to the live server.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string