How to prevent issues with our server company - stormcrawler

We are trying to use Stormcrawler to crawl grab the index page of every site that we know a domain of - politely and ignoring any where robots.txt tell us not to. We have a database of domains - around 250m of them - and we are using that as a start. The idea is that we will crawl these once a week.
We have had a number of warnings from our server provider
Currently our crawls attempt to go to a domain name - ie abc123.com - when we do this and the domain does not resolve, this gets 'flagged'. Obviously there are MANY domains that don't resolve and point to the same IP address and therefore when we try to access a large number of domains that don't work we think this causes our provider to send an alert to us.
Our plan is after the first crawl that we will identify the domains that do not work and we will only crawl these on a monthly basis to see if they have become live, but any help would be appreciated. Apologies for being a bit naive also, so any help/guidance will be appreciated

The alerts from your server provider are probably triggered during the DNS resolution. What DNS servers are used on your machines? They are probably the ones from your provider, have you tried using different ones e.g. OpenDNS or Google's? They might even be faster than the ones you are currently using. I'd also recommend using a DNS cache on your servers.

Related

Googlebot and Google Mobile bot Crawl Error after Domain IP and Domain DNS changed (Site not Reachable)

in past week I had to change my server to cloud (Digital Ocean Droplet), I am using a shared service but the concurrent user reached the number of Php execution (30). I shifted the entire site and site is up and running successfully, moreover Yandex and Bing are able to crawl my website but it is Google that I want.
I have got like 100K errors in console dashboard and raising, google ads bot isn't able to crawl my pages too. I have checked the following and there is no error in these.
.htaccess and redirections.
SSL
DNS records (I shifted name servers to DO and then back to registrar to see if the DNS was the error) but it doesn't seem like it is.
I double checked robots.txt it is fine by the google robots.txt validator and other search engines.
Similar setups are running on other servers with no changes at all the are fine.
UfW, I am new to it but due to its temporary nature I don't think it is the reason. I disabled it and checked it doesn't make difference.
I haven't blocked anything on apache so it should be good too.
The error that appears is attached at screenshots
Help me out as instead of scaling, I am going down bad.
I repointed the DNS through another service it took its time but it is resolved. I wasn't sure about the error, now I am, it is because of improper or partial DNS resolution issue.

What is the best way to move a large and complex site (which will need extensive testing) to a new server?

We need to move a large and complex site to a new server, and we need to ensure there is minimal downtime / disruption for our members.
This is what we have in mind:
Create a new domain (e.g. oursite.club) on the new server, for testing purposes. (We're planning on doing this because the temporary URL that works with some hosts / sites does not appear to work on ours)
Upload all existing code and import all databases to the new server.
Make any changes to allow for the new server path / domain name / different database names, where necessary.
Thoroughly test the .club site.
Add the current live domain (e.g. oursite.com) to the new server as an alias of the .club domain.
Once we're confident the new site works, change the DNS of the .com to point to the new server.
Also, add htaccess redirects on the old .com site to point to the same page on the .club domain, so that the site can still be accessed by those members for whom nameservers are not yet propagated.
After everything is working and all nameservers should have propagated, we want to make the .com the "real" domain and optionally retain the .club as an alias (i.e. swap the two domains on the new server).
This is what we have in mind:
Delete the add-on domain from the new server (i.e. the .club) and the alias domain (i.e. the .com) - but leave all the code / data where it is.
Re-add the .com, but point that at the server path where the .club code was uploaded.
Re-add the .club, but as an alias of the .com (or use URL forwarding at our domain registrar).
Make sure everything still works.
Update the htaccess redirects on the old server to send .com traffic to the .com domain on the new server, just in case there are any nameservers still pointing to the old server.
Since the nameservers for the two domains are already pointing to the new server, and that won't change, we thought this would offer the fastest way of swapping these two domains around.
We've done some testing with spare domains and a default WP blog, and this seems to work, but we wanted to call on your collective experience here to ask:
A. Will this work?
B. If not, what have we forgotten to take into account?
C. If so, good, but is there a better / easier way to accomplish this (especially the bit where we flip-flop the two domain names on the new server so that the current .com domain remains as the primary one to use)? For example, would it work if we add the .com to the new server (instead of creating a .club) and then edit our local hosts file during testing? Or can we use a VirtualHost directive to help with the domain swapping?
Although we've been building / running site for many years now, this is the first time we've had to move a membership site that is business-critical and where any excessive downtime / issues will cause us and/or our members real issues.
Many thanks!
I'm not sure if I'm missing something significant here (I don't entirely understand your need to flip flop the domains at the end), so I'll quite likely change this answer depending on comments etc (or delete it if I've completely missed a point!)
My approach to this would be
Set your DNS TTL for www.example.com to a very low number, 600 seconds perhaps. Don't plan the move until your current TTL expires (from the moment you make the change)
Create website on new server,configure the site under your example.club domain and example.com
Configure the hosts file on your testing machines with the IP Address of the server, and www.example.com This way your testing machines are going to actually test the full structure of example.com.
Test against example.club if you are concerned about broken links to the new domain, but since that is only temporary, it shouldn't really matter.
when it is tested and you are confident it is working, switch the DNS over.
Once you are comfortable that everything is working as expected reset your DNS TTL to a more sane figure.
As long as your database is in sync, there would be minimal (if any) downtime from this and at the end you can simply delete the example.club reference without any impact.
Depending on the value of the site, and the impact of potential downtime it might be worth putting a load balancer between both sites, and gradually increase the traffic that is sent to the new site so you can monitor progress. (you could use a load balancer from Azure which has no restrictions on being used against external sites)

Enabling SSL for a subdomain in IIS

I recently bought SSL for my website and want to create a section within the site in the form of https://secure.example.com/member/upgrade.aspx. However, I am having a hard time solving this issue since currently my website URL rewrite prohibits any subdomain and the user is logged out if he or she gets transferred to the above link.
I have search online and found some good information such as dynamically create the url without actually creating a subdomain in IIS.
Questions:
What steps are needed to achieve the objective above?
Should I have bought the wildcard certificate instead of one for a specific subdomain?
Thank you.
One option would be ignoring that url pattern for rewrite purposes or ignoring the url if the protocol is HTTPS. That said, I would take a slightly different approach here and just put the entire site behind SSL -- rewriting all the queries to the other protocol works and google is now giving rankings bumps to HTTPS so there are good business reasons to make the switch. You are already taking the pain of getting SSL involved at all -- the dedicated IP and certficate cost the same if you use them on a single page or all the pages, might as well take advantage of it and ease your management burden in the same motion.

Is there a way to find all existing subdomains of one main domain?

I work for Johns Hopkins University, and our web culture here has been an unruled wilderness for many years. We're trying to get a handle on the enormous number of registered subdomains across our part of the web-universe, and even our IT department is having some trouble tracking down the unabridged list.
Is there a tool or a script that would do this quickly and semi-easily? I'm a developer and would write something but I want to find out if this wheel has been created already.
Alternatively, is there a fancy way to google search, more than just *.jhu.edu or site: .jhu.edu, because those searches turn up tons of sites that use "jhu.edu" in the end of their urls (ex. www.keywordspy.com/organic/domain.aspx?q=cer.jhu.edu)
Thanks for your thoughts on this one!
The Google search site:*.jhu.edu seems to work well for me.
That said, you can also use Wolfram Alpha. Using this search, in the third box click "Subdomains" and then in the new subdomains section that is created click "More".
As #Mark B alluded to in his comment, the only way a domain name (sub or otherwise) has any real value is if a DNS service maps it to a server so that a browser can send it a request. The only way to track down all of the sub-domains is to track down their DNS entries. Thankfully, DNS servers are fairly easy to find, depending on the level of access you have to the network infrastructure and the authoritative DNS server for the parent domain.
If you are able to, you can pull DNS traffic from firewall logs in and around your network. That will let you find DNS servers that are being sent requests for your sub-domains.
Easier though would be to simply follow the DNS trail. The authoritative DNS server for your domain (jhu.edu) will have pointers to the other DNS servers that are authoritative for sub-domains (if your main one is not authoritative already).
If you have access to the domain registrar and have the proper authorization, you should be able to contact technical support and request the zone file or even export it yourself depending on the provider.

Subdomains and DNS

I currently have my own domain name and dedicated server and I offer different packages to my clients. What I want to be able to do is have them sign up with my website and create a package automatically that they can access via their username as a subdomain e.g.
http://yourusername.mywebsite.com
I currently have DNS entries set up for various subdomains with real information for my website e.g.
Name Type IP Address
# A 1.2.3.4
bugs A 1.2.3.4
support A 1.2.3.4
However, if a new customer signs up at the moment I have to go and manually create an entry for them with their username in it.
I'm sure I've seen websites that manage to do this automatically, does anyone have any ideas how, or any other methods that I should be using?
Thanks,
Mark
Since you apparently do not control the name servers, your choices are quite limited. One possibility is to use a wildcard DNS record:
* A 192.0.2.1
where the star will replace every name. Not ideal (inexisting domains will also appear).
The details depend on which DNS server you're using.
One approach is to have some code that opens the DNS zone file and adds the desired records. On Linux with Bind, you will then need to signal the server to get it re-read the zone file.
With Simple DNS Plus, you can easily add such a DNS record through the included HTTP API. For example:
http://127.0.0.1:8053/updatehost?host=yourusername.mywebsite.com&data=1.2.3.4
Since you apparently do not control the name servers, your choices are quite limited. Nevertheless, every serious DNS hoster provide you with a API (see for instance Slicehost's API). So, you may use this API and write a small program to update the DNS data.
(Foot note: handling paying customers when you do not even control the name servers seem... bad)

Resources