How can I setup CoreOS cluster on across multiple data centers? - coreos

I thought it is as simple as using public ip instead of private ip so that the machines can see each other, but that is not the case.
Here is my cloud-config file, which is very basic.
#cloud-config
coreos:
etcd:
# generate a new token for each unique cluster from https://discovery.etcd.io/new
discovery: https://discovery.etcd.io/<token>
# use $public_ipv4 if your datacenter of choice does not support private networking
addr: $public_ipv4:4001
peer-addr: $public_ipv4:7001
fleet:
public-ip: $public_ipv4 # used for fleetctl ssh command
units:
- name: etcd.service
command: start
- name: fleet.service
command: start
What do I need to have the cluster span across multiple data centres? For example, I use Digital Ocean and provision a machine in Singapore, another in New York, and another in Amsterdam.
My secondary question is, since I cannot find support for this use case easily, I wonder if this is not a recommended way to use CoreOS, and if it is not, could you tell me what is the prefer way to distribute my services across multiple data centres?

You can do this, but it requires tuning your etcd cluster. The distances you are talking about are fairly large, so expect to use some long timeouts. More docs here: https://coreos.com/etcd/docs/2.0.8/tuning.html

Related

Nginx: pair nginx-balancer

At the moment there is a nginx-balancer (Centos 7, a virtual machine with a white address) proxying to a large number of backend Apache servers. It is necessary to implement a failover cluster of two balancers on nginx. Fault tolerance is trially implemented using a virtual ip address (keepalived is used). Tell me what you can read about the pair nginx-balancer or how it can be implemented: all requests coming to them on the same virtual ip-address are evenly distributed between the two of them, but if one of them fails, the remaining one takes everything on itself?
At the moment, it turns out that there are two identical balancers and the benefit of the second is only in insurance. In the moments of full work of the main (master), the second (backup) is uselessly idle.
What you are describing is active-active HA.. you can find something on google for nginx+ but by briefly looking at it I don't really see it as true active/active = there is not just one virtual (floating) IP.. instead active/active is achieved by using two floating IPs (two VRRP groups - one VIP address active on each nginx) and then using round-robin DNS A record containing both addresses.
As far as I know keepalived is using VRRP protocol which in some implementations can provide 'true' active/active.. anyway I'm not sure keepalived supports this. Based on informatin I'm able to lookup it's not possible.

Bootstrap AKS agent nodes through terraform

I am currently using terraform to create k8s cluster which is working perfectly fine. Once the nodes are provisioned, I want to run a few bash commands on any one of the node. So far, null_resource seems like an option since it is a cluster and we are unaware of the node names/IPs. However, I am unable to determine what should be the value of connection block since azurerm_kubernetes_cluster does not export the IP address of the load balancer or the vm names. The question mark needs the correct value in the below:
resource "null_resource" "cluster" {
triggers = { "${join(",", azurerm_kubernetes_cluster.k8s.id)}" }
connection = { type = ssh
user = <user>
password = <password>
host = <?>
host_key = <pub_key>
}
}
Any help!
AKS does not expose the nodes of it to the Internet. And you just can connect the nodes through the master of the cluster. If you want to run a few bash commands in the nodes, you can use the SSH connection that makes a pod as a helper to connect to the nodes, see the steps about SSH node access.
Also, you can add the NAT rules for the nodes in the Load Balancer, then you can also SSH to the nodes through the Load Balancer public IP. But it's not a secure way. So I do not suggest this way.
Would recommend just running a daemonset that performs the bash commands on the nodes. As any scale or update operations are going to remove or not have the updated config you are performing on the nodes.
There was no straightforward solution for this one. Static IP was not the right way to do it and hence, I ended up writing a wrapper around terraform. I did not want to run my init scripts on every node that comes up but only one of the nodes. So essentially, now that wrapper communicates with terraform to first deploy only one node which executes cloud-init. After this, it recalls the function to scale terraform and brings up rest of the desired number of instances. In the cloud-init script, I have a check of kubectl get no where if I receive the size as more than one node, I simply skip the cloud-init commands.

Docker create two bridges that corrupts my internet access

I'm facing a pretty strange issue:
Here is my config:
docker 17-ce
ubuntu 16.04.
I work from two differents places with differents internet providers.
On the first place, everything works just fine, i can run docker out of the box and access internet without any problems.
But on the second place i cannot access the internet while docker is running, more precisly while the two virtual briges created by docker are up.
In this place, internet connection operate very strangly, i can ping google dns at 8.8.8.8, but nearly all dns request failed and most of the time after a few seconds the internet connection is totally down.
( The only difference between the first and the second place is the internet provider ).
At first i tought i could fix that by changing the default network bridge ip, but this does not solve the problem at all.
The point is that the --bip option of the docker daemon change the IP of the default docker bridge docker0, but docker also create an other bridge called br-1a0208f108d9 which does not reflect the settings passed to the --bip option.
I guess that this second bridge is causing trouble to my network because it overlap my wifi adapter configuration.
I'm having a hard time trying to diagnosticate this.
My questions are:
How can i be sure that my asumptions are right and that this second bridget is in conflict with my wifi adapter
What is this second bridge ? It's easy to find documentation about the docker0 bridge, but i cannot find anything related to this second bridge br-1a0208f108d9
How the exact same setup can work on one place and not an other one.
With this trouble i feel like i'm pretty close to level up my docker knowledges but before that i have to increase my network administration knowledges.
Hope you can help.
I manage to solve this issue after reading this:
https://success.docker.com/Architecture/Docker_Reference_Architecture%3A_Designing_Scalable%2C_Portable_Docker_Container_Networks
The second docker bridge br-1a0208f108d9 was created by docker because i was using a docker-compose file which involve the creation of an other custom network.
This network was using a fixed ip range:
networks:
my_network:
driver: bridge
ipam:
config:
- subnet: 172.16.0.0/16
gateway: 172.16.0.1
At my home, the physical wifi network adapter was automaticly assigned using DHCP the address 192.168.0.X.
But in the other place, the same wifi adapter get the address 172.16.0.x
Which collide with the custom docker network.
The solution was simply to change the IP of the custom docker network.
You have to tell Docker to use a different subnet. Edit /etc/docker/daemon.json and use something like this:
{
"bip": "198.18.251.1/24",
"default-address-pools": [
{
"base": "198.18.252.0/22",
"size": 26
}
]
}
Information is a bit hard to come by, but it looks like the bip option controls the IP and subnet assigned to the docker0 interface, while default-address-pools controls the addresses used for the br-* interfaces. You can omit bip in which case it will grab an allocation from the pool, and bip doesn't have to reside in the pool, as shown above.
The size is how big of a subnet to allocate to each Docker network. For example if your base is a /24 and you also set size to 24, then you'll be able to create exactly one Docker network, and probably you'll only be able to run one Docker container. If you try to start another you'll get the message could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network, which means you've run out of IP addresses in the pool.
In the above example I have allocated a /22 (1024 addresses) with each network/container taking a /26 (64 addresses) from that pool. 1024 รท 64 = 16, so you can run up to 16 Docker networks with this config (so max 16 containers running at the same time, or more if some of them share the same network). Since I rarely have more than two or three running containers at any one time this is fine for me.
In my example I'm using part of the 198.18.0.0/15 subnet as listed in RFC 3330 (but fully documented in RFC 2544) which is reserved for performance testing. It is unlikely that these addresses will appear on the real Internet, and no professional network provider will use these subnets in their private network either, so in my opinion they are a good choice for use with Docker as conflicts are very unlikely. But technically this is a misuse of this IP range so just be aware of potential future conflicts if you also choose to use these subnets.
The defaults listed in the documentation are:
{
"bip": "",
"default-address-pools": [
{"base": "172.80.0.0/16", "size": 24},
{"base": "172.90.0.0/16", "size": 24}
]
}
As mentioned above, the default empty bip means it will just grab an allocation from the pool, like any other network/container will.
In my case I would not apply Clement solution because I have the network conflict only with my dev pc while the container is delivered to many server which are not affected.
This problem in my opinion should be resolved as suggested here.
I tried this workaround:
I stopped the container with "docker-compose down" which destroys the bridge
Started the container while I'm on the "bad" network, so the container use another network
Since then, if I restart the container on any network it doesn't try to use the "bad" one, normally get the last used one.

RabbitMQ Cluster on EC2: Hostname Issues

I want to set up a 3 node Rabbit cluster on EC2 (amazon linux). We'd like to have recovery implemented so if we lose a server it can be replaced by another new server automagically. We can set the cluster up manually easily using the default hostname (ip-xx-xx-xx-xx) so that the broker id is rabbit#ip-xx-xx-xx-xx. This is because the hostname is resolvable over the network.
The problem is: This hostname will change if we lose/reboot a server, invalidating the cluster. We haven't had luck in setting a custom static hostname because they are not resolvable by other machines in the cluster; thats the only part of that article that doens't make sense.
Has anyone accomplished a RabbitMQ Cluster on EC2 with a recovery implementation? Any advice is appreciated.
You could create three A records in an external DNS service for the three boxes and use them in the config. E.g., rabbit1.alph486.com, rabbit2.alph486.com and rabbit3.alph486.com. These could even be the ec2 private IP addresses. If all of the boxes are in the same region it'll be faster and cheaper. If you lose a box, just update the DNS record.
Additionally, you could assign an elastic IPs to the three boxes. Then, when you lose a box, all you'd need to do is assign the elastic IP to it's replacement.
Of course, if you have a small number of clients, you could just add entries into the /etc/hosts file on each box and update as needed.
From:
http://www.rabbitmq.com/ec2.html
Issues with hostname
RabbitMQ names the database directory using the current hostname of the system. If the hostname changes, a new empty database is created. To avoid data loss it's crucial to set up a fixed and resolvable hostname. For example:
sudo -s # become root
echo "rabbit" > /etc/hostname
echo "127.0.0.1 rabbit" >> /etc/hosts
hostname -F /etc/hostname
#Chrskly gave good answers that are the general consensus of the Rabbit community:
Init scripts that handle DNS or identification of other servers are mainly what I hear.
Elastic IPs we could not get to work without the aid of DNS or hostname aliases because the Internal IP/DNS on amazon still rotate and the public IP/DNS names that stay static cannot be used as the hostname for rabbit unless aliased properly.
Hosts file manipulations via an script are also an option. This needs to be accompanied by a script that can identify the DNS's of the other servers upon launch so doesn't save much work in terms of making things more "solid state" config wise.
What I'm doing:
Due to some limitations on the DNS front, I am opting to use bootstrap scripts to initialize the machine and cluster with any other available machines using the default internal dns assigned at launch. If we lose a machine, a new one will come up, prepare rabbit and lookup the DNS names of machines to cluster with. It will then remove the dead node from the cluster for housekeeping.
I'm using some homebrew init scripts in Python. However, this could easily be done with something like Chef/Puppet.
Update: Detail from Docs
From:
http://www.rabbitmq.com/ec2.html
Issues with hostname
RabbitMQ names the database directory using the current hostname of
the system. If the hostname changes, a new empty database is created.
To avoid data loss it's crucial to set up a fixed and resolvable
hostname. For example:
sudo -s # become root
echo "rabbit" > /etc/hostname
echo "127.0.0.1 rabbit" >> /etc/hosts
hostname -F /etc/hostname

Is there a standard tool similar to DNS, but for mapping names to hostname/port number combos?

I have a number of services running on various machines which need to communicate over arbitrary ports. Right now port discovery happens by pushing a config file to each machine which contains mappings of a service-name to a hostname/port combo.
For all the same reasons that DNS works better than manually maintaining an /etc/hosts on each machine, I'd like to have a centralized system to register and lookup these hostname/port combos.
Yes, building a simple version of this system wouldn't take long at all (it's just a key-value store), but ideally the service would be fast, redundant, auto-updating and have fail-over, which would obviously take a bit more time to build from scratch.
I can't imagine I'm the first to need such a tool, but so far my Google-fu has failed me. Is there something out there built for this purpose? Or should I just set up Kyoto Tycoon or ZooKeeper and write a bit of caching/lookup/failover logic myself?
DNS supports SRV records that are designed just for this (service location.)
SRV records are of the following form (courtesy Wikipedia):
_service._proto.name TTL class SRV priority weight port target
service: the symbolic name of the desired service.
proto: the transport protocol of the desired service; this is usually either TCP or UDP.
name: the domain name for which this record is valid.
TTL: standard DNS time to live field.
class: standard DNS class field (this is always IN).
priority: the priority of the target host, lower value means more preferred.
weight: A relative weight for records with the same priority.
port: the TCP or UDP port on which the service is to be found.
target: the canonical hostname of the machine providing the service.
Most modern DNS servers support SRV records.
Avahi advertises services (by port) that each machine offers. (aka Apple's Bonjour)
Not sure if it's exactly what you're looking for, but definately in this vein.
The concept is that each machine would announce what services it is running on each port.
But this is limited to a LAN implementation, which I'm not sure fits your requirements.
To add a little more meat to this answer, here is an example service file for Avahi advertising a webpage:
<?xml version="1.0" standalone='no'?><!--*-nxml-*-->
<!DOCTYPE service-group SYSTEM "avahi-service.dtd">
<service-group>
<name replace-wildcards="yes">%h Web Server</name>
<service>
<type>_http._tcp</type>
<port>80</port>
</service>
</service-group>
I personally think zookeeper is a great fit for this use case. Ephemeral nodes mean that registration cleanup is not a problem, freeing you to use dynamic port allocation on the server side and watches will help with rebalancing client->server mappings. That said, using zookeeper for server side registration and using DNS SRV records for client side lookup(using a zookeeper to dns bridge) would work well for most use cases.

Resources