1. Setting Up Etcd Cluster With DNS Discovery

Setting Up Etcd Cluster With DNS Discovery

сelestial fireworks

Setting up an Etcd cluster with DNS discovery may be challenging. First of all, there are several building blocks:

  • Etcd – a distributed key value store
  • Amazon EC2 – a cloud computing provider
  • Cloudflare – a DNS provider
  • Chef – for configuring individual nodes

Each of them has its pitfalls, but we will guide you through whole process.

DNS Discovery

Any clustered system needs a way to maintain a list of nodes in a cluster. Usually, you need to specify all cluster members when starting a node. This is the way zookeeper and consul work. Effectively, you have a redundancy in configuration – every node stores a list of nodes. The list must be consistent, and it’s difficult to maintain it especially if the cluster lives a long life. Old nodes break and get removed, new nodes get added, cluster size may grow/shrink over time. All that makes cluster configuration maintenance burdensome and error prone.

DNS discovery is Etcd’s killer feature. Basically, it means that you keep a list of cluster nodes in one place. All interested parties – cluster nodes, clients, monitoring agents – get it from DNS. There is one single copy of the list, thus, there is no chance for inconsistency in the cluster view. The Etcd team should advertise this feature with capital letters on their website.

Process Overview

Three nodes will form the cluster. This is the minimal number of nodes in a cluster that is tolerant to one failure whether expected or not.

For each node, we start an Amazon EC2 instance and create a DNS record.

When all instances are ready and DNS is prepared, we start Etcd on three nodes simultaneously.

Etcd cluster setup algorithm

Starting Amazon EC2 Instance

We will illustrate each step with extracts from real code excerpts. We will skip unimportant parts of the code, so copy&paste won’t work. 🙂

Function setup_new_cluster() starts a cluster of a given size.
It calls the function launch_ec2_instance() in a loop that starts an EC2 instance and waits until it’s available via SSH.

Creating DNS Records

Three DNS records must be created for each node :

  1. A record to resolve the host name into an IP address.
  2. SRV record that tells what TCP port is used to serve other cluster nodes.
  3. SRV record that tells what TCP port is used to serve client requests.

Eventually, we should get the following DNS records:

A cluster node requests these records when it wants to know what other peers are and what ports they listen to.

When communicating with a client, the cluster requests these SRV records to know what host name to connect to and to which port.

And finally A records to resolve host names.

At TwinDB, we use CloudFlare to store twindb.com zone. CloudFlare provides API that we’re going to use.

For reference, this is the code to work with CloudFlare API:

It takes time before DNS changes we made are propagated and available on a node. We should wait until DNS is ready:

At this point, we should have three Amazon EC2 instances up & running and DNS records ready.

Bootstrapping Etcd Node

We use Chef recipe for etcd cluster. There are two gotchas with the recipe:

  1. By default it installs an ancient Etcd version 2.2.5 which is buggy.
  2. The recipe installs an init script that will fail if you start the first node (See Bug#63 for details). By the way, I got no feedback from the Chef team as of time of writing, but they didn’t forget to send me a bunch of sales cold calls and spam. Look up to Etcd team, they’re extremely responsive even on weekends.

Etcd Recipe Attributes

We need to specify only one attribute – the domain name.

Etcd Recipe

When the recipe is ready (we use our own chef server), we can bootstrap three cluster nodes. Remember, we need to start them simultaneously.

Code to bootstrap one node:

Checking Health of the Etcd Cluster

Now we can communicate with the Etcd cluster from any host with installed etcdctl:

Have a good service discovery!

Previous Post Next Post