Setting Up Etcd Cluster With DNS Discovery

Setting up an Etcd cluster with DNS discovery may be challenging. First of all, there are several building blocks:

  • Etcd – a distributed key value store
  • Amazon EC2 – a cloud computing provider
  • Cloudflare – a DNS provider
  • Chef – for configuring individual nodes

Each of them has its pitfalls, but we will guide you through whole process.

DNS Discovery

Any clustered system needs a way to maintain a list of nodes in a cluster. Usually, you need to specify all cluster members when starting a node. This is the way zookeeper and consul work. Effectively, you have a redundancy in configuration – every node stores a list of nodes. The list must be consistent, and it’s difficult to maintain it especially if the cluster lives a long life. Old nodes break and get removed, new nodes get added, cluster size may grow/shrink over time. All that makes cluster configuration maintenance burdensome and error prone.

DNS discovery is Etcd’s killer feature. Basically, it means that you keep a list of cluster nodes in one place. All interested parties – cluster nodes, clients, monitoring agents – get it from DNS. There is one single copy of the list, thus, there is no chance for inconsistency in the cluster view. The Etcd team should advertise this feature with capital letters on their website.

Process Overview

Three nodes will form the cluster. This is the minimal number of nodes in a cluster that is tolerant to one failure whether expected or not.

For each node, we start an Amazon EC2 instance and create a DNS record.

When all instances are ready and DNS is prepared, we start Etcd on three nodes simultaneously.

Etcd cluster setup algorithm

Starting Amazon EC2 Instance

We will illustrate each step with extracts from real code excerpts. We will skip unimportant parts of the code, so copy&paste won’t work. 🙂

INSTANCE_TYPE = "t2.micro"
KEYPAIR_NAME = "deployer"
SUBNET_ID = "subnet-12c7b638"   # private, infra
SECURITY_GROUP = "sg-f525808e"  # default VPC security group
ZONE_NAME = "twindb.com"
SSH_OPTIONS = "-o StrictHostKeyChecking=no"

Function setup_new_cluster() starts a cluster of a given size.
It calls the function launch_ec2_instance() in a loop that starts an EC2 instance and waits until it’s available via SSH.

def setup_new_cluster(size=CLUSTER_SIZE):

    Log.info("Initiating cluster with %d nodes" % size)

    nodes = []
    # Create file with private SSH key
    deployer_key_file = NamedTemporaryFile(bufsize=0)

    for i in range(size):
        node_id = uuid.uuid1().hex
        Log.info("Configuring node %s" % node_id)

        node_name = "etcd-%s" % node_id

        Log.info("Starting instance for node %s" % node_id)
        instance_id = launch_ec2_instance(AWS_DEFAULT_AMI, INSTANCE_TYPE, KEYPAIR_NAME, SECURITY_GROUP, SUBNET_ID,
                                          ROOT_VOLUME_SIZE, DATA_VOLUME_SIZE,
                                          get_instance_username_by_ami(AWS_DEFAULT_AMI), deployer_key_file.name,
                                          public=False, name=node_name)
        if not instance_id:
            Log.error("Failed to launch EC2 instance")

        Log.info("Launched instance %s" % instance_id)

Creating DNS Records

Three DNS records must be created for each node :

  1. A record to resolve the host name into an IP address.
  2. SRV record that tells what TCP port is used to serve other cluster nodes.
  3. SRV record that tells what TCP port is used to serve client requests.

Eventually, we should get the following DNS records:

A cluster node requests these records when it wants to know what other peers are and what ports they listen to.

$ dig +noall +answer SRV _etcd-server._tcp.twindb.com
_etcd-server._tcp.twindb.com. 299 IN SRV 0 0 2380 etcd-1e1650524ba511e68a9b12cb523caae1.twindb.com.
_etcd-server._tcp.twindb.com. 299 IN SRV 0 0 2380 etcd-6794c99e4ba411e68a9b12cb523caae1.twindb.com.
_etcd-server._tcp.twindb.com. 299 IN SRV 0 0 2380 etcd-c3e9dea04ba411e68a9b12cb523caae1.twindb.com.

When communicating with a client, the cluster requests these SRV records to know what host name to connect to and to which port.

$ dig +noall +answer SRV _etcd-client._tcp.twindb.com
_etcd-client._tcp.twindb.com. 299 IN    SRV 0 0 2379 etcd-6794c99e4ba411e68a9b12cb523caae1.twindb.com.
_etcd-client._tcp.twindb.com. 299 IN    SRV 0 0 2379 etcd-1e1650524ba511e68a9b12cb523caae1.twindb.com.
_etcd-client._tcp.twindb.com. 299 IN    SRV 0 0 2379 etcd-c3e9dea04ba411e68a9b12cb523caae1.twindb.com.

And finally A records to resolve host names.

$ dig +noall +answer etcd-6794c99e4ba411e68a9b12cb523caae1.twindb.com. etcd-1e1650524ba511e68a9b12cb523caae1.twindb.com. etcd-c3e9dea04ba411e68a9b12cb523caae1.twindb.com.
etcd-6794c99e4ba411e68a9b12cb523caae1.twindb.com. 299 IN A
etcd-1e1650524ba511e68a9b12cb523caae1.twindb.com. 299 IN A
etcd-c3e9dea04ba411e68a9b12cb523caae1.twindb.com. 299 IN A

At TwinDB, we use CloudFlare to store twindb.com zone. CloudFlare provides API that we’re going to use.

def setup_new_cluster(size=CLUSTER_SIZE):
        dns_record_name = node_name + "." + DISCOVERY_SRV_DOMAIN
        private_ip = get_instance_private_ip(instance_id)
        if not create_dns_record(dns_record_name,
            Log.error("Failed to create an A DNS record for %s" % dns_record_name)

        if not create_dns_record("_etcd-server._tcp." + DISCOVERY_SRV_DOMAIN,  # "_etcd-server._tcp.twindb.com"
                                 "0\t2380\t%s" % dns_record_name,
                                     "name": DISCOVERY_SRV_DOMAIN,
                                     "port": 2380,
                                     "priority": 0,
                                     "proto": "_tcp",
                                     "service": "_etcd-server",
                                     "target": dns_record_name,
                                     "weight": 0
            Log.error("Failed to create a SRV record for %s" % dns_record_name)
            Log.error("Trying to delete DNS record for %s" % dns_record_name)
            delete_dns_record(dns_record_name, ZONE_NAME)
            Log.error("Trying to terminate instance %s" % instance_id)

        if not create_dns_record("_etcd-client._tcp." + DISCOVERY_SRV_DOMAIN,
                                 "0\t2379\t%s" % dns_record_name,
                                     "name": DISCOVERY_SRV_DOMAIN,
                                     "port": 2379,
                                     "priority": 0,
                                     "proto": "_tcp",
                                     "service": "_etcd-client",
                                     "target": dns_record_name,
                                     "weight": 0
            Log.error("Failed to create a SRV record for %s" % dns_record_name)
            Log.error("Trying to delete DNS record for %s" % dns_record_name)
            delete_dns_record(dns_record_name, ZONE_NAME)
            Log.error("Trying to terminate instance %s" % instance_id)

For reference, this is the code to work with CloudFlare API:

def cf_api_call(url, method="GET", data=None):

    cmd = ["curl", "--silent", "-X", method,
           "https://api.cloudflare.com/client/v4%s" % url,
           "-H", "X-Auth-Email: %s" % CLOUDFLARE_EMAIL,
           "-H", "X-Auth-Key: %s" % CLOUDFLARE_AUTH_KEY,
           "-H", "Content-Type: application/json"
    if data:
        Log.debug("Executing: %r" % cmd)
        cf_process = Popen(cmd, stdout=PIPE, stderr=PIPE)
        cout, cerr = cf_process.communicate()

        if cf_process.returncode != 0:
            return None

            return json.loads(cout)

        except ValueError as err:
            return None

    except OSError as err:
        return None

def create_dns_record(name, zone, content, data=None, record_type="A", ttl=1):

    zone_id = get_zone_id(zone)

    url = "/zones/%s/dns_records" % zone_id
    request = {
        "name": name,
        "content": content,
        "type": record_type,
        "ttl": ttl

    if data:
        request["data"] = data

    response = cf_api_call(url, method="POST", data=json.dumps(request))

    if not response["success"]:
        for error in response["errors"]:
            Log.error("Error(%d): %s" % (error["code"], error["message"]))

    return bool(response["success"])

It takes time before DNS changes we made are propagated and available on a node. We should wait until DNS is ready:

def setup_new_cluster(size=CLUSTER_SIZE):
        # wait till dns_record_name resolves into private_ip
        Log.info("Waiting till DNS changes are propagated")
        while True:
                ip = socket.gethostbyname(dns_record_name)
                if ip == private_ip:
                    Log.error("%s resolved into unexpected %s" % (dns_record_name, ip))
            except socket.gaierror:

        # Save node in a list. We will need it later
            'key': deployer_key_file.name,
            'ip': private_ip,
            'name': node_name

At this point, we should have three Amazon EC2 instances up & running and DNS records ready.

Bootstrapping Etcd Node

We use Chef recipe for etcd cluster. There are two gotchas with the recipe:

  1. By default it installs an ancient Etcd version 2.2.5 which is buggy.
  2. The recipe installs an init script that will fail if you start the first node (See Bug#63 for details). By the way, I got no feedback from the Chef team as of time of writing, but they didn’t forget to send me a bunch of sales cold calls and spam. Look up to Etcd team, they’re extremely responsive even on weekends.

Etcd Recipe Attributes

We need to specify only one attribute – the domain name.

default['etcd-server']['discovery_srv'] = 'twindb.com'

Etcd Recipe

etcd_installation 'default' do
  version '3.0.2'
  action :create

etcd_service node.default['chef_client']['config']['node_name'] do
  discovery_srv node.default['etcd-server']['discovery_srv']

  initial_advertise_peer_urls 'http://' + node.default['chef_client']['config']['node_name'] + '.twindb.com:2380'
  advertise_client_urls 'http://' + node.default['chef_client']['config']['node_name'] + '.twindb.com:2379'

  initial_cluster_token 'etcd-cluster-1'
  initial_cluster_state 'new'

  listen_client_urls ''
  listen_peer_urls ''
  data_dir '/var/lib/etcd'
  action :start

When the recipe is ready (we use our own chef server), we can bootstrap three cluster nodes. Remember, we need to start them simultaneously.

def setup_new_cluster(size=CLUSTER_SIZE):
    pool = Pool(processes=size)
    pool.map(bootstrap_node, nodes)

Code to bootstrap one node:

def bootstrap(key, ip, node_name):

        username = get_instance_username_by_ami(AWS_DEFAULT_AMI)

        hosts_file = os.environ['HOME'] + "/.ssh/known_hosts"
        if isfile(hosts_file):
            run_command("ssh-keygen -f " + hosts_file + " -R " + ip)

        cmd = "knife bootstrap " + ip \
              + " --ssh-user " + username \
              + " --sudo --identity-file " + key \
              + " --node-name " + node_name \
              + " --yes " \
                " --run-list 'recipe[etcd-server]'"

    except CalledProcessError as err:
        return False

    return True

def bootstrap_node(node):
    Bootstrap node
    :param node: dictionary with node parameters. Dictionary must contain keys:
        key - path to SSH private key
        ip - IP address of the node
        name - node hostname
    :return: True if success or False otherwise
        return bootstrap(node['key'], node['ip'], node['name'])
    except KeyError as err:
        return False

Checking Health of the Etcd Cluster

Now we can communicate with the Etcd cluster from any host with installed etcdctl:

$ etcdctl --discovery-srv twindb.com cluster-health
member 83062705e5ba24af is healthy: got healthy result from http://etcd-c3e9dea04ba411e68a9b12cb523caae1.twindb.com:2379
member 9fca41c9f65e3e96 is healthy: got healthy result from http://etcd-1e1650524ba511e68a9b12cb523caae1.twindb.com:2379
member b8dfb16b4af1fd49 is healthy: got healthy result from http://etcd-6794c99e4ba411e68a9b12cb523caae1.twindb.com:2379
cluster is healthy

Have a good service discovery!

