2018-09-02

Replacing the AWS ELB - The Design

This is Part 3 in the Replacing the AWS ELB series.
  1. Replacing the AWS ELB - The Problem
  2. Replacing the AWS ELB - The Challenges
    1. Replacing the AWS ELB - The Design (this post)
    2. Replacing the AWS ELB - The Network Deep Dive
    3. Replacing the AWS ELB - Automation
    4. Replacing the AWS ELB - Final Thoughts

    So how do you go about using an IP address in a VPC and allow it to jump between availability zones?

    The solution to this problem was mentioned briefly in a slide in a re:invent session - which for the life of me I could not find (when I do I will post the link).

    The idea is to create an "overlay" network within the VPC - which allows you to manage IP addresses even though they don't really exist in the VPC.

    A simple diagram of such a solution would look something like this:

    standard_haproxy

    Each instance would be configured with an additional virtual interface - with an IP address that was not part of the CIDR block of the VPC - that way it would not be a problem to move it from one subnet to another.

    If the IP address does not actually exist inside the VPC - how do you get traffic to go to it?

    That is actually a simple one to solve - by creating a specific route on each of the subnets - that routes traffic to a specific ENI (yes it is possible).

    add_route

    The process would be something like this:

    start

    An instance will try to access the virtual IP - it will go to the Route table on the Subnet and and because of the specific entry - it will be routed to a specific instance.

    The last piece of the puzzle is how do you get the route to jump from one instance to the other instance of haproxy, this would be the initial state.

    initial

    haproxya fails or the AZ goes down

    haproxya_fail
    haproxyb recognizes this failure
    recognize_failure

    And then makes a call to the AWS API to move the route to a different ENI located on haproxyb

    move_to_haproxyb

    In the next post - we will go into a bit more detail on how the network is actually built and how the failover works.

    2018-08-29

    Replacing the AWS ELB - The Network Deep Dive

    This is Part 4 in the Replacing the AWS ELB series.

  3. Replacing the AWS ELB - The Problem
  4. Replacing the AWS ELB - The Challenges
  5. Replacing the AWS ELB - The Design
  6. Replacing the AWS ELB - The Network Deep Dive  (this post)
  7. Replacing the AWS ELB - Automation
  8. Replacing the AWS ELB - Final Thoughts

  9. Why does this whole thing with the network actually work? Networking in AWS is not that complicated - (sometimes it can be - but it is usually pretty simple) so why do you need to add in an additional IP address into the loop - and one that is not even really part of he VPC?

    To answer that question - we need to understand the basic construct of the route table in an AWS VPC. Think of the route table as a road sign - which tells you where you should go .

    directions
    Maybe not such a clear sign after all
    (Source: https://www.flickr.com/photos/nnecapa)


    Here is what a standard route table (RT) would look like

    route

    The first line says that all traffic that is part of your VPC - stays local - i.e. it is routed in your VPC, and the second line says that all other traffic that does not belong in the VPC - will be sent another device (in this case a NAT Gateway).

    You are the master of your RT - which means you can route traffic destined for any address you would like - to any destination you would like. Of course - you cannot have duplicate entries in the RT or you will receive an error.

    route_error1

    And you cannot have a smaller subset the traffic routed to a different location - if a larger route already exists.

    route_error2

    But otherwise you can really do what you would like.
    So defining a additional interface on an instance is something that is straight forward.

    For example on a Centos/RHEL instance you create a new file in /etc/sysconfig/network-scripts/
    DEVICE="eth0:1"
    BOOTPROTO="none"
    MTU="1500"
    ONBOOT="yes"
    TYPE="Ethernet"
    NETMASK=255.255.255.0
    IPADDR=172.16.1.100
    USERCTL=no

    This will create a second interface on your instance.

    ip

    Now of course the only entity in the subnet that knows that the IP exists on the network - except the instance itself.
    That is why you can assign the same IP address to more than a single instance.

    network_4


    Transferring the VIP to another instance

    In the previous post the last graphic showed in the case of a failure - haproxyb would send an API request that would transfer the route to the new instance.

    keepalived has the option to run a script that execute when the it's pair fails - it is called a notify

    vrrp_instance haproxy {
      [...]
      notify /etc/keepalived/master.sh
    }


    That is a regular bash script - and that bash script - can do whatever you would like, luckily that allows you to manipulate the routes through the AWS cli.

    aws ec2 replace-route --route-table-id <ROUTE_TABLE> --destination-cidr-block <CIDR_BLOCK> --instance-id <INSTANCE_ID>

    The parameters you would need to know are:

    • The ID of the route-table entry you need to change
    • The network of that you want to change
    • The ID of the instance that it should be

    Now of course there are a lot of moving parts that need to come into place for all of this to work - and doing it manually would be a nightmare - especially at scale - that is why automation is crucial.

    In the next post - I will explain how you can achieve this with a set of Ansible playbooks.