2018-07-09

Comparing CloudFormation, Terraform and Ansible - Part #2

The feedback I received from the first comparison was great – thank you all.

Obviously the example I used was not really something that you would use in the real world – because no-one actually creates a only a VPC – and does not create anything inside it, that is pretty futile.

So let’s go to the next example.

The scenario is to create a VPC, with a public presences and a private presence. This will be deployed across two availability zones. Public subnets should be able to route to the internet through an Internet Gateway, private subnets should be able to access the internet through a NAT Gateway.

This is slightly more complicated than just creating a simple VPC with a one-liner

So to summarize - the end state I expect to have is:

  • 1x VPC (192.168.90.0/24)
  • 4x Subnets
    • 2x Public
      • 192.168.90.0/26 (AZ1)
      • 192.168.90.64/26 (AZ2)
    • 2x Private
      • 192.168.90.128/26 (AZ1)
      • 192.168.90.192/26 (AZ2)
  • 1x Internet Gateway
  • 2x NAT Gateway (I really could do with one – but since the subnets and resources are supposed to be deployed in more than a single AZ – there will be two – and here I minimize the risk impact of loss of service if a single AZ fails)
  • 1x Public Route Table
  • 2x Private Route Table (1 for each AZ)

And all of these should have simple tags to identify them.

(The code for all of these scenarios is located here  https://github.com/maishsk/automation-standoff/tree/master/intermediate)

First lets have a look at CloudFormation


So this is a bit more complicated than the previous example. I still used the native resources in CloudFormation, and set defaults for the my parameters. You will see some built in functions that are available in CloudFormation – namely !Ref which is a reference function to lookup a value that has previously been created/defined in the template and !Sub that will substitute a value in the template with an environment variable.

So there are a few nifty things that are going here.

  1. You do not have remember resource names – CloudFormation keeps all the references in check and allows you to address them by name in other places in the template.
  2. CloudFormation manages the order in which the resources are created and takes of care of all of that for – and it will take care of the order what resources are created.

    For example – the route table for the private subnets will only be created after the NAT gateways have been created.
  3. More importantly – when you tear everything down – then CloudFormation takes care of the ordering for you, i.e. you cannot tear down a VPC – while the NAT gateways and Internet gateway are still there – so you need to delete those first and then you can go ahead and rip the everything else up.


Lets look at Ansible. There are built-in modules for this ec2_vpc_net, ec2_vpc_subnet, ec2_vpc_igw, ec2_vpc_nat_gateway, ec2_vpc_route_table.


As you can see this is bit more complicated than the previous example – because the subnets have to be assigned to the correct availability zones.

There are are a few extra variables that needed to be defined in order for this to work.


Last but not least – Terraform.

And a new set of variables



First Score - # lines of Code (Including all nested files)

Terraform – 164

CloudFormation - 172

Ansible – 204

(Interesting to see here how the order has changed)

Second Score - Easy of deployment / teardown.

I will not give a numerical score here - just to mention a basic difference between the three options.

Each of the tools use a  simple command line syntax to deploy

  1. CloudFormation

    aws cloudformation create-stack --stack-name testvpc --template-body file://vpc_cloudformation_template.yml

  2. Ansible

    ansible-playbook create-vpc.yml

  3. Terraform

    terraform apply -auto-approve

The teardown is a bit different

  1. CloudFormation stores the information as a stack - and all you need to do to remove the stack and all of its resources is to run a simple command of:

    aws cloudformation delete-stack --stack-name <STACKNAME>

  2. Ansible - you will need to create an additional playbook for tearing down the environment - it does not store the state locally. This is a significant drawback – you have to make sure that you have the order correct – otherwise the teardown will fail. this means you need to understand as well how exactly the resources are created.

    ansible-playbook remove-vpc.yml

  3. Terraform - stores the state of the deployment - so a simple run will destroy all the resources

    terraform destroy -auto-approve

You will see below that the duration of the runs are much longer than the previous example – the main reason being that the amount of time it takes to create a NAT gateway is long – really long (at least 1 minute per NAT GW) because AWS does a lot of grunt work in the background to provision this “magical” resource for you.

You can find the full output here of the runs below:

Results

Terraform
create: 2m33s
destroy: 1m24s

Ansible:
create: 3m56s
destroy: 2m12s

CloudFormation:
create: 3m26s
destroy: 2m14s

Some interesting observations. It seems that terraform was the fastest one of the three – at least in this case.

  1. The times are all over the place – and I cannot say one of the tools is faster than the other because the process is something that happens in the background and you have to wait for it complete. SO I am not sure how reliable the timings are.
  2. The code for the Ansible playbook is by far the largest – mainly because in order to tear everything down – it requires going through the deployed pieces and ripping them out – which requires a complete set of code.
  3. I decided to compare how much more code (you could compare increase in the amount of code to increased complexity) was added from the previous create step to this one

    Ansible: 14 –> 117 (~8x increase)
    CloudFormation: 24 –> 172 (~x7 Increase)
    Terraform: 7 –> 105 (~x15 increase)
  4. It is clear to me that allowing the provisioning tool to manage the dependencies on its own – is a lot simpler to handle – especially for large and complex environments.


This is by no means a recommendation to use one tool or the other - or to say that one tool is better than the other - just a simple side by side comparison between the three options that I have used in the past.

Thoughts and comments are always welcome, please feel free to leave them below.