2018-07-09

Comparing CloudFormation, Terraform and Ansible - Part #2

The feedback I received from the first comparison was great – thank you all.

Obviously the example I used was not really something that you would use in the real world – because no-one actually creates a only a VPC – and does not create anything inside it, that is pretty futile.

So let’s go to the next example.

The scenario is to create a VPC, with a public presences and a private presence. This will be deployed across two availability zones. Public subnets should be able to route to the internet through an Internet Gateway, private subnets should be able to access the internet through a NAT Gateway.

This is slightly more complicated than just creating a simple VPC with a one-liner

So to summarize - the end state I expect to have is:

  • 1x VPC (192.168.90.0/24)
  • 4x Subnets
    • 2x Public
      • 192.168.90.0/26 (AZ1)
      • 192.168.90.64/26 (AZ2)
    • 2x Private
      • 192.168.90.128/26 (AZ1)
      • 192.168.90.192/26 (AZ2)
  • 1x Internet Gateway
  • 2x NAT Gateway (I really could do with one – but since the subnets and resources are supposed to be deployed in more than a single AZ – there will be two – and here I minimize the risk impact of loss of service if a single AZ fails)
  • 1x Public Route Table
  • 2x Private Route Table (1 for each AZ)

And all of these should have simple tags to identify them.

(The code for all of these scenarios is located here  https://github.com/maishsk/automation-standoff/tree/master/intermediate)

First lets have a look at CloudFormation


So this is a bit more complicated than the previous example. I still used the native resources in CloudFormation, and set defaults for the my parameters. You will see some built in functions that are available in CloudFormation – namely !Ref which is a reference function to lookup a value that has previously been created/defined in the template and !Sub that will substitute a value in the template with an environment variable.

So there are a few nifty things that are going here.

  1. You do not have remember resource names – CloudFormation keeps all the references in check and allows you to address them by name in other places in the template.
  2. CloudFormation manages the order in which the resources are created and takes of care of all of that for – and it will take care of the order what resources are created.

    For example – the route table for the private subnets will only be created after the NAT gateways have been created.
  3. More importantly – when you tear everything down – then CloudFormation takes care of the ordering for you, i.e. you cannot tear down a VPC – while the NAT gateways and Internet gateway are still there – so you need to delete those first and then you can go ahead and rip the everything else up.


Lets look at Ansible. There are built-in modules for this ec2_vpc_net, ec2_vpc_subnet, ec2_vpc_igw, ec2_vpc_nat_gateway, ec2_vpc_route_table.


As you can see this is bit more complicated than the previous example – because the subnets have to be assigned to the correct availability zones.

There are are a few extra variables that needed to be defined in order for this to work.


Last but not least – Terraform.

And a new set of variables



First Score - # lines of Code (Including all nested files)

Terraform – 164

CloudFormation - 172

Ansible – 204

(Interesting to see here how the order has changed)

Second Score - Easy of deployment / teardown.

I will not give a numerical score here - just to mention a basic difference between the three options.

Each of the tools use a  simple command line syntax to deploy

  1. CloudFormation

    aws cloudformation create-stack --stack-name testvpc --template-body file://vpc_cloudformation_template.yml

  2. Ansible

    ansible-playbook create-vpc.yml

  3. Terraform

    terraform apply -auto-approve

The teardown is a bit different

  1. CloudFormation stores the information as a stack - and all you need to do to remove the stack and all of its resources is to run a simple command of:

    aws cloudformation delete-stack --stack-name <STACKNAME>

  2. Ansible - you will need to create an additional playbook for tearing down the environment - it does not store the state locally. This is a significant drawback – you have to make sure that you have the order correct – otherwise the teardown will fail. this means you need to understand as well how exactly the resources are created.

    ansible-playbook remove-vpc.yml

  3. Terraform - stores the state of the deployment - so a simple run will destroy all the resources

    terraform destroy -auto-approve

You will see below that the duration of the runs are much longer than the previous example – the main reason being that the amount of time it takes to create a NAT gateway is long – really long (at least 1 minute per NAT GW) because AWS does a lot of grunt work in the background to provision this “magical” resource for you.

You can find the full output here of the runs below:

Results

Terraform
create: 2m33s
destroy: 1m24s

Ansible:
create: 3m56s
destroy: 2m12s

CloudFormation:
create: 3m26s
destroy: 2m14s

Some interesting observations. It seems that terraform was the fastest one of the three – at least in this case.

  1. The times are all over the place – and I cannot say one of the tools is faster than the other because the process is something that happens in the background and you have to wait for it complete. SO I am not sure how reliable the timings are.
  2. The code for the Ansible playbook is by far the largest – mainly because in order to tear everything down – it requires going through the deployed pieces and ripping them out – which requires a complete set of code.
  3. I decided to compare how much more code (you could compare increase in the amount of code to increased complexity) was added from the previous create step to this one

    Ansible: 14 –> 117 (~8x increase)
    CloudFormation: 24 –> 172 (~x7 Increase)
    Terraform: 7 –> 105 (~x15 increase)
  4. It is clear to me that allowing the provisioning tool to manage the dependencies on its own – is a lot simpler to handle – especially for large and complex environments.


This is by no means a recommendation to use one tool or the other - or to say that one tool is better than the other - just a simple side by side comparison between the three options that I have used in the past.

Thoughts and comments are always welcome, please feel free to leave them below.

2018-07-05

Getting Hit by a Boat - Defensive Design

In a group discussion last week – I heard a story (I could not find the origin – if you know where it comes from – please let me know) – which I would like to share with you.
John was floating out in the ocean, on his back, with his shades, just enjoying the sun, the quiet, the time to himself, not a care in the world.
When all of a sudden he got bumped on the head (not hard enough to cause any serious damage) with a small rowing boat.
John was pissed…. All sorts of thoughts running through his head.
  • Who gave the driver their license?
  • Why are they not more careful?
  • I could have been killed?
  • Why are they sailing out here – this is not even a place for boats.
And with all that anger and emotion he pulled himself over the side of the boat, ready to give the owner/driver one hell of a mouthful.
When he pulls himself over the side, he sees an empty boat. No–one there, no-one to scream at.
And at that moment all the anger and rage that was building up inside – slowly went away.
We encounter things every day – many of them we think are directly aimed at us – deliberately or not – but we immediately become all defensive, build up a bias against the other and are ready to go ballistic. Until we understand that there is no-one to direct all this emotion and energy at.
And then we understand that sometimes thing just happen, things beyond our control and we cannot or should not put our fate into some else’s hands.
That was the original story – which I really can relate to.
14221418411_385101705b_z
(Source: Flickr – Steenaire)
But before I heard the last part of the story – my mind took this to a totally different place – which is (of course) architecture related.
John was enjoying a great day in the sun – and all of a sudden he got hit in the head by a boat.
Where did that boat come from?
No-one knows.. I assume the owner had tied it up properly on the dock.
  • Maybe the rope was cut.
  • Maybe someone stole it and dumped it when they were done.
  • Maybe there was a storm that set the boat loose.
  • Or maybe there was a bloopers company that was following the boat all along to see who would get hit in the head.
There are endless options as to how the boat got there. But they all have something in common. The boat was never supposed to end up hitting John in the head.. John expected to be able to bake nicely in the sun and not be hit in the head by a boat
But what if John had taken additional precautionary measures?
  • Set up a fence / guardrail around where he was floating
  • Put someone as a lookout to warn him about floating boats
  • Have a drone above his head hooked into a heads-up-display in his sunglasses that he can see what is going around him
There are endless possibilities and you can really let your imagination take you to where you want to go as to how John could have prevented this accident.
What does this have to do with Defensive Design?
When we design an application – we think that we are going to be ok – because we expect to be able to do what we want to do without interference.
For example.
My web server is suppose to serve web requests of a certain type. I did not plan for someone crafting a specific request that would crash my server or bombarding the webserver with such an influx of traffic that would bring the application to its knees.
But then something unexpected happens.
When you design your application you will never be able to predict every possibility of attack or some esoteric ways  people are going to use your software. There is always something new that comes up – or someone thinks of a different way to use your idea that you did not even think of.
What you can do, is put some basic guardrails into your software that will protect you from what you do know or think can happen.
  • Throttling the number of connections or request – to prevent DDOS attacks.
  • Introducing circuit breakers to prevent cascading failures
  • Open only specific ports / sockets
  • Sufficient authentication to verify that you should be doing what you are supposed to
  • Monitoring for weird or suspicious behavior.
Again the options are practically endless. And you will not think of it all. You should address the issues as they happen, iterate, rinse, repeat.
That was a 4 minute read into thing that I think about during the day.
What kind of things do you think about when during your daily work? I would be interested in hearing. Please feel free to leave comments down below.