2018-07-09

Comparing CloudFormation, Terraform and Ansible - Part #2

The feedback I received from the first comparison was great – thank you all.

Obviously the example I used was not really something that you would use in the real world – because no-one actually creates a only a VPC – and does not create anything inside it, that is pretty futile.

So let’s go to the next example.

The scenario is to create a VPC, with a public presences and a private presence. This will be deployed across two availability zones. Public subnets should be able to route to the internet through an Internet Gateway, private subnets should be able to access the internet through a NAT Gateway.

This is slightly more complicated than just creating a simple VPC with a one-liner

So to summarize - the end state I expect to have is:

  • 1x VPC (192.168.90.0/24)
  • 4x Subnets
    • 2x Public
      • 192.168.90.0/26 (AZ1)
      • 192.168.90.64/26 (AZ2)
    • 2x Private
      • 192.168.90.128/26 (AZ1)
      • 192.168.90.192/26 (AZ2)
  • 1x Internet Gateway
  • 2x NAT Gateway (I really could do with one – but since the subnets and resources are supposed to be deployed in more than a single AZ – there will be two – and here I minimize the risk impact of loss of service if a single AZ fails)
  • 1x Public Route Table
  • 2x Private Route Table (1 for each AZ)

And all of these should have simple tags to identify them.

(The code for all of these scenarios is located here  https://github.com/maishsk/automation-standoff/tree/master/intermediate)

First lets have a look at CloudFormation


So this is a bit more complicated than the previous example. I still used the native resources in CloudFormation, and set defaults for the my parameters. You will see some built in functions that are available in CloudFormation – namely !Ref which is a reference function to lookup a value that has previously been created/defined in the template and !Sub that will substitute a value in the template with an environment variable.

So there are a few nifty things that are going here.

  1. You do not have remember resource names – CloudFormation keeps all the references in check and allows you to address them by name in other places in the template.
  2. CloudFormation manages the order in which the resources are created and takes of care of all of that for – and it will take care of the order what resources are created.

    For example – the route table for the private subnets will only be created after the NAT gateways have been created.
  3. More importantly – when you tear everything down – then CloudFormation takes care of the ordering for you, i.e. you cannot tear down a VPC – while the NAT gateways and Internet gateway are still there – so you need to delete those first and then you can go ahead and rip the everything else up.


Lets look at Ansible. There are built-in modules for this ec2_vpc_net, ec2_vpc_subnet, ec2_vpc_igw, ec2_vpc_nat_gateway, ec2_vpc_route_table.


As you can see this is bit more complicated than the previous example – because the subnets have to be assigned to the correct availability zones.

There are are a few extra variables that needed to be defined in order for this to work.


Last but not least – Terraform.

And a new set of variables



First Score - # lines of Code (Including all nested files)

Terraform – 164

CloudFormation - 172

Ansible – 204

(Interesting to see here how the order has changed)

Second Score - Easy of deployment / teardown.

I will not give a numerical score here - just to mention a basic difference between the three options.

Each of the tools use a  simple command line syntax to deploy

  1. CloudFormation

    aws cloudformation create-stack --stack-name testvpc --template-body file://vpc_cloudformation_template.yml

  2. Ansible

    ansible-playbook create-vpc.yml

  3. Terraform

    terraform apply -auto-approve

The teardown is a bit different

  1. CloudFormation stores the information as a stack - and all you need to do to remove the stack and all of its resources is to run a simple command of:

    aws cloudformation delete-stack --stack-name <STACKNAME>

  2. Ansible - you will need to create an additional playbook for tearing down the environment - it does not store the state locally. This is a significant drawback – you have to make sure that you have the order correct – otherwise the teardown will fail. this means you need to understand as well how exactly the resources are created.

    ansible-playbook remove-vpc.yml

  3. Terraform - stores the state of the deployment - so a simple run will destroy all the resources

    terraform destroy -auto-approve

You will see below that the duration of the runs are much longer than the previous example – the main reason being that the amount of time it takes to create a NAT gateway is long – really long (at least 1 minute per NAT GW) because AWS does a lot of grunt work in the background to provision this “magical” resource for you.

You can find the full output here of the runs below:

Results

Terraform
create: 2m33s
destroy: 1m24s

Ansible:
create: 3m56s
destroy: 2m12s

CloudFormation:
create: 3m26s
destroy: 2m14s

Some interesting observations. It seems that terraform was the fastest one of the three – at least in this case.

  1. The times are all over the place – and I cannot say one of the tools is faster than the other because the process is something that happens in the background and you have to wait for it complete. SO I am not sure how reliable the timings are.
  2. The code for the Ansible playbook is by far the largest – mainly because in order to tear everything down – it requires going through the deployed pieces and ripping them out – which requires a complete set of code.
  3. I decided to compare how much more code (you could compare increase in the amount of code to increased complexity) was added from the previous create step to this one

    Ansible: 14 –> 117 (~8x increase)
    CloudFormation: 24 –> 172 (~x7 Increase)
    Terraform: 7 –> 105 (~x15 increase)
  4. It is clear to me that allowing the provisioning tool to manage the dependencies on its own – is a lot simpler to handle – especially for large and complex environments.


This is by no means a recommendation to use one tool or the other - or to say that one tool is better than the other - just a simple side by side comparison between the three options that I have used in the past.

Thoughts and comments are always welcome, please feel free to leave them below.

2018-07-05

Getting Hit by a Boat - Defensive Design

In a group discussion last week – I heard a story (I could not find the origin – if you know where it comes from – please let me know) – which I would like to share with you.
John was floating out in the ocean, on his back, with his shades, just enjoying the sun, the quiet, the time to himself, not a care in the world.
When all of a sudden he got bumped on the head (not hard enough to cause any serious damage) with a small rowing boat.
John was pissed…. All sorts of thoughts running through his head.
  • Who gave the driver their license?
  • Why are they not more careful?
  • I could have been killed?
  • Why are they sailing out here – this is not even a place for boats.
And with all that anger and emotion he pulled himself over the side of the boat, ready to give the owner/driver one hell of a mouthful.
When he pulls himself over the side, he sees an empty boat. No–one there, no-one to scream at.
And at that moment all the anger and rage that was building up inside – slowly went away.
We encounter things every day – many of them we think are directly aimed at us – deliberately or not – but we immediately become all defensive, build up a bias against the other and are ready to go ballistic. Until we understand that there is no-one to direct all this emotion and energy at.
And then we understand that sometimes thing just happen, things beyond our control and we cannot or should not put our fate into some else’s hands.
That was the original story – which I really can relate to.
14221418411_385101705b_z
(Source: Flickr – Steenaire)
But before I heard the last part of the story – my mind took this to a totally different place – which is (of course) architecture related.
John was enjoying a great day in the sun – and all of a sudden he got hit in the head by a boat.
Where did that boat come from?
No-one knows.. I assume the owner had tied it up properly on the dock.
  • Maybe the rope was cut.
  • Maybe someone stole it and dumped it when they were done.
  • Maybe there was a storm that set the boat loose.
  • Or maybe there was a bloopers company that was following the boat all along to see who would get hit in the head.
There are endless options as to how the boat got there. But they all have something in common. The boat was never supposed to end up hitting John in the head.. John expected to be able to bake nicely in the sun and not be hit in the head by a boat
But what if John had taken additional precautionary measures?
  • Set up a fence / guardrail around where he was floating
  • Put someone as a lookout to warn him about floating boats
  • Have a drone above his head hooked into a heads-up-display in his sunglasses that he can see what is going around him
There are endless possibilities and you can really let your imagination take you to where you want to go as to how John could have prevented this accident.
What does this have to do with Defensive Design?
When we design an application – we think that we are going to be ok – because we expect to be able to do what we want to do without interference.
For example.
My web server is suppose to serve web requests of a certain type. I did not plan for someone crafting a specific request that would crash my server or bombarding the webserver with such an influx of traffic that would bring the application to its knees.
But then something unexpected happens.
When you design your application you will never be able to predict every possibility of attack or some esoteric ways  people are going to use your software. There is always something new that comes up – or someone thinks of a different way to use your idea that you did not even think of.
What you can do, is put some basic guardrails into your software that will protect you from what you do know or think can happen.
  • Throttling the number of connections or request – to prevent DDOS attacks.
  • Introducing circuit breakers to prevent cascading failures
  • Open only specific ports / sockets
  • Sufficient authentication to verify that you should be doing what you are supposed to
  • Monitoring for weird or suspicious behavior.
Again the options are practically endless. And you will not think of it all. You should address the issues as they happen, iterate, rinse, repeat.
That was a 4 minute read into thing that I think about during the day.
What kind of things do you think about when during your daily work? I would be interested in hearing. Please feel free to leave comments down below.

2018-07-03

Encounters in the Cloud - Interview

This is a translation of an interview I gave to IsraelClouds (a meet the architect session).

Hello, my name is Maish Saidel-Keesing. I am a Cloud and DevOps architect at CyberArk in Petach Tikva. I have over 19 years experience in the compute industry. In the past I was a system administrator, managing Active Directory, Exchange and Windows servers. I have a lot of past experience with VMware systems - I wrote the first version of VMware vSphere Design and I have extensive knowledge of OpenStack (where I also participated in the OpenStack Architecture Design Guide). In recent years I have been working in the public cloud area (AWS and Azure) and I am also in the process of writing another book called “The Cloud Walkabout”, which was written following my experience with AWS.

What was the catalyst that made you interested in cloud computing?

My interest in technology has been ingrained in me since I was a child. I am always interested in trying new things all the time, and the cloud was for me a tool that enabled me as an IT infrastructure professional to push the organization to run faster and bring value to the entire company. 

The pace at which the company wanted to run with the standard ("old fashioned") tools was not fast enough and we headed toward the cloud (private and public) to help us meet our goals.

What difficulties did you encounter when you wanted to learn about cloud computing?

First of all organizational buy-in. At first I encountered difficulties when I tried to explain to upper management why the cloud is important to the organization, it was not obvious and required a lot of persuasion and date to back up the statements.

Second, the level of local education(courses, lecturers) was not very high at the time, which required a lot of hours of self-study and practical experience to learn one topic or another. I have never done a frontal course here in Israel - only self-study at my own pace, including 5 AWS certifications and additional certifications.

What do you predict for the future in cloud computing vs. on-prem?

I predict that the day is near where the number of Workloads run on-prem will be minimal, and the vast majority or our software will run in the public cloud. There will always be some applications that are not financially viable to be moved to the cloud or because of security restrictions cannot live in the cloud, so we will have to live in a hybrid world for many years. The world has become a cloud world, but the distance is so long that we can seamlessly transfer our applications between cloud and cloud.

‍Did the solutions you were looking for have alternatives among the various public cloud providers you worked with? If so, what were the considerations in choosing the supplier? What support did you receive from the cloud provider?

Similar to the situation today, the market leader was AWS. However, Google Cloud and Microsoft Azure have narrowed a huge gap in recent years. When I started the journey to the cloud, I worked only with AWS - they helped us with both individual and technical advice - about the existing ways to move the applications to the cloud, optimization and improvement, and in addition to the business aspect of streamlining and reducing costs. 

What are the conclusions after your transition to the cloud compared to on-premises?

It is clear to me that it is impossible to compete with a public cloud. The many possibilities offered by each of the cloud providers are a difference of heaven and earth compared to the capabilities in your datacenter. Building services that can be consumed from the cloud by simply calling a rich API can take thousands of hours, and even then, as private organizations, we can not keep up with the tremendous pace of development in the industry.

In the public cloud we have no "limitations" of resources. (Of course there are certain limitations - but no organization has a chance to match the scale that cloud providers are working with)

How does an organization know that it is ready for the cloud? What are the important points (time, staff) that will indicate readiness?

When you start to see that business processes are moving too slowly, competitors are faster than you and take too much time to respond to business requests. If the services you are asked to provide are not within your grasp, or to meet the load and demands - it will take you months or years. In these situations you need to recognize that you have reached your limit, and now it is time to ask for the "help of the cloud" and move on to the next stage in the growth of the company.

It is important to switch to the approach of becoming an enabler, a partner of the organization and accompany the business on the journey. Work together and think about how to advance the organizational agenda and not be the obstacle. If you insist on being gatekeepers and the "no, you cannot" person, you find yourself irrelevant to your organization and customers.

Finally, is there something important that you would like to see happening (meetings, studies or anything else) in the cloud computing community and / or the cloud computing portal?

Today, cloud vendors are trying to sell a story of success in every customer transition to the cloud. It is clear to me that this is not always a reflection of reality. For every success story I am sure there is at least the same amount - and even more - of failures.

I would like all of us to learn from these failures - they are an invaluable resource and can serve us in our attempts to go through the same process. I wish we were sharing many more of these stories. It's true that it's embarrassing, it's true that it does not bring the glamor and glory of a successful story - but it is very important that we learn from each other's mistakes.‍

Thank you for coming to Israel's cloud computing portal, we were very happy to host you.