Replacing the AWS ELB - Final Thoughts

This is the last part in the Replacing the AWS ELB series.
  1. Replacing the AWS ELB - The Problem
  2. Replacing the AWS ELB - The Challenges 
    1. Replacing the AWS ELB - The Design
    2. Replacing the AWS ELB - The Network Deep Dive
    3. Replacing the AWS ELB - Automation
    4. Replacing the AWS ELB - Final Thoughts (this post)

    If you haven't already read the previous posts in the series - please take the time to go through them.

    So here are some additional thoughts and ideas about the whole journey.

    First and foremost - none of this would have been possible without group effort of the team that worked on this.
    Udi, Mark, and Mike - thank you all for your input, help and hard work that went into this.

    Was it all worth it?

    Yes, yes and hell yes!! The cost of having to refactor applications to work with the way that the AWS ELB works - was not financially viable and would take far to long . There was no way we could make our delivery dates and have all the applications modify the way they worked.

    So not only was it worth it - it was a necessity, without this - the project was a non-starter.

    What was the hardest part of the solution?

    Definitely the automation. We had the solution white-boarded out after a an hour or two, brought up a PoC within another hour or two.

    As I said somewhere else in the post - if this was a one-off then it would not have been worth while - but we needed about 10 pairs of haproxy instances in each deployment - and there were 10- 15 deployments - so manual was not going to work here. There was a learning curve that we needed to get over and that took some time.

    This can't be all you were doing with haproxy..

    Of course not.. The configurations in the examples are really basic and simple. The actual haproxy.cfg was a lot more complicated and was generated on the fly using Consul and consul-template. This allows for some very interesting and wonderful things that can be accomplished. The instances were what could be considered as pets, because they were hardly re-provisioned, but the configuration was constantly changing based on the environment.

    So did you save money?

    No! This was more expensive than provisioning an ELB from AWS. The constraints dictated that this was the chosen solution - not cost. Well in a way this was wasted resources, because there are instances that are sitting idle most of the time - without actually doing anything. The master-slave model is not a cost effective solution because you are spending money to address a scenario when (and if)  you lose a node.

    Does this scale? How?

    We played around with this a bit and also created a prototype that provisioned an auto scaling group with that would work active-active-active with multiple haproxy's - but this required some changes in the way we did our service discovery. This happened a good number of months after we went live - as part of the optimization stage.  Ideally - this would have been the way we would have chosen if we could do it over again.

    For this example the only way to scale is to scale up the instances sizes - not to scale out.

    So to answer the question above - in the published form - no it does not.

    Any additional benefits to rolling your own solution?

    This could be ported to any and every cloud - or deployment you would like. All you need to do it change the modules and the parts that interact directly with AWS with the cloud of your choice - and it would probably work. It is not a simple rip and replace - but the method would work - just would take a bit of extra time and coding.

    What about external facing load balancers - will this work?

    Yes, all you will need to do is replace the routes - with an elastic IP, and have the keepalived script switch the EIP from one instance to another. I should really post about that as well.

    So why did you not use an EIP in the first place?

    Because the this was internal traffic. If I was to use an external facing load balancer, the traffic would essentially go out to the internet and come back in - for two instances that were in the same subnet in the same AZ. This does not make sense neither from a financial nor a security perspective. 

    Can I contact you if I have any specific questions on the implementation?

    Please feel free to do so. You can either leave a comment on any of the posts in the series, ping me on Twitter (@maishsk), or use the contact me on the top.