2016-03-24

We Are All OpenStack! Are We Really????

Background

OpenStack has a vast community, globally distributed, spanning multiple time zones and all sorts and kinds. The community has a culture - a charter, a way of doing things. It is something that you have to learn how to get used to - because the lay of the land is not always as straight forward as you would expect, not what you are accustomed to, and sometimes it might see downright weird. But as the saying goes, "When in Rome, do as the Romans".

A thread on the OpenStack mailing list as of late week (actually two of them) caught my eye, and attention - so much that I feel the need to write this post.

Ever since I have started to get involved with OpenStack, I have noticed that there is a very obvious divide between the Developers (those who write and contribute code to OpenStack) and the User community (this is split into two parts - those who actually consume OpenStack - the end users, and those who deploy and maintain OpenStack - the operators). But before the thread - let me explain current situation in OpenStack for those who might not be acquainted with the way things work.

Anyone who wants can sign up as an OpenStack foundation member. All you need to do is to sign up on the site, fill in a form, accept an agreement - and you are a full fledged OpenStack foundation member.

The 'benefits' that come with this are not many - and the obligations are not that demanding either.

2.2 Individual Members.
(a) Individual Members must be natural persons. Individual Members may be any natural person who has an interest in the purpose of the Foundation and may be employed by Platinum Members or Gold Members.
(b) The application, admission, withdrawal and termination of persons as Individual Members are set forth in the membership policy attached as Appendix 1 (“Individual Member Policy”).

As an foundation member - you can run as a candidate for the Individual Member Director elections and of course you can vote in the above elections. You are invited to participate in the User surveys that are published on a regular basis. And of course you consider yourself a member of the OpenStack community. You are also eligible to run as a candidate for the Technical Committee elections (but you are not allowed to vote – see below).

Next come the Active Technical Contributors (ATC). These are the people that actually write the code (well at least that was the intention - I will explain later). The criteria for becoming an ATC are as follows:

(i) An Individual Member is an ATC who has had a contribution approved for inclusion in any of the official OpenStack projects during one of the two prior release cycles of the Core OpenStack Project (“Approved Contribution”). Such Individual Member shall remain an ATC for three hundred and sixty five days after the date of acceptance of such Approved Contribution.
(ii) An Individual Member who has made only other technical contributions to the OpenStack Core Project (such as bug triagers and technical documentation writers) can apply to the chair of the Technical Committee to become an ATC. The final approval of such application shall be approved by a vote of the Technical Committee. The term shall be for three hundred and sixty five days after the date of approval of the application by the Technical Committee.

This comes with obligations and benefits as well. The obligations are that you are expected to contribute code according to the guidelines and rules of the projects, code quality must be upheld, and etc. etc.

The benefits of being an ATC are as follows.

  • You get a free pass to the OpenStack summit ($600 value).
  • You are eligible to vote in the bi-annual Technical Committee Elections.
  • You are also eligible to vote for the Project Team Lead Elections - which occur every six months - providing you contributed to that specific project over the last cycle.
  • You are considered an integral member of the OpenStack community. Up until last summit - you were also granted access to the OpenStack Design summit - something that was closed for only ATC's (but this has changed since last summit - or two summits ago - I cannot remember).

There are two other groups who are partially recognized by the OpenStack community. There are people that are contributors to the community. They submit bugs, review patches, but do not actually submit code to the projects. I assume that the idea is that once they start getting into the code - then they actually will contribute code back into the projects in the long run and will become ATC's.

And then there are the end-users, and then the Operators. They are the people who are trying to deploy, upgrade and maintain what we all call OpenStack.

  • They are the ones who are trying to get things working in OpenStack.
  • They are the ones who have to provide creative ways to supply solutions to their clients - because OpenStack does not do what it was supposed to - does not do it well enough, does it really badly - or has not got there yet.
    Two clear examples (for me) would be - bad telemetry for all instances (which is now being re-written) or Load balancing that is not enterprise ready.
  • They participate in an Ops meetups (see the outcome from the last one in Manchester a month ago)
  • They contribute code - to the Ops repositories - which include scripts, tools, and much more.
    They participate in working groups - such as Enterprise Working group, Operator Tags, Product marketing - there are many. The technical (a.k.a. Developer) community see these participants as part of the user community. They do not see these contributions as part of the OpenStack product, they are deemed Upstream, or Downstream - but not part of the OpenStack products.

 

The Problem

The thread I am talking about - was titled "Recognizing Ops contributions". It has been voiced many a time (and not only by me - even though I have been very vocal about it) that Operators should be recognized as part of OpenStack. I have actually run for the Technical Committee elections on the basis of this stand - but I was not at all successful.

OpenStack works on the 4 Opens (and this is an official OpenStack resolution!)

I would like to bring your attention to the following points in that resolution

The community controls the design process. You can help make this software meet your needs.

The technical governance of the project is a community meritocracy with contributors electing technical leads and members of the Technical Committee.

The obvious question though is - what is considered a contributor?

If the four opens are to be upheld - then I would expect that the all contributors are equal.

Today that is not the case.

There have been many a time where I personally and many other operators as well have asked for things to change in OpenStack, have asked to be included in the community, have asked to provide feedback, and have been constantly told that everything should be done the way that developers have been doing it. Sometimes with success, sometimes with less, but I think that OpenStack has started on the right track - but still there is a whole lot of resistance to allow Operators a full and equal say in what happens in the development process.

When I see comments like this (from Thierry Carrez - a long time TC member), I come across what has been happening for many years already.

Yeah, we can't really overload ATC because this is defined and used in governance. I'd rather call all of us "contributors".
<..snip..>
Upstream contributors are represented by the Technical Committee and vote for it. Downstream contributors are represented by the User Committee and (imho) should vote for it.

Yes Thierry continues to say that all of the contributors should be equal - but keep the two separate - only people who contribute Upstream (which I understand as those who write code) should be part of the TC, and all the others - part of the User Committee.

What is this User Committee? Looking on the OpenStack site there is a clear definition of its mission.

As the number of production OpenStack deployments increase and more ecosystem partners add support for OpenStack clouds, it becomes increasingly important that the communities building services around OpenStack guide and influence the product evolution. The OpenStack User Committee mission is to:

  • Consolidate user requirements and present these to the management board and technical committee.
  • Provide guidance for the development teams where user feedback is requested.
  • Track OpenStack deployments and usage, helping to share user stories and experiences.
  • Work with the user groups worldwide to keep the OpenStack community vibrant and informed.

To me it seems that the user committee is a mantlepiece body which has absolutely no influence on what happens in OpenStack and was formed to say, "Hey we have a group of people that represent our users. They can't really do anything - but at least we can say they have representation."

And how do I know that the User committee - has no teeth? I looked at the bylaws (4.14).

4.14 User Committee. The User Committee shall be an advisory committee to the Board of Directors and shall be comprised of at least three (3) Individual Members, one (1) appointed by the Technical Committee, one (1) appointed by the Board of Directors, and one (1) appointed by the appointees of the Technical Committee and Board of Directors. The User Committee shall organize its meetings and may, on approval of the Board of Directors, create elected seats to be filled by a vote of the Individual Members. On request of the User Committee, the Board of Directors shall invite the User Committee to attend each regular quarterly meeting and shall allocate at least one (1) hour during the regular quarterly meeting following the annual election to hear the report and recommendations of the User Committee. On request of the User Committee, the Technical Committee shall invite the User Committee to attend a regular meeting and shall allocate at least one (1) hour during such meeting up to four times each calendar year to hear the report and recommendations of the User Committee.

The UC (User Committee) has to request to be invited to attend the BOD (Board of Directors) meeting. The UC has to request to attend the TC (Technical Committee) regular meetings and will be given at maximum 4 hours a year to give recommendations and report.

There is not a single word about what the UC actually can do besides participate in meeting and pass recommendations. Do these recommendations have to be accepted? Can they change anything? That is at the discretion of the TC and the Board.

What does the TC do - or what can they do? First the TC Charter:

The Technical Committee (“TC”) is tasked with providing the technical leadership for OpenStack as a whole (all official projects, as defined below). It enforces OpenStack ideals (Openness, Transparency, Commonality, Integration, Quality...), decides on issues affecting multiple projects, forms an ultimate appeals board for technical decisions, and generally has technical oversight over all of OpenStack.

Again back to the bylaws (4.13)

4.13 Technical Committee.
(a) The Technical Committee shall be selected as provided in the Technical Committee Member Policy in Appendix 4.
(b) (i) The Technical Committee shall have the authority to manage the OpenStack Project, including the authority to determine the scope of the OpenStack Technical Committee Approved Release subject to the procedures set forth below. No changes to the OpenStack Technical Committee Approved Release which deletes all or part of the then current Trademark Designated OpenStack Software. shall be approved by the Technical Committee without approval as provided in the Coordination Procedures. After such approval, the Secretary shall post such description to the Foundation’s website.

Do you notice the difference in tone? This is the committee that decides what happens within the OpenStack projects, who is added, who is not, who is removed, what directions are taken, etc. etc.

Sorry for jumping around, but back to the thread.

Right, this brings up the other important point I meant to make. The purpose of the "ATC" designation is to figure out who gets to vote for the Technical Committee, as a form of self-governance. That's all, but it's very important (in my opinion, far, far, far more important than some look-at-me status on a conference badge or a hand-out on free admission to an event). Granting votes for the upstream technical governing body to people who aren't involved directly in upstream technology decisions makes little sense, or at least causes it to cease being self-governance (as much as letting all of OpenStack's software developers decide who should run the User Committee would make it no longer well represent downstream users).

Again - I read this is as - let the developers write their code and keep Operators far away from having anything to do with how OpenStack projects are managed. Because they have no business here.

The Solution?

Which led me to write this answer. I would like to re-iterate what I said on that thread.

Operator contributions to OpenStack are no less important or no less equal than that of anyone writing code or translating UI's or writing documentation.
By saying that someone who contributes to OpenStack - but doing so by not writing code are not entitled to any technical say in what directions OpenStack should pursue or how OpenStack should be governed, is IMHO a weird (to put it nicely) perception of equality.

So I see two options.

Ops Contributors are considered Active Technical Contributors - just the same as anyone writing code - or fixing a spelling mistake in documentation (and yes submitting a patch to correct a typo in a document - does give you ATC status). Their contributions are just as important to the success of the community as anyone else.
or
Give Ops contributors a different status (whatever the name may be) - and change the governance laws to allow these people with this status a voting right in the Technical committee. They have as much right as any other contributor to cast their opinion on how the TC should govern and what direction it should choose.

By alienating Operators (and yes it is harsh word - but that is the feeling that many Operators - me included - have at the moment) from having a say in - how OpenStack should run, what release cycles should be - what the other side of the fence is experiencing each and every day due to problems in OpenStack's past and possible potential trouble with the future - reminds me of a time in the not so far back history where not all men/women were equal.

Where some were allowed to vote, and others not - they were told that others could decide for them - because those others knew what was best.

OpenStack's 12th release - Mitaka - is coming up very soon. That means that OpenStack will soon be 6 years old.
I think it is time for OpenStack - including the developer community - to accept that they are no longer alone in this - there is a whole lot of information, knowledge and experience that can be reaped from all those who are actually using the products that the community produces and it is now time to accepts all contributors - in whatever way they may choose to do so - as equal members - in every way.

Yes there may be disadvantages, there also could be many advantages as well. But it time to stop treating everyone that does not write code as a "second degree citizen". Because at the moment OpenStack Technical community flaunt that everyone is part of OpenStack - but de-facto - they are most certainly not.

There will be a new working group formed (Non-ATC Recognition Working Group) in the very near future. I plan to be a very active member of this group - with my personal end goal of finally getting all contributors to the OpenStack community - and not only those who actually contribute code - as equals - with equal say - in all aspects of OpenStack - yes - including Technical leadership and decisions.

(I consider myself an Operator - my view could be biased)

2016-02-01

There is no Root Cause, Only Contributing Factors

I participated a week or two ago in the DevOpsJRS meetup in Cisco Jerusalem.  Our guest  speaker was Avishai Ish-Shalom. I always enjoy Avishai's talks, he is a great speaker, a down to earth guy, and I have had the opportunity and pleasure to work with him several times in the past.

One of the slides that he posted included the following:

I am currently involved in an Scrum product team, where we (try and) do retrospectives after each sprint.

For those of you who are not familiar with the Agile methodologies, a short overview and my view on the process.

Making long term plans is quite difficult, and sometimes even impossible in our ever changing world. Things are moving so fast, at such a pace, ever changing.  Scrum groups work in sprints. A sprint is a short burst of work, which can be defined by the team, but usually we are talking about 1-2 week bursts.

The team plans the work for each sprint and concentrates only those tasks at hand for that specific sprint. They produce in small increments but continously produce something that adds value.

After the sprint there is a retrospective. The team looks at what went well, what was bad, and how to improve. There is a huge amount of trust needed within in the team in order for this to be productive, and one of the things that are very important is that these are conducted in a blameless manner.

The point of such an exercise is to learn and to improve and not to point fingers.

Back to the root cause. In my previous IT positions whenever there was an outage, we did a root cause analysis to see what caused the problem. We always wanted to pinpoint that one thing that caused the problem.

2310295343_462278ae01_z

I completely agree with what Avishai said. There is no such a thing as a Root cause, there are only contributing factors. But this seems to be completely against what you might know and have been accustomed to.

Let try and demonstrate with an example.

A critical application stopped responding.
The outage caused downtime for 1 hour in your organization.

In a regular post mortem and root cause analysis, you would have gone through the motions until you think that you found was that the reason the app went down for a hour.

Why did go down for an hour?
Because the host it was running on was disconnected from the network.

Why was it disconnected?
Because John disconnected the wrong cable when working in the datacenter.

There we found the root cause. It was John's fault.

If we are looking only for a root cause, that would be it.
But remember, there is no root cause, only contributing factors.

Digging down a little deeper will uncover a lot more.

Why did John disconnect the wrong cable?
Because he was already at work for more than 24 hours fighting fires and running from crisis to crisis.
He was tired (contributing factor).
And the cables were not marked correctly. (another factor)

So it was not John's fault. There were contributing factors.

The idea of this exercise is to improve and to understand the possible things that we can learn from this event so that it does not occur again.

Possible answers could be:

Make sure that all cables are marked clearly. It would have helped here.
John was tired, over worked. Why? Because he had too much on his plate, he was overloaded.
Perhaps increase automated processes that will free up more time for John and the team.
Invest in more staff, better equipment, additional training so that John would have a better balance and have time to invest in improvement.

We must embrace outages, because they are they best learning opportunities, and the best way to improve.

I would highly recommend using this method in your next retrospective or post-mortem. I can guarantee you, that this will improve your team, yourself and the way you work.