I feel a bit of a rant coming on. This is about software developer mindsets, but it's also about "processes", which can be a bit of a scary term to many developers.
I got quite lucky in my career, in two particular ways.
First, in my first job I struck up a pretty good relationship with our CTO. He and I had quite a few conversations about the why and wherefore of what we were doing, from a business rather than a development perspective. That has left me with an interest in the big picture.
Second, I got sucked pretty much from the get-go into ops. We called it system administration then.
For $reasons, I used to be one of the few developers who got root access to production machines in a bunch of companies.
This has helped me get a fairly good understanding of what operations require from developers. It's quite different from what developers require from each other.
In some ways, it's the exact opposite.
The way these two things fit together is through the nebulous concept of "processes", which most people seem to immediately associate with some ISO standard and certifications, which is followed by endless amount of documentation, which must lead to the conclusion that processes must stand in the way of developers doing what they should be doing, which is programming.
I used to be like that.
So what does operations require from developers?
Developers often think of that in terms of "make things easy for ops", such as giving them a single binary to run instead of having them wade through configuration files. And to a degree, that's a fair enough goal to have for devs.
But the truth is, that ops tend to have some coding ability, and there are a ton of configuration management tools out there and in active use.
Complex setups that can be scripted aren't all that complex.
In fact, what operations require *more* from software is adaptability. The way devs tend to run software is usually *not* the way ops need to run it. Simplicity can be a bit of an own goal.
Or, to put it differently, while ops are end-users, they're not to be confused with grandpa trying to figure out this new-fangled electronic mail thingimajig.
What ops require above anything else from devs is boring.
No, really, that's it. Software should be boring.
That is, it should be predictable.
Configuration file formats changing, interfaces changing, setup procedures changing, requirements changing - all of these things shouldn't happen.
Of course they will, and they will have to. That's where scripting helps ops out again. But that means that any change in what ops can reasonably expect must be documented and the documentation communicated in a manner that ops can deal with.
You cannot communicate this the day before a new release should go live.
Which brings me back to processes.
A process is nothing more than putting this communication in place.
The way this manifests is most typically through checklists. Follow steps a-z in your release from dev to ops.
But what these checklists *encapsulate* is the organisation's understanding of the needs of ops from devs (and vice versa for e.g. bug reporting procedures, etc.)
QA is usually in the middle of this process, adding the required check mark on one of the boxes on the list.
Every organisation has a process for this. Every single one of them. Even if it's "oh I just upload that JS file to the production server", that's a process.
It's a shitty one, but it is one.
What is often missing are the checklists. And some structured discussions around how to get to checklists. All of which is shorthand for communication.
The discussions tend to lead to more or less the same place: you either need a gate, or you need recovery.
A gate is a kind of barrier in the process that is somewhat cumbersome to pass, but if you have the right key, passing isn't all that hard.
QA is such a gate. If you cannot pass QA's requirements - which is usually in the form of regression tests and new feature tests - then you cannot go to production.
Recovery refers to the set of procedures you have for rolling back to the last known state in case of a production failure.
Recovery tends to be good in orgs with weak gates and vice versa.
I'm a bit traditionally minded here, and personally prefer my gates to be strong. The industry as a whole tends to prefer strong recovery. There is almost always a mixture of the two employed.
In the end, it barely matters. What matters is that the business goals are met reliably, and the delivery team is not unreasonably stressed out by their combined efforts.
What the whole thing requires regardless of the balance struck here is that developers take some ownership.
This isn't to say that developers "own" that everything goes smoothly. That would be an exaggeration.
But they cannot reasonably have the mindset that throwing code "over the wall" is their job. Their job is to provide to the next down the line - QA, ops, both - exactly what those parties require to do *their* respective jobs.
"Worked on my computer" is a symptom of this being broken, same as "no errors in the CI" or whatever form it takes.
I sometimes exaggerate this and say "your job is only done when the end user - aka grandpa - successfully uses it to solve their problems". This is more for illustration purposes than as a real assignment of responsibilities. Exaggeration is a it double-edged, people can take it too seriously. But often enough it carries a general point across fairly well.
FOSS developers have a particularly hard time here. They are not part of an overall organization that provides the kind of feedback they need to optimize their output.
What I mean is, even if FOSS developers work towards their company internal goals and just publish software, the "organization" never includes all the users they end up factually delivering to, and communications with those users often relies on the users initiating it.
I don't know who uses my FOSS software, but they sure do.
As a FOSS developer, you have a few methods at your disposal to help you along here.
One is *prolific* documentation. But it doesn't just have to cover a lot of ground, it has to have entry points tailored to needs you may reasonably expect. The ops perspective is one such entry point.
Another is to double down on predictability. Infamously, Debian lags behind on up-to-date software and has slow "stable" release cycles. But it does just that, and it has generally earned them love from ops.
A third method is actually quite difficult to balance, but is still crucial: you have to bring the barriers to contribution way, way down.
Where this is often difficult is in bug reporting software. I see a lot of popular software using bots to classify bug reports according to developer needs, sometimes closing them automatically when the bot deems the report to be a duplicate or whatever.
There is limited developer bandwidth for dealing with reports, so they need screening of this kind.
Unfortunately, there is a negative result here for would-be contributors.
If you run into such bots, or excessive formalism in the report templates, or contributor agreements they need to sign before sending a patch, etc, etc, the most likely result is that people end up contributing less overall.
I can deal with all of that if it solves a problem I can't circumvent. But if it's easier to switch software than to contribute, well...
Lowering barriers may be better long-term.
If nothing else it signals "yes I want your contribution" even if one cannot deal with it immediately, as opposed to sending the signal that contribution is only appreciated conditionally.
This feeds into community management.
I believe - I have no data, sorry - that FOSS projects benefit strongly from human community managers that take care of this screening. And community management can in itself be a form of contribution.
See I started this thread on developers and ops interfacing, but...
... this goes right back to that beginning: FOSS developers that do not visible invest effort into interfacing with the "next" down the chain, which in this case are random people picking your software up, do not do their job.
They throw code over the wall. They say "it works on my computer", or "in my use-case".
I sometimes hear - heard the other day - that it's the developer's spare time, so one can't make such demands on them.
There's some truth there, absolutely.
So let's say I volunteer for the local fire brigade. We have a lot of volunteer fire response in Germany in smaller towns, which is supplemented by professionals from neighbouring larger towns. Response time often trumps equipment and experience.
Let's say I volunteer, but I don't respond to alarms. Or I do, but I always turn up late.
On the one hand, I'm volunteering my time, so one cannot make demands of me, right?
In practice, there will always be reasons for not always being able to meet responsibilities, and users should be adequately accepting of those. Practice is always full of complications, on a case-by-case basis.
It's the mindset that matters.
@jens in my previous job, I used to have sort of broken up the path to a form of devops, and was responsible for maintaining and educating the product devs on the build and deployment system, and the packaging for customer deployments.
Needless to say I had more than a few words with that one team leader who privately prided himself for not abiding with the process, and being a rockstar "break stuff" type of diva dev (he also was ingraining that mentality into his team mates), including that time where I had to educate him publicly on the all-devs mailing list with how to properly use the version control system so as not to hamper other teams' progress.
He also had his mind set on not reconciling conflicts regularly with his branch, and at one time was several *months* behind the main branch, which was of course against the agreed practices; I got vindicated when it took his team *two full weeks* to painfully merge their feature branch into the mainline, which I had (easily) predicted.
@jens I think the biggest problem is that most devs never work in ops.
A single coworker and I were ops in the past, and we both have a radically different perspective from our other dev peers.
We write things to make life easier for ops, and have ended up setting pro-ops policy for the rest of the team, especially since at least one of us is involved in every code review.
Of course, when you have a dev team with experience in ops, it quickly goes in the devops direction
@urusan @jens As someone who's worked in ops for their entire career, and who has done development while doing that because things were thrown at us that didn't meet the needs of operations or customers, but "had to be deployed" because "I have no freaking idea at all, logic and common sense isn't something people do when choosing processes for operations or customers apparently"...
The number of developers who think that operations and customers are annoyances is far too high, and the number who wish to push problems that will be seen immediately by operations the second something hits 'production' to 'sometime in the future, not important' is far too high.
Devops just pushes that to production faster, meaning that if you're in devops, now the protections you had against some levels of stupidity and rushing headfirst over the cliff are removed, and you get to solve them with even less sleep.
All things in moderation. Devops + limits is ok.
I've read things from some people like "not allowing developers on production machines is evil!"
those developers insisting on that NEVER answer the fscking phone when the customer can't get their things for their thing that will cost them all your business and theirs if it doesn't work, and why was it changed without notifying someone?
We used to say, when I was in the trucking industry, that losing a customer cost you 7 times more, to regain them as a customer, than keeping them.
I've seen more customers leave due to developers pushing shit into production "because we have to fail fast!" (the mentality of 'getting investor cash' - not 'keeping a customer.')
And then the investor money is gone, and your customers don't trust you because you keep screwing up production, and suddenly no one gets paid, and the fail fast person left to get a higher paying job and you stuck around to keep the place running.
Fsck those devs.
The main plus side of devops is that it aligns the incentives of devs and ops, because we have to directly deal with any problems we make for ourselves. If anything, the gates are stronger because I (or the other devops guy) can smackdown bad ideas earlier, at the very latest in code review.
It relies on a relatively rare combination of human resources: multiple developers with substantial ops experience.
It puts a heavy burden on the devops people (if they're doing their job correctly), and of course we aren't getting paid more. Ops work is also distracting, which breaks development flow, which is why they split in the first place
A well designed system running on well designed infrastructure has minimal administrative overhead these days.
The problems arise from poorly designed systems and/or poorly designed or non-existent infrastructure, which is still the norm.
There'll always be a need for first responders as well.
Ops/infrastructure never get the attention they deserve, since it's seen by management as a cost center to be cut, when really it's the bedrock for the whole operation.
@urusan @Truck I find the point of view most helpful that sees DevOps as ops for devs. Which means it's an ops-driven thing, that just also takes dev needs into account. One example of that is actually containerisation; in itself it *complicates* ops.
But since container images can be deployed *and* given to devs', it's possible for devs to run more production like systems in which to integrate their software.
That may be a simplified description, but works.
I immediately fixed this upon realizing it, but it snuck up on me as I hadn't been doing anything of any real importance.
@urusan @jens Yes. Devops CAN work, so long as people have a mindset that is operations _and_ development, and start the project with that, and stick through to the end and continue to work with it after deployment.
It becomes more difficult when any of those items are not the case. More than one, and we're talking exponentially harder.
A private instance for the Finkhäuser family.