Second, I got sucked pretty much from the get-go into ops. We called it system administration then.
For $reasons, I used to be one of the few developers who got root access to production machines in a bunch of companies.
This has helped me get a fairly good understanding of what operations require from developers. It's quite different from what developers require from each other.
In some ways, it's the exact opposite.
The way these two things fit together is through the nebulous concept of "processes", which most people seem to immediately associate with some ISO standard and certifications, which is followed by endless amount of documentation, which must lead to the conclusion that processes must stand in the way of developers doing what they should be doing, which is programming.
I used to be like that.
So what does operations require from developers?
Developers often think of that in terms of "make things easy for ops", such as giving them a single binary to run instead of having them wade through configuration files. And to a degree, that's a fair enough goal to have for devs.
But the truth is, that ops tend to have some coding ability, and there are a ton of configuration management tools out there and in active use.
Complex setups that can be scripted aren't all that complex.
In fact, what operations require *more* from software is adaptability. The way devs tend to run software is usually *not* the way ops need to run it. Simplicity can be a bit of an own goal.
Or, to put it differently, while ops are end-users, they're not to be confused with grandpa trying to figure out this new-fangled electronic mail thingimajig.
What ops require above anything else from devs is boring.
No, really, that's it. Software should be boring.
That is, it should be predictable.
Configuration file formats changing, interfaces changing, setup procedures changing, requirements changing - all of these things shouldn't happen.
Of course they will, and they will have to. That's where scripting helps ops out again. But that means that any change in what ops can reasonably expect must be documented and the documentation communicated in a manner that ops can deal with.
You cannot communicate this the day before a new release should go live.
Which brings me back to processes.
A process is nothing more than putting this communication in place.
The way this manifests is most typically through checklists. Follow steps a-z in your release from dev to ops.
But what these checklists *encapsulate* is the organisation's understanding of the needs of ops from devs (and vice versa for e.g. bug reporting procedures, etc.)
QA is usually in the middle of this process, adding the required check mark on one of the boxes on the list.
Every organisation has a process for this. Every single one of them. Even if it's "oh I just upload that JS file to the production server", that's a process.
It's a shitty one, but it is one.
What is often missing are the checklists. And some structured discussions around how to get to checklists. All of which is shorthand for communication.
The discussions tend to lead to more or less the same place: you either need a gate, or you need recovery.
A gate is a kind of barrier in the process that is somewhat cumbersome to pass, but if you have the right key, passing isn't all that hard.
QA is such a gate. If you cannot pass QA's requirements - which is usually in the form of regression tests and new feature tests - then you cannot go to production.
Recovery refers to the set of procedures you have for rolling back to the last known state in case of a production failure.
Recovery tends to be good in orgs with weak gates and vice versa.
I'm a bit traditionally minded here, and personally prefer my gates to be strong. The industry as a whole tends to prefer strong recovery. There is almost always a mixture of the two employed.
In the end, it barely matters. What matters is that the business goals are met reliably, and the delivery team is not unreasonably stressed out by their combined efforts.
What the whole thing requires regardless of the balance struck here is that developers take some ownership.
This isn't to say that developers "own" that everything goes smoothly. That would be an exaggeration.
But they cannot reasonably have the mindset that throwing code "over the wall" is their job. Their job is to provide to the next down the line - QA, ops, both - exactly what those parties require to do *their* respective jobs.
"Worked on my computer" is a symptom of this being broken, same as "no errors in the CI" or whatever form it takes.
I sometimes exaggerate this and say "your job is only done when the end user - aka grandpa - successfully uses it to solve their problems". This is more for illustration purposes than as a real assignment of responsibilities. Exaggeration is a it double-edged, people can take it too seriously. But often enough it carries a general point across fairly well.
FOSS developers have a particularly hard time here. They are not part of an overall organization that provides the kind of feedback they need to optimize their output.
What I mean is, even if FOSS developers work towards their company internal goals and just publish software, the "organization" never includes all the users they end up factually delivering to, and communications with those users often relies on the users initiating it.
I don't know who uses my FOSS software, but they sure do.
As a FOSS developer, you have a few methods at your disposal to help you along here.
One is *prolific* documentation. But it doesn't just have to cover a lot of ground, it has to have entry points tailored to needs you may reasonably expect. The ops perspective is one such entry point.
Another is to double down on predictability. Infamously, Debian lags behind on up-to-date software and has slow "stable" release cycles. But it does just that, and it has generally earned them love from ops.
A third method is actually quite difficult to balance, but is still crucial: you have to bring the barriers to contribution way, way down.
Where this is often difficult is in bug reporting software. I see a lot of popular software using bots to classify bug reports according to developer needs, sometimes closing them automatically when the bot deems the report to be a duplicate or whatever.
There is limited developer bandwidth for dealing with reports, so they need screening of this kind.
Unfortunately, there is a negative result here for would-be contributors.
If you run into such bots, or excessive formalism in the report templates, or contributor agreements they need to sign before sending a patch, etc, etc, the most likely result is that people end up contributing less overall.
I can deal with all of that if it solves a problem I can't circumvent. But if it's easier to switch software than to contribute, well...
Lowering barriers may be better long-term.
If nothing else it signals "yes I want your contribution" even if one cannot deal with it immediately, as opposed to sending the signal that contribution is only appreciated conditionally.
This feeds into community management.
I believe - I have no data, sorry - that FOSS projects benefit strongly from human community managers that take care of this screening. And community management can in itself be a form of contribution.
See I started this thread on developers and ops interfacing, but...
... this goes right back to that beginning: FOSS developers that do not visible invest effort into interfacing with the "next" down the chain, which in this case are random people picking your software up, do not do their job.
They throw code over the wall. They say "it works on my computer", or "in my use-case".
I sometimes hear - heard the other day - that it's the developer's spare time, so one can't make such demands on them.
There's some truth there, absolutely.
So let's say I volunteer for the local fire brigade. We have a lot of volunteer fire response in Germany in smaller towns, which is supplemented by professionals from neighbouring larger towns. Response time often trumps equipment and experience.
Let's say I volunteer, but I don't respond to alarms. Or I do, but I always turn up late.
On the one hand, I'm volunteering my time, so one cannot make demands of me, right?
But the kicker is, I am not actually volunteering for turning up in a red suit with a fire hose. That may be what it looks like, sure.
What I'm volunteering for is a job, one that is comprised both of rights and of responsibilities. If I ignore the responsibilities, people will rightly be upset.
So it is with the role of FOSS developer jobs. Nobody is asking for your coding time. Everyone is asking for your ability to solve user needs. Ignoring the user needs is not doing your *voluntary* job.
@jens what i often see, to overstretch your metaphors is people being paid to work for the fire brigade in the big city next to your little village and they come in with demands to your little volunteers fire bridge
i co-built a community of volunteer maintainers of puppet modules and i had someone at a conference come up to me and brag that he used us to train his junior engineers
I'm a savvy user, but know nothing about programming. Let me give you my take on this, OK?
Your job is only done when your software allows me to do more in less time than not having your software allows.
Most software does not pass this test. Indeed I think in my entire life of using ever-increasingly computerized systems, I can count on one hand where computers have improved things.
Yes. It is that stark.
As an example, at work we have a contact management system. Contact management is absolutely **KEY** to our business. If we screw this up, our business dies.
The contact management system, installed and configured by conslutants (sic) of the worst kind, is so unreliable that literally everybody who is client-facing prints off key clients into special notebook pages so that we can use those instead of this system.
While this system is the most egregious of the lot, even a lot of COTS software we use is utter crap. I have a rhythm of automatically hitting Ctrl+S while working every five minutes or so, no matter what software I'm using (except that crap management system which manages to screw even "save" up…) because out of nowhere the computer might suddenly decide that it's had enough and will throw a tantrum, losing any unsaved work.
I tend to think of this as more of a joint responsibility for the organizations that produce your software, while the developer's role is more limited. They have to do their part of the job to enable everyone else to, ultimately, fulfill this at the organizational level.
But as a shorthand, it absolutely is that stark.
@zdl Yes, but software can crash on particular inputs or on particular systems. It's unrealistic of developer-as-person to cover all of those, but developer-as-organization should at least reach a high percentage.
I'm looking more at developer-as-person here. Your requirement can be met by developer-as-organization without much involvement from developer-as-person.
Weird terminology, but maybe that makes sense?
@jens I guess? I've no idea what goes into making this stuff (no matter how much SO tries to explain—he's an electrical engineer and works with this—it goes in one ear and out the other, likely slipping out while I yawn ;)).
@zdl TL;DR is, there's an entire supply chain of specialized jobs involved.
It used to be so much simpler :)
Yet, cars still crash...
because they're on roads, and roads are unpredictable.
Put a car near a tree, tree gets hit by lightning, falls on car. Car has been involved in a crash.
Now, why didn't the designers have a fix for this? I paid good money for that car! Why can't car people DO THEIR JOBS?
Because *no one* can predict everything that will happen at all moments.
You can work very hard and try to predict everything... but as mentioned in parables...
build an idiot proof thing, they build a better idiot.
Cars crash when in unusual circumstances and/or when the driver is taking unusual actions.
Computers crash sitting there doing nothing.
There is absolutely no comparison. Any car that was as unreliable as the **best** of the software that I use every day would be, in Canada, returnable under the so-called "lemon laws".
@zdl This is Off-Topic but - the deliberate use of the slur "slut" here seems unnecessary and offensive. If you didn't like the consultants I'm sure there's a way to communicate that, without adding to an ancient culture of violence against women.
@jens i wrote this a long time ago, https://igalic.co/thoughts/2015-12-03-software-vs-automation.html i think it still mostly holds up
A private instance for the Finkhäuser family.