I love Ansible. But its main use is in commissioning new systems, and comissioning new systems was never really the problem. We already had many ways to do it: cloning, golden images and so on.
Ansible gives the impression that you can translate your infrastructure to a code base, and that tickles developers' fancies. But maintainance is the hard bit, as it always was. It isn't appropriate to release an ansible playbook all over your live, mission-critical server. The consequeces of even a small bug are just too great. Even if Ansible was used to build the system in the first place, the server has since been subject to 12 months of entropy, and is now a stranger to the parent Ansible code.
Roll out is comparitavely simple, and always was. What little complexity it contains is easily managed. Maintenance is more tricky.
Entropy is not the same as "uncontrolled changes". Entropy is also the build up of natural changes over time, eg. as files naturally grow and change, software ages, passwords change, and so on.
Eg. I need a new route adding to my critical server, which has 7 interfaces. Immutable infrastructure says I just roll out a new identical server but with the added route. However, that approach makes the large assumption no entropy has occurred which could impact the change. It also requires an interruption of service while a switchover is made to the new system. It also requires careful management of potential issues, eg. identical IP addresses, changing MAC addresses, changing system UUIDs, and much other stuff. In other words, it also requires a great deal more time, people and expense than the single command which could be used to add a route in the conventional way.
The moment a system is booted, it starts to change. Immutable?
The next buzzword?
I need a new route adding to my critical server, which has 7 interfaces…
Shame on you for wanting to things with real hardware! DevOps wants you to virtualise everything and provision a single server for every service because the management is easier and who cares about silly things like interprocess communication?
@Charlie Clark Who said my 7 interface server was real ? It's virtual, and so are the interfaces. Same issues apply.
Whaaat? I love ansible too. I'm not sure you're using it right. I've *never* used it to provision systems, and I'm not positive that this is even one of its intended uses.
There should be no 'entropy' in the system state. You run ansible (or chef, or puppet, or your homegrown shell scripts) every X number of minutes out of cron (or whatever your scheduler of choice is), and let it pick up/trigger updates to pkgs, configs or data files from your CMDB's API for the specific host or hostgroup resources you've defined there. It's great for one-off adhoc runs of commands as well, but really no different than shell+ssh that we've been using for decades to do the same work.
Don't get me wrong, its not nearly as efficient or quick as working directly with shell, and non-interactive ssh and issuing the one off command or two that you're trying to execute, but seeing how most sysadmins I work with these days look at me cross-eyed with a "fuck you" grin on their face when I tell them to read the manual (*any* manual), I imagine its somewhat of a necessity *these-days*. Lets be honest, the state of the art is in a bad way.
Where have you been? This has been a buzz-phrase for at least 5-6 years.
I will add, having infrastructure as code does add the one nice-to-have function of being able to diff your *complete infrastructure* and (hopefully if you do it right) be able to track it in your source control history. This is beautiful.
Well, one of the ways you're "meant to do it" is to use Ansible for *all* changes. This is more or less how we run the Fedora infrastructure, for instance:
if you want to change the configuration of a server, you commit your change to that repo and run the playbook. Always. You don't go in and start poking around interactively.
But really, that's just one pattern. I wouldn't say Ansible has a "main use"; different people use it for lots of different things. It's a tool, and upstream's pretty agnostic about how it's supposed to be used.
Tasks in ansible are idempotent. You should be able to run them repeatedly without altering state unless your config item has changed locally or in your repo.
I've had multipl ansible playbooks run thousands of times against hosts without a single change.
That said, I don't yet have the confidence or balls to do that on a core dB server, uhuh.
"When you automate, you accelerate."
Fine, unless there's a brick wall or cliff ahead of you.
While I agree the single largest issue for most SA's *was* provisioning, I'll note that the idea of 'immutable infrastructure' is *not* that the (system never changes). It is an alteration in the way one thinks of infrastructure. If one thinks of the (Physical host/VM OS, baseline services i.e. DNS, auth, mail, logging, ntpd, network, storage, FS layout) as infrastructure, they *can* become immutable.
You put the (cilent/guest/application/db/service) into its *own* box, separate from the infra box, then automation starts to be massively effective. -- no, one does not roll things out across one's prod systems without testing, but at least once one starts thinking in the context of managing 8K or more systems with only 32 people (and I'm talking Director, managers, PA's OS, platform, tools network/storage team), you learn to box things up nicely, separate what is test/dev/qa/prod, what can be handled in a herd and what needs to be snowflaked.
While it has been a while, we flipped the switch on DNS servers, migrating from an old set of hosts, to a new fancy shmancy cluster of application servers for DNS management as a single change to just over 5,900 *servers* and some 12k desktops. In about an hour, including final validations. Automation made the system changes. *and* provided the validation results.
the *testing* rollouts, executed 5 or 6 times prior, on limited targets, gave us the comfort to do that. And automation made backing out the *tests* just as easy. Total time ? about 18 hours, 11 of which were spent on the paper work and conf calls to get the change approved.
I'll guess that you've never had the opportunity as an SA to point out that a vendor is using *default dumb* in installation instructions for their application. I've found *VERY* few application vendors that have *default intelligent* installations. To make your environment manageable, as an SA you have to be an utter prick and override the "Install to root" "Run as Root" "well we have to do this because we don't know" crap. And willing to take the time to fix it so that it does work once in the box.
"While I agree the single largest issue for most SA's *was* provisioning,..."
Eh? I have just written 2 posts arguing the opposite. Who are you agreeing with?
Ansible considered harmful?
What I don't get about Ansible is that they encode Turing-complete logic in a system that REALLY does not want to support testing. ???
Death to CM!
Out of all the config management stuff Ansible is certainly one of the best, if not the best.
Other systems like Puppet and Chef suffered from bloat and complexity (and stupid agent-based nonsense). When Ansible arrived it was a breath of fresh air - a relatively simple approach to config management an one that encouraged you to keep things simple.
Unfortunately Ansible's gone the way of other CM systems - more modules are being piled in, increasing the complexity. A rapid development cycle often breaks compatibility between one release and another either intentionally or down to bugs.
The documentation on the site is pretty poor too; this is quite common with a lot of modern 'devop' tools. (Hashicorp is a particular offender).
The basic fact is that configuration management is, and always will be, difficult when run against long-lived instances. As someone above mentioned, you'll inevitably encounter a time when running Ansible/Chef/whatever against a server will break something.
You'll also find that your CM code becomes more convoluted and harder to maintain over time.
This sort of thing leads to the scenario where you need to make a change to your infrastructure; you _could_ do it in your CM system, but that might take a day or two with updating, debugging and refactoring your CM code (often to find it doesn't work due to months-old bugs).
Then you're a bit scared to run the stuff against your systems because you can't truly be sure of the outcome.
Or you could do it manually in less than an hour with much less risk of things breaking
Personally I'm fully embracing containerisation; it still leaves the infrastructure provisioning, but with things like Docker for AWS that becomes much less of a headache.
Why the focus on speed?
It seems to me that the major benefit of these automated management systems is not speed of action but a combination of consistency and concentration of expertise.
I think the virtue of consistency speaks for itself but whilst these management systems do not have any inbuilt expertise they allow the expertise of those techs who do have it to be projected beyond their physical presence, as it were; the people who do know how to do things properly don't have to do everything personally and individually. Thus, the expertise is put in to the management system and the management system then distributes it.
Burying the lede
It was kinda funny to find this way down the bottom:
"And Red Hat Ansible Tower 3.2, coming later this month, was announced along with the open source AWX project, the upstream version of Ansible Tower."
when I know that, to the Ansible folks, that was by far the biggest hairy deal in this whole event (note: I work for RH, in a different department). Yup, that basically means we open sourced Tower, like we said we were going to. Here it is, have fun:
Re: Burying the lede
Do they (RH) think folks actually *like* to use Tower? The open source simple toolset is good enough for most Open-source-heavy shops.
Re: Burying the lede
Depends on the user, really. Again I'm not in that department, but AIUI, Tower is a very big deal for some customers. And then again, as you say, some users are fine with just the open source Ansible proper (this is all we use in Fedora). Depends on your scale and particular setup, I guess.
I have two primary use cases for Ansible - 1) provisioning of (generally virtual) infrastructure, and 2) (re)deployment of applications. While the former is invaluable to me (I intend future dev environments to be Ansible provisions rather that Vagrant which is proving increasingly problematic), application deployment takes the lion's share of usage. I can tag and deploy an application, building it from Git if necessary with a single command. The beauty is that by just changing the configuration in the inventory I can do this from a dev environment, on to staging, and thenceforth production. In a microservices environment blue/green deployments are easily facilitated.
The real win though is that the playbooks become the infrastructure documentation, so there's no longer any need to keep a project wiki, if anything even more vulnerable to entropy, up to date. Anybody else on the team can just as easily and quickly deploy the application as me.
In your devops team, I am sure there are qualified software engineeers. But how many team members have years of experience as real, full time systems administrators ?
Right. There aren't as many sysadmins in the world as their are devs. Devops teams tend to be lopsided. A better decription would be devdevdevdevdevops. So you end up with a polished and elegant Attlasian infrastructure, but everyone thinks that "Dirty Cow" is some James Herriot thing. Get more oppy. Your infrstructure will than you.
I'm not sure what the solution is. Giving me a rate rise would be a good start.
Huh. I'm finding the opposite. Most placing saying devops seem to want a sysadmin that knows chef. As a SW guy, I view devops as creating sw tools based around the expertise of the sysadmin. By population, I expect to see more dev than ops. But those ops guys are the ones whose brains have the data that we are pushing into code.