Aleksey Tsalolikhin, 9 July 2010
On Thursday, June 24th, the USENIX 2010 conference in Boston hosted the first Configuration Management Summit on automating system administration using open source configuration management tools. The summit brought together developers, power users and new adopters.
There are over a dozen different CM tools actively used in production.
So many choices can bewilder a sysadmin searching for a CM tool.
The summit included representatives for 4 tools: Bcfg2 (pronounced "bee-config 2"), Cfengine, Chef and Puppet.
The summit had three parts: 4 presentations; a panel session; and a mini BarCamp with 6 presentations. The panel session was quite lively.
I will attempt to compare and contrast the 4 tools; however using any robust configuration management tool, with discipline, is better than administering systems manually.
Bcfg2
Bcfg2 was represented by its creator, Sr. Systems Administrator Narayan Desai.
Some notes on the name: pronounced "bee-config two". The "B" in Bcfg2 originally stood for "bundle"; now it doesn’t stand for anything. The 2 indicates version 2.
Narayan thinks of configuration management as an API for programming your configuration.
Goals of Bcfg2’s design:
-
Efficient representation of diverse configuration.
-
Scalability to thousands of nodes.
-
Programmability (ability to insert code, not just declarative descriptions of desired configuration).
-
Model configuration in simple unambigious terms. (No way to end up with two different models of the same configuration.) "Closing the loops between goals and reality" is very important in Bcfg2. To enable this, there is a lot of reporting built in; client reports its state upstreat to the server to enable "gap analysis". This is a very popular feature.
-
Support extensive configuration debugging. Help the sysadmin get to the bottom of things quickly. Bcfg2 has full system introspection capability (why is Bcfg2 making the decisions that it is). Many people think the debugger is the coolest feature of Bcfg2.
-
The Bcfg2 client can be run in dry-run (no changes), interactive (are you sure you want to do this?) and non-interactive mode, which is facilitates learning it.
-
Composition of information from a number of sources. (For example, put together different policies from different organizational sources such as organizational policy (FTP should not be running) and departmental policy (all user home directories must be NFS-mounted from fileserver HAPPYHOME).)
-
Expose plugin API to all aspects of the configuration process to enable handling of corner/edge cases. (Flexibility is an important part of Bcfg2’s design.)
Because Bcfg2 can capture the entire configuration of a system into a model you can diff the goals model (the desired state) and the current model and find out your degree of deviation, or you can diff two current models (Server A and Server B) and find out what’s different about their configuration.
Bcfg2 is very flexible - for example, you can designate a node as an examplar, and the Bcfg2 server will copy it’s config to other nodes.
<Base>
<Group name='Mail-server'>
<Package name='postfix'/>
</Group>
</Base>
<Base>
<Group name='Web-server'>
<Package name='apache2'/>
<Package name='apache2-mod_php'/>
<Package name='php5'/>
</Group>
</Base>
Generate /etc/motd (message of the day) file that describes the system in terms of its Bcfg2 metadata and probe responses.
Here is the template (stored on the server):
------------------------------------------------------------------------
GOALS FOR SERVER MANAGED BY BCFG2
------------------------------------------------------------------------
Hostname is ${metadata.hostname}
Groups:
{% for group in metadata.groups %}\
* ${group}
{% end %}\
{% if metadata.categories %}\
Categories:
{% for category in metadata.categories %}\
* ${category}
{% end %}\
{% end %}\
{% if metadata.Probes %}\
Probes:
{% for probe, value in metadata.Probes.iteritems() %}\
* ${probe} \
${value}
{% end %}\
{% end %}\
-------------------------------------------------------------------------
ITOPS MOTD
-------------------------------------------------------------------------
Please create a Ticket for any system level changes you need from IT.
This template gets the hostname, groups membership of the host, categories of the host (if any), and result of probes on the host (if any). The template formats this in with a header and footer that makes it visually more appealing.
One possible output of this template would be:
------------------------------------------------------------------------
GOALS FOR SERVER MANAGED BY BCFG2
------------------------------------------------------------------------
Hostname is cobra.example.com
Groups:
* oracle-server
* centos5-5.2
* centos5
* redhat
* x86_64
* sys-vmware
Categories:
* os-variant
* os
* database-server
* os-version
Probes:
* arch x86_64
* network intranet_network
* diskspace Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
18G 2.1G 15G 13% /
/dev/sda1 99M 13M 82M 13% /boot
tmpfs 3.8G 0 3.8G 0% /dev/shm
/dev/mapper/mhcdbo-clear
1.5T 198M 1.5T 1% /mnt/san-oracle
* virtual vmware
-------------------------------------------------------------------------
IT MOTD
-------------------------------------------------------------------------
Please create a Ticket for any system level changes you need from IT.
Screenshots of Reporting System
Cfengine
Mark Burgess, the author of Cfengine, started his presentation by re-focusing the projector. The image was not blurry to start, just not completely in focus and I had no trouble reading the prior presenter’s slides; but it was very crisp after the adjustement!
Such attention to detail inspired my confidence.
Cfengine is the granddaddy of open source configuration management tools, dating back to 1993 and Mark worked as a part-time sysadmin at the University of Oslo struggling with handling many different kinds of Unix and Unix-like systems.
Mark describes Cfengine as an agent-based change management system with "convergent" or "self-healing" behavior. (Cfengine will continously return a system to the configured state or keep it there if it’s already there. Another way to put it, regardless of where you start from, you can always get to the defined state.)
The Cfengine language is a largely declarative language for describing desired or "promised" states. Like Bcfg2, the language is a pragmatic mix of declarative and procedural.
Cfengine includes a self-learning monitoring framework (to deal with an unknown environment) and a knowledge management framework (to help handle complexity of system configuration). Cfengine introduced the idea of "classes", which are patterns in space and time and implicit if/then tests.
You can use Boolean logic with classes to select systems for a configuration promise. For example: Linux servers on x86 platform with -dev in the name should have their OS updated on Sunday at 3 AM.
Cfengine is model-based in the sense that you describe the model of the end state that you want.
Cfengine is self-documenting because you are using a declarative language.
Cfengine is lightweight (1.9 MB footprint). It has very few prerequisites (Berkeley DB library, crypto library and optional PCRE library). Today, Cfengine runs on everything from unmanned underwater vehicle to Nokia handheld phones to supercomputing clusters.
Because it’s a C binary with very few prerequisites, it has the largest span of systems it runs on out of all the open source CM tools.
At first, Cfengine was modeled as a computer immune system, helping a system stay healthy in an uncertain, changing, and possibly hostile environment.
The current philosophy of Cfengine is "promise theory", where the defined state is promised by different system components (such as files, packages, processes, etc.), and Cfengine is a "promise engine" — an engine for keeping promises.
-
Voluntary cooperation, local autonomy. Cfengine allows local control of policy in anticipation of consensus building amongst human administrators. Voluntary cooperation is expected; so Cfengine always pulls policy, never pushes it. A policy push is indistinguishable from attack. (Cfengine has had 3 security vulnerbilities in 17 years due to this principle.)
-
Pragmatism. Work with what you’ve got: allow shell commands.
-
Resilience Expect the unpredictable (therefore convergence back to promised state). Design allows for a single point of control without a single point of failure. (If a policy server goes away, the Cfengine agents on nodes will keep running using cached policy.)
-
Allow freedom. For example, allow use of package systems. Cfengine is about "constraint", not "control". The philosophy back of this is, "You do not control environments, you participate in environments."
-
Convergence. Run Cfengine many times and the system should always get better and it should never worse. Always move closer to the promised state; or stay there. Stay there by always trying to move closer to it. (This counteracts the natural force of entropy which would result in system state drift over time.)
Promise Theory is based on the key principles of convegence and autonomy.
Everything is a promise in cfengine language. Files promise to be there (and are created or copied by Cfengine if they are not); packages promise to be installed; processes promise to be running.
Cfengine configuration is composed of promises and patterns. A class is an example of a pattern; a list of packages to be installed is another (see example below).
Another practical pattern in Cfengine is abstraction of promise details so you can see at a glance what is promised, and can still drill down if necessary to get the promise details.
For example:
copy_from => my_secure_cp("myfile","myserver")
body copy_from my_secure_cp(file,server)
{
source => "$(file)";
servers => { "$(server)" };
compare => "digest";
encrypt => "true";
verify => "true";
force_ipv4 => "false";
collapse_destination_dir => "false";
copy_size => irange("0","50000");
findertype => "MacOSX";
# etc etc
}
packages:
"postfix"
package_policy => "add",
package_method => yum;
First, create a variable of type "list of strings" named @match_package.
Second, use an implicit loop over each element of the list (like in perl, @var is an array, $var is a scalar/string), and promise that package is added using YUM.
Loops are implicit in Cfengine, this is a powerful abstraction.
vars:
"match_package" slist => {
"apache2",
"apache2-mod_php5",
"php5"
};
packages:
"$(match_package)"
package_policy => "add",
package_method => yum;
Chef
Aaron Peterson, seasoned systems engineer, Technical Evangelist of Opscode, Inc., presented. Chef is primarily a configuration management library system, and system integration platform (help integrate new systems into existing platforms.)
Chef grew out of power user dissatisfaction with aspects of Puppet. Made available in 2009, Chef is in beta (version 0.9.x.x) and is settling down now.
-
The cloud is the future, be able to operate in the cloud. (For example, automated provisioning of new server instances.)
-
Fully automated infrastructure is hard. Make it easier through configuration sharing and re-use. Chef is a library for CM (or a CM system built on that library).
-
Facilitate integration of new servers into an existing platform.
-
Idempotence. Describe the end state, and Chef will get you there and keep you there. However, Chef only takes actions when they differ from the description of the end product.
-
Reasonability (easy to think about).
-
Sane defaults (yet easily changed).
-
Hackability (easy to extend).
-
TMTOWTDI (There Is More Than One Way To Do It).
-
Pragmatism. Chef’s language is Ruby with some DSL. The Ruby mix allows you to include programming.
-
Enable infrastracture as code to benefit from software engineering practices such as agile methodologies, code sharing through github, release management, etc.
-
Enable you to solve your problem.
-
Data-driven. Configuration is just data.
-
Cultivate a culture of quality and reusability through Chef cookbooks. Example freely available cookbooks: Postgresql, tomcat6, Apache2, Kickstart, OpenSSL, etc.
-
Chef exposes data and behavior over HTTP to enable integration with external tools.
-
Recipes are run in order. Nodes have a run list: what roles or recipes to apply, in order.
"run_list": [ "role[webserver]", "role[database_master]", "role[development]" ]
Chef: Infrastructure is Code
package "postfix" do action :install end
package "apache2" do action :install end package "apache2-mod_php5" do action :install end package "php5" do action :install end
package "sudo" do
action :upgrade
end
template "/etc/sudoers" do
source "sudoers.erb"
mode 0440
owner "root"
group "root"
variables(
:sudoers_groups => node[:authorization][:sudo][:groups],
:sudoers_users => node[:authorization][:sudo][:users]
)
end
Manage configuration as resources.
Put them together in recipes.
Track it like source code.
Configure your servers.
The above example is two part, a package resource recipe, and an accompanying file template recipe.
Under the hood, a Chef Provider will handle the required action to bring the resource into the described state. Example provider for the above: Chef::Provider::Package::Yum.
Recipes are lists of resources (files, packages, processes, filesystems, users, etc.)
Cookbooks are packages of recipes.
default[:nginx][:gzip] = "on" default[:nginx][:gzip_http_version] = "1.0" default[:nginx][:gzip_comp_level] = "2" default[:nginx][:gzip_proxied] = "any" default[:nginx][:keepalive] = "on" default[:nginx][:keepalive_timeout] = 65 default[:nginx][:worker_processes] = cpu[:total] default[:nginx][:worker_connections] = 4096
Chef is known to run on the following platforms: * Ubuntu (6.06, 8.04-9.10) and Debian (5.0) * RHEL and CentOS (5.x) * Gentoo (1.12.11.1) * FreeBSD (7.1) * OpenBSD (4.4) * MacOS X (10.4, 10.5) * OpenSolaris (2008.11) * preliminary support of Windows
There are three cloud providers supported by "knife", the Chef commandline tool: Rackspace, Amazon EC2, Terremark. The support is through the OSS fog project with support made visible in "knife" as cloud provider support for API interaction matures and support in fog matures.
knife rackspace server create 'role[rails]' knife ec2 server create 'role[rails]' knife terremark server create 'role[rails]'
There is an optional Web UI. Anything that can be done through it (such as adding users or roles) can be done on the CLI with "knife" or interactively via "shef", the Chef shell.
The commercial platform has some additional Web UI features, but again, the primary use of Chef is expected to be via libraries, following the "infrastructure as code" paradigm.
Puppet
Michael de Haan of Puppet Labs, and formerly of Red Hat engineering, presented.
-
Puppet is centralized.
-
Puppet internal logic is graph based. It uses decision trees and reports on what it was able to do; and what failed (and everything after it). Manual ordering is very important, as decision trees will be based on it. Ordering is very fine-grained.
-
Puppet language is a data center modelling language representing the desired state. The Puppet language is designed to be very simple and human readable. This prevents you from inserting (Ruby) code but it also makes it safer (prevents you from shooting yourself in the foot). However you can still call external (shell) scripts. Also, an upcoming version (2.6) will support programming in a a Ruby DSL.
-
Portability. Works anywhere Ruby works.
-
Pluggability. Puppet does not allow arbitrary language in the code; however there is pluggability: server side functions can interact with an external data source (e.g. query database or read a textfile). There is a feature called "external_nodes" which you can enable on the puppet server (the puppetmaster) which will kick in whenever a puppet client (puppetd) connects. Instead of having the node name and its class membership and attributes stored in your puppet config, you can have it stored in an external database, and "external_nodes" will fetch that info.
Puppet only performs actions that are necessary. The basic formula for Puppet’s operation is: server-side, poll information from the client then decide what to do and tell the client what to do. In detail:
The server gets the client to tell the server about itself. These are facts in Puppet. The configuration policies are the manifests.
The server compares the facts (what is) to the manifests (what should be), and, if necessary, creates instructions to the clients on the managed nodes for moving from what is to what should be. These instructions are encoded as a JSON catalog.
The JSON catalog contains a declarative description about desired state, and the client then runs that catalog to achieve the desired state.
If a service subscribes to a file, and the file changes, the service will know it automatically needs to restart. For example:
service { 'sshd':
ensure => running,
subscribe => File['/etc/ssh/sshd_config'],
}
file { '/etc/ssh/sshd_config':
ensure => present,
source => puppet:///sshd/sshd_config,
owner => root,
group => root,
}
Resource Types are the building blocks of Puppet configuration. Here is a simple example:
file { "/etc/passwd":
owner => root,
group => root,
mode => 644
}
This is the "file" resource type. It controls ownership and access permissions to the named file.
Providers are what make the resource type an actuality; or it’s the part of Puppet that actually executes the configuration, the interface between the resource description and the OS; the "doer".
There can be multiple providers for a resource, for example you might specify mod-php package be installed, and it could by installed by package providers for dkpg, rpm, yum, openbsd, and so on. The most appropriate provider will be picked automatically; or you can specify certain features in the resource type, and then the providers will be probed for what features they support.
There is an advanced and experimental feature "exported resources" that allows one host to configure another host (in Puppet terms, it allows resources to move between hosts) — this allows inter-node orchestration.
Puppet, of course, can export reports.
What Lies Ahead? What Are the Challenges in Configuration Management?
Narayan: "Configuration meta-programming" or "multi-node orchestration". For example: "NTP clients should talk to our NTP servers", or "the ssh_known_hosts file should contain entries for all machines", or "the load balancer should direct traffic to all production Web servers".
Mark Burgess: Including network devices in configuration management; manipulating mechanical devices (such as controlling satellite position in Earth orbit); most importantly, knowledge management (tracking state, understanding intentions, aligning with business goals). Mark is working on tying Cfengine with ISO13250 Topic Maps.