Inflexible configuration will cause problems. Chris Oldwood demonstrates how to support multiple configurations flexibly.
If you look at a system’s production configuration settings you could be fooled into thinking that we only need a simple configuration mechanism that supports a single configuration file. In production it’s often easier because things have settled down – the correct hardware has been provisioned, security accounts created, monitoring services installed, etc. But during development and testing is when the flexibility of your configuration mechanism really comes into play.
I work on distributed systems which naturally have quite a few moving parts and one of the biggest hurdles to development and maintenance in the past has been because the various components cannot be independently configured so that you can cherry-pick which services you run locally and which you draw from your integration/system test environment. Local (as in ‘on your desktop’) integration testing puts the biggest strain on your configuration mechanism as you probably can only afford to run a few of the services that you might need unless your company also provides fairly meaty developer workstations too.
In the past I’ve found the need to override component settings using a variety of criteria and the following article is definitely not exhaustive, but gives the most common reasons I have encountered for needing to configure something differently. It also goes into a little more detail about how you might support such multiple configurations whilst minimising the potential for duplication.
Per-environment
The most obvious candidate is environmental as there is usually a need to have multiple copies of the system running for different reasons. I would hazard a guess that most teams generally have separate DEV, TEST & PROD environments to cover each aspect of the classic software lifecycle. For small systems, or systems with a top-notch build pipeline and test coverage, the DEV & TEST environments may serve the same purpose. Conversely I have worked on a team that had 7 DEV environments (one per development stream [ Oldwood14 ]), a couple of TEST environments and a number of other special environments used for regulatory purposes, all in addition to the single production instance.
What often distinguishes these environments are the instances of the external services that you will depend on. It is common for all production environments to be ring-fenced so that you only have PROD talking to PROD to ensure isolation. In some cases you may be lucky enough to have UAT talking to some read-only PROD services, perhaps to support parallel running. But DEV environments are often in a sorry state and highly distrusted so are ring-fenced for the same reason as PROD, but this time for the stability of everyone else’s systems.
Where possible I prefer the non-production environments to be a true mirror of the production one, with the minimum changes required to work around environmental differences. Ideally we’d have infinite hardware so that we could deploy every continuous build to multiple environments configured for different purposes, such as stress testing, fault injection, DR failover etc. But we don’t. So we often have to settle for continuous deployment to DEV to run through some basic scenarios, followed by promotion to UAT to provide some stability testing, and thence to PROD.
Where sharing of production input sources is possible this means is that our inputs are often the same as for production, but naturally our outputs have to be different. But you don’t want to have to configure each output folder separately, so you need some variable-based mechanism to keep it manageable so that most settings are then derived, e.g. only the root folder name changes, the relative child structure stays the same and therefore does not require explicit configuration.
The Disaster Recovery (DR) environment is an interesting special case because it should look and smell just like production. A common technique for minimising configuration changes during a failover is to use DNS Common Names (CNAMEs) for the important servers, but that isn’t always foolproof. Whilst this means that in theory you should be able to switch to DR solely through network infrastructure re-configuration, in practice you will find not every system you depend on will be quite so diligent.
Per-machine
Next up are machine specific settings. Even in a homogenous Windows environment you often have a mix of 64-bit and 32-bit hardware, slightly different hard disk partitioning, amount of memory, CPUs, etc. or different performance characteristics for different services. Big corporations love their ‘standard builds’, which largely helps minimises the impact, but even those change over time as the hardware and OS changes – just look at where user data has been stored in Windows over the various releases. The ever changing security landscape also means that best practices change and these will, on occasion, have a knock-on effect on your system’s set-up.
By far the biggest use for per-machine overrides I’ve found, though, is during development, i.e. when running on a developer’s workstation. While unit testing makes a significant contribution to the overall testing process you still need the ability to easily cobble together a local sandbox in which you can do some integration testing. I’ve discovered the hard way what happens when the DEV environment becomes a free-for-all – it gets broken and then left to fester. I’ve found treating it with almost the same respect as production pays dividends because, if the DEV environment is stable (and running the latest code) you can often reduce the setup time for your local integration testing sandbox by drawing on the DEV services instead of having to run them all locally.
Per-process-type
Virtually all processes in the system will probably share the same basic configuration, but certain processes will have specific tasks to do and so they may need to be reconfigured to work around transient problems. One of the reasons for using lots of processes (that share logic via libraries) is exactly to make configuration easier because you can use the process name as a ‘configuration variable’.
The command line is probably the default mechanism most people think of when you want to control the behaviour of a process, but I’ve found it’s useful to distinguish between task specific parameters, which you’ll likely always be providing, and background parameters that remain largely static. This means that when you use the
--help
switch you are not inundated with pages of options. For example a process that always needs an input file will likely take that on the command line, as it might an (optional) output folder; but the database that provides all the background data could well be defaulted using, say, an .ini file.
Per-user
The final category is down to the user (or more commonly the service account) under which the process runs. I’m not talking about client-side behaviour which could well be entirely dynamic, but server-side where you often run all your services under one or more special accounts. There is often an element of crossover here with the environment as there may be separate DEV, TEST and PROD service accounts to help with isolation. Support is another scenario where the user account can come into play as I may want to enable/disable certain features to help avoid tainting the environment I’m inspecting, such as using a different logging configuration.
Getting permissions granted is one of those tasks that often gets forgotten until the last minute (unless DEV is treated liked PROD which drives the requirement out early). Before you know it you switch from DEV (where everyone has way too many rights) to UAT and you suddenly find things don’t work. A number of times in the past I’ve worked on systems where a developer’s account has been temporarily used to run a process in DEV or UAT to keep things moving whilst the underlying change requests bounce around the organisation. Naturally security is taken pretty seriously and so permissions changes always seem to need three times as many signatures as other requests; in the meantime though we are expected to keep development and testing moving along.
Hierarchical configuration
Although most configuration differences I’ve encountered tend to fall into one specific category per setting, there are some occasions where I’ve had cause to need to override the same setting based on two categories, say, environment and machine (or user and process). However, because the hardware and software is itself naturally partitioned (i.e. environment/user) it’s usually been the same as only needing to override on the latter (i.e. machine/process). For example if a few UAT and PROD servers had half the RAM of the others, then the override could be applied at machine-level on just those boxes because the servers are physically separated (the environment) as UAT and PROD services are never installed on the same host.
What this has all naturally lead to is a hierarchical configuration mechanism, something like what .Net provides, but where
<machine>
does not necessarily mean all software on that host, just my system components. It may also take in multiple configuration providers, such as a database, .ini files, the registry, command line, etc. With something like a database the problem of chickens-and-eggs rears its head and so it can’t be a source for bootstrapping settings as you need somewhere to configure a connection string to access it.
As an aside environment variables are one technique I wouldn’t use by default to configure services on Windows. This is because they are inherited from the parent process – Services.exe – and so any change to the system environment variables requires it to be restarted, which is essentially a reboot [ KB ].
Hierarchical files
The default file-based configuration mechanism that .Net uses has only two levels of .config file, but it’s possible to leverage the underlying technology and create your own chain of configuration files. In the past I have exploited this mechanism so that on start-up each process will go looking for these files in the assembly folder in the following order:
System.Global.config System.<environment>.config System.<machine>.config System.<process>.config System.<user>.config
Yes, this means that every process will hit the file-system looking for up to 5 .config files, but in the grand scheme of things the hit is minimal. In the past I have also allowed config settings and the bootstrapping config filename to be overridden on the command line by using a hierarchical command line handler that can process common settings. This has been invaluable when you want to run the same process side-by-side during support or debugging and you need slightly different configurations, such as forcing them to write to different output folders.
Use sensible defaults
It might appear from this article that I’m a configuration nut. On the contrary, I like the ability to override settings when it’s appropriate, but I don’t want to be forced to provide settings that have an obvious default. I see little point in large configuration files full of defaulted settings just because someone may need to tweak it, one day – that’s what the source code and documentation is for.
I once worked on a system where all configuration settings were explicit. This was intentional according to the lead developer because you then knew what settings were being used without having to rummage around the source code or find some (probably out-of-date) documentation. I understand this desire but it made testing so much harder as there was a single massive configuration object to bootstrap before any testable code could run. It became a burden needing to provide a valid setting for some obscure business rule when all I was trying to test were changes to the low-level messaging layer.
Configuration file formats
I have a preference for simple string key/value pairs for the configuration settings – the old fashioned Windows .ini file format still provides one of the simplest formats. Yes, XML may be more flexible but it’s also considerably more verbose. Also, once you get into hierarchical configurations (such as .Net XML style .config files), its behaviour becomes unintuitive as you begin to question whether blocks of settings are merged at the section level, or as individual entries within each section. These little things just add to the burden of any integration/systems testing.
I mentioned configuration variables earlier and they make a big difference during testing. You could specify, say, all your input folders individually as absolute paths, but when they’re related that’s a pain when it comes to environmental changes (see Listing 1 for an example).
[Feeds] SystemX=\\Server\PROD\Imports\SystemX SystemY=\\Server\PROD\Imports\SystemY SystemZ=\\Server\PROD\Imports\SystemZ |
Listing 1 |
One option would be to generate the final configuration files from some sort of template, such as with a tool like SlowCheetah [ SlowCheetah ], which could be done at compile time, package time or deployment time. The source files could then be hierarchical in nature but flattened down to a single deployable file.
When there are many defaults and only a few overrides, following the hierarchical nature right through to the deployment means that it’s easier to see what is overridden because its file only lists exceptions. You can then use variables in the core settings and define them in the list of exceptions (for an example, see Listing 2).
<System.Global.config> [Variables] FeedsRoot=%SharedData%\Imports [Feeds] SystemX=%FeedsRoot%\SystemX SystemY=%FeedsRoot%\SystemY SystemZ=%FeedsRoot%\SystemZ <System.PROD.config> [Variables] SharedData=\\Server\PROD |
Listing 2 |
The set of variables don’t just have to be custom ones, you can also chain onto the underlying environment variables collection so that you can use standard paths such as
%TEMP%
and
%ProgramFiles%
when necessary.
Summary
This article took a look at the differences between configuring a complex system for use in production and the many other environments in which it needs to operate, such as development and testing. We identified a number of patterns that help describe why we might need to configure the system in different ways and formulated a hierarchy that can be used to refine settings in a consistent manner. Finally we looked at how variables can be used to exploit the commonality across settings to further reduce the points of configuration to a bare minimum.
Acknowledgements
Thanks as always goes to the Overload advisors for watching my back.
References
[Oldwood14] ‘Branching Strategies’, Chris Oldwood, Overload 121
[KB] http://support2.microsoft.com/kb/821761
[SlowCheetah] https://www.nuget.org/packages/SlowCheetah/