Home Home

Simplifying systems by adding proper abstractions

Arseni Mourzenko
Founder and lead developer, specializing in developer productivity and code quality
130
articles
April 25, 2015

The original draft of Solange project defined a notion of profile and instance. According to the today's documentation (revision 1234):

A profile is a generic description of a machine, i.e. a set of configuration items and operations which define a precise type of machines. For example, a profile may define a proxy server—any proxy server. Another profile may define a DHCP server.

while:

An instance is a configuration of a specific machine. Not any machine, but a precise, IP-bound, machine on a network. For example, DHCP failover 1 and DHCP failover 2 are two different machines, and so they correspond to two instances, but they share the same profile.

The difference was so essential, that even the project itself had two directories: one for different profiles with settings.json file, initialization script and other profile-related files, another one for instances with a structurally different settings.json file, different initialization script and other instance-related files.

Any machine had necessarily one instance which declared a corresponding profile.

As it happens with any rigid structure, problems started to appear, the major issue being code duplication. Surprisingly, I didn't have any difficulty to determine whether I should use instance or profile for a specific file, initialization command or configuration option—I would expect to have problematic cases more than once, but I didn't. On the other hand, the fact that there is only one instance and only one profile was really annoying. Practically identical profiles ended up copy-pasted, which was obviously terrible for later maintenance.

So it was decided to implement a hierarchy—this is somehow Trust project was born. Instances would remain the same, but profiles would have parent-child relationship, given that a profile can have zero to one parent, but not two or more. If done, this would partially solving the problem of code duplication, but could also make it non-intuitive to work with.

In parallel came the idea of suppressing the distinction between profiles and instances. Every instance will simply inherit from a parent, or remain standalone, providing every needed piece of info. This simplified the structure a bit, but still, given my hatred of tree structures, it couldn't possibly be that I would implement such structure in a core of the most important project of my life.

This is where Trust came handy, bringing two strong points:

  • Being a hierarchical database, Trust doesn't enforce any tree structure: while the data is inherently tree-based, nodes can reference other nodes located anywhere in Trust, making it not a tree, but a cross-referenced set of elements.

  • More importantly, Trust provides a common access to the hierarchical data, without showing how is the data actually stored. This is both disruptive compared to file-based JSON extractor used originally by Solange, and helpful in terms of proper abstraction which should be set: Solange should care about the final result, that is the JSON document it needs to build a virtual machine, but shouldn't care about the way data is stored. Would it be a single huge JSON file describing the whole infrastructure in a tree-based way, or multiple JSON files in multiple directories with cross-referencing everywhere, Solange will still get the consistent results from Trust.

What's interesting about this is that by setting up proper abstractions, one can simplify a system substantially. With profiles and instances, Solange was difficult to explain to people who never used it before, and was frankly not that simpler compared to Chef and Puppet—it had its proper terminology, its proper way to structure data—things which are not necessarily needed, but are still here for the sake of complexity. With the abstraction brought by Trust, Solange becomes much more elegant and easy to understand. It also becomes more flexible: a small company with a few dozen of virtual machines and a single system administrator won't use the same way of organizing data as a company with hundreds of thousands of virtual machines maintained by several departments. And if this is not enough, the change will also make Solange's source code shorter and easier.