Naming conventions for servers, two years later
A few years ago, I wrote an article explaining how do I name servers. Since then, little changed, and conventions used in companies are still insane. Mentality hasn't changed either, and it's still considered professional to have arbitrary, meaningless names nobody could remember.
The benefits of explicit, meaningful host names are, however, high:
An obvious reason is that it reduces errors. I stopped counting the number of times where, using customers' systems, I connected remotely to a wrong server, and noticed my mistake only half an hour later. Those mistakes have a very concrete cost, not only in terms of wasted time, but also in terms of outages which could be caused by operations performed on a wrong machine.
Conventions which make it difficult to differentiate between environments are particularly harmful. And by differentiation, I don't mean having a “p” for production and “s” for staging in a name such as “stxp4508-g2p-lac,” because, frankly, most persons are totally unable to notice one letter differences in total garbage. It should be needless to explain what could be the consequences of running commands on a production server, believing that you're on a development machine.
ssh-ing to wrong machines, which one shouldn't do anyway in a company with a decently mature infrastructure, the errors could also creep in when (1) changing configuration or (2) studying logs. Those are exactly two situations where you absolutely need to avoid any distraction caused by unclear host names, and prevent any mistakes between different machines. Spending an hour wondering why there is a mismatch between the bug report sent by a user and the actual logs, and discovering that you were looking at the logs of a different machine isn't a nice situation.
A more minor reason is that it saves time even when no human errors are made. When names are clear, if I want to target a specific machine, I just do it. When names are cryptic, I need to copy-paste the name from somewhere. If it's a web page or an Excel spreadsheet, this means that I need to open one. This also means that somebody, somewhere, needed to create this page or spreadsheet, and needs to maintain it.
For some conventions, a change of a purpose of the server or a change of ownership could necessitate it to be renamed. Renaming a server isn't an easiest thing to do, given the dependencies (monitoring tools, apps). This usually leads to the decision to postpone the operation, and, years later, the server still has an old name. If the convention is actually used, the situation becomes error prone by itself.
In the case of one customer, the names of machines contained the administrative names of the divisions owning the servers. When a server changed ownership, nobody could afford spending several days cleaning the mess after renaming it, so its original name was preserved. When the server performed a DOS attack on one of the internal services, the team handling this service sent an alert to the team which should have been the owner of the server according to its name. However, this team, not owning this server any longer, couldn't respond, and nobody knew which team was responsible now. It's only a week later than someone from the concerned team noticed something strange happening.
Finally, having a name of the concerned application in the host name makes it possible to identify very easily all the servers which are used by an application by simply
grep-ing the list of servers.
There are, however, a few limitations as well.
While I believe that this approach scales well, I have used it in practice on only a tiny infrastructure: only seventy-five machines, and only one person to control their naming. While using the same approach in a company with thousands of machines doesn't scare me a lot, the human factor could, indeed, be a serious threat. It is technically possible to identify the naming patterns and enforce them, but not for every machine. For instance, my
http-machines or machines which host databases have very uniform names, but some just can't follow the rules. An example is a machine which handles dynamic IP updates and is called
dyn-updater. Another case is the machine named
pxewhich serves as a PXE server to rebuild physical machines. Large companies will have more of those very specific cases, and naming and tracking those servers could easily become a mess.
The change of application name would necessitate to rename a lot of machines. One solution would be to use a codename for a project, and a commercial name for the public. This clearly doesn't work in my case, where most projects are open sourced anyway. Moreover, I believe that this double naming creates unnecessarily complexity for dubious benefits.
This naming convention and Windows are mutually exclusive (should I put that as a benefit, rather than a limitation?) The fifteen characters limit is hit by all but most basic machines. For example, one of the instances hosting this blog,
http-blog-failover, contains eighteen characters, that is three extra characters. I cannot see any way to keep the names sane while squeezing them in the NetBIOS limit. They would necessarily become unreadable and inexpressive.
The longest machine name so far,
http-bookshelf-www-failover, contains twenty-seven characters. This is far from the sixty-three characters limit on Unix, but with the possible addition of domains, this would become
pelicandd-http-bookshelf-www-failoverwhich gives thirty-seven characters. With additional staging environment, this could get us to
pelicandd-http-staging-bookshelf-www-failover, that is forty-five characters. The names of applications should remain short enough to accommodate for potential suffixes or prefixes.
To conclude, if you have an infrastructure managed by a small team where everyone is responsible enough to care about readable server names, and if you use Unix-based machines only, there are few reasons not to use this naming pattern. It makes your life easier, and makes it unnecessarily to have a dedicated mapping system between humanly readable information and cryptic identifiers which are used by mistake as machine names. If you do use Windows, then the schema is unfortunately not for you; but, well, if you do use Windows, you have other, more crucial problems to solve, right?