Documenting types used in services

Arseni Mourzenko

Founder and lead developer

179

articles

October 16, 2019

SOA and microservices trends caused the frenzy where everyone started to expose everything as a service. Large and small companies wanted to replace big systems of components with relatively unclear interfaces between them by bigger systems of services with what should have been clear, easy to use interfaces. Unfortunately, for most, the practice was a bit... unexpected.

For developers who have to consume any in-house services, it shouldn't be a surprise that the quality of those services range from poor to terrifying. Made by people who were never taught to create services and who, it seems, never even consumed the ones created by Twilio, Amazon or Google, those monsters are not a huge step forward compared to the old assemblies of components. There is even one particular aspect where they are even more terrible: their level of haziness. The cause: the developers don't think too much about the types being used, because they assume it will work, magically.

For example, imagine that an HTTP service is asking for a number as one of the inputs, or specifying that the output contains a number. Developers didn't make any documentation, since they published Swagger, and in their heads, the presence of Swagger somehow frees the developers from the necessity of documenting stuff. So the Swagger tells that there is a number you should put or that there is a number which would be sent back. Great. Does it tell you:

What is the minimum and maximum value?
What would be the precision, including the maximum precision?
What are the rounding rules being applied, if applicable?
Is the format which uses exponents, such as 2.5e6, supported?
Is there a way to specify that the value doesn't exist?
Is NaN a valid entry?
Is there a support for negative zero?

Those are, however, very basic and simple questions, and depending on the answers, there may be important consequences. Regarding the precision, for instance, it would be useful to know whether 1.234567890123456789 and 1.23456789012345678 or 1.23456789012345679 would be considered as the same number or not, because depending on that, I should be particularly careful when writing the code which uses the service. It is important when I'm using the same language, but it becomes even more important when I'm interacting through a text-based protocol with a service which may be developed in a very different language than my consumer application, and which could follow slightly different rules regarding the representation of a number.

Or take strings. Wouldn't it be useful to know:

The maximum allowed length?
The encoding?
The list of denied characters, if any?
The possibility to use \x00 inside the string?
The representation of the absence of a value?
The consequences of starting or ending the string by a space?

When developers don't think about those details, the services and applications they create are fragile. In fact, documenting types allows both the producer of a service and its consumers to think about the edge cases that should be tested. If those cases are not documented, it often indicates that nobody thought about them in the first place when creating a service, and a string a bit too long or a NaN as input could easily break it. When consumers accept using those services, it often means they didn't test those cases either, and are hoping for the best—not the best approach for making reliable software.

Why is it so? Essentially, for two reasons:

The unwillingness of developers to think about the edge cases. Understandable, although it's what they are paid for, isn't it?
The incompetence of the developers. In fact, most simply don't know the language well enough to understand the caveats of a given type and the mechanisms in play when receiving inputs or serializing the response. Unfortunately for them, if they actively avoid thinking about the edge cases, they will never learn the specifics of the language they use in the first place.