The complexity of API keys
When it comes to handling API keys, there are classically two situations which are a bit problematic: public sharing of a key and man in the middle. Looking at the approaches by Amazon, Google and other big companies, I find it both overly complex and not very flexible, so I want to suggest a very simple approach to a not so simple problem.
Key distribution
As soon as keys start to be used “outside the box,” outside restricted environment where only limited number of highly trusted persons—system administrators and IT operations staff—can access them, those keys are subject to reuse and misuse. The case presents for every AJAX-based API where the key is simply there, in plain text, viewable by anyone, but also includes situations where the key is shared between multiple applications which, all, access the same API.
The problem is essentially the control of the entities who have (or had in the past) access to the API key. If there is only one key which is shared among the consumers, the authority have no control over it: there is no way to tell who is using the key, and the only way to revoke access it to change the key itself, which leads to a renewal of the key for every legitimate consumer.
The solution is to have a two-parts key. One part is always the same: it is generated by the central authority, or CA (that is the service which is in charge of all keys) to the intermediary authority, or IA. The IA now has a key which, as is, cannot be used publicly. On the other hand, the IA can append to it the second part which is generated by the IA itself (eventually with the help of CA in order to avoid implementation mistakes related to the cryptographically secure pseudo-random number generator). This second part is unique for every consumer, while the first one remains the same.
The magic behind it is that when the consumer is using a key, the CA receives the complete key—both the first and the second part. The CA validates the first one, as the only valuable information for the CA. On the other hand, the second part of the key has two roles:
- Blacklisted (or not whitelisted) keys lead to rejection.
- Other keys are audited so that IA can statistically determine the usage for each key.
The benefit of this approach is that IA, without having the role of handling the keys and granting accesses—this is the role of the CA, have still the possibility to put specific restrictions in order to avoid the key to be misused. The fact that all the keys start with the same first part makes it very easy to identify the source of the keys and to associate them to a specific IA.
Man in the middle
Dealing with keys which are not shared to the public, but kept on servers and accessed only by trusted system administrators is very straightforward when dealing with a single service accessing another one, but starts to be pretty hairy in a context of a micro-services infrastructure where keys could be passed to services which can hardly be trusted.
A basic scenario is this one. Imagine a storage service which has its consumers with, for each consumer, a key. Now, an image sharing service relies on storage service to store images. The image sharing service has its own consumers which should somehow be reflected in storage service: a consumer who stored an image through the image sharing service should also be able to find this image when using the storage service directly.
This situation makes it unpractical for image sharing service to be an actual customer of storage service, since it would force to create a sort of a super-user account with high privileges, such as the ability to set the owner of a file. This sucks in terms of security, and adds tremendous complexity.
On the other hand, passing the actual storage service API keys from the end customer through the image sharing service down to the storage service is not an option either. What if the image sharing service is compromised? What if a disgruntled employee who manages the image sharing service decides to store the API keys which pass through?
The solution is simple: the service in charge of keys validation uses public-private pair of keys, meaning that the API keys which pass through the intermediaries are encrypted.
Now is a good opportunity to introduce the concept of a passive intermediary. A service which acts as a passive intermediary is discharging itself from the verification of the API keys. It doesn't care about the keys: it merely transmits those keys to the underlying service. This doesn't mean that it should be trusted; this simply means that the consumer of this service doesn't have to have a dedicated API key for this service, but may have to have a key for the underlying one. In the example above, the image sharing service would just delegate the checking of the API key to storage service: everyone who can use the storage service can automatically use the image sharing service as well.
This doesn't mean that the passive intermediary cannot rely on the consumer name. It may trust the response from the underlying service in order to know whether the consumer is a legitimate one or not.
There are two benefits of being a passive intermediary.
For the service itself, this means that there would be no interactions with the central authority, no keys to validate and manage. This leads to a simpler code.
For the consumers, this means less keys to manage. If there is a chain of ten interdependent services, managing ten keys is not particularly convenient.
Sometimes, the passive intermediary (or any intermediary in general) has to be authenticated by the underlying service as well. Some other services may accept anonymous intermediaries, since the encrypted API keys of the consumers are already a good protection against abuse and identity theft. It's up to the service to decide whether the additional protection is needed, given that the choice doesn't affect the end consumers, since such authentication happens between two services only.
Overall picture
While it might look that the model described in Key distribution and the one described in Man in the middle are different, they are just two sides of a single system based on three rules:
- Any service may generate its own consumer identification keys, which are recognized by the central authority when appended to the shared key.
- When combined, services may request several keys at a maximum of one per service. Every service is in charge of contacting central authority in order to validate the concerned keys.
- The keys can be passed through other services which may eventually act as passive intermediaries.