Home Home

Workflows, ETLs, and pure magic

Arseni Mourzenko
Founder and lead developer, specializing in developer productivity and code quality
150
articles
December 11, 2020

Five years ago, I wrote a rather opinionated and very critical article about Nintex Workflows, a perfectly useless product which makes your life miserable while you're paying for it. Nintex is sold as a solution to allow non-technical persons to change business logic in a given context; however, it is so poorly designed and has a so convoluted way to show the logic and make it possible to modify it, that it not only prevents non-technical persons to deal with business logic, but does that as well for IT professionals as well.

A bit of context. Back then, I found myself at the head of a team who had to deal with the following situation: a customer had a workflow so complex that nobody could understand it. Not that the logic behind it was complex—it was rather simple and straightforward—but Nintex Workflows made it completely impossible to understand. The workflow occasionally and randomly crashed (without reporting any error), and my team was asked to solve that. Obviously, nobody had a freaking idea how to approach this beast.

In the article back then, I gave a daring suggestion: use Python instead. More specifically, train non-technical persons to write code in Python, create an API around it, connect it to an MQS, and enjoy.

Since then, two things happened.

First and foremost, I met a manager who did exactly that. He has a team with a few programmers and a few business analysts, who do not consider themselves as programmers, many of them with absolutely no prior experience in anything related to programming or scripting—if such distinction could be considered relevant here. Those business analysts work on data processing, and instead of using an ETL, they actually code the thing themselves in Python, while relying on the API created by the developers. I interacted a lot with the manager and his team, and could gather enough information to provide some more details about the approach, the benefits and the drawbacks.

Second, about half a dozen persons reacted to my early article, some wondering if it's really possible to train a non-technical person to use Python, others asking how to deal with the mess that a non-technical person who doesn't necessarily know the rules of clean code will create over time. My recent experience could help me answering those two questions.

Note that while the context of the team I was talking about is a bit different—they are extracting, transforming, and loading data from and to the specific data sources, and not handling the business workflows—the idea is exactly the same. As for business rules, there are existent workflow applications, there is too a market for ETLs as well. In both cases, there are situations where an existent tool can be valuable—in terms of workflows, I'm obviously talking about things such as Windows Workflow Foundation, not Nintex Workflows, which is perfectly and completely useless—and there are cases, where Python can provide much more value.

For instance, imagine an application which handles the administrative part of hiring an employee. Depending on the circumstances, the company may decide to change the workflow, and if the software is not flexible enough, this may not be possible. However, if the software product involves user-manageable workflows, it adds tremendous flexibility to the users, and so the value to the product itself.

Similarly, a team may be aggregating a specific type of data from different data sets to put it into the EDW, while keeping an eye on the regular changes in the structure of the incoming data. When a source changes its structure, or another source should be added, it should be relatively easy with an ETL to just remap the fields, or add new mappings.

Those are the cases where ETLs or workflow tools (except Nintex!) can really provide value: a narrow field with strict boundaries.

Now, imagine you need to do some complex data processing over the data sets coming from different data sources, REST APIs, and SOAP services, while caching the requests and also saving a bunch of CSV files for historical use. Here, an ETL can quickly become a limitation and a burden, rather than a tool which empowers the team.

So comes Python. In 2015, I wrote:

It's extremely simple. Take a message queue service (MQS), an API and a Python script. Yes, that's all you need. The Python script interacts with both the API and the MQS.

In retrospective, I made a small mistake: it's not an API which is needed, but a framework. This was exactly my approach when developing Grape: the framework which handles all the complexity of the communication through an MQS, all the idiosyncrasies of specific hardware devices, hides the complexity, and leaves the user with the ability to define very easily (at least, without bothering about such exciting things as the subtleties of the binary communication protocol used to communicate with Arduino devices) the business rules of the orchestrator: “Ye shall trigger an alarm when an intruder comes into my flat,” or “Make the red LED flash until the build is fixed,” or “Start the fan if the room temperature reaches a threshold, defined as a complex algorithm based on the outside weather and the interior humidity.”

In fact, writing a Python script is difficult. It is for a programmer, therefore it is as well for a non-programmer. One needs to know what stdout is, and how much memory is used, and what happens if a file was removed at the exact moment the script started to read it, or if an infinite loop suddenly terminates (every time I do this joke, I lose half of my readers; come on, it's a funny one!) When, on the other hand, a script is surrounded by a framework which handles logging and streaming and memory management and faults and hundreds of bad things which can happen out there in the wild, it becomes much easier to focus on the actual business need and write a simple piece of code which corresponds to it. This, essentially, solves a major part of the difficulties that non-technical users encounter when they start using Python. The experience here is comparable to when IT specialists start to learn a new tech. You really want to learn interesting stuff, but instead, you're stuck with this stupid error telling that one of the configuration files cannot be parsed, and that sucks, because you don't want to spend the next few hours dealing with this low-level stuff.

This concept of a framework is, in a way, similar to AWS Lambda. You don't want to deal with infrastructure and versions of Linux and dependencies and firewalls and disk spaces. All you want is to receive input, be able to process it in a way you want, and create the output. Python's framework for non-programmers should look exactly the same (heck, it could be based on AWS Lambda!), with the script taking parameters in, flushing output to stdout, and not being concerned about its environment. Once those environment considerations are away, non-technical persons can do a great job.

Now, obviously, there is no framework which could force someone to write clean code. There are however two ways to avoid the mess.

First, the framework itself can provide useful information about the script it runs. For instance, it can highlight parts of the script which take a while. I don't expect a non-technical person to be able to use a profiler—at least not if presented in a way so familiar to developers—but something visual which shows that, well, a 15 seconds script spent 14.5 seconds on line 67, may be largely enough to lead the person to the right direction. And I don't expect a non-technical person to be able to explain the difference between a smoke test and a system test, but I'm pretty sure anyone can, when invited by a framework, map some expected outputs to a series of inputs and see if the script matches the expectations.

Second, there should be an ability for the non-technical persons to ask a programmer to help them—this is actually the first thing I suggested to the manager when I learned that non-technical persons write code in his team. This help may take different forms: one may review the pull requests, or perform training sessions, or may do occasional pair programming, or just give a hand with a particularly daunting task. The fact is, it actually saves money for the company. Recently, for instance, one programmer on the team helped a business analyst with a script which took too long to execute. As a result, a several hours script runs now in just a few minutes, thanks to a few changes in the structure of the code—a game changer for everyone.

Languages such as Python are designed in a way that it makes it very difficult to write bad code—badly formatted, badly written, misleading. Coupled with some help from a programmer, and a solid framework which provides useful insight as to how the script runs and what it really does, those languages can be a very valuable tool which empowers the users when it comes to solving a problem which would be too complex for an ETL or a workflow tool. Use them, they are great.