Technical Debt in AI systems

Posted by Venkatesh Subramanian on March 22, 2024 · 7 mins read

Let’s start with definition of Technical debt and few examples from Software engineering.
You may write code that is clever and yet hard to understand by another programmer. In this case, the code is doing its job, however there is a cost when another programmer has to comprehend and extend the same.

In today’s world of cloud migrations, it could be that on-prem systems are running fine and yet you know that they may lack dynamic scaling- so sooner or later cloud is the way to go. However, code has to be refactored to follow certain patterns such as stateless design if it needs to be migrated to cloud. The delay in making these refactorings introduces a cost. Or it could be unit testing coverage that needs to be improved, and yet the pressure of adding new features always takes a precedence.

In all these examples, you can think of the additional effort needed to resolve the requirement goes up over time and often compounds, akin to paying interest on a financial debt. So Architects in traditional software engineering always keep a dedicated backlog of improvements to build that will ensure its long-term sustainability - though not visible to end-users.

Now, how does technical debt change with respect to AI?

Well all the standard software engineering issues still remain as AI is part of the system architecture stack. In addition some additional aspects need to be considered, as follows.

AI models depend on large amounts of data for training. As the data landscape changes, models become less effective and even biased due to the nature of the data. So the models need to adapt to the new data distributions and tested for fairness or inclusivity. This is the data dependency technical debt.

Algorithms, techniques, and frameworks to train AI systems are also evolving rapidly. The delay in upgrading to these techniques or platforms introduces the algorithm update technical debt.

In traditional code, adding comments and encapsulating logic in modular fashion is recommended for decreasing the debt. However, this breaks down in AI systems as models are built using complex architectures and often non-linear activation functions. So growing model complexity and its black box nature can increase the technical debt, and a move towards interpretable or explainable AI may be a way to reduce this debt.

For example, LLMs can explain in verbose mode how it is choosing agents, retrieving context, and then generating content. Likewise, a predictive AI model can explain which features had more weightage when taking certain decisions.

Silicon innovations such as the Graviton, Trainium, Inferentia processors from AWS or Google Ampere aimed to improve the performance per watt of systems is steadily growing. Many of these will incur some extra effort such as code changes or cost to migrate- however if not done early it will lead to more resource use and larger size of systems to migrate at a later point.

With Gen AI, now there is a plethora of content on web that is machine generated. So as we train new models and these in turn use machine generated text and then generate more outputs. This can result in an echo chamber where errors cascade and multiply if not verified and controlled in a timely manner. There are also many tools coming up to detect if a particular content was generated by another model or written by a human expert. Delay in using such tools will lead to model error cascade debt.

ML systems generating outputs may need to track who is consuming this data - systems, other ML models, and human users- and may need to have access mechanisms, tracking, and policies around this. If the consumers grow without tracking and later something changes in model outputs then it can have a real compound effect on downstream within your organization or even with external customers.

A lot of AI systems start off prematurely without a solid data foundation. In this case developers slap together ETL glue code for every stream of data that they may be using. And overtime this ETL glue becomes a spaghetti - good for PoCs and demo yet causing a lot of maintenance issues in production.

AI systems also start off with lots of iterative experiments to decide the right combination of data features to use for training, choice of prompts in case of Generative AI, and level of fine-tuning of models to make it better over time. The experiment code in Jupyter notebooks written by data scientists often get moved to production systems with the intent of speeding up release- however it introduces a huge technical debt if the AI scientist code is not refactored as part of AI engineering and MLOps grade quality.

Similarly, there could be a lot of configuration parameters for AI systems that could go all over the place and introduce debt if not maintained in a central configuration system or a JSON file.

Another type of debt is caused due to the mushrooming of several tools, platforms, and startups in this space in a compressed timeframe. Few years from now, many of these may not exist and the AI systems built on these will have to be refactored.

There are also regulations such as the EU AI act to ensure compliance and governance of AI systems. Although its early days of the act, developers who delay understanding these regulations and updating their AI systems will amass a lot of interest to pay later - including baking the solutions right and even paying fines for non-compliance.

AI tools such as GitHub co-pilot are now able to generate code based on prompts. However, some of this code can be difficult to understand and debug if things don’t work. Since these models have been trained on code written by humans, the issues will inevitably show up. Hence, indiscriminately generating code and pushing to high velocity CI/CD will be very risky as it will add unprecedented technical debt in the systems.

Finally it is important to see that AI is part of the full systems stack, and as it gets integrated it creates dependencies. So the role of systems architect will become highly valuable. Code reviews, integration reviews, data governance, responsible AI, robustness testing of Gen AI systems and Architecture quality assurance will go a long way in managing the technical debt, and delivering sustainable, maintainable AI rich software systems.


* indicates required

Intuit Mailchimp