Requirements Engineering for modern AI systems

Posted by Venkatesh Subramanian on November 05, 2023 · 7 mins read

Modern AI/ML systems are now regularly integrated into full stack software applications. However, some of the traditional SDLC activities like Requirements engineering has to adapt to meet the complexities of such systems.
In AI based systems the criteria of good quality is how well the system performs- the accuracy, precision and recall of the prediction. While customers in traditional applications know what outputs should come from what parts of application, in these AI systems they struggle to understand how to specify the outcome parameters of prediction.
In AI systems, the explainability of how the application works is also not easy to explain even for developers, as this is not decided by source code. Rather this is decided by the training data and the architecture of the neural networks.
This explainability is very important for requirements traceability to enable acceptance by customer, without ambiguity. So requirements engineering needs to elicit explainability needs.
In traditional systems, biases towards specific user segments are detectable through code reviews or runtime assessments. However, in AI systems, such biases are implicit and challenging to trace back to their source. Consequently, it is imperative for requirements engineers to scrutinize the data and models for potential attributes contributing to discriminatory behavior. Ensuring diverse representation in the training data, a non-negotiable requirement, safeguards minorities from being excluded from the advantages of modern AI. And the outliers in data become classes of their own standing, which is desirable from a societal angle.

AI Training Data and Requirements Engineering
Data requirements may in fact play a bigger role than functional requirements of classic world. Training data must be subject to same rigor now as source code was in the traditional systems.
Garbage in is Garbage out - so there needs to be clear requirements around both data quality and quantity. In AI systems leveraging data from diverse sources, alterations in the source schema often disrupt consumer data processing. Consequently, the need for requirements related to reconciling mechanisms between data sources and destinations becomes crucial.
In fact there is an interesting concept called “Data Contracts” which are agreements between the data producers and consumers on the schema, semantics, and distribution policies to ensure frictionless data supply chains. So this has to be captured as part of requirements too.
Data augmentation is usually part of the development lifecycle, and requirements should include how this augmentation needs to happen.
Data in certain domains also tends to have a temporal aspect to even reveal certain facets. For example, if the AI needs to predict a person’s credit score then it would need at least a couple of years transaction data on the individual. So requirements need to validate this aspect based on the use case and domain.
There is also the concern of sensitive data, that may end up when training AI.
Traditional database systems have a “delete” command to remove data that may be sensitive, if it was captured by mistake. However, the ML model has no such “delete”! Once the data is used to train a model then there is no way to edit a model. You have to retrain a new model. So the legality of regulations such as GDPR must be thorougly understood by the requirements gatherer to ensure compliance of these systems. This also means that data lineage becomes a very important requirement to track and ensure that no illegal changes have happened in transit of data in the pipeline.

Models also need to have explicit requirements around observability for things like “drift” where statistical properties of data used to train the model have now changed or data itself has now changed due to adverse climate or a pandemic.
As part of requirements elicitation the engineer must collaborate with Data scientist, data engineers, compliance teams to ensure that the data collection, storage, pipeline, and sensitivity requirements are captured clearly.
The requirements engineer also needs to translate the technical jargon sounding AI/ML measures of performance to a language that Business users can understand.
Now if we add the world of Large Language models (LLMs) into this mix then the job of requirements engineering goes up a few more notches as discussed next.

LLMs and Requirements Engineering
Requirements must capture the nuances of type of natural language queries and context that system must comprehend and respond. Note that the UX is now fluid and dynamic based on natural conversations unlike the traditional user interfaces we have known in past.
Most LLMs will do some adaptation using prompt engineering or fine-tuning of foundation model. Requirements should specify the customization needs, including the domain-specific vocabulary, context, and desired output formats.
Requirements must also take a stand on the balance between creativity and precision of the generated output for various types of LLM interaction - the temperature setting of these model responses.
Given that these are unstructured natural language interactions, there must be requirements on context disambiguation and user clarification for ambiguity in language.
LLMs may process sensitive user data. Requirements must emphasize robust privacy measures, including data encryption, anonymization techniques, and compliance with data protection regulations. User consent mechanisms and transparent data usage policies should also be clearly defined.
LLM-based apps might incorporate multi-modal interactions, integrating text with images, videos, or other forms of media. Requirements should specify how the application processes and responds to multi-modal inputs, ensuring a seamless user experience across different interaction modes.
LLMs are computationally intensive. Requirements should address scalability challenges, outlining the expected user load, response time benchmarks, and strategies for optimizing performance. Scalable infrastructure and efficient model serving mechanisms are vital considerations.
LLM-based apps can benefit from continuous learning based on user interactions. Requirements should support mechanisms for collecting user feedback, updating the model, and adapting to evolving user needs, ensuring the application remains relevant and effective over time.

In the ever-changing landscape of modern AI, data, and systems, requirements engineering stands as the linchpin for innovation and responsible technology deployment. Its rapid evolution is not merely a necessity but a vital imperative, ensuring seamless integration of cutting-edge technologies, ethical practices, and user-centric design.


Subscribe

* indicates required

Intuit Mailchimp