Scientific mental models for Data & AI engineers

Posted by Venkatesh Subramanian on March 03, 2024 · 4 mins read

In this post I will take one idea from each of the sciences of Physics, Chemistry, Biology, and Mathematics - relating the same to the work of Software engineering, including Data and AI.

While software engineers create models of intelligent behavior and build full stack applications, it will also help to look at multiple models from pure science and relate it to their solutioning activities.

  1. Physics - Uncertainty Principle:

    In 1927 the German physicist Werner Heisenerg proposed the ‘Uncertainty principle’ which states that we cannot know both the position and momentum of a particle, such as photon or electron, with perfect accuracy. The more we nail down the position, the less we know about the speed and vice-versa.

    In Data engineering and AI there is a similar tradeoff between precision and uncertainty. You can make models more complex with several layers of non-linear activation functions to squeeze more precision- however this will also decrease the explainability of how the model is producing the results. So Data engineers need to strike a balance between model complexity and interpretability based on their use case, model performance, explainability, and compute resources.

  2. Chemistry - Activation Energy:

    Activation is the minimum amount of energy required to start a chemical reaction. In these reactions often there is a catalyst that can decrease the amount of activation energy typically needed, thereby accelerating the chemical reaction.

    In Data and AI work, the debugging and optimization tasks have a parallel situation. Engineers often face inertia when starting a task or resolving a bug or even do trials with multiple model hyperparameters. Once they overcome the initial barrier and dive into the problem, progress becomes much easier. Automated Machine learning such as AutoML can act as catalysts to lower this activation energy for AI engineers by automating end-to-end process of building Machine learning models.

  3. Biology - Evolutionary Adaptation:

    Evolutionary adaptation is the process by which organisms evolve traits that increase their fitness in a particular environment over successive generations.

    In Data and AI engineering, algorithms can be thought of as evolving and adapting over time. Just as organisms evolve to survive in changing environments, AI algorithms can be subjected to evolutionary strategies or meta-learning techniques, allowing them to adapt to different datasets or environments. This concept is particularly relevant in areas like reinforcement learning, where agents learn through trial and error, adapting their strategies based on feedback from the environment or from humans in loop.

  4. Mathematics- Graph Theory:

    Graph theory deals with the study of graphs, which are mathematical structures representing pairwise relationships between objects.

    Graph theory has numerous applications in modern data engineering and software development. For instance, in data engineering, understanding the relationships between different data entities (nodes) and their connections (edges) can help in designing efficient database schemas, optimizing data querying, and building recommendation systems. Similarly, in software development, graph algorithms can be used for tasks like dependency resolution, social network analysis, and routing optimization in networks.

From these examples it is clear that many of the models from Physics, Chemistry, Biology, and Mathematics give a good lens to relate to modern Data and AI software engineering.

More important is the scientific temperament of hypothesis driven work. Scientists formulate hypothesis to guide the experiments. Similarly, Data and AI engineers can develop systemic or architecture hypothesis to address system requirements and constraints. Then test their solutions using concepts such as A/B testing to validate model selection, design decisions, uncover tradeoffs, and iteratively update the solution to align with project goals.


* indicates required

Intuit Mailchimp