Created 2024/11/10 at 03:05PM

Last Modified 2024/12/02 at 05:21AM

Testing is one of the cornerstones of software development, but what happens when we take it too far?

We'll explore the potential pitfalls of over-engineering test designs, all in the name of trying to make tests as flexible and extensible as possible. By the end of this post, you’ll understand why sometimes, less is more.

# utils.py
class CubeFn:
    def solve(self, x):
        return x ** 3


class SquareFn:
    def solve(self, x):
        return x ** 2


class CircleAreaFn:
    def solve(self, r):
        return 3.14 * r * r


class BoxAreaFn:
    def solve(self, h, w):
        return h * w


class TrapezoidAreaFn:
    def solve(self, a, b, h):
        return (a + b) * h / 2
# test_utils.py
from typing import Tuple

from utils import *


def idfunc(val):
        if isinstance(val, (list, tuple)):
            return f"({','.join(map(str, val))})"
        return f"{val}"


class LinearFnTest:
    DATA_POINTS: Tuple[float, float] | None = None

    def pytest_generate_tests(self, metafunc):
        metafunc.parametrize("input_, output_", self.DATA_POINTS, ids=idfunc)

    def solve(self, x):
        raise NotImplementedError

    def test_func(self, input_, output_):
        assert self.solve(input_) == output_


class TestCube(CubeFn, LinearFnTest):
    DATA_POINTS = (
        (2, 8),
    )


class TestSquare(SquareFn, LinearFnTest):
    DATA_POINTS = (
        (2, 4), 
        (3,9),
    )


class AreaFnTest:
    DATA_POINTS: Tuple[Tuple[float, ...], float] | None = None

    def pytest_generate_tests(self, metafunc):
        metafunc.parametrize("input_, output_", self.DATA_POINTS, ids=idfunc)

    def solve(self, *args):
        raise NotImplementedError

    def test_func(self, input_, output_):
        assert self.solve(*input_) == output_


class TestCircleArea(CircleAreaFn, AreaFnTest):
    DATA_POINTS = (
        ((10,), 314),
    )


class TestBoxArea(BoxAreaFn, AreaFnTest):
    DATA_POINTS = (
        ((3, 5), 15), 
        ((10, 10), 100),
    )


class TestTrapezoidArea(TrapezoidAreaFn, AreaFnTest):
    DATA_POINTS = (
        ((4, 5, 10), 45),
    )

Observations

  1. Both LinearFnTest and AreaFnTest are base test classes designed to parameterize tests for different mathematical functions. They define the pytest_generate_tests method, which is responsible for setting up the parameters for the tests.

  2. In both LinearFnTest and AreaFnTest, there's a solve method that is defined but not implemented. The solve method is meant to be implemented by the subclasses (such as CubeFn, SquareFn, CircleAreaFn, etc.). In each of these subclasses, solve provides the actual logic that performs the computation.

  3. For each test, the pytest_generate_tests method will create a separate test case for each set of input/output values in the DATA_POINTS tuple.

$ pytest -q
.......                                                                                                                                                 [100%]
7 passed in 0.03s

Perfect! The system is extensible: you can easily add (or can you?) more tests for other mathematical functions by defining new classes and implementing the corresponding logic in the solve methods.

So, what's wrong with it?

Everything. This looks neat and extensible, but it will almost always shoot you in your foot if you try something like this in a large scale code base.

  1. There's a very deep flaw in having inheritance amongst tests. The abstraction in the test (e.g., LinearFnTest, AreaFnTest) assumes a commonality between tests that might not always hold true. Over time, as new requirements or corner cases arise, this misalignment increases. When you need to unwind it or slightly alter the base behaviour you end up in all sorts of sadness. Debugging becomes harder as it’s unclear if the issue lies in the test data, the base logic, or the test implementation itself.

  2. Tests become so abstracted that they lose their expressiveness. The intent of a specific test is buried under layers of abstraction, making it hard to figure out what exactly is being tested and why.

  3. Tests become harder to adapt to evolving requirements. As new features or edge cases emerge, extending the parameterized structure becomes cumbersome. Adding a new data point may require changing both the test logic and the shared infrastructure, increasing the risk of regressions in unrelated tests.

  4. Tests parameterized in a highly generalized way lack descriptive identifiers. When a test fails, the output may only show raw data (input_, output_), requiring additional effort to understand what the case represents in real-world terms.