Data-Engineering

Unit Testing : Software Development

Content

Introduction
Unit testing
Reference

Introduction

Unit testing is a software testing technique where individual units or components of a software application are tested in isolation. The main purpose is to validate that each unit of the software performs as designed. In data science and analytics, unit testing can be applied to functions, modules, or algorithms to ensure they produce the expected output for a given set of inputs. It helps identify and fix bugs early in the development process, improving the overall reliability and maintainability of the code.

Reference: For more details, please checkout my github repo: Unit testing.
Note: For the difference between the unittest and pytest, please visit the repo and the readme file in Unittesting and pytest differences.

Unit testing vs integration testing

While unit testing ensures that all units of code properly work independently, integration testing ensures that they work together. Integration tests focus on real-life use cases. They often rely on external data such as databases or web servers. A unit test, on the other hand, only needs data that is created exclusively for the test. It is therefore much easier to implement.

	Unit testing	integration testing
Scope	Focuses on testing individual units or components of a software application in isolation. Units can be functions, methods, or small modules	Involves testing the interactions and interfaces between multiple components or systems. Ensures that integrated components work together as expected.
Isolation	Performed in isolation from the rest of the application. Dependencies are often replaced with mocks or stubs to focus on the specific unit being tested.	Requires multiple components to be integrated and tested as a group. Tests how well different components work together.
Purpose	Identifies and fixes bugs in individual units early in the development process. Helps maintain code quality and allows for easier refactoring.	Verifies that integrated components function correctly as a whole. Detects issues arising from the interactions between components.
Execution Time	Typically faster to execute as it involves testing smaller units of code.	May take longer to execute due to the need to integrate and test larger portions of the application.
Dependencies	Dependencies are often mocked or stubbed to isolate the unit under test.	Requires actual dependencies to be present for testing the integrated system.
Automation	Easily automated, and it's common to run unit tests frequently during development.	Automation is also common but may involve more complex setups and configurations.
Debugging	Easier to pinpoint issues to specific units or components.	May require more effort to identify the specific cause of failures due to interactions between components.

Advantages of Unit testing

Here is a non-exhaustive list of the advantages of unit testing that make it a vital asset in the toolbox of a good programmer:

Time saving: Some very basic errors can become quite difficult to identify during the integration testing phase, due to the many layers of code that accumulate. However, these errors can be detected very simply, very quickly and very early in the building of the code thanks to unit tests.
Fluidification of code changes: If you wish to bring in a modification to your code (e.g. change the regression method), it becomes very easy to verify that the function still works as expected by performing the unit test of this function.
Improved code quality: A good approach to coding is to code unit tests before you code the units themselves. This compels one to think about all the contingencies that the unit might face. Thinking about how to code the unit renders the unit simpler and more robust later on. This approach is known as test driven development (TDD).
Aid to the understanding of the code: Unit tests are also used by developers as explanatory documentation of each part of the code. In fact, it is very easy to understand the expected behaviour of a function by reading the associated unit test beforehand.

Weakness of the Unit testing

However, it is impossible to test the infinite variety of contingencies that the unit might face. Passing the unit test without a hitch is therefore not a total guarantee of correct operation.
Unit tests cannot, however, by construction, test the interaction between units.

Automated testing

What is automated testing? : To automate unit tests, there are frameworks that will greatly facilitate the task. The developer must set the criteria for the tests he/she wishes to perform, and then the framework takes care of performing the tests automatically and providing detailed error reports.
- unitest: The basic framework for automated testing on python is unittest. It is a built-in testing framework in python. It follows the xUnit style and provides classes and methods for creating and running tests.
- pytest: pytest is another popular third party testing framework. It normaly simplifues test discovery and execution. It's known for its concise syntex and powerful features.
- nose: Similarly nose is another third party testing framewrok that extends unittest and provides additional functionality. It's particularly useful for test discovery and running tests in parallel.

Example-1

Create calculator.py file. Here we will have add function.


            def add(a,b):
            return a+b

Create a test_calculator.py file and add the code:


              import pytest
              from calculator import add

              def test_add_positive_numbers():
                  result = add(2, 3)
                  assert result == 5

              def test_add_negative_numbers():
                  result = add(-2, -3)
                  assert result == -5

              def test_add_mixed_numbers():
                  result = add(1, -5)
                  assert result == -4

This file contains test functions prefixed with test_. The pytest framework for assertions ('assert') instead of the built-in unittest assertions.

Now run the tests using:
```
pytest test_calculator.py
```
Pytest will discover and run the tests, providing detailed output.

If we change one of the test, it will show an error and hence we will know that which test don't satisfy. The green dot turned into a red because one of the checks in the test was not carried out. The error returned is an assertion error and Pytest even tells us exactly where it is. When we change result: result = add(2,3) ==6, it will give error:

Example-2: Let's create a class 'Wallet' that has a method for adding money add_cash and a method for withdrawing money spend_cash. In a wallet.py file, create a Wallet class that:

Accepts an initial contribution of money and stores it in the balance attribute (= 0 if the initial contribution is not specified)
Has a method for adding money add_cash.
Has a method for withdrawing money spend_cash. This method first checks that the balance is sufficient and returns an InsufficientAmount exception if it is not.

In another python file wallet_test.py we will now write our unit tests. To do this we need to import the functions we want to test as well as the pytest module (to test the InsufficientAmount exception).
Solution: The wallet.py file then contains:


          class Wallet(object):

          def __init__(self, initial_amount=0):
              self.balance = initial_amount
      
          def spend_cash(self, amount):
              if self.balance < amount:
                  raise InsufficientAmount('Not enough available to spend {}'.format(amount))
              self.balance -= amount
      
          def add_cash(self, amount):
              self.balance += amount
      
          class InsufficientAmount(Exception):
              pass

a newly created wallet has a balance of 0 by default.
a newly created wallet with an initial balance of 100 has a balance of 100.
a wallet created with an initial balance of 10 to which 90 is added has a balance of 100.
a wallet created with an initial balance of 20 from which 10 is removed has a balance of 10.
a wallet that tries to spend more than its balance will cause an InsufficientAmount error message.

the wallet_test.py contains:


          from wallet import Wallet, InsufficientAmount
          import pytest
          def test_default_initial_amount():
              wallet = Wallet()
              assert wallet.balance == 0
      
          def test_setting_initial_amount():
              wallet = Wallet(100)
              assert wallet.balance == 100
          
          def test_wallet_add_cash():
              wallet = Wallet(10)
              wallet.add_cash(90)
              assert wallet.balance == 100
          
          def test_wallet_spend_cash():
              wallet = Wallet(20)
              wallet.spend_cash(10)
              assert wallet.balance == 10
          
          def test_wallet_spend_cash_raises_exception_on_insufficient_amount():
              wallet = Wallet()
              with pytest.raises(InsufficientAmount):
                  wallet.spend_cash(100)

Example-4: We can calculate the sum of the two number in following three ways:

Without Parametrization or Fixture
Only With Parameterization
Only With Fixture

The three methods are used for different purpose.

Without Any Parameterization or Fixture: In this approach, each test case is written as an individual function, and there is no use of parameterization or fixtures. Each test explicitly defines its input values and expected outcomes. This approach is straightforward and may be suitable for simpler test cases or scenarios where explicitness is preferred. However, it can lead to code duplication if many test cases share a similar structure.


              # test_example.py

              def add(a, b):
                  return a + b
              
              def test_addition_case1():
                  result = add(1, 2)
                  assert result == 3
              
              def test_addition_case2():
                  result = add(0, 0)
                  assert result == 0
              
              def test_addition_case3():
                  result = add(-1, 1)
                  assert result == 0
              
              def test_addition_case4():
                  result = add(10, -5)
                  assert result == 5

Only With Fixture: In this approach, a fixture is used to provide parameterized input data for a test function. The @pytest.fixture decorator defines a fixture that can be reused across multiple test functions. This enhances code modularity and reusability by separating the test setup from the test logic.


              import pytest

              @pytest.fixture(params=[
                  (1, 2, 3),
                  (0, 0, 0),
                  (-1, 1, 0),
                  (10, -5, 5),
              ])
              def input_data(request):
                  return request.param
              
              def test_addition(input_data):
                  input_a, input_b, expected_output = input_data
                  result = add(input_a, input_b)
                  assert result == expected_output

Only With Parameterization: In this approach, the @pytest.mark.parametrize decorator is used to parametrize a single test function. The decorator allows you to run the same test function with different sets of input parameters. This results in more concise code, especially when dealing with similar test cases, and helps reduce code duplication.


              import pytest

              @pytest.mark.parametrize("input_a, input_b, expected_output", [
                  (1, 2, 3),
                  (0, 0, 0),
                  (-1, 1, 0),
                  (10, -5, 5),
              ])
              def test_addition(input_a, input_b, expected_output):
                  result = add(input_a, input_b)
                  assert result == expected_output

Side effects:

Unit tests can have both positive and negative side effects, depending on how they are implemented and maintained. Here are some of the common side effects:

Positive Side Effects:
- Improved Code Quality:
  - Writing unit tests encourages developers to create modular and well-structured code.
  - Forces developers to think about how to make functions and modules easily testable.
- Early Bug Detection:
  - Identifying and fixing bugs early in the development process is one of the primary benefits of unit testing.
  - Helps catch issues before they escalate into larger problems.
- Regression Prevention:
  - Unit tests act as a safety net to ensure that new code changes do not break existing functionality (regression testing).
- Documentation:
  - Unit tests serve as living documentation, showcasing the expected behavior of the code.
  - New developers can use tests to understand the intended functionality of different parts of the codebase.
- Facilitates Refactoring:
  - Developers can confidently refactor code, knowing that if the tests pass, the changes haven't introduced regressions.
- Supports Continuous Integration:
  - Unit tests are crucial for setting up continuous integration (CI) pipelines, ensuring that tests are automatically run whenever changes are made.
Negative Side Effects:
- Time-Consuming:
  - Writing and maintaining unit tests can be time-consuming, especially for complex systems.
  - Balancing test coverage with development speed is crucial.
- False Sense of Security:
  - High test coverage does not guarantee bug-free code. It's possible to have well-tested code that still has logical errors or edge cases that are not covered.
- Maintenance Overhead:
  - As the codebase evolves, unit tests may need to be updated or rewritten to accommodate changes.
  - Frequent changes in requirements might result in a constant need for test updates.
- Overemphasis on Code Coverage:
  - Focusing solely on achieving high code coverage may lead to tests that don't adequately cover critical scenarios.
  - Quality of tests is more important than sheer quantity.
- Dependency on Implementation Details:
  - Tests that are too tightly coupled with the implementation details of the code may become fragile and break easily with minor changes.
- Resistance to Change:
  - In some cases, developers may resist making changes to the code due to concerns about breaking existing tests.

If your units contain a lot of side effects this can be problematic for unit testing. Unit testing is a good way to ensure that your units respect the Single Responsibility Principle.

Error messages

Error messages and their handling can vary based on the nature of the functions being tested. Here are some common types of error messages and how they might be handled in unit tests:

ValueError invalid Input:


            def divide(a, b):
            if b == 0:
                raise ValueError("Cannot divide by zero.")
            return a / b

Handling:


            def test_divide():
            assert divide(10, 2) == 5
            try:
                divide(5, 0)
            except ValueError as e:
                assert str(e) == "Cannot divide by zero."

TypeError: Incorrect Argument Type:


              def calculate_square(n):
              if not isinstance(n, (int, float)):
                  raise TypeError("Input must be a number.")
              return n ** 2


              def test_calculate_square():
              assert calculate_square(3) == 9
              try:
                  calculate_square("four")
              except TypeError as e:
                  assert str(e) == "Input must be a number."

AssertionError: Unexpected Output:


              def add(a, b):
              return a * b  # Incorrect implementation

Handling:


              def test_add():
              assert add(2, 3) == 5
              try:
                  assert add(2, 3) == 6
              except AssertionError as e:
                  assert str(e) == "assert 6 == 5"

IndexError: Out of Range:


              def get_element_by_index(lst, index):
              if index < 0 or index >= len(lst):
                  raise IndexError("Index out of range.")
              return lst[index]

Handling:


              def test_get_element_by_index():
              my_list = [1, 2, 3]
              assert get_element_by_index(my_list, 1) == 2
              try:
                  get_element_by_index(my_list, 5)
              except IndexError as e:
                  assert str(e) == "Index out of range."

CustomError: Specific Application Error:


              class CustomError(Exception):
              pass
          
              def custom_function():
                  raise CustomError("Something went wrong.")

Handling:


              def test_custom_function():
              try:
                  custom_function()
              except CustomError as e:
                  assert str(e) == "Something went wrong."

References

Unit testing

Some other interesting things to know:

Visit my website on For Data, Big Data, Data-modeling, Datawarehouse, SQL, cloud-compute.
Visit my website on Data engineering