Introduction
Unit testing is a software testing technique where individual units or components of a software application are tested in isolation. The main purpose is to validate that each unit of the software performs as designed. In data science and analytics, unit testing can be applied to functions, modules, or algorithms to ensure they produce the expected output for a given set of inputs. It helps identify and fix bugs early in the development process, improving the overall reliability and maintainability of the code.- Reference: For more details, please checkout my github repo: Unit testing.
- Note:
For the difference between the
unittest
andpytest
, please visit the repo and the readme file in Unittesting and pytest differences.
Unit testing vs integration testing
While unit testing ensures that all units of code properly work independently, integration testing ensures that they work together. Integration tests focus on real-life use cases. They often rely on external data such as databases or web servers. A unit test, on the other hand, only needs data that is created exclusively for the test. It is therefore much easier to implement.Unit testing | integration testing | |
---|---|---|
Scope |
|
|
Isolation |
|
|
Purpose |
|
|
Execution Time | Typically faster to execute as it involves testing smaller units of code. | May take longer to execute due to the need to integrate and test larger portions of the application. |
Dependencies | Dependencies are often mocked or stubbed to isolate the unit under test. | Requires actual dependencies to be present for testing the integrated system. |
Automation | Easily automated, and it's common to run unit tests frequently during development. | Automation is also common but may involve more complex setups and configurations. |
Debugging | Easier to pinpoint issues to specific units or components. | May require more effort to identify the specific cause of failures due to interactions between components. |
Advantages of Unit testing
Here is a non-exhaustive list of the advantages of unit testing that make it a vital asset in the toolbox of a good programmer:- Time saving: Some very basic errors can become quite difficult to identify during the integration testing phase, due to the many layers of code that accumulate. However, these errors can be detected very simply, very quickly and very early in the building of the code thanks to unit tests.
- Fluidification of code changes: If you wish to bring in a modification to your code (e.g. change the regression method), it becomes very easy to verify that the function still works as expected by performing the unit test of this function.
- Improved code quality: A good approach to coding is to code unit tests before you code the units themselves. This compels one to think about all the contingencies that the unit might face. Thinking about how to code the unit renders the unit simpler and more robust later on. This approach is known as test driven development (TDD).
- Aid to the understanding of the code: Unit tests are also used by developers as explanatory documentation of each part of the code. In fact, it is very easy to understand the expected behaviour of a function by reading the associated unit test beforehand.
Weakness of the Unit testing
- However, it is impossible to test the infinite variety of contingencies that the unit might face. Passing the unit test without a hitch is therefore not a total guarantee of correct operation.
- Unit tests cannot, however, by construction, test the interaction between units.
Automated testing
- What is automated testing? :
To automate unit tests, there are frameworks that will greatly facilitate the task. The developer must set the criteria for the tests he/she wishes to perform, and then the framework takes care of performing the tests automatically and providing detailed error reports.
- unitest: The basic framework for automated testing on python is
unittest
. It is a built-in testing framework in python. It follows the xUnit style and provides classes and methods for creating and running tests. - pytest:
pytest
is another popular third party testing framework. It normaly simplifues test discovery and execution. It's known for its concise syntex and powerful features. - nose: Similarly
nose
is another third party testing framewrok that extends unittest and provides additional functionality. It's particularly useful for test discovery and running tests in parallel.
- unitest: The basic framework for automated testing on python is
Example-1
- Create
calculator.py
file. Here we will have add function.def add(a,b): return a+b
- Create a
test_calculator.py
file and add the code:
This file contains test functions prefixed withimport pytest from calculator import add def test_add_positive_numbers(): result = add(2, 3) assert result == 5 def test_add_negative_numbers(): result = add(-2, -3) assert result == -5 def test_add_mixed_numbers(): result = add(1, -5) assert result == -4
test_
. Thepytest
framework for assertions ('assert') instead of the built-inunittest
assertions. - Now run the tests using:
pytest test_calculator.py
Pytest will discover and run the tests, providing detailed output.
result = add(2,3) ==6
,
it will give error:
Example-2: Let's create a class 'Wallet' that has a method for adding money
add_cash
and a method for withdrawing money spend_cash
.
In a wallet.py
file, create a Wallet class that:
- Accepts an initial
contribution
of money and stores it in thebalance
attribute (= 0 if the initial contribution is not specified) - Has a method for adding money
add_cash
. - Has a method for withdrawing money
spend_cash
. This method first checks that the balance is sufficient and returns anInsufficientAmount
exception if it is not.
Solution: The wallet.py file then contains:
class Wallet(object):
def __init__(self, initial_amount=0):
self.balance = initial_amount
def spend_cash(self, amount):
if self.balance < amount:
raise InsufficientAmount('Not enough available to spend {}'.format(amount))
self.balance -= amount
def add_cash(self, amount):
self.balance += amount
class InsufficientAmount(Exception):
pass
In another python file wallet_test.py we will now write our unit tests. To do this we need to import the functions we want to test as well as the pytest module (to test the InsufficientAmount exception).
Now write 5 unit tests that check different properties :
- a newly created wallet has a balance of 0 by default.
- a newly created wallet with an initial balance of 100 has a balance of 100.
- a wallet created with an initial balance of 10 to which 90 is added has a balance of 100.
- a wallet created with an initial balance of 20 from which 10 is removed has a balance of 10.
- a wallet that tries to spend more than its balance will cause an InsufficientAmount error message.
wallet_test.py
contains:
from wallet import Wallet, InsufficientAmount
import pytest
def test_default_initial_amount():
wallet = Wallet()
assert wallet.balance == 0
def test_setting_initial_amount():
wallet = Wallet(100)
assert wallet.balance == 100
def test_wallet_add_cash():
wallet = Wallet(10)
wallet.add_cash(90)
assert wallet.balance == 100
def test_wallet_spend_cash():
wallet = Wallet(20)
wallet.spend_cash(10)
assert wallet.balance == 10
def test_wallet_spend_cash_raises_exception_on_insufficient_amount():
wallet = Wallet()
with pytest.raises(InsufficientAmount):
wallet.spend_cash(100)
Example-4: We can calculate the sum of the two number in following three ways:
- Without Parametrization or Fixture
- Only With Parameterization
- Only With Fixture
- Without Any Parameterization or Fixture:
In this approach, each test case is written as an individual function, and there is no use of parameterization or fixtures. Each test explicitly defines its input values and expected outcomes. This approach is straightforward and may be suitable for simpler test cases or scenarios where explicitness is preferred. However, it can lead to code duplication if many test cases share a similar structure.
# test_example.py def add(a, b): return a + b def test_addition_case1(): result = add(1, 2) assert result == 3 def test_addition_case2(): result = add(0, 0) assert result == 0 def test_addition_case3(): result = add(-1, 1) assert result == 0 def test_addition_case4(): result = add(10, -5) assert result == 5
- Only With Fixture:
In this approach, a fixture is used to provide parameterized input data for a test function. The @pytest.fixture decorator defines a fixture that can be reused across multiple test functions. This enhances code modularity and reusability by separating the test setup from the test logic.
import pytest @pytest.fixture(params=[ (1, 2, 3), (0, 0, 0), (-1, 1, 0), (10, -5, 5), ]) def input_data(request): return request.param def test_addition(input_data): input_a, input_b, expected_output = input_data result = add(input_a, input_b) assert result == expected_output
- Only With Parameterization:
In this approach, the @pytest.mark.parametrize decorator is used to parametrize a single test function. The decorator allows you to run the same test function with different sets of input parameters. This results in more concise code, especially when dealing with similar test cases, and helps reduce code duplication.
import pytest @pytest.mark.parametrize("input_a, input_b, expected_output", [ (1, 2, 3), (0, 0, 0), (-1, 1, 0), (10, -5, 5), ]) def test_addition(input_a, input_b, expected_output): result = add(input_a, input_b) assert result == expected_output
Side effects:
Unit tests can have both positive and negative side effects, depending on how they are implemented and maintained. Here are some of the common side effects:- Positive Side Effects:
- Improved Code Quality:
- Writing unit tests encourages developers to create modular and well-structured code.
- Forces developers to think about how to make functions and modules easily testable.
- Early Bug Detection:
- Identifying and fixing bugs early in the development process is one of the primary benefits of unit testing.
- Helps catch issues before they escalate into larger problems.
- Regression Prevention:
- Unit tests act as a safety net to ensure that new code changes do not break existing functionality (regression testing).
- Documentation:
- Unit tests serve as living documentation, showcasing the expected behavior of the code.
- New developers can use tests to understand the intended functionality of different parts of the codebase.
- Facilitates Refactoring:
- Developers can confidently refactor code, knowing that if the tests pass, the changes haven't introduced regressions.
- Supports Continuous Integration:
- Unit tests are crucial for setting up continuous integration (CI) pipelines, ensuring that tests are automatically run whenever changes are made.
- Improved Code Quality:
- Negative Side Effects:
- Time-Consuming:
- Writing and maintaining unit tests can be time-consuming, especially for complex systems.
- Balancing test coverage with development speed is crucial.
- False Sense of Security:
- High test coverage does not guarantee bug-free code. It's possible to have well-tested code that still has logical errors or edge cases that are not covered.
- Maintenance Overhead:
- As the codebase evolves, unit tests may need to be updated or rewritten to accommodate changes.
- Frequent changes in requirements might result in a constant need for test updates.
- Overemphasis on Code Coverage:
- Focusing solely on achieving high code coverage may lead to tests that don't adequately cover critical scenarios.
- Quality of tests is more important than sheer quantity.
- Dependency on Implementation Details:
- Tests that are too tightly coupled with the implementation details of the code may become fragile and break easily with minor changes.
- Resistance to Change:
- In some cases, developers may resist making changes to the code due to concerns about breaking existing tests.
- Time-Consuming:
Error messages
Error messages and their handling can vary based on the nature of the functions being tested. Here are some common types of error messages and how they might be handled in unit tests:ValueError
invalid Input:
Handling:def divide(a, b): if b == 0: raise ValueError("Cannot divide by zero.") return a / b
def test_divide(): assert divide(10, 2) == 5 try: divide(5, 0) except ValueError as e: assert str(e) == "Cannot divide by zero."
- TypeError: Incorrect Argument Type:
def calculate_square(n): if not isinstance(n, (int, float)): raise TypeError("Input must be a number.") return n ** 2
def test_calculate_square(): assert calculate_square(3) == 9 try: calculate_square("four") except TypeError as e: assert str(e) == "Input must be a number."
- AssertionError: Unexpected Output:
Handling:def add(a, b): return a * b # Incorrect implementation
def test_add(): assert add(2, 3) == 5 try: assert add(2, 3) == 6 except AssertionError as e: assert str(e) == "assert 6 == 5"
- IndexError: Out of Range:
Handling:def get_element_by_index(lst, index): if index < 0 or index >= len(lst): raise IndexError("Index out of range.") return lst[index]
def test_get_element_by_index(): my_list = [1, 2, 3] assert get_element_by_index(my_list, 1) == 2 try: get_element_by_index(my_list, 5) except IndexError as e: assert str(e) == "Index out of range."
- CustomError: Specific Application Error:
Handling:class CustomError(Exception): pass def custom_function(): raise CustomError("Something went wrong.")
def test_custom_function(): try: custom_function() except CustomError as e: assert str(e) == "Something went wrong."
References
Some other interesting things to know:
- Visit my website on For Data, Big Data, Data-modeling, Datawarehouse, SQL, cloud-compute.
- Visit my website on Data engineering