How a Generalized Validation Testing Approach Improves Efficiency, Boosts Outcomes and Streamlines Debugging

In two recent blog posts from the CrowdStrike Software Development Engineers in Test (SDET) team, we explored how end-to-end validation testing and modular testing design could increase the speed and accuracy of the testing lifecycle. 

In this latest post, we conclude our SDET series with a deep dive on how our generalized validation testing component improves efficiency, enhances product functionality and streamlines troubleshooting.

Generalized Validation Testing as a Best Practice

In a traditional approach, validation testing is done with basic test functionalities using plain “assert” statements. However, this can be a tedious and inefficient process in that the team will need to write much of the code manually and create new assert validators for each specific use case — which is a time-consuming process and may also be prone to errors. 

To make this process more efficient, our team developed a generalized way of validating products that streamlines the validation process by limiting the amount of code we need to write manually, as well as automating the validation itself. In short, this approach gives our team a flexible, powerful standard model that can validate most python data structures by comparing service data or data to be tested against a similar data structure containing the expected criteria. 

Our approach consists of two main components: 

  1. Leveraging Hamcrest, which is a Python testing framework for writing matcher objects, to establish a comprehensive set of built-in major rules that can be used as the foundation for every testing need.
  2. A custom-build validation algorithm or validation map that validates the new data to be tested based on expected data.
The Hamcrest Advantage
Hamcrest provides a wide range of predefined matcher objects such as: object matchers (equal_to, instance_of), number matchers (greater_than), text matchers (contains_string), sequence matchers (contains_inanyorder) or dictionary matchers (has_entries). assert_that is the hamcrest sentence for executing a test assertion. You can check this dictionary link for all available matchers and definitions.

For example, assert_that([1,2,3], only_contains(1,2,3,4,5)) tests that a given sequence [1,2,3] only contains elements from a predefined set (1,2,3,4,5).

Another major advantage of using hamcrest is that matchers can be composed of multiple matchers for a greater level of flexibility. For example, assert_that([1,2,3], only_contains(all_of(less_than(10), greater_than(0)))) will check that all elements of the given sequence are greater than 0 and less than 10.

The following image demonstrates how a validation map is used to validate certain data structures. Inputs to this validation map include:

  • Actual data: Data to be tested. This can include the response data to a specific request from a service endpoint, serialized as data structures (e.g. Python’s namedtuples).
  • Expected data: A sequence of serialized data structures (as it can be built from multiple data sources) containing the expected values related to the data under test. The expected data values can be either hamcrest matchers or exact values.

In the below implementation, the structure under test, actual data, and the expected data are converted into dictionaries. Keys are flattened if they contain nested structures.

def create_validation_map_from_namedtuple(actual: NamedTuple, expected: Sequence[NamedTuple], augmented_expected_dict: Dict = None)\
        -> Tuple[Dict, Dict, List]:
    """
    Create a pyhamcrest validation map to be used for has_entries({'foo':equal_to(1), 'bar':equal_to(2)}) assert_that
    on actual data based on expected structures
    :param actual: structure to be validated
    :param expected: sequence of structures that contain validation data
    :param augmented_expected_dict: override dict for certain keys from validation data structs. Need to have flatten
    path
    :return: actual_dict, expected_dict, missing_keys
    """
    actual_dict = namedtuple_to_flatten_dict(actual)
    expected_dict = {}
    for struct in expected:
        if struct is not None:
            expected_dict.update(
                {k: v for k, v in namedtuple_to_flatten_dict(struct).items() if k in actual_dict.keys()})
    if augmented_expected_dict:
        expected_dict.update(
            {k: v for k, v in namedtuple_to_flatten_dict(augmented_expected_dict).items() if k in actual_dict.keys()})
    missing_keys = list(set(actual_dict.keys()) - set(expected_dict.keys()))
    for key, value in expected_dict.items():
        if value is None or value == []:
            expected_dict[key] = any_of(is_(empty()), equal_to(None))
        elif isinstance(value, list):
            expected_dict[key] = contains_inanyorder(*value)
    return actual_dict, expected_dict, missing_keys

When the expected data structure does not include all the data needed for the intended validation, there is the possibility to extend the validation data by using an augmentation dictionary called augmented_expected_dict. The augmented_expected_dict values will enhance and/or overwrite the expected_data.

If there are keys in expected_data that are not part of the actual_data, they will be excluded from the validation map.

In this example, the created validation map will contain three elements:

  • The actual dict: data to be tested
  • The expected dict: validation criteria (can contain hamcrest matchers)
  • Missing keys list: sequence with keys from the actual data not present in the expected dict

Validation Design

The validation map will be executed by using the has_entries hamcrest matcher against the actual and expected dict. has_entries matches if a dictionary contains entries satisfying a dictionary of keys and corresponding value matchers.

If, for any possible reason, a field is missing from a structure under test (such as in a corner case scenario or a known issue that the tester wants to ignore) and we want to skip that check in order to continue with the validation, we have the option to specify that field key in the exclude_keys argument. With that command, the validator will ignore it.

By default if there are missing keys which are not excluded, the validation will fail because the validation criteria should cover all the data fields from the data structure under test. We can skip failing for missing keys by using fail_on_missing_keys= False.

def assert_on_validation_map(assert_structure: NamedTuple, actual: Dict, expected: Dict, missing_keys: Sequence = None,
                             exclude_keys: Sequence = None, fail_on_missing_keys: bool = True) -> None:
    if exclude_keys:
        if missing_keys:
            missing_keys = [key for key in missing_keys if key not in exclude_keys]
        [actual.pop(key) for key in exclude_keys if key in actual.keys()]
        [expected.pop(key) for key in exclude_keys if key in expected.keys()]
    assert_that(actual, has_entries(expected), f"Didn't match expected data for structure {assert_structure} with "
                                               f"validation map {actual}")
    if missing_keys and fail_on_missing_keys:
        assert_that(missing_keys, is_(empty()), f"Found missing keys from response in our expected data for "
                                                f"structure {assert_structure}")

Taken together, the use of Hamcrest and validation map help our team complete more powerful validations that we would otherwise not be able to do with the same efficiency. 

A Real-world Example of Validation Map Testing

To illustrate how our team leverages this concept in practice, below we share a business use case for a product that identifies software vulnerabilities

for applications based on data points coming from agents deployed on clients’ computers. The approach used to ingest test data is thoroughly described here.

Once test data has been injected to the system, we can proceed to verify that vulnerabilities have been correctly processed.

Let’s have a closer look into an oversimplified example of a vulnerability, which is a weakness or a flaw that affects a specific application on a host that can be exploited by a malicious actor. Below we see how we can validate a vulnerability using this approach. A few notes on the script: 

  • A vulnerability_id is what uniquely identifies the vulnerability instance
  • A status can be “OPEN” if the vulnerability has not been addressed by installing a patch, or “CLOSED” 
  • cve stands for Common Vulnerabilities and Exposures and is an industry publicly disclosed vulnerability
  • The app is the application that the given cve affects 
  • labels provide the means to enhance a vulnerability with specific custom tags or information
  • groups allow for to map different vulnerabilities together

# Vulnerability to test

Vulnerability(
  vulnerability_id='1122aabb_3344ccdd',
  status='OPEN',
  cve=Cve(id='CVE-2019-15903', severity='HIGH'),
  app=Application(vendor='CentOS', app_name='Firefox', app_version='60.8.0'),
  labels=['label_A', 'label_B'],
  groups=None)

# Expected data structure

ExpectedVulnerability(
  cve=ExpectedCve(
         id='CVE-2019-15903',
         severity='HIGH',
         cvss_base_score='7.5'),
  app=Application(vendor='CentOS', app_name='Firefox', app_version='60.8.0'),
  status='OPEN',
  labels=['label_B', 'label_A'],
  groups=[])

In order to verify the vulnerability, we build an “ExpectedVulnerability” data structure based on several sources such as the ingested test data or specific downstream services with more enriched data points.

Now that we have the expected data structure in place, there is one last step we can add, in order to further enhance our validation, and that is by leveraging the “augment” feature. As the vulnerability id cannot be predetermined and we can’t check the id value itself, we can still enhance the validator by checking that the vulnerability is not empty and is a string. We can do this with the “augment” feature, as shown in the code snippet below:

augmented_expected_dict = {
                     "vulnerability_id": all_of(instance_of(str),is_not(empty()))
     }

We can now proceed to execute the actual verification:

actual, expected, missing = create_validation_map_from_namedtuple(actual=vulnerability,
                                                                  expected=[expected_vulnerability],
                                                                  augmented_expected_dict=augmented_expected_dict)

assert_on_validation_map(vulnerability, actual, expected, missing)

Two notes on the above example:

  1. The vulnerability data structure contains a field called “labels” whose value is [‘label_A’, ‘label_B‘]. For different reasons, the order of the elements may not be guaranteed. In fact, our expected data structure from which the label field is sourced, contains the label’s values in a different order: [‘label_B’, ‘label_A‘]. Validation of values which are sequences by default ignores the order of the elements, so the label field validation will pass. This is being handled in the background when creating the validation map (create_validation_map_from_namedtuple) by leveraging the contains_inanyorder hamcrest matcher for field values that are lists.
  2. None or empty list expected values will match to any type of empty data structure or None value. In our example, ‘groups’ has an expected value of [], but in the vulnerability under test it has the value of None. This validation will pass seamlessly.

Benefits of Generalized Validation Testing

Our generalized approach to validation testing offers CrowdStrike a variety of important benefits, including: 

  • Efficiency. Working this way saves engineers and the development team significant time by limiting the amount of code that needs to be written for each individual validation.
  • Better outcomes. This approach allows the team to test code more thoroughly, which ultimately leads to better product quality and performance.
  • Enhanced logging and debugging. Leveraging Hamcrest matchers and their logging capabilities gives the team a detailed, timestamped log that is instrumental in improving the speed and accuracy of the debugging process. Again, because the process is part of an existing model, this capability does not need to be built manually by developers.

Conclusion 

Validation is a crucial function of every SDET team. Given the speed and scale at which CrowdStrike engineers work, our generalized approach is an absolute necessity for our engineering organization – and yet another example of how our SDET team is helping to fulfill the CrowdStrike mission to stop breaches.

Have questions or comments about generalized validation testing? Sound off on social media @CrowdStrike and be sure to check out our other two posts from this series: Testing Data Flows Using Python and Remote Functions and End-to-end Testing: How a Modular Testing Model Increases Efficiency and Scalability.

Related Content