Skip to content

Commit

Permalink
docs: add instructions for writing new checks
Browse files Browse the repository at this point in the history
Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
  • Loading branch information
behnazh-w committed Feb 1, 2024
1 parent 7f92fe2 commit 9aedac6
Show file tree
Hide file tree
Showing 4 changed files with 204 additions and 86 deletions.
158 changes: 157 additions & 1 deletion docs/source/pages/developers_guide/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. Copyright (c) 2023 - 2023, Oracle and/or its affiliates. All rights reserved.
.. Copyright (c) 2023 - 2024, Oracle and/or its affiliates. All rights reserved.
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
=========================
Expand All @@ -11,6 +11,162 @@ To follow the project's code style, see the :doc:`Macaron Style Guide </pages/de

For API reference, see the :doc:`API Reference </pages/developers_guide/apidoc/index>` page.

-------------------
Writing a New Check
-------------------

As a contributor to Macaron, it is very likely to need to write a new check or modify an existing one at some point. In this
section, we will understand how Macaron checks work and what we need to do to develop one.

+++++++++++++++++
High-level Design
+++++++++++++++++

Before jumping into coding, it is useful to understand how Macaron as a framework works. Macaron is an extensible
framework designed to make writing new supply chain security analyses easy. It provides an interface
that you can leverage to access existing models and abstractions instead of implementing everything from scratch. For
instance, many security checks require to traverse through the code in GitHub Actions configurations. Normally,
you would need to find the right repository and commit, clone it, find the workflows, and parse them. With Macaron,
you don't need to do any of that and can simply write your security check by using the parsed shell scripts that are
triggered in the CI.

Another important aspect of our design is that all the check results are automatically mapped and stored in a local database.
By performing this mapping, we make it possible to enforce flexible policies on the results of the checks. While storing
the check results to the database happens automatically by Macaron's backend, the developer needs to add a brief specification
to make that possible as we will see later.

+++++++++++++++++++
The Check Interface
+++++++++++++++++++

Each check needs to be implemented as a Python class in a Python module under ``src/macaron/slsa_analyzer/checks``.
A check class should subclass the ``BaseCheck`` class in :ref:`base_check module <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.base\\_check module>`.

You need to set the name, description, and other details of your new check in the ``__init__`` method. After implementing
the initializer, you need to implement the ``run_check`` abstract method. This method provides the context object
:ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>`, which contains various
intermediate representations and models. The ``dynamic_data`` property would be particularly useful as it contains
data about the CI service, artifact registry, and build tool used for building the software component.

``component`` is another useful attribute in the :ref:`AnalyzeContext <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.analyze\\_context module>` object
that you should know about. This attribute contains the information about a software component, such
as it's corresponding ``repository`` and ``dependencies``. Note that ``component`` will also be stored into the database and its attributes
such as ``repository`` are established as database relationships. You can see the existing tables and their
relationships in our :ref:`data model <pages/developers_guide/apidoc/macaron.database:macaron.database.table\\_definitions module>`.

Once you implement the logic of your check in the ``run_check`` method, you need to add a class to help
Macaron handle your check's output:

* Add a class that subclasses ``CheckFacts`` to map your outputs to a table in the database. The class name should follow the ``<MyCheck>Facts`` pattern.
* Specify the table name in the ``__tablename__ = "_my_check"`` class variable. Note that the table name should start with ``_`` and it should not have been used by other checks.
* Add the ``id`` column as the primary key where the foreign key is ``_check_facts.id``.
* Add columns for the check outputs that you would like to store into the database. If a column needs to appear as a justification in the HTML/JSON report, pass ``info={"justification": JustificationType.<TEXT or HREF>}`` to the column mapper.
* Add ``__mapper_args__`` class variable and set ``"polymorphic_identity"`` key to the table name.

Next, you need to create a ``result_tables`` list and append check facts as part of the ``run_check`` implementation.
You should also specify a :ref:`Confidence <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
score choosing one of the ``Confidence`` enum values, e.g., ``Confidence.HIGH`` and pass it via keyword
argument ``confidence``. You should choose a suitable confidence score based on the accuracy
of your check analysis.

.. code-block:: python
result_tables.append(MyCheckFacts(col_foo=foo, col_bar=bar, confidence=Confidence.HIGH))
This list as well as the check result status should be stored in a :ref:`CheckResultData <pages/developers_guide/apidoc/macaron\.slsa_analyzer\.checks:macaron.slsa\\_analyzer.checks.check\\_result module>`
object and returned by ``run_check``.

Finally, you need to register your check by adding it to the :ref:`registry module <pages/developers_guide/apidoc/macaron\.slsa_analyzer:macaron.slsa\\_analyzer.registry module>`:

.. code-block:: python
registry.register(MyCheck())
And of course, make sure to add tests for you check by adding a module under ``tests/slsa_analyzer/checks/``.

+++++++
Example
+++++++

In this example, we show how to add a check determine if a software component has a source-code repository.
Feel free to explore other existing checks under ``src/macaron/slsa_analyzer/checks`` for more examples.

1. First create a module called ``repo_check.py`` under ``src/macaron/slsa_analyzer/checks``.

2. Add a class and specify the columns that you want to store for the check outputs to the database.

.. code-block:: python
# Add this line at the top of the file to create the logger object if you plan to use it.
logger: logging.Logger = logging.getLogger(__name__)
class RepoCheckFacts(CheckFacts):
"""The ORM mapping for justifications in the check repository check."""
__tablename__ = "_repo_check"
#: The primary key.
id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True)
#: The Git repository path.
git_repo: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF})
__mapper_args__ = {
"polymorphic_identity": "__repo_check",
}
3. Add a class for your check, provide the check details in the initializer method, and implement the logic of the check in ``run_check``.

.. code-block:: python
class RepoCheck(BaseCheck):
"""This Check checks whether the target software component has a source-code repository."""
def __init__(self) -> None:
"""Initialize instance."""
check_id = "mcn_repo_exists_1"
description = "Check whether the target software component has a source-code repository."
depends_on: list[tuple[str, CheckResultType]] = [] # This check doesn't depend on any other checks.
eval_reqs = [
ReqName.VCS
] # Choose a SLSA requirement that roughly matches this check from the ReqName enum class.
super().__init__(check_id=check_id, description=description, depends_on=depends_on, eval_reqs=eval_reqs)
def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
"""Implement the check in this method.
Parameters
----------
ctx : AnalyzeContext
The object containing processed data for the target software component.
Returns
-------
CheckResultData
The result of the check.
"""
if not ctx.component.repository:
logger.info("Unable to find a Git repository for %s", ctx.component.purl)
# We do not store any results in the database if a check fails. So, just leave result_tables empty.
return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED)
return CheckResultData(
result_tables=[RepoCheckFacts(git_repo=ctx.component.repository.remote_path, confidence=Confidence.HIGH)],
result_type=CheckResultType.PASSED,
)
4. Register your check.

.. code-block:: python
registry.register(RepoCheck())
Finally, you can add tests for you check by adding ``tests/slsa_analyzer/checks/test_repo_check.py`` module. Macaron
uses `pytest <https://docs.pytest.org>`_ and `hypothesis <https://hypothesis.readthedocs.io>`_ for testing. Take a look
at other tests for inspiration!

.. toctree::
:maxdepth: 1

Expand Down
21 changes: 20 additions & 1 deletion src/macaron/slsa_analyzer/analyze_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,8 @@ def __init__(
self.check_results: dict[str, CheckResult] = {}

# Add the data computed at runtime to the dynamic_data attribute.
self.dynamic_data: ChecksOutputs = ChecksOutputs(
# This attribute should be accessed via the `dynamic_data` property.
self._dynamic_data: ChecksOutputs = ChecksOutputs(
git_service=NoneGitService(),
build_spec=BuildSpec(tools=[]),
ci_services=[],
Expand All @@ -91,6 +92,24 @@ def __init__(
expectation=None,
)

@property
def dynamic_data(self) -> ChecksOutputs:
"""Return the `dynamic_data` object that contains various intermediate representations.
This object is used to pass various models and intermediate representations from the backend
in Macaron to checks. A check can also store intermediate results in this object to be used
by checks that depend on it. However, please avoid adding arbitrary attributes to this object!
We recommend to take a look at the attributes in this object before writing a new check. Chances
are that what you try to implement is already implemented and the results are available in the
`dynamic_data` object.
Return
------
ChecksOutputs
"""
return self._dynamic_data

@property
def provenances(self) -> dict[str, list[InTotoV01Statement | InTotoV1Statement]]:
"""Return the provenances data as a dictionary.
Expand Down
40 changes: 2 additions & 38 deletions src/macaron/slsa_analyzer/checks/README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,6 @@
# Defining Checks

The checks defined in this directory are automatically loaded during the startup of Macaron and used during the analysis. This `README.md` shows how a Check can be created.
The checks defined in this directory are automatically loaded during the startup of Macaron and used during the analysis. For detailed instructions to write a new check, see our [website](https://oracle.github.io/macaron/pages/developers_guide/index.html).

## Base Check
The `BaseCheck` class (located at [base_check.py](./base_check.py)) is the abstract class to be inherited by other concrete checks.
Please see [base_check.py](./base_check.py) for the attributes of a `BaseCheck` instance.

## Writing a Macaron Check
These are the steps for creating a Check in Macaron:
1. Create a module with the name `<name>_check.py`. Note that Macaron **only** loads check modules that have this name format.
2. Create a class that inherits `BaseCheck` and initiates the attributes of a `BaseCheck` instance.
3. Register the newly created Check class to the Registry ([registry.py](../registry.py)). This will make the Check available to Macaron. For example:
```python
from macaron.slsa_analyzer.registry import registry

# Check class is defined here
# class ExampleCheck(BaseCheck):
# ...

registry.register(ExampleCheck())
```
4. Add an ORM mapped class for the check facts so that the policy engine can reason about the properties. To provide the mapped class, all you need to do is to add a class that inherits from `CheckFacts` class and add the following attributes (rename the `MyCheckFacts` check name and `__tablename__` as appropriate).

```python
class MyCheckFacts(CheckFacts):
"""The ORM mapping for justifications in my check."""

__tablename__ = "_my_check"

#: The primary key.
id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True) # noqa: A003

#: The name of the column (property) that becomes available to policy engine.
my_column_name: Mapped[str] = mapped_column(String, nullable=False)

__mapper_args__ = {
"polymorphic_identity": "_my_check",
}
```

For more examples, please see the existing Checks in [checks/](./).
You can also have a look at the existing Checks in [this](./) directory for inspiration.
71 changes: 25 additions & 46 deletions tests/slsa_analyzer/checks/test_vcs_check.py
Original file line number Diff line number Diff line change
@@ -1,62 +1,41 @@
# Copyright (c) 2022 - 2023, Oracle and/or its affiliates. All rights reserved.
# Copyright (c) 2022 - 2024, Oracle and/or its affiliates. All rights reserved.
# Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.

"""This modules contains tests for the provenance available check."""

import os
from pathlib import Path

from macaron.database.table_definitions import Analysis, Component, Repository
from macaron.slsa_analyzer.analyze_context import AnalyzeContext, ChecksOutputs
from macaron.slsa_analyzer.checks.check_result import CheckResultType
from macaron.slsa_analyzer.checks.vcs_check import VCSCheck
from macaron.slsa_analyzer.git_service.base_git_service import NoneGitService
from macaron.slsa_analyzer.slsa_req import SLSALevels
from macaron.slsa_analyzer.specs.build_spec import BuildSpec
from tests.conftest import MockAnalyzeContext

from ...macaron_testcase import MacaronTestCase
from ..mock_git_utils import initiate_repo

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
REPO_DIR = os.path.join(BASE_DIR, "mock_repos", "vcs_check_repo/sample_repo")


# pylint: disable=super-init-not-called
class MockAnalyzeContext(AnalyzeContext):
"""This class can be initiated without a git obj."""

def __init__(self) -> None:
# Make the VCS Check fails.
self.component = Component(purl="pkg:invalid/invalid", analysis=Analysis(), repository=None)
self.ctx_data: dict = {}
self.slsa_level = SLSALevels.LEVEL0
self.is_full_reach = False
self.dynamic_data: ChecksOutputs = ChecksOutputs(
git_service=NoneGitService(),
build_spec=BuildSpec(tools=[]),
ci_services=[],
is_inferred_prov=True,
expectation=None,
package_registries=[],
)
self.wrapper_path = ""
self.output_dir = ""


class TestVCSCheck(MacaronTestCase):
"""Test the vcs check."""

def test_vcs_check(self) -> None:
"""Test the vcs check."""
check = VCSCheck()
initiate_repo(REPO_DIR)

component = Component(
purl="pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c",
analysis=Analysis(),
repository=Repository(complete_name="github.com/package-url/purl-spec"),
)
use_git_repo = AnalyzeContext(component=component, macaron_path=REPO_DIR, output_dir="")
assert check.run_check(use_git_repo).result_type == CheckResultType.PASSED

no_git_repo = MockAnalyzeContext()
assert check.run_check(no_git_repo).result_type == CheckResultType.FAILED
def test_vcs_check_valid_repo(macaron_path: Path) -> None:
"""Test the vcs check for a valid repo."""
check = VCSCheck()
initiate_repo(REPO_DIR)
use_git_repo = MockAnalyzeContext(macaron_path=macaron_path, output_dir="")
use_git_repo.component = Component(
purl="pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c",
analysis=Analysis(),
repository=Repository(complete_name="github.com/package-url/purl-spec"),
)
assert check.run_check(use_git_repo).result_type == CheckResultType.PASSED


def test_vcs_check_invalid_repo(macaron_path: Path) -> None:
"""Test the vcs check for an invalid repo."""
check = VCSCheck()
initiate_repo(REPO_DIR)
no_git_repo = MockAnalyzeContext(macaron_path=macaron_path, output_dir="")
no_git_repo.component = Component(
purl="pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c", analysis=Analysis(), repository=None
)
assert check.run_check(no_git_repo).result_type == CheckResultType.FAILED

0 comments on commit 9aedac6

Please sign in to comment.