.. _workflows:

=========
Workflows
=========

Workflows are server-side logic that can schedule and combine server
tasks and worker tasks to automate complex operations.

Workflows are created from a workflow template chosen from a set maintained by
the server administrators, plus data coming from user input.

See the :ref:`explanation-workflows` for an overview, and below for
technical details.

.. _workflow-noop:

Workflow ``noop``
=================

This is a workflow that does nothing, and is mainly used in tests.

* ``task_data``: empty


.. _workflow-sbuild:

Workflow ``sbuild``
===================

This workflow takes a source package and creates sbuild work requests (see
:ref:`task-sbuild`) to build it for a set of architectures.

* ``task_data``:

  * ``input`` (required): see :ref:`package-build-task`
  * ``target_distribution`` (required string): ``vendor:codename`` to specify
    the environment to use for building. It will be used to determine
    ``distribution`` or ``environment``, depending on ``backend``.
  * ``backend`` (optional string): see :ref:`package-build-task`
  * ``architectures`` (required list of strings): list of architectures to
    build. It can include ``all`` to build a binary for ``Architecture: all``
  * ``build_logs_collection`` (:ref:`lookup-single` with default category
    ``debian:package-build-logs``, optional): collection where build logs
    should be retained; if unset, build logs are not added to any collection
  * ``environment_variant`` (optional string): variant of the
    environment we want to build on, e.g. ``buildd``; appended during
    environment :ref:`lookup <lookup-syntax>` for
    ``target_distribution`` above.
  * ``build_profiles`` (optional, default unset): select a build profile, see
    :ref:`package-build-task`.
  * ``binnmu`` (optional, default unset): build a binNMU, see
    :ref:`package-build-task`.
  * ``retry_delays`` (optional list): a list of delays to apply to each
    successive retry; each item is an integer suffixed with ``m`` for
    minutes, ``h`` for hours, ``d`` for days, or ``w`` for weeks.

The source package will be built on the intersection of the provided list of
architectures and the architectures supported in the ``Architecture:`` field of
the source package. The architecture ``all`` packages are built in an ``amd64`` environment.

The workflow may also apply a denylist of architectures if it finds a
``debian:suite`` collection corresponding to the build
distribution/environment, and that suite provides one.

If ``build_logs_collection`` exists, then the workflow adds
:ref:`action-update-collection-with-data` and
:ref:`action-update-collection-with-artifacts` event reactions to each
sbuild work request to record their build logs there.  See
:ref:`collection-package-build-logs`.

If ``retry_delays`` is set, then the workflow adds a corresponding
``on_failure`` :ref:`action-retry-with-delays` action to each of the sbuild
work requests it creates.  This provides a simplistic way to retry
dependency-wait failures.  Note that this currently retries any failure, not
just dependency-waits; this may change in future.


.. _workflow-update-environments:

Workflow ``update_environments``
================================

This workflow schedules work requests to build :ref:`tarballs
<artifact-system-tarball>` and :ref:`images <artifact-system-image>`, and
adds them to a :ref:`debian:environments collection
<collection-environments>`.

* ``task_data``:

  * ``vendor`` (required): the name of the distribution vendor, used to look
    up the target ``debian:environments`` collection
  * ``targets`` (required): a list of dictionaries as follows:

    * ``codenames`` (required): the codename of an environment to build, or
      a list of such codenames
    * ``codename_aliases`` (optional): a mapping from build codenames to
      lists of other codenames; if given, add the output to the target
      collection under the aliases in addition to the build codenames.  For
      example, ``trixie: [testing]``
    * ``variants`` (optional): an identifier to use as the variant name when
      adding the resulting artifacts to the target collection, or a list of
      such identifiers; if not given, the default is not to set a variant
      name
    * ``backends`` (optional): the name of the debusine backend to use when
      adding the resulting artifacts to the target collection, or a list of
      such names; if not given, the default is not to set a backend name
    * ``architectures`` (required): a list of architecture names of
      environments to build for this codename
    * ``mmdebstrap_template`` (optional): a template to use to construct
      data for the :ref:`task-mmdebstrap`
    * ``simplesystemimagebuild_template`` (optional): a template to use to
      construct data for the :ref:`task-simplesystemimagebuild`

For each codename in each target, the workflow creates a :ref:`group
<workflow-group>`.  Then, for each architecture in that target, it fills in
whichever of ``mmdebstrap_template`` and ``simplesystemimagebuild_template``
that are present and uses them to construct child work requests.  In each
one, ``bootstrap_options.architecture`` is set to the target architecture,
and ``bootstrap_repositories[].suite`` is set to the codename if it is not
already set.

The workflow adds one event reaction to each child work request as follows
for each combination of the codename (including any matching entries from
``codename_aliases``), variant (``variants``, or ``[null]`` if
missing/empty), and backend (``backends``, or ``[null]`` if missing/empty).
``{vendor}`` is the ``vendor`` from the workflow's task data, and
``{category}`` is ``debian:system-tarball`` for ``mmdebstrap`` tasks and
``debian:system-image`` for ``simplesystemimagebuild`` tasks:

.. code-block:: yaml

  on_success:
    - action: "update-collection-with-artifacts"
      artifact_filters:
        category: "{category}"
      collection: "{vendor}@debian:environments"
      variables:
        - codename: {codename}
        - variant: {variant}  # omit if null
        - backend: {backend}  # omit if null


.. _workflow-autopkgtest:

Workflow ``autopkgtest``
========================

This workflow schedules autopkgtests for a single source package on a set of
architectures.

* ``task_data``:

  * ``prefix`` (string, optional): prefix this string to the item names
    provided in the internal collection

  * ``source_artifact`` (:ref:`lookup-single`, required): see
    :ref:`task-autopkgtest`
  * ``binary_artifacts`` (:ref:`lookup-multiple`, required): see
    :ref:`task-autopkgtest`
  * ``context_artifacts`` (:ref:`lookup-multiple`, optional): see
    :ref:`task-autopkgtest`

  * ``vendor`` (string, required): the distribution vendor on which to run
    tests
  * ``codename`` (string, required): the distribution codename on which to
    run tests
  * ``backend`` (string, optional): see :ref:`task-autopkgtest`
  * ``architectures`` (list of strings, optional): if set, only run on any
    of these architecture names

  * ``include_tests``, ``exclude_tests``, ``debug_level``,
    ``extra_environment``, ``needs_internet``, ``fail_on``, ``timeout``: see
    :ref:`task-autopkgtest`

Tests will be run on the intersection of the provided list of architectures
(if any) and the architectures provided in ``binary_artifacts``.  If only
``Architecture: all`` binary packages are provided in ``binary_artifacts``,
then tests are run on ``amd64``.

The workflow creates an :ref:`task-autopkgtest` for each concrete
architecture, with task data:

* ``input.source_artifact``: ``{source_artifact}``
* ``input.binary_artifacts``: the subset of ``{binary_artifacts}`` that are
  for the concrete architecture or ``all``
* ``input.context_artifacts``: the subset of ``{context_artifacts}`` that
  are for the concrete architecture or ``all``
* ``host_architecture``: the concrete architecture
* ``environment``: ``{vendor}/match:codename={codename}``
* ``backend``: ``{backend}``
* ``include_tests``, ``exclude_tests``, ``debug_level``,
  ``extra_environment``, ``needs_internet``, ``fail_on``, ``timeout``:
  copied from workflow task data parameters of the same names

Any of the lookups in ``input.source_artifact``, ``input.binary_artifacts``,
or ``input.context_artifacts`` may result in :ref:`promises
<bare-data-promise>`, and in that case the workflow adds corresponding
dependencies.  Binary promises must include an ``architecture`` field in
their data.

Each work request provides its ``debian:autopkgtest`` artifact as output in
the internal collection, using the item name
``{prefix}autopkgtest-{architecture}``.

.. todo::

    The selection of the host architecture for architecture-independent
    binary packages should be controlled by pipeline instructions.  A
    similar mechanism might also control multiarch tests, such as testing
    i386 packages on an amd64 testbed.


.. _workflow-event-reactions:

Event reactions
===============

The ``event_reactions`` field on a workflow is a dictionary mapping events
to a list of actions. Each action is described with a dictionary where the
``action`` key defines the action to perform and where the remaining keys
are used to define the specifics of the action to be performed. See section
below for details. The supported events are the following:

* ``on_creation``: event triggered when the work request is created
* ``on_unblock``: event triggered when the work request is unblocked
* ``on_success``: event triggered when the work request completes
  successfully
* ``on_failure``: event triggered when the work request fails or errors
  out

Supported actions
~~~~~~~~~~~~~~~~~

.. _action-send-notification:

``send-notification``
^^^^^^^^^^^^^^^^^^^^^

Sends a notification of the event using an existing notification channel.

* ``channel``: name of the notification channel to use
* ``data``: parameters for the notification method

.. _action-update-collection-with-artifacts:

``update-collection-with-artifacts``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Adds or replaces artifact-based collection items with artifacts generated
by the current work request.

* ``collection`` (:ref:`lookup-single`, required): collection to update
* ``name_template`` (string, optional): template used to generate the name for the collection
  item associated to a given artifact. Uses the ``str.format`` templating
  syntax (with variables inside curly braces).
* ``variables`` (dict, optional): definition of variables to prepare to be able to
  compute the name for the collection item.  Keys and values in this
  dictionary are interpreted as follows:

  * Keys beginning with ``$`` are handled using `JSON paths
    <https://pypi.org/project/jsonpath-rw/>`_.  The part of the key after
    the ``$`` is the name of the variable, and the value is a JSON path
    query to execute against the ``data`` dictionary of the target artifact
    in order to compute the value of the variable.

  * Keys that do not begin with ``$`` simply set the variable named by the
    key to the value, which is a constant string.

  * It is an error to specify keys for the same variable name both with and
    without an initial ``$``.

* ``artifact_filters`` (dict, required): this parameter makes it possible
  to identify a subset of generated artifacts to add to the collection.
  Each key-value represents a specific Django's ORM filter query against
  the Artifact model so that one can run
  ``work_request.artifact_set.filter(**artifact_filters)`` to
  identify the desired set of artifacts.

.. note::

   When the ``name_template`` key is not provided, it is expected that
   the collection will compute the name for the new artifact-based
   collection item.  Some collection categories might not even allow you to
   override the name.  In this case, after any JSON path expansion, the
   ``variables`` field is passed to the collection manager's
   ``add_artifact``, so it may use those expanded variables to compute its
   own item names or per-item data.

As an example, you could register all the binary packages having
``Section: python`` and a dependency on libpython3.12 out of a ``sbuild``
task with names like ``$PACKAGE_$VERSION`` by using this action::

    action: 'update-collection-with-artifacts'
    artifact_filters:
      category: 'debian:binary-package'
      data__deb_fields__Section: 'python'
      data__deb_fields__Depends__contains: 'libpython3.12'
    collection: 'internal@collections'
    name_template: '{package}_{version}'
    variables:
      package: 'deb_fields.Package'
      version: 'deb_fields.Version'

.. _action-update-collection-with-data:

``update-collection-with-data``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Adds or replaces a bare collection item based on the current work request.

This is similar to :ref:`action-update-collection-with-artifacts`, except
that of course it does not refer to artifacts.  This can be used in
situations where no artifact is available, such as in ``on_creation``
events.

* ``collection`` (:ref:`lookup-single`, required): collection to update
* ``category`` (string, required): the category of the item to add
* ``name_template`` (string, optional): template used to generate the name
  for the collection item.  Uses the ``str.format`` templating syntax (with
  variables inside curly braces, referring to keys in ``data``).
* ``data`` (dict, optional): data for the collection item.  This may also be
  used to compute the name for the item, either via substitution into
  ``name_template`` or by rules defined by the collection manager.

.. note::

   When the ``name_template`` key is not provided, it is expected that the
   collection will compute the name for the new bare collection item.  Some
   collection categories might not even allow you to override the name.

.. _action-retry-with-delays:

``retry-with-delays``
^^^^^^^^^^^^^^^^^^^^^

This action is used in ``on_failure`` event reactions.  It causes the work
request to be retried automatically with various parameters, adding a
dependency on a newly-created :ref:`task-delay`.

The current delay scheme is limited and simplistic, but we expect that more
complex schemes can be added as variations on the parameters to this action.

* ``delays`` (list, required): a list of delays to apply to each successive
  retry; each item is an integer suffixed with ``m`` for minutes, ``h`` for
  hours, ``d`` for days, or ``w`` for weeks.

The workflow data model for work requests gains a ``retry_count`` field,
defaulting to 0 and incrementing on each successive retry.  When this action
runs, it creates a :ref:`task-delay` with its ``delay_until`` field set to
the current time plus the item from ``delays`` corresponding to the current
retry count, adds a dependency from its work request to that, and marks its
work request as blocked on that dependency.  If the retry count is greater
than the number of items in ``delays``, then the action does nothing.

Workflow implementation
=======================

On the Python side, a workflow is orchestrated by a subclass of
``Workflow``, which derives from ``BaseTask`` and has its own subclass
hierarchy.

When instantiating a "Workflow", a new ``WorkRequest`` is created with:

* ``task_type`` set to ``"workflow"``
* ``task_name`` pointing to the ``Workflow`` subclass used to orchestrate
* ``task_data`` set to the workflow parameters instantiated from the template (or from the parent workflow)

This ``WorkRequest`` acts as the root of the ``WorkRequest`` hierarchy
for the running workflow.

The ``Workflow`` class runs on the server with full database access
and is in charge of:

* on instantiation, laying out an execution plan under the form of a
  directed acyclic graph of newly created ``WorkRequest`` instances.
* analyzing the results of any completed ``WorkRequest`` in the graph
* possibly extending/modifying the graph after this analysis

``WorkRequest`` elements in a Workflow can only depend among each other, and
cannot have dependencies on ``WorkRequest`` elements outside the workflow.
They may depend on work requests in other sub-workflows that are part of the
same root workflow.

All the child work requests start in the ``blocked`` status using the ``deps``
unblock strategy. When the Workflow ``WorkRequest`` is ready to run, all the
child ``WorkRequest`` elements that don't have any further dependencies can
immediately start.

WorkflowTemplate
================

The ``WorkflowTemplate`` model has (at least) the following fields:

* ``name``: a unique name given to the workflow within the workspace
* ``workspace``: a foreign key to the workspace containing the workflow
* ``task_name``: a name that refers back to the ``Workflow`` class to
  use to manage the execution of the workflow
* ``task_data``: JSON dict field representing a subset of the parameters
  needed by the workflow that cannot be overridden when instantiating the root
  ``WorkRequest``

The root ``WorkRequest`` of the workflow copies the following fields from
``WorkflowTemplate``:

* ``workspace``
* ``task_name``
* ``task_data``, combining the user-supplied data and the
  ``WorkflowTemplate``-imposed data)

.. _workflow-group:

Group of work requests
======================

When a workflow generates a large number of related/similar work requests,
it might want to hide all those work requests behind a group that would
appear a single step in the visual representation of the workflow.  This is
implemented by a ``group`` key in the ``workflow_data`` dictionary of each
task.

Advanced workflows / sub-workflows
==================================

Advanced workflows can be created by combining multiple limited-purpose
workflows.

Sub-workflows are integrated in the general graph of their parent workflow
as WorkRequests of type ``workflow``.

From a user interface perspective, sub-workflows are typically hidden as a
single step in the visual representation of the parent's workflow.

Cooperation between workflows is defined at the level of workflows.
Individual work requests should not concern themselves with this; they
are designed to take inputs using lookups and produce output artifacts
that are linked to the work request.

Sub-workflow coordination takes place through the workflow's internal
collection (which is shared among all sub-workflows of the same root
workflow), providing a mechanism for some work requests to declare that they
will provide certain kinds of artifacts which may then be required by work
requests in other sub-workflows.

On the providing side, workflows use the
:ref:`action-update-collection-with-artifacts` event reaction to add
relevant output artifacts from work requests to the internal collection, and
create :ref:`promises <bare-data-promise>` to indicate to other workflows
that they have done so.  Providing workflows choose item names in the
internal collection; it is the responsibility of workflow designers to
ensure that they do not clash, and workflows that provide output artifacts
have a optional ``prefix`` field in their task data to allow multiple
instances of the same workflow to cooperate under the same root workflow.

On the requiring side, workflows look up the names of artifacts they require
in the internal collection; each of those lookups may return nothing, or a
promise including a work request ID, or an artifact that already exists, and
they may use that to determine which child work requests they create.  They
use :ref:`lookups <lookup-syntax>` in their child work requests to refer to
items in the internal collection (e.g.
``internal@collections/name:build-amd64``), and add corresponding
dependencies on work requests that promise to provide those items.

Sub-workflows may depend on other steps within the root workflow while
still being fully populated in advance of being able to run.  A
workflow that needs more information before being able to populate
child work requests should use :ref:`workflow callbacks
<workflow-callback>` to run the workflow orchestrator again when it is
ready.  (For example, a workflow that creates a source package and
then builds it may not know which work requests it needs to create
until it has created the source package and can look at its
``Architecture`` field.)
