.. _ontology:

========
Ontology
========

-------------
For Artifacts
-------------

The "type" categorization of artifacts does not enforce anything on the
structure of the associated files and key-value data. However, there must
be some consistency and rules to be able to a make a meaningful use of the
system.

This document presents the various types that we use to manage a
Debian-based distribution. For each type, we explain:

* the expected structure of the "slug" (string identifier)
* what associated files you can find
* what key-value data you can expect
* what relationships with other artifacts are likely to exist

Type ``source-package``
=======================

This artifact represents a set of files that can be extracted in some
way to provide a file hierarchy containing source code that can be built
into ``binary-packages`` artifact(s).

* Slug: *name* _ *version*
* Data:

  * name: the name of the source package
  * version: the version of the source package
  * type: the type of the source package

    * ``dpkg`` for a source package that can be extracted with ``dpkg-source -x`` on the ``.dsc`` file

  * dsc-fields: a parsed version of the fields available in the .dsc file

* Files: for the ``dpkg`` type, a ``.dsc`` file and all the files
  referenced in that file
* Relationships: none

Type ``binary-packages``
========================

This artifact represents the set of binary packages (``.deb`` files and
similar) produced during the build of a source package for a given
architecture.

If the build of a source-package produces binaries of more than one
architecture, one ``binary-packages`` artifact is created for each
architecture, listing only the binary packages for that architecture.

* Slug: *srcpkg-name* _ *version* _ *architecture*
* Data:

  * srcpkg-name: the name of the source package
  * srcpkg-version: the version of the source package
  * version: the version used for the build (can be different from the
    source version in case of binary-only rebuilds)
  * architecture: the architecture that the packages have been built for.
    Can be any real Debian architecture or ``all``.
  * packages: the list of binary packages that are part of the build
    for this architecture.

* Files:
* Relationships:

  * built-using: the corresponding ``source-package``
  * built-using: other ``binary-packages`` (for example in the case of
    signed packages duplicating the content of an unsigned package)
  * built-using: other ``source-package`` (general case of Debian's
    ``Built-Using`` field)

Type ``source-upload``
======================

* Slug: random uuid
* Data:

  * type: the type of the source upload

    * ``dpkg``: for an upload generated out of a ``.changes`` file created
      by ``dpkg-buildpackage``

  * repository: the target repository (corresponds to the ``Distribution`` field
    from the usual ``.changes`` file)
  * changes-fields: a parsed version of the fields available in the
    ``.changes`` file

* Files:

  * a ``.changes`` file

* Relationships:

  * extends: one ``source-package``

Type ``binary-upload``
======================

* Slug: random uuid
* Data:

  * type: the type of the source upload

    * ``dpkg``: for an upload generated out of a ``.changes`` file created
      by ``dpkg-buildpackage``

  * repository: the target repository (corresponds to the ``Distribution`` field
    from the usual ``.changes`` file)
  * changes-fields: a parsed version of the fields available in the
    ``.changes`` file

* Files:

  * a ``.changes`` file

* Relationships:

  * extends: one (or more) ``binary-packages``

Type ``repository``
===================

Represents an APT repository.

* Slug: *codename* or (*origin* : *codename* if we have to handle multiple vendors)
* Data:

  * base-url: the base url of a repository (eg "http://deb.debian.org/debian")
  * codename: the codename of the repository (eg "sid")
  * possibly, other data extracted from the ``InRelease`` file:

    * components: list of components available
    * architectures: list of architectures available
    * origin: string identifier for the repository owner

* Files:

  * Multiple JSON encoded files making it easy to browse the content of
    the repository. Exact format to be determined.

* Relationships:

  * extends: (optional) another ``repository`` (e.g. "experimental"
    extends "unstable")
  * includes: many ``binary-packages`` and ``source-package``

---------
For Tasks
---------

While tasks are unique in theory, we can have different tasks sharing
some commonalities. In the Debian context in particular, we have different
ways to build Debian packages with different helper programs (sbuild,
pbuilder, etc.) and we want those tasks to reuse the same set of
parameters so that they can be called interchangeably.

This public interface is materialized by a generic task that can be
scheduled by the users and that will run one of the available
implementations that can run on one of the available workers.

This section documents those generic tasks and their interface.

Task ``PackageBuild``
=====================

A generic task to represent a package build, i.e. the act of transforming
a source package (.dsc) into binary packages (.deb).

The ``task_data`` associated to this task can contain the following keys:

* ``input`` (required): a dictionary of values describing the input data

  * ``source_package_url`` (required): an URL pointing to a source package
    (.dsc file), it is used to retrieve the source package to build, it can
    be a publicly accessible URL or a debusine artifact (that might be
    private but accessible with a token).
  * ``checksums`` (optional): a dictionary of checksum data.

    * ``sha256sum``: SHA256 checksum of the file at ``source_package_url``.

* ``distribution`` (required): name of the target distribution
* ``extra_repositories`` (optional): a list of extra repositories to enable.
  Each repository is described by a dictionary with the following
  possible keys:

  * ``sources_list``: a single-line for an APT's sources.list file
  * ``authentication_key`` (optional): the ascii-armored public key used to
    authenticate the repository

* ``host_architecture`` (required): the architecture that we want to build
  for, it defines the architecture of the resulting architecture-specific
  .deb (if any)
* ``build_architecture`` (optional, defaults to the host architecture):
  the architecture on which we want to build the package (implies
  cross-compilation if different from the host architecture). Can be
  explicitly set to the undefined value (Python's ``None`` or javascript's
  ``null``) if we want to allow cross-compilation with any build architecture.
* ``build_components`` (optional, defaults to ``any``): list that can contain
  the following 3 words (cf ``dpkg-buildpackage --build=any,all,source``):

  * ``any``: enables build of architecture-specific .deb
  * ``all``: enables build of architecture-independent .deb
  * ``source``: enables build of the source package (.dsc)
* ``build_profiles``: list of build profiles to enable during package build (cf
  ``dpkg-buildpackage --build-profiles``)

* ``build_options``: value of ``DEB_BUILD_OPTIONS`` during build
* ``build_path`` (optional, default unset): forces the build to happen
  through a path named according to the passed value. When this value
  is not set, there's no restriction on the name of the path.
