Bootstrapping

Current
- Setup for CI
  - e.g., for GitHub Actions
  - Download and eval shell script
    - Sets several environment variables. TODO Need to check if these variables are script specific or used within the bootstrapping process.
- Setup for Docker
  - Dockerfile
  - Packages needed to fetch repos (VCS): https://github.com/orbital-transfer/launch-site/blob/master/maint/docker-debian-packages
- Bootstrap
  - https://github.com/orbital-transfer/p5-Orbital-Launch/blob/master/lib/Orbital/Launch/Command/Bootstrap.pm
- config:
  - Install directory
    - global install: no directory specified (so installs to perl -V:sitelib)
    - CI install: in ~/.orbital.
      - Only when $ENV{CI} is true
    - default: relative to launch-site repo
- steps:
  1. Install cpm (from vendor submodules) if not available
  2. Install cpanm using cpm.
  3. Install App::scan_prereqs_cpanfile using cpanm.
  4. Generate a cpanfile for every orbital-transfer submodule.
  5. Install dependencies from cpanfile for every orbital-transfer submodule using cpm then cpanm.
Future
- config:
  - Install to directory, upload to GitHub
    - upload to GitHub for cacheing
  - Install to directory at bootstrap time
    - determine the directory to install to using a class
- steps: same as before
  - Perhaps need to generate cpanfile in each repo, only install and run App::scan_prereqs_cpanfile if cpanfile missing.
- decreasing deps:
  - Need to outline a set of modules that are pure-Perl or can be translated to pure-Perl equivalents.
  - Some of these can be installed as the base dependencies of all the Orbital::Payload::* modules.
  - Others need to be set up to be installed based on context:
    - e.g., if loading on GitHub actions, then load everything needed there
  - Create a way of specifying features that require different sets of modules in a distribution.
  - Use a loader function to look up the features needed by a module and install if necessary.
  - Need to have a context for installation and loading which holds the directories for bootstrapping.

Workflow

Current

Diagram

Outline

Have two tasks:
- Install
- Test
What do these tasks do:
- Determine the platform
  - Platforms:
    - Debian + apt,
    - macOS + Homebrew,
    - Windows + MSYS2.
- Set up environment for everything (task: Install, Test).
  - Not great... this is specific to project-renard. It downloads the test data and sets an environment variable.
- Each platform has a _install method (task: Install).
  - Debian + apt: Install pkg:deb/debian/xvfb, pkg:deb/debian/xauth.
    - pkg:deb/debian/xauth is only needed for the xvfb-run script. xvfb-run seems to be distribution-specific (i.e., the script is not provided by the upstream xorg-server package but is packaged by some Linux distributions such as Debian and Fedora).
  - macOS + Homebrew:
    - Update Homebrew if needed (in non-CI environments). This is necessary because Homebrew can be slow to update and this slows down the CI. This might not be needed with newer versions of Homebrew.
    - Remove old Python: python@2 to clean up a pre-installed package. This is (was?) specific to GitHub CI and was causing a conflict with Python 3.
    - [SKIPPED] Installing xquartz. This might have been for an older version of gtk+3. Note that this requires adding a tap to install a cask. Good thing it isn't needed now but being able to use a Homebrew tap is still important.
    - Install pkg-config. Often needed for finding the libraries.
    - Install openssl.
  - Windows + MSYS2:
    - Prepare MSYS2:
      - Disable CheckSpace option for pacman. Checking the disk space takes time in the CI. This speed up is also used by the GitHub action https://github.com/msys2/setup-msys2.
      - Update mirror list. This was needed at some point to fix the update because the mirror was down. Need to look more into this as this was a hot fix at the time.
      - Run pacman update, then
        
        Update also requires killing processes that use msys-2.0.dll. This is the same as what installers of tools like Chocolatey and the aforementioned GitHub Action do.
        
        Install pacman-mirrors.
        
        Install git. Needed for below when fetching repos.
        
        There was a (now disabled) work around to update GCC9.
    - Install build tools using pacman.
    - Install openssl package using pacman.
    - Copy gcc to cc and mingw32-make to gmake. Not sure if this is needed anymore. This makes Perl XS builds happy.
    - Create C:\tmp for Data::UUID. Not fixed yet!
    - Install Perl. Fixes pl2bat because pl2bat.bat isn't there. Installs cpanm, cpm, and some basic Perl modules. Includes my own patches to cpm to help it work in parallel (not ready for upstream). Not great as this is Perl specific and really should be part of non-platform code. Who did this?! Oh, right.
- Each platform has a _pre_run method (task: Install, Test)
  - Debian + apt:
    - Start the Xvfb sever. Note that the reason why Xvfb is also needed for the Install task is that running dzil listdeps or dzil build needs a display because with project-renard/curie, these commands load the code. It runs Gtk3's init on loading the module! This is a bad design, but worth working around just to prove that it can be worked around. Run DISPLAY= perl -MDevel::Hide=Gtk3 -S dzil listdeps to see the issues. This is also why errors while building occur when these modules have not yet been installed in the CI environment. A very unsensible arrangement.
  - macOS + Homebrew: no-op.
  - Windows + MSYS2: no-op.
- Each platform has an environment (task: Install, Test)
  - Debian + apt: sets DISPLAY to the Xvfb started above.
  - macOS + Homebrew:
    - This environment depends on the Homebrew prefix.
    - Adds to PKG_CONFIG_PATH: openssl, libffi
    - Adds to PATH: openssl, gettext.
    - Sets ARCHFLAGS: This has something to do with using a macOS system Perl. Might not be needed with Homebrew. Bad idea to use the macOS system Perl anyway.
  - Windows + MSYS2:
    - Set MSYSTEM: default is MINGW64.
    - Set PATH: to the default paths for that MSYSTEM.
    - Set MSYS2_FC_CACHE_SKIP: skip font cache for fontconfig package.
    - Add hack that modifies Perl linking for EUMM: yikes! This isn't even in the same project! Boo this code!!!
      - Unsure what a proper fix for this would be. Each distribution links to different libraries and will need transformations, but these transformations are not so much of a special case that they can't be given a name (e.g., use :nosearch + :search around a particular -lfoo flag).
    - Set ALIEN_BUILD_PKG_CONFIG to prefer PkgConfig::CommandLine: This should probably be upstream.
- For each repo to install via depth-first walk (task: Install):
  - Get the repos using git. Note that git is already installed in most environments (except MSYS2 in the CI so that is why the _install for MSYS2 step installs git.
  - Install all the "native" dependencies of the repos first, then install the repos.
  - Each platform has an install_packages method to install the "native" packages (task: Install)
    - Debian + apt:
      - Check if all needed packages are already installed.
      - Otherwise apt-get update.
      - Then apt-get install.
      - Includes a hack to install meson via pip3. This is because the version of Meson in the specific Debian container used is too old. I would like to support this somehow for flexibility purposes. But not here. Not like this... not like this. This will actually break with newer systems due to how pip3 is now set up using an externally-managed-environment under PEP 668.
    - macOS + Homebrew:
      - Disable auto update. See above in _install for why this was slow on older Homebrew's.
      - Check if fontconfig is a dependency. If it is, use a hack to skip the font cache generation post-install step. Terrible. But the hack is still needed.
      - brew install any packages that are not already installed. Ignore errors by using || true. Alas!
    - Windows + MSYS2:
      - Uses Chocolatey and Pacman! Wait, I thought this was just Pacman for MSYS2! The horror! I believe this is for one specific thing which was to get testing of Anki working on Windows (because there is no way Anki is going to be installed via an MSYS2 package as the focus is on developer tools).
  - Install the repo itself (task: Install)
    - Right now this is only for Perl distributions.
      - via Orbital::Payload::Env::Perl->apply_roles_to_repo.
      - More coupling! Cut the knot!
    - There is some caching here.
    - $repo->install.
- Test the main repo (task: Test).
  - $repo->run_test.

Future

The tooling runs on a System
- Host System is where it all starts!
- Target System is what you want the output to run on.
  - Targets can be nested.
  - e.g., you need to run inside of a container inside of VM on another machine. For example, you have a build box with Windows VM and you need to use specific Windows Docker containers.
For host, target
- Need to identify the kernel, OS, architecture.
  - Note: some OSes are multi-arch or have a compatibility mode (x86_64 can run x86 code, some Windows could run 16-bit using NTVDM)
    - e.g., Debian supports multi-arch apt package settings.
  - Process:
    - Note: uname is from POSIX https://en.wikipedia.org/wiki/Uname
    - Kernel discovery process:
      - On Linux
        
        uname -s: name
        
        uname -r: release
        
        uname -v : version
    - OS discovery process:
      - On Linux
        
        uname -o
      - On DOS, Windows
        
        ver
    - Kernel Architecture
      - On Linux
        
        uname -m
      - Note: kernel is compiled for a particular architecture
      - This might not be what the CPU could potentially support.
      - e.g., a 32-bit kernel running on a 64-bit CPU machine
        
        this may limit what you can run
    - CPU architecture
      - CPU info
- Need to identify runtime environment (default: host and target are identical and local).
  - Identify which tools are available
    - Debian: apt, most likely GNU coreutils
    - MSYS2: pacman
    - Alpine: apk, Busybox (as opposed to GNU coreutils)
  - Process:
    - Depends on OS.
    - In some cases the runtime environment tools you want are not installed such as macOS and Windows that have multiple package managers.
- Scenario 1:
  - Goal: as discussed on the homepage, I would like to test on FreeBSD.
  - Step 1: Get the name of the Vagrant box to use.
  - Note: https://app.vagrantup.com/freebsd are official.
    - More info on release announcements like https://www.freebsd.org/releases/13.0R/announce/.
    - Official VM images can be part of the database.
For configuration
- Package dependencies
  - Have a way to say what you depend on and how to install it.
  - Examples:
    - Install Perl using:
      - Debian : apt-get install perl (pkg:deb/debian/perl)
      - Unix-likes : perlbrew (options for version and all compile flags), requires a C compiler
      - Windows : berrybrew (options for version and architecture)
      - Windows : chocolatey install strawberryperl (has options for version and architecture)
      - Linux | macoOS : https://github.com/skaji/relocatable-perl
      - Unix-likes : plenv
        
        extension for relocatable-perl: https://github.com/skaji/plenv-download
      - Inside of GitHub Actions: https://github.com/shogo82148/actions-setup-perl
        
        various options (threads, etc.)
      - Version info:
        
        https://metacpan.org/pod/CPAN::Perl::Releases
        
        https://metacpan.org/pod/CPAN::Perl::Releases::MetaCPAN
    - Install Ruby using:
      - Unix-likes: https://github.com/rbenv/rbenv
      - Unix-likes: https://github.com/rvm/rvm
      - https://phusion.github.io/traveling-ruby/ (Windows have to be made on Linux | macOS)
      - Windows : chocolatey
    - Install MSYS2 (Windows only)
      - Using chocolatey (options include path)
      - Inside of GitHub Actions: https://github.com/msys2/setup-msys2
  - This is necessary for different kinds of scenarios:
    - Scenario 1:
      - Goal: make sure that code works with the Perl interpreter and Perl modules packaged in Debian.
      - Given a project that depends on Perl modules (e.g., pkg:cpan/module/Foo)
      - Step 1: Map the Perl modules to Debian packages (e.g., pkg:deb/debian/libfoo-perl)
      - Step 2: Install the ones that map via Debian packages
      - Step 3: Install the rest using CPAN client.
      - Note: This implies the Perl being used is the /usr/bin/perl via Debian pkg:deb/debian/perl.
      - Example: used in the BioPerl CI.
      - Process:
        
        Need to have a way to enumerate Perl module dependencies.
        
        Enumerating dependencies can be done via different strategies (that need priority):
        
        cpanfile
        
        META.json
        
        Makefile.PL
        
        These are part of what CPAN clients can provide.
      - This mapping process can be done in a way that is tied to a particular platform version and stored as configuration data that can be automatically updated. This is useful when using a CI or Docker so that the dependencies are "baked in".
    - Scenario 2:
      - Goal: test different build options for Alien::Build.
        
        Most important: ALIEN_INSTALL_TYPE
        
        Others are listed in the documentation under the ENVIRONMENT heading.
      - Given a Perl Alien project that uses Alien::Build that is used to provide a library bar.so and headers bar.h.
      - Alternative 1.1: On Debian, Alien install type "system" needs pkg:deb/debian/libbar-dev to work.
      - Alternative 1.2: On macOS, Alien install type "system" needs pkg:brew/formula/bar to work.
      - Alternative 1.3: Alien install type "share" only needs compiler. This is tested on multiple OSes.
      - Example: many different Alien modules
      - Process:
        
        Need to have a way to define dependencies for different options / variants.
    - Scenario 3:
      - Goal: Select which CPAN mirror to download from.
      - Given a Perl project that has local patches to dists that it needs.
      - These patched dists are stored in a local private mirror.
      - Set the mirror used for particular packages.
      - Pin to a particular version.
      - Aside: List of all public CPAN mirrors http://mirrors.cpan.org/ (but this is no longer needed as CPAN uses a CDN now)
      - Aside: Important for security: also needed to set HTTPS (see cpanm docs for --mirror).
    - Scenario 4:
      - Goal: Faster builds using pre-configured images.
      - Given a project with dependencies A, B, C which are installable via the system package manager.
      - Step 1: Create a Docker image that install A, B, C if the image does not exist.
        
        May need to specify level of cacheing
        
        Cache by package names
        
        Cache by package names + available package versions.
        
        May want to use https://snapshot.debian.org/.
      - Step 2: Now run the rest of the installation inside of the Docker image.
      - Note: This will run faster on subsequent installations.
      - Note: May need to indicate what to do about checking for newer versions of A, B, C (update + upgrade)
  - When describing a dependency, there need to be different phases as mentioned on the front page. These dependencies can be static or dynamic.
  - A given software project can have multiple components that describe features or some subset of the project files. These can be mapped to different system-level packages (e.g., gtk+3 has documentation and introspection files; also gtk+3 has both libgtk and libgdk; libgdk itself supports multiple backends such as broadway).
  - An example of components could be documentation:
    - Debian's apt packages often have a separate -doc suffix package with optional documentation.
    - Language package managers have flags that can be used to turn off installing documentation.
    - It might be possible to also strip documentation from code, but this could remove copyright notices.
- Service dependencies
  - Scenario 1: Tests need X11
    - X11 can be run under various servers
      - If DISPLAY is set, have option
        
        Use current DISPLAY
        
        Use Xvfb (pkg:deb/debian/xvfb)
        
        Use Xephyr (pkg:deb/debian/xserver-xephyr) (may need a WM to run inside)
      - If DISPLAY is not set (headless environment)
        
        Set DISPLAY
        
        Use Xvfb (pkg:deb/debian/xvfb)
  - Scenario 2: Tests need PostgreSQL
    - Alternative 1: Use existing PostgreSQL on system.
    - Alternative 2: Use Docker https://hub.docker.com/_/postgres/.
      - Note: similar to official VMs, there can be a concept of official Docker images.
      - Docker images are usually controlled via environment variables.
        
        Need to model these and how they relate to configs.
        
        e.g., setting the DB password or a DB configuration setting.
    - Alternative 3: If already using Docker for running the tests separately, use Docker Compose.
    - Requires: connection string (host+port, db, auth)
      - PostgreSQL clients can use environment variables for this by default.
Testing of Orb database
- Need to be able to detect updates (e.g., new version of Debian, FreeBSD, etc. released)
- Self-tests (e.g., a configuration that uses that particular Orb).
- Need to have an archive of test reports for particular configurations.
  - Because doing integration tests on these could be heavy (time consuming, lots of compute resources).
  - Used to track stability across versions.
  - Check these into either the same repo (on a branch) or another one. Commit messages should contain metadata about which database version generated the report (this can be done with git notes).

Outline

Fetch repos:
- current repo is used as is
- Fetching dependencies have have several strategies
  - if have git
    - Have metadata to repo elsewhere on FS
      - if repo is git and have git, can use git worktree
      - Trick that I use: to do processing on a separate FS, check out a fresh copy of the repository to a git worktree.
    - else git clone
  - else
    - fetch over HTTP using host specific tarball?

Definitions

Platform
- Note that the definition of platform used here is different from the general definition. A more specific term could be "distribution platform" which defines a minimal set needed to describe a system along with its operating system, runtime (usually libc), userspace, and package manager. Some platforms do not have the userspace nor the package manager.
- Examples
  - Alpine Linux: Linux + musl libc runtime + busybox userspace
- NOTE: package names can change from one version of a platform to another so there needs to be a concept of ranges.
- Platforms are themselves are software projects.
- Platforms can change their components from one version to another (but less likely):
  - Android's userspace changed from BusyBox to Toybox.
  - macOS changed its default shell from bash to zsh.
  - A more likely change that should be modeled is that specific releases of a Linux distribution (such as RHEL) use a particular version of glibc. This is often used as a target for long-term support for binary packages.
- Stacking / relating platforms:
  - Multi-OS package managers: Homebrew runs on both macOS and Linux so this could be interesting in the sense that some platforms could be stacked on another one.
  - Ubuntu and other platforms derive from Debian. These share many packages and package names. This can be indicated with a inheritsPackagesFrom relationship.
  - Not all packages are inherited the same way: for example, Ubuntu's firefox package for recent versions of Ubuntu are transitional packages that install Firefox via Snap (other packages like this have "snap" in their version name).
- Some package names have a suffix referring to ISO language codes for installing locale data. There should be a way to indicate this declaratively.
Task
- Something that can be run.
  - exec() calls
  - code
- Can succeed or fail.
  - A future?
- Has dependencies.
- Has inputs / outputs
  - Input:
    - stream: stdin
    - Can be variables (e.g., parameters), directories, files
  - Outputs:
    - streams: stdout/stderr
    - derived from input
- Dependency graph:
  - e.g., Task B depends on A.output0
- Tasks can have a mergeable role.
- Tasks can have a retryable role.
- Tasks can have subtasks?
  - A graph of subtasks?
- Can have specialised tasks for CIs
  - e.g., on GitHub Actions, this would indicate which OS the task is running on
- Model generic tasks such as
  - Fetch HTTPS URI and output on stdout:
    - curl -fsSL URI
  - Fetch a Git repo from GitHub given a SHA1 while ignoring the git history.
    - Can be done via HTTPS without a git client.
    - NOTE: This does not include submodules! Most repos don't use submodules, but this makes the tarball useless for those that do.
- Start by modeling tasks as simple commands
  - These do not have the full semantic modeling described later.
Resources
- lockable / mutex
  - Some resources can only be accessed for writing by one task at a time, e.g., installing CPAN modules to a library directory, running apt-get.
- More examples: files, directories, devices, ports, etc.
Scheduling
- https://en.wikipedia.org/wiki/Coffman–Graham_algorithm
- https://metacpan.org/pod/Bencher::Scenario::GraphTopologicalSortModules
  - https://metacpan.org/pod/Data::Graph::Util
  - https://metacpan.org/pod/Algorithm::Dependency
Workflow graph
- DAG
  - Maybe in the future use strongly connected components to do condensation of any directed graph?
Task::Apt::Install
- Definition of Task::Apt::Install
  - apt-get install [-y] [package-list]
    - where -y option is boolean and optional.
    - where package-list option is array and required.
- Given two tasks of type Task::Apt::Install,
  - Mutually exclusive: can not run multiple instances of Task::Apt::Install at the same time since they all require access to the dpkg lock.
    - note: the reason they are mutually exclusive should not be limited to just because they are the same type — we should look at the resources they are trying to lock
  - Mergeable: if the options other than package-list are the same, then they can be merged by merging the arrays of package-list.
Package (or Pkg)
- An installable unit.
- What gets installed varies by package type (different packaging management systems).
  - e.g., libfoo on one system might be equivalent to separate libfoo and libfoo-dev packages on another
  - perhaps be able to identify categories of files installed
    - work with pkg-config to find -I and -L
- Has dependencies.
- Dependencies should be resolvable.
- Can have different enabled features
- define data declaratively
  - it can be updated
  - it can be modeled
  - it does not have code
  - Q: how to map on to particular operations
    - Perhaps name a particular task and have different strategies to implement it?
  - Should be equally or more descriptive than the package managers that it wraps.
- bug trackers / submission
  - preferred bug tracker and non-preferred alternatives (e.g., RT, GitHub, GitLab, mailing lists)
Data
- RDF model?
  - provides graph
  - easy serialisation
  - existing ontologies
    - DOAP
      - https://en.wikipedia.org/wiki/DOAP
      - https://github.com/ewilderj/doap
    - SPDX
    - CycloneDX
      - https://cyclonedx.org/
  - link up with security advisories
  - have modules provide RDF representation?
    - see also the old Moose + Semantic Web work, LinkML + dataclasses codegen work
  - can track provenance of data
    - which graph it came from
    - was it entered via a tool or manually
    - maybe tie data changes to a commit?
    - PROV (Provenance) - Wikipedia
- Need to model command line tools
  - See PaSh https://binpa.sh/
    - https://nikos.vasilak.is/p/pash:osdi:2022.pdf
    - https://arxiv.org/pdf/2012.15422.pdf
    - Interesting possibility to know what can be parallelised.
  - can this be used to provide completion?
    - sub-commands
    - type-aware (getopt types, paths, etc.)
    - some completion is interdependent on other parts of the partial command
    - some completion depends on state (e.g., git needs to access the current repo's remotes and tracked file state) so completion can not just be static enumerations.
    - see zsh, bash completion
    - PowerShell ISE
    - Which flags show a dry-run of the action and how to parse that?
  - Possible to extract from --help, from man(1) page, from calls to getopt(3)?
  - Modeling should include version numbers and different specs (input, output, etc.) for different versions of a software.
    - what if different packages provide a conflicting command name?
    - example: GNU Parallel and moreutils both provide a parallel. In Debian, this is fixed by renaming one to parallel.moreutils.
  - Testing of these models can be automated via tracing.
    - e.g., a particular parameter refers to a resource (path, port). This will also be used in a syscall by the program.
- Need to model environment variables:
  - Each language has different ways of access environment variables:
    - C: getenv(3);
    - Perl: %ENV, Env.pm;
    - Python: os.environ[], os.getenv().
  - Documentation usually has an ENVIRONMENT section.
- Need to model build options:
  - autoconf can be parsed.
  - See how tools like Spack, Gentoo do it?
Some projects need external data
- Perhaps these can also be modeled as "packages"?
- "Data packages" as opposed to "Software packages"
- There are several solutions to this including the concept of version control for data (e.g., DVC, LakeFS).
- Trick: using GNU Stow to manage symlink to data directory structure while keeping the data as read-only. This can be used for host-specific data (e.g., .env deployment configurations by $HOSTNAME) and to avoid accidentally deleting data when cleaning project directory (e.g., make clean, git clean).
- ```
              cd "$(git rev-parse --show-toplevel)" \
              && stow --target . --dir ../merge .
```

Bootstrapping

Workflow

Current

Diagram

Outline

Future

Outline

Definitions

Organisation of project

CI of project modules

Roadmap