What is Orbital Transfer?
Orbital Transfer is a set of tools made to help developers manage their execution environments. This is similar to what an IDE does when setting up a project, but goes further in helping declaratively specifying the project's relationship to other parts of the system both under the developer's control and not.
Goals
Relationship to other code
A distribution of code does not exist in a standalone environment. It has dependencies on other aspects of the system it is running on such as the architecture, operating system, paths to libraries, configuration settings.
Poly-repo versus mono-repo
Goal: Declare the relationship between source code repositories.
Usually, a large project is broken into modules that can be used and tested independently of other parts of the project. To keep track of changes to these modules, developer's make decision to manage changes to these together (mono-repo architecture) or apart (poly-repo architecture) (see more).
The reason why a mono-repo may be used is that it makes it easier to advance
changes that cut across modules at a single time. When modules depend on each
other such a B -(depends on)⟶ A
, making a change in A
would also mean
possibly updating B
. Keeping these changes together means that advancing
versions is easier especially if there are further modules C, D, E -(depends on)⟶ B
which would also possibly need to be updated. Though one could say that
if a change in one module causes so many changes downstream, the modules are
too highly coupled.
However, a poly-repo is not that much harder to work with than a mono-repo if
one has the right tools. One could use branches and declare that a particular
branch can only work with changes before/past a certain point (e.g., version,
tag) on a dependency. Applying this to multiple repositories at once just
requires specifying at which point Δ
the change in occurs then specifying
that everything before the change must be < ∆
and after must be ≥ Δ
. To
automate this, the tool needs access to the dependency graph between the
various repos that make up the poly-repo architecture.
Of course, this technical difference does not account for the communication dynamics of the organization developing the software (see Conway's law).
Dependencies in other packages
Packages also depend on code outside of the developer's direct control. These can apply to the entire system or just to the language environment that the code is being developed in.
System/distribution package management
Goal: Declare system packages.
Often IDEs and language tooling just know about language-level dependencies and help the developer resolve these, however, system-level dependencies are an important part of testing and deployment.
Often these dependencies are things that are compiled for the entire system to uses such as GUI toolkits, libraries, or services that need to be accessed by the code (network servers, databases). Installing these often require administrator permissions, system restarts, system-specific commands/paths, so knowing the specifics of the system you are working with is necessary.
This system-specificity also applies to the package names themselves, such as
the GTK+3.0 library is known by libgtk-3-0
on Debian, gtk3
on Fedora, etc.
(see more) and there are further
packages built from the same source that contain features such as development
files (headers, debugging builds, type maps) or documentation.
Language package management
Goal: Standardize how language packages are installed and declared.
Tooling for language package management is well-known by developers and these can be used in many ways such as installing at the system level, per-user, or per-project, each with benefits and drawbacks.
System packages are more for end-user use than language packages are. An
application could be distributed through a language package manager,
but it is more likely that language packages are installed to something like a
vendor/
directory.
Developers tend to modify and extend language packages more than system
packages which means that tasks such as version pinning, custom builds, and
patching are more likely to be used there.
Furthermore, language packages can themselves have system packages that they
depend on. If my application ultimately depends on a language package that is a
binding for something provided by a system package, it would be best to declare
the <binding-package> -(depends on)⟶ <system-package>
instead of
<application> -(depends on)⟶ <system-package>
. For example, an application
that uses Gtk+3 using language package Gtk3.pm would have the following dependencies:
perl:Gtk3 ⟶ perl:Glib::Object::Introspection (*) perl:Gtk3 ⟶ debian:gir1.2-gtk-3.0 (runtime) perl:Glib::Object::Introspection ⟶ perl:Glib (*) perl:Glib::Object::Introspection ⟶ debian:libgirepository1.0-dev (build time) perl:Glib::Object::Introspection ⟶ debian:libgirepository-1.0-1 (runtime) perl:Glib ⟶ debian:libglib2.0-dev (build time) perl:Glib ⟶ debian:libglib2.0-0 (runtime)
Instead of listing these in each application that might use perl:Gtk3
, it
would be better to have a package metadata repository to make these
dependencies reusable.
Cross-language dependencies
Goal: Declare dependencies across languages for polyglot projects.
Sometimes you are not using a single language in a project and need to be able to prepare a development environment that contains modules from multiple language ecosystems. Language package managers usually only know about the specific language/runtime they are written for and developers might not be able to use system packages either because they are outdated or do not even exist.
Being able to specify dependencies on other language ecosystems and installing those modules under a project environment allow for them to be distributed together for deployment.
This can be made easier than remembering the flags and settings for each of the
language package managers (e.g., Maven, NPM, CPAN, pip, gem, etc.) and
language-specific environment managers (e.g., rbenv
, pyenv
, etc.). Instead
of having to write a script which combines these tools in an ad hoc fashion per
each project that needs cross-language dependencies and using tools that are
shell-specific, one could generate a configuration file (which can be read by
any language) that specifies where to find the installation path for each
component.
For me, this was motivated by trying to use packages for natural language processing in Java and Python. Some of these are used mostly as libraries and others can be used as servers. As opposed to working with C/C++ libraries which can be installed to a prefix and are often self-contained, packaging something like Java requires also packaging dependencies and having access to the JVM.
Why not IDEs or configuration management?
An IDE could be configured to do many of the same tasks that I have outlined, but then that information would likely be in the IDEs format and not easily shared with others that do not use that particular IDE. This information is also useful for running tasks in an environment where there is no IDE such as a continuous integration server.
Some of what I'm proposing could be done with configuration management (e.g., CFEngine, Chef, Puppet, Ansible), but those are usually for whole systems so the recipes aren't written in quite the way that would make sense for a development environment. These configuration management tools often do not understand the underlying build environments or languages and are mostly used to deploy a set of pre-built packages.
I'm trying to fit in a niche inside of DevOps, but more dev-centric. I am trying to do tasks an IDE would do if it knew more about the ecosystem of the current project. Furthermore, IDE/editor plugins could certainly be created.
Reproducible builds
Goal: Help with creating packages/installers for deployment
While some of DevOps has focused on creating VM images/containers that contain the final environment for a service/application, I believe there is still room for tools that help with creating packages.
I could use a particular cross-platform package manager (e.g. Spack, Conda, Nix, Homebrew, RPM (e.g., via alien)) to do some of what I had outlined in terms of dependency management, but when using those tools, it is similar to creating your own system packages. This limits you to the platforms that the package manager supports. One could certainly use the dependency metadata to help create system packages for those package managers, but creating system packages makes more sense later in the development pipeline when deploying, not during day-to-day development.
A combinatorial explosion
Goal: Simplify complex sequences and combinations of development tasks by making them declarative.
When doing a development task, often there is some setup involved before it can be run. Often this setup of an environment (paths, environment variables) is placed in a script that is sourced prior to starting work. These scripts are often specific to the system they were written for with hard-coded paths, so they can not be shared when working in a different configuration.
A declarative approach could help with setting up the appropriate environment before running a task.
For example, maybe I want to run git-bisect
on some code on FreeBSD, but I'm on a
Linux host. What if the tool knew how to download and start a virtual machine
with FreeBSD and install everything the project needs for FreeBSD, then run
git-bisect
inside the virtual machine. I could just let it run and come back
in an hour and I've got the results.
Instead of having to write a bunch of task-specific configuration code to glue something together, I could have a configuration such as
Host: Linux x86_64 Target: FreeBSD 12.0 x86_64 Package-Override: package: libfoo version: 1.23p24 Repo: foobar.git Build-Type: Debug Build-Options: - ASAN: on - Optimize-Level: 0 Task: git-bisect Good version: git SHA Bad version: git SHA Test: prove Strategy: *
and the tool could figure out a strategy to fulfill that by using Vagrant with a FreeBSD image while retrieving all the project-specific variations by analyzing the repository's files and repo-specific metadata. This task is not difficult to do manually, it's just mindlessly repetitive and could be replaced by a template.
I don't want to have to repeat the same configuration information multiple times for different kinds of environments and each aspect of the environment adds to the combination that needs to be supported:
- Runtime environment: local system (
chroot
), virtual machines (Vagrant), emulators (QEMU), containers (Docker, LXC), compatibility layer (Wine, NetBSD Binary Emulation), WSL1/WSL2 - Architectures: ARM 32/64,
x86_64
- Targets: WebAssembly, cross-compilation
- CI/CD pipelines: Jenkins, Travis CI, AppVeyor, GitHub Actions, GitLab CI/CD, Azure Pipelines
- IDEs: local IDEs, cloud IDEs (Gitpod, Eclipse Che / Codenvy), notebooks (Project Jupyter, Binder)
- Operating systems: GNU/Linux (Debian, Fedora, SuSE), macOS (Homebrew, MacPorts, Fink), Windows (MSYS2, vcpkg), *BSDs
- Toolchains:
gcc
,clang
, Visual Studio Build Tools, Oracle Developer Studio (Sun Studio), Open Watcom C/C++, Embarcadero C++Builder, Intel C++ - Wrappers:
ccache
,distcc
- Build systems: CMake, Gradle.
- Remote transport: SSH, WinRM
- Servers: HTTP, DBs, X11, VNC
- Configuration: files, environment variables (e.g.,
application-specific such as
PORT
, libc/linker-specific such asLD_PRELOAD
,LD_DEBUG
)
This is not hopelessly complex as each aspect has a straightforward way of using them. One just has to figure out how to combine each of these straightforward pieces.
One example of what I would like to do is generate configurations for cloud IDEs. Several of these allow for configuring a project by using a YAML configuration; by placing that configuration in a repository, you can start writing and testing that project in your browser. It would be useful to only specify configuration metadata once and output service-specific configurations. It is also possible to support weird combinations that only make sense in a cloud IDE context such as using VNC for X11 application output and running a JavaScript VNC client — allowing one to develop graphical desktop applications in a cloud IDE.
Refactoring tools
I am still thinking of how I want to do refactoring tools, but I have written a tool that lets me rename and move code between repositories that makes use of Git branches and separates groups of repository moves and groups of renames into separate commits so that the changes can be tracked.
Martin Fowler writes that when he refactors, he does not always have automated tools and that he does not always need them.
Kent Beck writes that he tells people to "stop thinking" when refactoring. He also mentions separating "vertical refactoring" from "horizontal refactoring".
Design
Bootstrapping
One of the first questions to answer is how can Orbital Transfer code install itself on a fresh install? This question is important for dealing with provisioning virtual machines as everything needed to run the software is not necessarily available at first.
Right now the code "launches" Orbital Transfer by checking to see what package manager the system has and installing the base interpreter using that then scanning the Orbital Transfer code itself for dependencies and installing those.
Phases
Some dependencies are only needed in certain phases of development. Furthermore, those phases may need certain environments for running.
Right now I have defined the phases:
- Author: Needed to generate a source code distribution from code in the repository. Not all code necessarily has an authoring phase.
- Configuration/Build: Needed to build the code from the source code distribution and place it in its final install location.
- Test: Needed for testing the code before it has been installed.
- Runtime: Needed for when the installed code is run.
For dependencies, it may be necessary to remove Build dependencies before running the Test phase to test that a runtime is clean and that dependencies are specified fully. This separation is useful for when people try to build on one machine then deploy on another. One way of doing this is to use one container to build and another to run the tests with shared build artifacts.
Another feature of these phases is that they may need to have different environments. For example, installation of system packages usually requires having root access while running tests is usually done as a non-privileged user. A test may require having certain environment variables set such as flags enabling code coverage, flags for a system service, API keys, or paths to test data. Another project may have tests that are only available when some other service is running before the tests starts.
Features / software variability / Software Product Lines
One of the ideas I want to implement is a way of modeling features in software
that can be turned on and off. There should be a way to say that B:feat2 -(depends on)⟶ A:feat1
so that the granularity of dependencies can go down to the level of features.
Carrying your environment with you
When running commands, combine both the command and the environment that the command needs such as administrator privileges, environment variables, current working directory. This helps with keeping state between each command, making for reproducible builds, and easier to debug builds.
Front-end
- Command-line front-end, REPL (e.g., a list of commands for interacting with different services that users can choose from interactively by searching and tab-completing — kind of like what a command-line version of an IDE would be)
The developer loop
Some affordances to help developers do the next step automatically:
Scan build software for message strings like "%s can not be installed due to conflict with %s needed by %s" and then when running the software and getting an error message, automatically searching for the message online.
Intercepting calls to
pkg-config
and installing the corresponding package automatically.
Mirroring and caching
Reducing build times
Part of the development process is waiting for your code to build and test in order to get feedback on your changes. Caching certain aspects of the build can help speed up the build. There should be an easy way to enable/disable caching for a project and a way to share those caches with downstream dependencies if they opt-in (for example if you are building something from the Author phase onwards).
How caching works depends on which type of caching you are doing and where the caches will reside. Different language tooling caching works differently. Some caches are small and some are large. Some caching locations need API keys and depending on where it is running those are supplied in different ways.
Cache me outside!: when you have no Internet
Orbital Transfer should let people offline their environments for development and installation. I'd like to sit in a green field and write greenfield code.
This can take several forms:
- Mirroring system/language packages so that they can be installed in a virtual environment without having to download from the Internet.
- Mirroring documentation/examples for reading and indexing so that an Internet connection is not needed to look up how to use code.