Conda recipe generator for the LSST Software Stack
This repository contains conda-lsst
, a utility that generates
Conda recipes for LSST stack packages. The
recipes are generated using the information stored in
EUPS, the package manager
that LSST uses internally.
conda-lsst
also provides a convenient mechanism to build the generated
recipes in proper order, and to upload the resulting binaries to a remote
web server from where they will be installable using conda install
.
You only need this code if you wish to create and distribute your own build
of the LSST stack; conda-lsst
is not needed to just use the LSST
codes. If all you want is to install the LSST stack, skip to the
Installing section.
This code is beta quality; it is expected to work on OS X and Linux.
You need at least Miniconda with
conda-build
, jinja2
, requests
, and sqlalchemy
packages installed,
as well as the requests_file
python module (install
with pip install requests_file
). You will also need to build and
install the git-lfs
and lsst-git-lfs-config
conda packages using
the recipes found in etc/recipes
directory. For your convenience,
there's a script, ./bin/bootstrap.sh, that when run:
bash ./bin/bootstrap.sh
will install all of these for you. Miniconda will be installed into a
subdirectory named miniconda
.
If you have docker, and want conda
package builds to happein within a
CentOS 5 docker container (recommended, for maximum binary compatibility),
run:
cd docker
make
to create the necessary docker image.
Once make
is run, it will print out the instructions on how to set
the REBUILD_RECIPES_IN_CONTAINER
environmental variable and make the new
container known to rebuild.sh
.
The rebuild.sh
script that conda lsst make-recipes
checks for the contents
of $REBUILD_RECIPES_IN_CONTAINED
variable and uses it as a driver for conda build
(if present). It's best to export this variable from your .bashrc
.
To generate all packages and upload them to a remote service, run someting like the following:
# Add conda-lsst to PATH. This will allow you to call it as `conda lsst`
export PATH="$PWD/bin:$PATH"
# Tell conda where the channel we'll be uploading to is
conda config --add channels http://eupsforge.net/conda/dev
# Build conda packages for LSST codes (the recipes will be stored in the `recipes` subdirectory)
conda lsst make-recipes build:b1852 lsst_distrib lsst_sims --build
# Upload to the 'dev' channel
conda lsst upload
Note: conda-lsst
is smart about not rebuilding packages
that have already been built.
Build logs are stored in recipes/<packagename>/_build.log
.
Failed builds can be debugged by changing into the source directory (usually
.../conda-bld/work) and running ./_build.sh <eupspkg_verb>
where the verb
is typically build
.
The first parameter passed to conda lsst make-recipes
is an lsst-build
generated manifest. You may be familar
with these as manifest.txt
files that lsstsw
generates in its build
directory. For "official" builds, they're also stored in the
canonical versiondb
by the lsstsw's rebuild
script when run on lsst-dev
as lsstsw
.
To find out which build manifests (as stored in the canonical versiondb
contain a particular product, use the what-builds
utility. For example:
$ what-builds lsst_apps | tail
lsst_apps b2005
lsst_apps b2007
lsst_apps b2010
lsst_apps b2012
lsst_apps b2014
lsst_apps b2015
lsst_apps b2017
lsst_apps b2018
lsst_apps b2020
lsst_apps b2021
See this gist or the video tutorial.
The binaries are currently being built for 64 bit Linux (any variant newer than RHEL 6) and OS X (10.9 or later). See here for more on binary compatibility.
LSST code base consists of more than a hundred packages (largely written in
C++ and Python) that depend on each other at build and run time. LSST
natively uses EUPS as the
package manager (EUPS preferst the term product). EUPS provides a way to
record dependencies (through so-called "table files", stored in the ups/
subdirectory of each package), as well as a uniform mechanism to build code
that the packages contain irrespective of their internal build system. The
latter is known as eupspkg
and is described in some detail in the
docstring of the
eupspkg.py
EUPS module. EUPS also provides a facility to distribute code in source
form, but it does not have a facility do distribute binaries.
Conda is a package manager written for the
Anaconda Python distribution. It has support for building and distribution
of binaries: given a recipe -- a shell script that contains commands
needed to build the binary, plus some metadata -- the conda-build
utility
will build the source and tarball it into a "conda package" (physically, a
.tar.bz2
file). When uploaded to a specially
formatted
web-accessible directory (or a hosted service such as anaconda.org), these
binary packages can then be installed by the user using the conda install
command.
The way we "marry" EUPS and Conda is by a) building EUPS itself as a Conda
package, and b) distributing pre-built EUPS products as Conda packages
themselves. We therefore primarily utilize Conda as a convenient binary
packaging and distribution mechanism; the products themselves still need to
be setup
-ed using the usual EUPS commands, but the eups distrib install
step (which would've built them from source) is now replaced by a simple
conda install
(which simply unpacks the binary). We also make EUPS itself
a dependency of all products: running conda install lsst-distrib
will now
install both the LSST stack and EUPS itself (thus obviating the need for a
separate newinstall.sh
script).
The conda-lsst
script uses the package dependency information extracted
from EUPS, and the eupspkg build system, to generate conda recipes that
build LSST codes for a specific release. It also generates a script
to build those recipes (in dependency-sorted order), as well as provies
a convenient upload
utility to upload the built binaries to a package
repository.
This section describes technical details and design consideration that went into the code. The basic instructions on how to run it are above.
-
conda lsst make-recipes
reads the list of all products, their versions, and their dependencies from the manifest file (the first argument). The remaining arguments list the EUPS names of the products to be turned into conda packages. Only "top level" products needs to be specified --conda-lsst
will traverse the dependency tree and process all dependencies as necessary. -
The list of products will be topologically sorted and a recipe will be created in
recipes
subdirectory for each one that needs to be built. -
Of all channels known to
conda build
, those matchingour_channel_regex
inconfig.yaml
will be searched for packages with an identical recipe. If one is found, that means the package has already been built and doesn't need to be rebuilt once again (in which case a '.done' file will be placed into the particular product's recipe directory). N.b.: the actual lookup is sped up by hashing and using a database (see below). -
If
--build
is given on the command line,conda lsst make-recipes
will runconda build
on each recipe, to build the packages. The results are stored in$CONDA_ROOT/conda-bld/<platform>/
directory (where platform is typicallyosx-64
orlinux-64
, depending on your machine). -
The resulting packages can be uploaded to a remote repository using the
conda lsst upload
command. It uses eitherscp
orrsync
to upload the results to the destination server.
We don't use EUPS directly to extract the dependency information (though we
could!), but instead use the manifest.txt
files generated by
lsstsw
(the backend machinery for LSST's
CI system).
A manifest.txt
file lists all products, EUPS versions, dependencies, and
git SHA1s for all products in a "build" (one run of buildbot). Using a
manifest.txt
file as input allows us to generate binary Conda packages for
any given build.
Subsets of the build can also be generated by providing a list of top-level packages on the command line (e.g., very useful when debugging, as one doesn't need to repeat a whole build for a quick test).
For convenience, given a build tag, one can pull the manifest.txt
directly from
versiondb directory
where they are kept. That lets you do things such as:
conda lsst make-recipes build:b1497 wcslib
Recipe generation is controlled by entries in config.yaml
file.
They control virtually all aspects of recipe generation, from injecting
missing system dependencies, to defining the output directories and
default destination servers to upload to. Refer to comments in config.yaml
for more.
The settings from condig.yaml
can be overridden by keys in ~/condalsstrc
. For
example, here's what I (mjuric) have in my ~/condalsstrc
:
$ cat ~/.condalsstrc
our_channel_regex: '^(?:https?://conda.lsst.mjuric.org/)(.+?)/?$'
upload:
server: 'centos@conda.lsst.mjuric.org'
dir_base: '/var/www/html'
conda: 'conda'
Conda-packaged EUPS, when unpacked by conda install
, is placed into:
$ROOT/opt/eups/
and is configured to store its database (ups_db
) information in
$ROOT/var/opt/eups/
. $ROOT
above refers to the root of the Conda
environment you're installing into.
We've chosen not to install the EUPS binary (eups
) into the global
$ROOT/bin/
directory (that is on the users' path). This is because EUPS
needs to be correctly initialized before use, and it's likely users would
forget to do this. Instead, we install a dummy eups
and setup
scripts
there that remind the users of the need to initialize EUPS before continuing.
The initialization is typically done by source
-ing the
setups.sh
script from its bin/
directory (this is the majority of what
loadLSST.sh
does). The conda package links those scripts into the global
$ROOT/bin/
directory, but prefixed with eups-
. Therefore, to initialize
EUPS, you need to run:
source eups-setups.sh
(or .csh
, .zsh
, etc., depending on your shell). Note that it's not
necessary to specify the full path to eups-setups.sh
-- scripts to be
sourced are looked up on $PATH
.
For details, see the recipe in etc/recipes/eups
.
Conda-packaged EUPS is fully functional. You can use it to install
additional EUPS products (for example, with eups distrib install
), add
tags, etc. We only recommend against using it to remove existing products
installed through conda
.
The binaries produced by generated recipes unpack into:
$ROOT/opt/lsst/<product>/<version>/
directory, where product
is the EUPS product name and the version
is the
EUPS version (note: these can be different from the Conda names and
versions).
Their EUPS information (.version
and .chain
files) unpacks itself into
$ROOT/var/opt/eups/ups_db
. Therefore, EUPS becomes aware of the new
product as soon as it's installed.
For details, see the section on the build system and
the files in the etc/templates
subdirectory.
All packages come tagged with two EUPS tags: current
and conda
. As Conda
doesn't support having more than one version of a package installed in the
same environment, having everything tagged as current is the right thing to
do.
It should be easy to extend 'conda-lsst' to take arbitrary additional tags
-- all they do is generate extra .chain
files. The only complication is
that all tags EUPS knows about must be declared in
$ROOT/var/opt/eups/ups_db/global.tags
file; the code already automatically
handles that using a pre-link.sh
script (see
etc/templates/pre-link.sh.template
).
Note that once the package is installed, you can use EUPS to declare additional tags, as you wish.
conda lsst make-recipes
generates the build recipes by filling out the missing
information in .template
files found in the etc/templates
directory. The generation is completely automatic.
The build maximally reuses the eupspkg.sh
build system that all LSST EUPS
packages use; in essence, the generated build.sh
script does the
following:
source eups-setups.sh
eupspkg prep
eupspkg config
eupspkg build
eupspkg install
python -m compileall ...
eupspkg decl
eups declare -t <tags> <product> <version>
This is essentially the same sequence performed by EUPS when building
packages with eups distrib install
.
All the added complexity here (compared to what you'd find in
eupspkg.py)
comes from the need to handle various corner cases, relocateability, and to
inject correct build flags so as to make the binaries redistributable across
multiple versions of the operating system (e.g.
MACOSX_DEPLOYMENT_TARGET
). Note: build.sh.template
script also needs
some more work and cleaning up
-- it's likely possible to substantially simplify it!
This build script is executed by conda build
to build and tarball the
binary package (with the metadata about the package such as the name,
version, and dependencies coming from meta.yaml
, which is also generated
by filling out a template).
Some products that are distributed with EUPS (either as full packages or
stubs) already exist in default Conda repositories. Examples include
numpy
, swig
, scons
, twisted
, etc. We don't want to unnecessarily
duplicate functionality by building our own versions.
However, we can't just skip these. Because of the way the current
sconsUtils
build system works, those
packages still need to be declared to EUPS and their .cfg
files need to be
present. Otherwise their dependents won't know how to build themselves.
The solution to this problem is not to perform the full build of packages
that Conda can supply to us, but to only copy the contents of their ups/
directories and declare them to EUPS. To make it clear these are packages
that only carry the EUPS config files, we name them
lsst-PRODUCTNAME-eups-configs
. Secondly, whenever a product depends on
one of these packages, we inject the dependency on both the conda package,
as well as the relevant -eups-configs
package. For example, the afw
EUPS product depends on numpy
and therefore the conda package lsst-afw
will depend on lsst-numpy-eups-configs
and numpy
.
The products to be considered internal (provided by conda) should be listed
by hand in the internal_products
variable in config.yaml
.
Note that this is really a workaround; the long-term solution is not to
depend on .cfg
files for builds and instead use something more standard,
like pkg-config
, as well as make it possible to run conda binaries w/o
EUPS.
The code currently skips over a few (optional) products, most notably
afw-data
(that is very large). This is defined in the skip_products
variable in config.yaml
Any conda-specific patches needed to build the products should be placed in
etc/patches/<product>/
directory (the location can be changed by
setting patchdir
in config.yaml
), with a .patch
extension. They should
apply with patch -p0
. See the patches currently there for examples.
Hint: if you're generating the patches with git diff
(as you probably
should), use something like:
git diff --no-prefix master > mypatch.patch
to have the output at the -p0
level.
We recommend to maintain conda-lsst patches on branches in the package's
repository (e.g. a conda-patches
branch). To make it easy to create a
patch from a brancg in a git repo, use the make-patch
utility as:
make-patch patch-name.patch master conda-patches
This creates a file patch-name.patch
with a diff between
master..conda-patches
in the etc/patches/<product>/
directory, where
<product>
is automatically inferred from the name of the .table
file in
the git repo's ups/
directory. If you omit the third argument to
make-patch
, HEAD
is assumed. If you omit the second argument, master
is assumed. Therefore, assuming you're on conda-patches
branch, running:
make-patch patch-name.patch
is equivalent to the invocation above.
Note: there is currently no way to declare a patch should only be applied for certain commits (or commits coming before a certain commit).
Some (external, wrapped) EUPS products, most notably sncosmo
and pymssql
are distutils packaged Python products that transparently use easy_install
to install additional dependencies. Conda does not allow this (and rightly
so, as it makes it impossible to guarantee offline installs).
For these packages to build, we need to:
- manually create recipes for all of their dependencies, and place them in
etc/recipes
. The easiest way to do this is by usingconda skeleton
. - Declare their depenencies in
config.yaml
by adding them to thedependencies
list, with their name prefixes byrecipe/
.
Here is an example of the entry for pymssql
:
dependencies:
pymssql:
run: [ cython, recipe/setuptools-git ]
build: [ cython, recipe/setuptools-git ]
Note that some EUPS packages have undeclared dependencies on conda
packages as well (e.g., pymssql
above depends on cython
).
A conda package can be installed into any directory on the end-user's system. It therefore needs to be relocatable.
Conda already has the mechanisms that greatly (and largely transparently) help make this happen (see the entry about relocatable packages in conda documentation).
This mechanism (among other things) ensures that our codes built against
libraries suppled by conda (those in $ROOT/lib
) will be dynamically linked
against them at the end-user's system (this is typically acute for
libssl.so
, where the system versions vary wildly compared to the Conda
ones).
That said, we did have to undertake one additional step. On OS X, we needed
to inject the -headerpad_max_install_names
option (e.g., see
here)
to the linker command line for install_name_tool
to work reliably.
Note that we don't need (or want) the paths to our libraries to be
hardcoded into the libraries they depend on, as EUPS will handle this
through LD_LIBRARY_PATH
. In fact, hardcoding it (e.g., in a
RPATH entry) would make it impossible
to mix-and-match EUPS versions of different packages (for those inclined to
do so) as the RPATH
entry takes precedence over LD_LIBRARY_PATH
.
That said, it may be nice to encode the path to other libraries in
RUNPATH
, to have a default fallback
when LD_LIBRARY_PATH
is not set.
Binary compatibility is largely determined by the presence (or absence) on the end-user's system of system libraries and frameworks that the code has been built against.
On OS X, we build with MACOSX_DEPLOYMENT_TARGET=10.9
set, which should
ensure that the built binaries work on OS X 10.9 (Mavericks) and later.
Older systems are unsupported because they utilise a different
implementation of the C++ standard library (libstdc++
vs libc++
).
On Linux, we build on a RHEL5-compatible system. Running on any newer distribution is expected to work.
Conda follows different conventions and imposes additional restrictions on package naming and version format:
-
Conda package names must be all lower case. They are allowed to contain the '-' sign, however.
-
Conda versions consist of three parts:
-
The version, preferably in PEP-386 format. This is the "real" version of the package. '-' signs are not allowed in version strings. Letters are also undesirable (as they're interpreted -- see the PEP above).
-
The build number. This number should be incremented every time the conda recipe changes and a new binary is rebuilt (while the actual upstream source has not changed).
-
The build string. This is an arbitrary string that isn't used anywhere when comparing versions. It can be used to record some meaningful information about the build itself (e.g., the SHA1 of the source). By convention, the build string is usually of the form
<something>_<buildnumber>
or (in case there is no prefix), just<buildnumber>
-
For more details, see here.
By convention, canonical Conda package names are written out as
<package_name>-<version>-<buildstring>
.
Given those constraints, converting EUPS names+versions to Conda names+versions is done as follows:
- The EUPS product name is transformed to conda package name by replacing all the underscores '_' by dashes, '-'. The dashes as word separators seem to be a generally accepted convention in Conda world.
- We also prepend an
lsst-
to it, except in cases where it would be silly (e.g.,lsst-lsst-distrib
). - Some products are given different names by fiat (see the
eups_to_conda_map
variable inconfig.yaml
).
Converting versions is tough -- for the exact heuristics see the code in
eups_to_conda_version()
function in the code. That said, here's roughly
what happens:
- Any versions of the form
X.Y.Z.W....
are left as is. - Any version that end in
xxx.lsstN
are converted toxxx.N
- Any versions of the form of
xxx-N
are converted toxxx.N
- If a version has a SHA1 embedded, it's moved or copied to the build string portion of the Conda version.
- Any version of the form
X.Y.Z...
whereX
is between 10 and 20 is guessed to be an LSST package (as opposed to some external code that we distribute, for examplegsl
), and a 0 is prepended to it. It is highly unusual in the Conda world to have version 10.x of something that's really alpha-quality code; this brings the version number to sometning more consistent with the state of the codebase. - Finally, the +N suffixes are converted to .000N suffixes (i.e.,
formatted as
%04d
). - The conda build number always starts at zero, and is incremented every time the conda recipe used to build the source changes (see below).
Some examples:
skypix-10.0+235 --> skypix-0.10.0.0235_1
obs_test-10.1-4-g461b62d+49 --> obs_test-0.10.1.4.0049-461b62d_1
boost-1.55.0.lsst1-2+3 --> lsst-boost-1.55.0.1.2.0003-1
conda lsst
does its best to tell you which recipes have already been
built, so that you don't have to rebuild them (such recipes have a
.done file added to their directory, which the generated rebuild.sh
script reads and honors).
It does this by comparing the generated recipe to the recipes of already
built packages (on channels that match the our_channel_regex
pattern
from config.yaml
), that have the
same name and version (but not the build number or build string). If the
recipes are the same (modulo build number/string; see below), the builds
would result in identical results and therefore no new build is necessary.
A rebuild is when the same source code (i.e., having the same git SHA),
including the dependencies and EUPS versions, is rebuilt using a different
recipe (e.g., the recipe may have been modified to change a compiler flag,
or add a new a conda-specific patch, etc.). This may happen when the
recipe templates in etc/templates
are changed. A rebuild will keep the
same version, but the build number (and therefore the build string --
by convention, build strings are some <prefix>_<buildnum>
or just
<buildnum>
) will increment to reflect this is a rebuild (an example may be
lsst-boost-1.55.0.1.2.0003-0 and lsst-boost-1.55.0.1.2.0003-1).
While conceptually simple, this comparison of recipes would be extremely
inefficient if implemented naively: conda lsst
would need to download
every built package, from every channel, extract the recipes, and compare
them to the recipe(s) of interest (modulo build number/string) until a match
is found. Since built packages are expected to grow to hundreds of GB, the
naive implementation is infeasible.
Instead, we cache the hashes of recipes (minus the lines in meta.yaml
that define the build number/string) in a local sqlite database (in
pkginfo-cache/<platform>/cache-db.sqlite
). When a new recipe is generated,
it is hashed and compared to the hashes in the database.
conda lsst
refreshes this cache every time it is run, unless
--no-cache-refresh
option is given; if a new package is detected in any of
the channels, it's downloaded, the recipe hashed and cached. Similarly, any
packages that are removed are purged from the cache.
This is truly a cache -- it is safe to delete; conda lsst
will
transparently recover if it's not present.
To allow the user to use the binaries, they need to be uploaded to a "channel" of a conda repository. This repository can either be on the anaconda.org hosted service, or a HTTP-accessible directory on a remote server to which you can SCP the files.
conda lsst upload
makes it easy to upload to the latter of the two. The
defaults in config.yaml
are set up so that just running conda lsst upload
will upload to a subdirectory of ~mjuric/public_html/conda
at
lsst-dev.ncsa.illinois.edu
.
For example, assuming I have http://eupsforge.net/conda/dev
in my
~/.condarc
, running:
conda lsst upload
will upload the built packages to
lsst-dev.ncsa.illinois.edu:~mjuric/public_html/conda/dev
. Note that to do
this, you need to have permissions to write to this directory.
This directory is exposed to the web as http://eupsforge.net/conda/dev; for a user to install from this channel, they'd run:
conda config --add channels http://eupsforge.net/conda/dev
after which the usual conda install
command will find the packages
available there.
More options are available; see conda lsst upload -h
for a more complete
summary.
Conda repositories follow a simple package repository format. That they have to be initialized before being used; for example:
cd path/to/my/channel
mkdir osx-64 linux-64
conda index osx-64
conda index linux-64
conda index
will create the repodata.json
(and repodata.json.bz2
)
files that conda
client uses to search for packages in the channel. conda lsst upload
automatically runs conda index
after every upload.