CEP 26 - Identifying Packages and Channels in the conda Ecosystem
Title | CEP 26 - Identifying Packages and Channels in the conda Ecosystem |
Status | Approved |
Author(s) |
Jaime Rodríguez-Guerra <jaime.rogue@gmail.com> Matthew R. Becker <becker.mr@gmail.com> Cheng H. Lee <clee@anaconda.com> |
Created | Mar 11, 2025 |
Updated | Apr 17, 2025 |
Discussion | https://github.com/conda/ceps/pull/116 |
Implementation | N/A |
Abstract
This CEP aims to standardize names and other strings used to identify packages, artifacts and channels in the conda ecosystem.
Specification
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.
More specifically, violations of a MUST or MUST NOT rule MUST result in an error. Violations of the rules specified by any of the other all-capital terms MAY result in a warning, at discretion of the implementation.
Identifying package artifacts
The conda ecosystem distinguishes between two types of packages:
- Distributable package names: represented by a concrete, downloadable, extractable conda artifact.
- Virtual package names: not backed by any concrete artifact. They only exist on the client side.
Package names
A distributable package name MUST only consist of lowercase ASCII letters, numbers, hyphens, periods and underscores. It MUST start with a letter, a number, or a single underscore. It MUST NOT include two consecutive separators (hyphen, period, underscore).
Virtual package names MUST only consist of lowercase ASCII letters, numbers, hyphens, periods and underscores. They MUST NOT use two consecutive separators, with one exception: they MUST start with two underscores.
Distributable package names MUST match the following case-insensitive regex:
^(([a-z0-9])|([a-z0-9_](?!_)))[._-]?([a-z0-9]+(\.|-|_|$))*$
.
Virtual package names MUST follow this regex: ^__[a-z0-9][._-]?([a-z0-9]+(\.|-|_|$))*$
.
In all cases, the maximum length of a package name MUST NOT exceed 64 characters.
Version strings
Version strings MUST only consist of digits, periods, lowercase ASCII letters, underscores, plus symbols, and exclamation marks. Additional rules apply but are out of scope in this CEP and will be discussed separately.
The maximum length of a version string MUST NOT exceed 64 characters.
Build strings
Builds strings MUST only consist of ASCII letters, numbers, periods, plus symbols, and underscores.
They MUST match this regex ^[a-zA-Z0-9_\.+]+$
.
The maximum length of a build string MUST NOT exceed 64 characters.
Artifact extensions
Artifact extensions MUST only consist of lowercase ASCII letters, numbers and periods. They must
start and end with a letter or a number. They MUST NOT include two consecutive periods. They MUST
match this regex ^[a-z0-9](\.?[a-z0-9])*$
.
The maximum length of a file extension MUST NOT exceed 16 characters.
The conda ecosystem currently recognizes two artifact extensions:
tar.bz2
andconda
, versionedv1
andv2
respectively.
Distribution strings
A "distribution string" MAY be used to identify a package artifact, without specifying the extension or channel. It MUST match the following syntax:
[<subdir>/]<package name>-<version string>-<build string>
Distribution strings apply to distributable packages. They are used as the name of the directories where artifacts are extracted in the package cache, for example.
Virtual packages MAY be also identified by a distribution string, but in those cases a subdir MUST NOT be present.
Note: Despite the similarity, distribution strings are not
MatchSpec
-like specifiers and MUST NOT be used as such.
Filenames
The filename of distributable conda artifacts is obtained by adding the artifact extension to its distribution string (without the subdir, if present). It MUST match this syntax:
<package name>-<version string>-<build string>.<extension>
The maximum length of a filename MUST NOT exceed 211 characters.
Virtual conda packages do not exist on disk and SHOULD NOT need filename standardization.
Identifying channels
A conda channel is defined as a URL where one can find one or more repodata.json
files arranged
in one subdirectory (subdir) each. noarch/repodata.json
MUST be present to consider the parent
location a channel.
Channel base URLs
The base URL for the arbitrary location of a repodata file is defined as:
<scheme>://[<authority>][/<path>/][/label/<label name>]/<subdir>/repodata.json
with <scheme>
, <authority>
and <path>
defined by RFC
3986.
Taken the channel definition above, the base URL without trailing slashes is thus:
<scheme>://[<authority>][/<path>/][/label/<label name>]
For example, given https://conda.anaconda.org/conda-forge/noarch/repodata.json
, the part leading
to noarch/repodata.json
and thus base URL is https://conda.anaconda.org/conda-forge
. For local
repodata such as file:///home/username/channel/noarch/repodata.json
, the channel base URL is
file:///home/username/channel
.
When present, each path component MUST only contain lowercase ASCII letters, numbers, underscores, periods, and dashes. They MUST NOT start with a period or a dash. They SHOULD start and end with a letter or a number. If present, each path component MUST match this regex:
^[a-z0-9_][a-z0-9_.-]*$
For file://
-based channel URLs, the path component rules MAY be understood as recommendations
only.
The maximum length of an individual path component in a channel base URL MUST NOT exceed 128 characters. The maximum length of a channel base URL SHOULD NOT exceed 256 characters.
To avoid ambiguous MatchSpec
grammar, the last path component of a channel base URL SHOULD NOT
match any subdir
identifiers. If it does, the behavior in this ambiguous case is not defined
and implementation dependent.
Channel names
For convenience, the channel name is defined as the concatenation of scheme
, authority
and
path
components of a channel URL. At least one of authority
or path
SHOULD be present. In
their absence, the channel name MUST be considered empty, regardless the scheme. Empty channel
names SHOULD NOT be used.
When the scheme and authority fields are missing, the full URL can be inferred with these rules:
- If the channel name matches the regex
^\.{0,2}[/\\].*$
, or if it matches the regex^[A-Z]:([\\/].*)?$
(for Windows drives), it SHOULD be understood as the path component of afile://
URL. - Otherwise, the tool SHOULD provide a user-configurable mechanism to use a default scheme and
authority, with the provided channel name taken as the rest of the path component. At the time of
this CEP's writing, most tools assume the default URL scheme and authority to be
https://conda.anaconda.org
.
Subdir names
Channel subdir names MUST either be the literal noarch
or a string following the syntax
{os}-{arch}
, where {os}
and {arch}
MUST only consist of lowercase ASCII letters and numbers.
Non-noarch
subdirs MUST match this regex: ^[a-z0-9]+-[a-z0-9]+$
.
The maximum length of a subdir name MUST NOT exceed 32 characters.
Label names
Channel label names MUST only consist of ASCII letters, digits, underscores, hyphens, forward
slashes, periods, and whitespace. They MUST start with a letter. They MUST match this regex:
^[a-zA-Z][0-9a-zA-Z_\-\./]*$
. The last /
-delimited component of a label
SHOULD NOT match any subdir
identifier. If it does, the behavior in this ambiguous
case is undefined and implementation dependent.
The label nolabel
is reserved and MUST only be used for conda packages which have no other
labels. In other words, in the space of labels, the empty set is represented by the labels
nolabel
.
A URL for a package, repodata, etc. without a label component MUST be assumed to have the default
label main
.
The maximum length of a label name MUST NOT exceed 128 characters.
Backwards compatibility
The conda subdir and package name regexes are backwards compatible with the current conda
implementation (25.3) and all existing packages on the defaults
and conda-forge
channels,
except for the __anaconda_core_depends
package on the defaults
channel, which was deprecated
in April 2025. See this
comment.
The regex for labels was pulled from an anaconda.org error message describing the set of valid labels.
As of 2025-03-12T19:00Z, of the ~1.9M channel names on anaconda.org:
- 7,219 violate the regex
^[a-z0-9]+((-|_|.)[a-z0-9]+)*$
; - 98 violate the regex
^[a-z0-9][a-z0-9_.-]*$
(allowing channel names to end with_
,.
, or-
); and - 6 violate
^[a-z0-9_][a-z0-9_.-]*$
(allowing channel names to start with_
). Of those six, five start with.
, and the other starts with~
.
See this comment for more details.
The authors have excluded the channel names in the last case that start with .
or ~
given
possible security implications. A low percentage, ~0.4%, of channels do not match the
recommendations for channel names above, but are allowed.
The maximum lengths allowed for the different fields have been chosen so the resulting path
components (directory names, filenames) comfortably fit in a the 255-char maximum limit some
filesystems impose. As of 2025-03-01T13:00Z, there are no violations of these limits in any of the
packages published for conda-forge
, bioconda
and defaults
. See this
comment and this
comment for more details.
Copyright
All CEPs are explicitly CC0 1.0 Universal.