Skip to main content

CEP 26 - Identifying Packages and Channels in the conda Ecosystem

Title CEP 26 - Identifying Packages and Channels in the conda Ecosystem
Status Approved
Author(s) Jaime Rodríguez-Guerra <jaime.rogue@gmail.com>
Matthew R. Becker <becker.mr@gmail.com>
Cheng H. Lee <clee@anaconda.com>
Created Mar 11, 2025
Updated Apr 17, 2025
Discussion https://github.com/conda/ceps/pull/116
Implementation N/A

Abstract

This CEP aims to standardize names and other strings used to identify packages, artifacts and channels in the conda ecosystem.

Specification

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119 when, and only when, they appear in all capitals, as shown here.

More specifically, violations of a MUST or MUST NOT rule MUST result in an error. Violations of the rules specified by any of the other all-capital terms MAY result in a warning, at discretion of the implementation.

Identifying package artifacts

The conda ecosystem distinguishes between two types of packages:

  • Distributable package names: represented by a concrete, downloadable, extractable conda artifact.
  • Virtual package names: not backed by any concrete artifact. They only exist on the client side.

Package names

A distributable package name MUST only consist of lowercase ASCII letters, numbers, hyphens, periods and underscores. It MUST start with a letter, a number, or a single underscore. It MUST NOT include two consecutive separators (hyphen, period, underscore).

Virtual package names MUST only consist of lowercase ASCII letters, numbers, hyphens, periods and underscores. They MUST NOT use two consecutive separators, with one exception: they MUST start with two underscores.

Distributable package names MUST match the following case-insensitive regex: ^(([a-z0-9])|([a-z0-9_](?!_)))[._-]?([a-z0-9]+(\.|-|_|$))*$.

Virtual package names MUST follow this regex: ^__[a-z0-9][._-]?([a-z0-9]+(\.|-|_|$))*$.

In all cases, the maximum length of a package name MUST NOT exceed 64 characters.

Version strings

Version strings MUST only consist of digits, periods, lowercase ASCII letters, underscores, plus symbols, and exclamation marks. Additional rules apply but are out of scope in this CEP and will be discussed separately.

The maximum length of a version string MUST NOT exceed 64 characters.

Build strings

Builds strings MUST only consist of ASCII letters, numbers, periods, plus symbols, and underscores. They MUST match this regex ^[a-zA-Z0-9_\.+]+$.

The maximum length of a build string MUST NOT exceed 64 characters.

Artifact extensions

Artifact extensions MUST only consist of lowercase ASCII letters, numbers and periods. They must start and end with a letter or a number. They MUST NOT include two consecutive periods. They MUST match this regex ^[a-z0-9](\.?[a-z0-9])*$.

The maximum length of a file extension MUST NOT exceed 16 characters.

The conda ecosystem currently recognizes two artifact extensions: tar.bz2 and conda, versioned v1 and v2 respectively.

Distribution strings

A "distribution string" MAY be used to identify a package artifact, without specifying the extension or channel. It MUST match the following syntax:

[<subdir>/]<package name>-<version string>-<build string>

Distribution strings apply to distributable packages. They are used as the name of the directories where artifacts are extracted in the package cache, for example.

Virtual packages MAY be also identified by a distribution string, but in those cases a subdir MUST NOT be present.

Note: Despite the similarity, distribution strings are not MatchSpec-like specifiers and MUST NOT be used as such.

Filenames

The filename of distributable conda artifacts is obtained by adding the artifact extension to its distribution string (without the subdir, if present). It MUST match this syntax:

<package name>-<version string>-<build string>.<extension>

The maximum length of a filename MUST NOT exceed 211 characters.

Virtual conda packages do not exist on disk and SHOULD NOT need filename standardization.

Identifying channels

A conda channel is defined as a URL where one can find one or more repodata.json files arranged in one subdirectory (subdir) each. noarch/repodata.json MUST be present to consider the parent location a channel.

Channel base URLs

The base URL for the arbitrary location of a repodata file is defined as:

<scheme>://[<authority>][/<path>/][/label/<label name>]/<subdir>/repodata.json

with <scheme>, <authority> and <path> defined by RFC 3986.

Taken the channel definition above, the base URL without trailing slashes is thus:

<scheme>://[<authority>][/<path>/][/label/<label name>]

For example, given https://conda.anaconda.org/conda-forge/noarch/repodata.json, the part leading to noarch/repodata.json and thus base URL is https://conda.anaconda.org/conda-forge. For local repodata such as file:///home/username/channel/noarch/repodata.json, the channel base URL is file:///home/username/channel.

When present, each path component MUST only contain lowercase ASCII letters, numbers, underscores, periods, and dashes. They MUST NOT start with a period or a dash. They SHOULD start and end with a letter or a number. If present, each path component MUST match this regex:

^[a-z0-9_][a-z0-9_.-]*$

For file://-based channel URLs, the path component rules MAY be understood as recommendations only.

The maximum length of an individual path component in a channel base URL MUST NOT exceed 128 characters. The maximum length of a channel base URL SHOULD NOT exceed 256 characters.

To avoid ambiguous MatchSpec grammar, the last path component of a channel base URL SHOULD NOT match any subdir identifiers. If it does, the behavior in this ambiguous case is not defined and implementation dependent.

Channel names

For convenience, the channel name is defined as the concatenation of scheme, authority and path components of a channel URL. At least one of authority or path SHOULD be present. In their absence, the channel name MUST be considered empty, regardless the scheme. Empty channel names SHOULD NOT be used.

When the scheme and authority fields are missing, the full URL can be inferred with these rules:

  • If the channel name matches the regex ^\.{0,2}[/\\].*$, or if it matches the regex ^[A-Z]:([\\/].*)?$ (for Windows drives), it SHOULD be understood as the path component of a file:// URL.
  • Otherwise, the tool SHOULD provide a user-configurable mechanism to use a default scheme and authority, with the provided channel name taken as the rest of the path component. At the time of this CEP's writing, most tools assume the default URL scheme and authority to be https://conda.anaconda.org.

Subdir names

Channel subdir names MUST either be the literal noarch or a string following the syntax {os}-{arch}, where {os} and {arch} MUST only consist of lowercase ASCII letters and numbers. Non-noarch subdirs MUST match this regex: ^[a-z0-9]+-[a-z0-9]+$.

The maximum length of a subdir name MUST NOT exceed 32 characters.

Label names

Channel label names MUST only consist of ASCII letters, digits, underscores, hyphens, forward slashes, periods, and whitespace. They MUST start with a letter. They MUST match this regex: ^[a-zA-Z][0-9a-zA-Z_\-\./]*$. The last /-delimited component of a label SHOULD NOT match any subdir identifier. If it does, the behavior in this ambiguous case is undefined and implementation dependent.

The label nolabel is reserved and MUST only be used for conda packages which have no other labels. In other words, in the space of labels, the empty set is represented by the labels nolabel.

A URL for a package, repodata, etc. without a label component MUST be assumed to have the default label main.

The maximum length of a label name MUST NOT exceed 128 characters.

Backwards compatibility

The conda subdir and package name regexes are backwards compatible with the current conda implementation (25.3) and all existing packages on the defaults and conda-forge channels, except for the __anaconda_core_depends package on the defaults channel, which was deprecated in April 2025. See this comment.

The regex for labels was pulled from an anaconda.org error message describing the set of valid labels.

As of 2025-03-12T19:00Z, of the ~1.9M channel names on anaconda.org:

  • 7,219 violate the regex ^[a-z0-9]+((-|_|.)[a-z0-9]+)*$;
  • 98 violate the regex ^[a-z0-9][a-z0-9_.-]*$ (allowing channel names to end with _, ., or -); and
  • 6 violate ^[a-z0-9_][a-z0-9_.-]*$ (allowing channel names to start with _). Of those six, five start with ., and the other starts with ~.

See this comment for more details. The authors have excluded the channel names in the last case that start with . or ~ given possible security implications. A low percentage, ~0.4%, of channels do not match the recommendations for channel names above, but are allowed.

The maximum lengths allowed for the different fields have been chosen so the resulting path components (directory names, filenames) comfortably fit in a the 255-char maximum limit some filesystems impose. As of 2025-03-01T13:00Z, there are no violations of these limits in any of the packages published for conda-forge, bioconda and defaults. See this comment and this comment for more details.

All CEPs are explicitly CC0 1.0 Universal.