Skip to main content

condastats is back

· 6 min read
Jannis Leidel
Steering council member
Banner image for condastats is back blog post

condastats is a command-line tool and Python library for querying download statistics of conda packages from the Anaconda public dataset. The project hadn't seen a release since August 2022, so we spent some time updating it to work with current Python and pandas versions, cleaning up the codebase, rewriting the documentation, and adding an interactive browser demo. The result is condastats 0.4.2 -- here's what's new and how to use it.

Why this matters

Ever wondered how many times a conda package has been downloaded? Which platforms are popular? Whether conda-forge or defaults drives more installs? condastats answers these questions with a single command:

condastats overall pandas --month 2024-01

Or from Python:

from condastats import overall
overall("pandas", month="2024-01")

The tool queries the Anaconda public dataset, a collection of monthly Parquet files containing download counts for conda packages since January 2017. Think of it as pypistats for the conda ecosystem.

The last release (0.2.1) shipped in August 2022. Since then, pandas 2.x changed its categorical handling, newer Python versions broke dask imports, and the build tooling around setup.py and versioneer became outdated. These releases catch up with all of that -- and then some.

Try it in the browser

You can query conda download statistics right here -- no installation required. Enter a package name, pick a date range, and hit Query:

Open the full demo in a new tab for 15 curated example queries, shareable URLs, and more.

This runs Pyodide (CPython compiled to WebAssembly) directly in your browser with the real condastats package. It uses the pure-pandas query API introduced in 0.4.0, which doesn't require dask or s3fs -- making it lightweight enough to run in WebAssembly. Expect about 20 MB for Pyodide plus 1-15 MB of Parquet data per queried month.

What changed: 0.3.0 through 0.4.2

Polished browser demo (0.4.2)

The browser demo ships a full Python REPL (jQuery Terminal + Prism.js) with syntax highlighting, tab completion, and command history. It includes 15 curated example query cards, shareable URLs, and quick date range presets.

Pure-pandas query API (0.4.0)

A new set of functions -- query_overall(), query_grouped(), and top_packages() -- operate on any pandas DataFrame without requiring dask or s3fs. This makes import condastats work in environments where dask isn't available (like Pyodide in the browser), while the existing S3-backed functions continue to work as before.

Dask is now lazily imported only when S3 functions are called, so the library starts up faster even in traditional environments.

Refactored codebase (0.3.0)

The business logic has been extracted from cli.py into a dedicated _core.py module. All public functions now have type hints and NumPy-style docstrings. The CLI gained a --version flag, proper argument validation, and better error messages.

The build system migrated from setup.py + versioneer to pyproject.toml + setuptools_scm. Travis CI has been replaced by GitHub Actions, and Dependabot keeps dependencies up to date. The project now has 31 tests covering all CLI subcommands and the Python API.

Critical bug fixes (0.3.0)

Several issues that made condastats unusable on recent Python and pandas versions have been resolved:

  • TypeError: descriptor '__call__' on Python 3.11+ caused by an older dask version
  • ValueError: Not all columns are categoricals (#19) when reading Parquet files
  • ArrowStringArray requires PyArrow array of string type (#17, #24) from pandas/pyarrow version mismatches

If you tried condastats in the last couple of years and hit errors, these releases fix them.

Modern Python support (0.3.0)

condastats now supports Python 3.10 through 3.14 and requires updated dependencies:

  • numpy>=1.20.0
  • pandas>=2.0.0
  • dask>=2024.5.2
  • pyarrow>=10.0.0

Python 3.8 and 3.9 are no longer supported.

Install and try it

pixi global install condastats
condastats overall numpy --month 2024-06

Or run it without installing -- just pick your tool:

pixi x condastats overall numpy --month 2024-06

Quick examples

Compare multiple packages for a specific month:

condastats overall pandas numpy scipy --month 2024-01

Break down downloads by platform:

condastats pkg_platform pandas --month 2024-01

See monthly trends over a range:

condastats overall pandas --start_month 2024-01 --end_month 2024-06 --monthly

Group by conda channel (anaconda vs. conda-forge):

condastats data_source pandas --month 2024-01

All of these work as Python functions too:

from condastats import overall, pkg_platform, data_source

overall(["pandas", "numpy"], month="2024-01")
pkg_platform("pandas", month="2024-01")
data_source("pandas", start_month="2024-01", end_month="2024-06", monthly=True)

New documentation

The documentation at condastats.readthedocs.io has been completely rewritten using the Diataxis framework, organized into four clear sections:

SectionWhat you'll find
TutorialStep-by-step guide from your first query to comparing packages
How-to guidesRecipes for filtering, grouping, Jupyter usage, running without installing
API referenceFull autodoc for all Python functions with parameter descriptions
CLI referenceEvery subcommand, option, and exit code
ExplanationHow the Anaconda dataset works, query pipeline, performance tips

Get involved

condastats is a conda-incubator project. Contributions are welcome:

Originally created by Sophia Man Yang, condastats is now maintained by the conda-incubator community.

Further reading