condastats is back

condastats is a command-line tool and Python library for querying download statistics of conda packages from the Anaconda public dataset. The project hadn't seen a release since August 2022, so we spent some time updating it to work with current Python and pandas versions, cleaning up the codebase, rewriting the documentation, and adding an interactive browser demo. The result is condastats 0.4.2 -- here's what's new and how to use it.
Why this matters
Ever wondered how many times a conda package has been downloaded? Which platforms are popular? Whether conda-forge or defaults drives more installs? condastats answers these questions with a single command:
condastats overall pandas --month 2024-01
Or from Python:
from condastats import overall
overall("pandas", month="2024-01")
The tool queries the Anaconda public dataset, a collection of monthly Parquet files containing download counts for conda packages since January 2017. Think of it as pypistats for the conda ecosystem.
The last release (0.2.1) shipped in August 2022. Since then, pandas 2.x changed its categorical handling, newer Python versions broke dask imports, and the build tooling around setup.py and versioneer became outdated. These releases catch up with all of that -- and then some.
Try it in the browser
You can query conda download statistics right here -- no installation required. Enter a package name, pick a date range, and hit Query:
Open the full demo in a new tab for 15 curated example queries, shareable URLs, and more.
This runs Pyodide (CPython compiled to WebAssembly) directly in your browser with the real condastats package. It uses the pure-pandas query API introduced in 0.4.0, which doesn't require dask or s3fs -- making it lightweight enough to run in WebAssembly. Expect about 20 MB for Pyodide plus 1-15 MB of Parquet data per queried month.
What changed: 0.3.0 through 0.4.2
Polished browser demo (0.4.2)
The browser demo ships a full Python REPL (jQuery Terminal + Prism.js) with syntax highlighting, tab completion, and command history. It includes 15 curated example query cards, shareable URLs, and quick date range presets.
Pure-pandas query API (0.4.0)
A new set of functions -- query_overall(), query_grouped(), and top_packages() -- operate on any pandas DataFrame without requiring dask or s3fs. This makes import condastats work in environments where dask isn't available (like Pyodide in the browser), while the existing S3-backed functions continue to work as before.
Dask is now lazily imported only when S3 functions are called, so the library starts up faster even in traditional environments.
Refactored codebase (0.3.0)
The business logic has been extracted from cli.py into a dedicated _core.py module. All public functions now have type hints and NumPy-style docstrings. The CLI gained a --version flag, proper argument validation, and better error messages.
The build system migrated from setup.py + versioneer to pyproject.toml + setuptools_scm. Travis CI has been replaced by GitHub Actions, and Dependabot keeps dependencies up to date. The project now has 31 tests covering all CLI subcommands and the Python API.
Critical bug fixes (0.3.0)
Several issues that made condastats unusable on recent Python and pandas versions have been resolved:
TypeError: descriptor '__call__'on Python 3.11+ caused by an older dask versionValueError: Not all columns are categoricals(#19) when reading Parquet filesArrowStringArray requires PyArrow array of string type(#17, #24) from pandas/pyarrow version mismatches
If you tried condastats in the last couple of years and hit errors, these releases fix them.
Modern Python support (0.3.0)
condastats now supports Python 3.10 through 3.14 and requires updated dependencies:
numpy>=1.20.0pandas>=2.0.0dask>=2024.5.2pyarrow>=10.0.0
Python 3.8 and 3.9 are no longer supported.
Install and try it
- pixi
- conda / mamba
- uv / pip
- pipx
pixi global install condastats
condastats overall numpy --month 2024-06
conda install --channel conda-forge condastats
condastats overall numpy --month 2024-06
uv pip install condastats
condastats overall numpy --month 2024-06
pipx install condastats
condastats overall numpy --month 2024-06
Or run it without installing -- just pick your tool:
- pixi x
- uvx
- pipx run
pixi x condastats overall numpy --month 2024-06
uvx condastats overall numpy --month 2024-06
pipx run condastats overall numpy --month 2024-06
Quick examples
Compare multiple packages for a specific month:
condastats overall pandas numpy scipy --month 2024-01
Break down downloads by platform:
condastats pkg_platform pandas --month 2024-01
See monthly trends over a range:
condastats overall pandas --start_month 2024-01 --end_month 2024-06 --monthly
Group by conda channel (anaconda vs. conda-forge):
condastats data_source pandas --month 2024-01
All of these work as Python functions too:
from condastats import overall, pkg_platform, data_source
overall(["pandas", "numpy"], month="2024-01")
pkg_platform("pandas", month="2024-01")
data_source("pandas", start_month="2024-01", end_month="2024-06", monthly=True)
New documentation
The documentation at condastats.readthedocs.io has been completely rewritten using the Diataxis framework, organized into four clear sections:
| Section | What you'll find |
|---|---|
| Tutorial | Step-by-step guide from your first query to comparing packages |
| How-to guides | Recipes for filtering, grouping, Jupyter usage, running without installing |
| API reference | Full autodoc for all Python functions with parameter descriptions |
| CLI reference | Every subcommand, option, and exit code |
| Explanation | How the Anaconda dataset works, query pipeline, performance tips |
Get involved
condastats is a conda-incubator project. Contributions are welcome:
- Try the browser demo: condastats.readthedocs.io
- Report bugs or request features: GitHub Issues
- Install it:
conda install --channel conda-forge condastatsorpip install condastats
Originally created by Sophia Man Yang, condastats is now maintained by the conda-incubator community.
