scirpy.datasets.iggytop

Contents

scirpy.datasets.iggytop#

scirpy.datasets.iggytop(*, deduplicated=True, tag='latest')#

Return the IggyTop database as an AnnData object.

IggyTop (Immunological Graph Yielding Top receptor-epitope pairings) is a harmonized database of immunoreceptor-epitope pairings integrating data from multiple sources: IEDB, VDJdb, McPAS-TCR, CEDAR, ITRAP, TRAIT, TCR3d, and NeoTCR. V(D)J genes are normalized to IMGT standards and CDR3 sequences are harmonized following AIRR standards. Pre-built datasets are released bimonthly.

By default, a deduplicated version of the dataset is returned. Use this version if you’d like to work with the integrated resource combining data from all source datasets. If you prefer to work with a single resource, set deduplicated=False and filter the resource of interest via .obs["source"].

Note

Scirpy datasets are managed through Pooch.

By default, the dataset will be downloaded into your operating system’s default cache directory (See pooch.os_cache() for more details). If it has already been downloaded, it will be retrieved from the cache.

You can override the default cache dir by setting the SCIRPY_DATA_DIR environment variable to a path of your preference.

Parameters:
  • deduplicated (bool (default: True)) – If True, return the deduplicated and 10X-filtered dataset. If False, return the full merged dataset including all source records.

  • tag (str (default: 'latest')) – The IggyTop release tag to use. Defaults to "latest", which always fetches the most recent release. For reproducibility, pin a specific release tag (e.g. "data-2026.04.25.075304").

Return type:

AnnData

Returns:

An AnnData object containing immunoreceptor-epitope pairings from IggyTop in obsm["airr"]. Each entry is represented as if it was a cell, but without gene expression data.