Import scirpy together with scanpy as

import scanpy as sc
import scirpy as ir

For consistency, the scirpy API tries to follow the scanpy API as closely as possible.

Input/Output: io#


scirpy’s data structure has been updated in v0.13.0.

Previously, receptor data was expanded into columns of adata.obs, now they are stored as an awkward array in adata.obsm["airr"]. Moreover, we now use MuData to handle paired transcriptomics and AIRR data.

AnnData objects created with older versions of scirpy can be upgraded with scirpy.io.upgrade_schema() to be compatible with the latest version of scirpy.

Please check out


Update older versions of a scirpy anndata object to the latest schema.

The following functions allow to import V(D)J information from various formats.

io.read_h5mu(filename[, backed])

Read MuData object from HDF5 file

io.read_h5ad(filename[, backed, as_sparse, ...])

Read .h5ad-formatted hdf5 file.

io.read_10x_vdj(path[, filtered, include_fields])

Read AIRR data from 10x Genomics cell-ranger output.

io.read_tracer(path, **kwargs)

Read data from TraCeR ([SLonnbergP+16]).

io.read_bracer(path, **kwargs)

Read data from BraCeR ([LEM+18]).

io.read_bd_rhapsody(path, **kwargs)

Read IR data from the BD Rhapsody Analysis Pipeline.

io.read_airr(path[, use_umi_count_col, ...])

Read data from AIRR rearrangement format.

io.from_dandelion(dandelion[, transfer, ...])

Import data from Dandelion ([SRB+21]).

Scirpy can export data to the following formats:

io.write_airr(adata, filename, **kwargs)

Export IR data to AIRR Rearrangement tsv format.


Export data to Dandelion ([SRB+21]).

To convert own formats into the scirpy Storing AIRR rearrangement data in AnnData, we recommend building a list of AirrCell objects first, and then converting them into an AnnData object using from_airr_cells(). For more details, check the Data loading tutorial.

io.AirrCell(cell_id[, ...])

Data structure for a Cell with immune receptors.

io.from_airr_cells(airr_cells[, key_added])

Convert a collection of AirrCell objects to AnnData.

io.to_airr_cells(adata, *[, airr_mod, airr_key])

Convert an adata object with IR information back to a list of AirrCell objects.

Preprocessing: pp#

pp.index_chains(adata, *[, filter, ...])

Selects primary/secondary VJ/VDJ cells per chain according to the Immune receptor (IR) model.

pp.merge_airr(adata, adata2, *[, airr_mod, ...])

Merge two AnnData objects with IR information (e.g. BCR with TCR).

pp.ir_dist(adata[, reference, metric, ...])

Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.

Get: get#

The get module allows retrieving AIRR data stored in adata.obsm["airr"] as a per-cell DataFrame or Series.

get.airr(adata, airr_variable[, chain, ...])

Retrieve AIRR variables for each cell, given a specific chain.

get.obs_context(data, temp_cols)

Contextmanager that temporarily adds columns to obs.

get.airr_context(data, airr_variable[, ...])

Contextmanager that temporarily adds AIRR information to obs.

Tools: tl#

Tools add an interpretable annotation to the AnnData object which usually can be visualized by a corresponding plotting function.


tl.group_abundance(adata, groupby[, ...])

Summarizes the number/fraction of cells of a certain category by a certain group.

Quality control#

tl.chain_qc(adata, *[, airr_mod, airr_key, ...])

Perform quality control based on the receptor-chain pairing configuration.

Define and visualize clonotypes#

tl.define_clonotypes(adata, *[, key_added, ...])

Define clonotypes based on CDR3 nucleic acid sequence identity.

tl.define_clonotype_clusters(adata, *[, ...])

Define clonotype clusters.

tl.clonotype_convergence(adata, *, ...[, ...])

Finds evidence for Convergent evolution of clonotypes.

tl.clonotype_network(adata, *[, sequence, ...])

Computes the layout of the clonotype network.

tl.clonotype_network_igraph(adata[, basis, ...])

Get an igraph object representing the clonotype network.

Analyse clonal diversity#

tl.clonal_expansion(adata, *[, target_col, ...])

Adds a column to obs recording which clonotypes are expanded.

tl.summarize_clonal_expansion(adata, groupby, *)

Summarizes clonal expansion by a grouping variable.

tl.alpha_diversity(adata, groupby, *[, ...])

Computes the alpha diversity of clonotypes within a group.

tl.repertoire_overlap(adata, groupby, *[, ...])

Compute distance between cell groups based on clonotype overlap.

tl.clonotype_modularity(adata[, target_col, ...])

Identifies clonotypes or clonotype clusters consisting of cells that are more transcriptionally related than expected by chance by computing the Clonotype modularity.

tl.clonotype_imbalance(adata, replicate_col, ...)

Aims to find clonotypes that are the most enriched or depleted in a category.

Query reference databases#

tl.ir_query(adata, reference, *[, sequence, ...])

Query a referece database for matching immune cell receptors.

tl.ir_query_annotate(adata, reference, *[, ...])

Annotate cells based on the result of ir_query().

tl.ir_query_annotate_df(adata, reference, *)

Returns the inner join of adata.obs with matching entries from reference.obs based on the result of ir_query().

V(D)J gene usage#

tl.spectratype(adata[, chain, cdr3_col, ...])

Summarizes the distribution of CDR3 region lengths.

Plotting: pl#


pl.embedding(adata, basis, *[, color, ...])

A customized wrapper to the scanpy.pl.embedding() function.


Every of these plotting functions has a corresponding tool in the scirpy.tl section. Depending on the computational load, tools are either invoked on-the-fly when calling the plotting function or need to be precomputed and stored in AnnData previously.

pl.alpha_diversity(adata, groupby, *[, ...])

Plot the alpha diversity per group.

pl.clonal_expansion(adata, groupby, *[, ...])

Visualize clonal expansion.

pl.group_abundance(adata, groupby[, ...])

Plots the number of cells per group, split up by a categorical variable.

pl.spectratype(adata[, chain, cdr3_col, ...])

Show the distribution of CDR3 region lengths.

pl.vdj_usage(adata, *[, vdj_cols, ...])

Creates a ribbon plot of the most abundant VDJ combinations.

pl.repertoire_overlap(adata, groupby, *[, ...])

Visualizes overlap betwen a pair of samples on a scatter plot or

pl.clonotype_modularity(adata[, ax, ...])

Plots the Clonotype modularity score against the associated log10 p-value.

pl.clonotype_network(adata, *[, color, ...])

Plot the Clonotype network.

pl.clonotype_imbalance(adata, replicate_col, ...)

Aims to find clonotypes that are the most enriched or depleted in a category.

Base plotting functions: pl.base#

pl.base.bar(data, *[, ax, stacked, style, ...])

Basic plotting function built on top of bar plot in Pandas.

pl.base.line(data, *[, ax, style, ...])

Basic plotting function built on top of line plot in Pandas.

pl.base.barh(data, *[, ax, style, ...])

Basic plotting function built on top of bar plot in Pandas.

pl.base.curve(data, *[, ax, curve_layout, ...])

Basic plotting function for drawing KDE-smoothed curves.

Plot styling: pl.styling#

pl.styling.apply_style_to_axes(ax, style, ...)

Apply a predefined style to an axis object.

pl.styling.style_axes(ax[, title, ...])

Style an axes object.

Datasets: datasets#

Example datasets#


Return the dataset from [WMdA+20] as MuData object.


Return the dataset from [WMdA+20] as AnnData object, downsampled to 3000 TCR-containing cells.


Return the dataset from [MMR+20] as AnnData object.

Reference databases#

datasets.vdjdb([cached, cache_path])

Download VDJdb and process it into an AnnData object.

datasets.iedb([cached, cache_path])

Download IEBD v3 and process it into an AnnData object.

A reference database is also just a Scirpy-formatted AnnData object. This means you can follow the instructions in the data loading tutorial to build a custom reference database.

Utility functions: util#

util.DataHandler(data[, airr_mod, airr_key, ...])

Transparent access to airr modality in both AnnData and MuData objects.

util.graph.layout_components(graph[, ...])

Compute a graph layout by layouting all connected components individually.

util.graph.layout_fr_size_aware(graph, *[, ...])

Compute the Fruchterman-Reingold layout respecting node sizes.

util.graph.igraph_from_sparse_matrix(matrix, *)

Get an igraph object from an adjacency or distance matrix.

IR distance utilities: ir_dist#

ir_dist.sequence_dist(seqs[, seqs2, metric, ...])

Calculate a sequence x sequence distance matrix.

distance metrics#


Abstract base class for a CDR3-sequence distance calculator.


Abstract base class for a DistanceCalculator that computes distances in parallel.


Calculates the Identity-distance between CDR3 sequences.


Calculates the Levenshtein edit-distance between sequences.


Calculates the Hamming distance between sequences of identical length.


Calculates distance between sequences based on pairwise sequence alignment.


Calculates distance between sequences based on pairwise sequence alignment.


Computes pairwise distances between TCR CDR3 sequences based on the "tcrdist" distance metric.