API#

Import scirpy together with scanpy as

import scanpy as sc
import scirpy as ir

For consistency, the scirpy API tries to follow the scanpy API as closely as possible.

Input/Output: io#

Note

scirpy’s data structure has been updated in v0.13.0.

Previously, receptor data was expanded into columns of adata.obs, now they are stored as an awkward array in adata.obsm["airr"]. Moreover, we now use MuData to handle paired transcriptomics and AIRR data.

AnnData objects created with older versions of scirpy can be upgraded with scirpy.io.upgrade_schema() to be compatible with the latest version of scirpy.

Please check out

io.upgrade_schema(adata)

Update older versions of a scirpy anndata object to the latest schema.

The following functions allow to import V(D)J information from various formats.

io.read_h5mu(filename[, backed])

Read MuData object from HDF5 file

io.read_h5ad(filename[, backed, as_sparse, ...])

Read .h5ad-formatted hdf5 file.

io.read_10x_vdj(path[, filtered, include_fields])

Read AIRR data from 10x Genomics cell-ranger output.

io.read_tracer(path, **kwargs)

Read data from TraCeR ([SLonnbergP+16]).

io.read_bracer(path, **kwargs)

Read data from BraCeR ([LEM+18]).

io.read_bd_rhapsody(path, **kwargs)

Read IR data from the BD Rhapsody Analysis Pipeline.

io.read_airr(path[, use_umi_count_col, ...])

Read data from AIRR rearrangement format.

io.from_dandelion(dandelion[, transfer, ...])

Import data from Dandelion ([SRB+21]).

Scirpy can export data to the following formats:

io.write_airr(adata, filename, **kwargs)

Export IR data to AIRR Rearrangement tsv format.

io.to_dandelion(adata)

Export data to Dandelion ([SRB+21]).

To convert own formats into the scirpy Storing AIRR rearrangement data in AnnData, we recommend building a list of AirrCell objects first, and then converting them into an AnnData object using from_airr_cells(). For more details, check the Data loading tutorial.

io.AirrCell(cell_id[, ...])

Data structure for a Cell with immune receptors.

io.from_airr_cells(airr_cells[, key_added])

Convert a collection of AirrCell objects to AnnData.

io.to_airr_cells(adata, *[, airr_mod, airr_key])

Convert an adata object with IR information back to a list of AirrCell objects.

Preprocessing: pp#

pp.index_chains(adata, *[, filter, ...])

Selects primary/secondary VJ/VDJ cells per chain according to the Immune receptor (IR) model.

pp.merge_airr(adata, adata2, *[, airr_mod, ...])

Merge two AnnData objects with IR information (e.g. BCR with TCR).

pp.ir_dist(adata[, reference, metric, ...])

Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.

Get: get#

The get module allows retrieving AIRR data stored in adata.obsm["airr"] as a per-cell DataFrame or Series.

get.airr(adata, airr_variable[, chain, ...])

Retrieve AIRR variables for each cell, given a specific chain.

get.obs_context(data, temp_cols)

Contextmanager that temporarily adds columns to obs.

get.airr_context(data, airr_variable[, ...])

Contextmanager that temporarily adds AIRR information to obs.

Tools: tl#

Tools add an interpretable annotation to the AnnData object which usually can be visualized by a corresponding plotting function.

Generic#

tl.group_abundance(adata, groupby[, ...])

Summarizes the number/fraction of cells of a certain category by a certain group.

Quality control#

tl.chain_qc(adata, *[, airr_mod, airr_key, ...])

Perform quality control based on the receptor-chain pairing configuration.

Define and visualize clonotypes#

tl.define_clonotypes(adata, *[, key_added, ...])

Define clonotypes based on CDR3 nucleic acid sequence identity.

tl.define_clonotype_clusters(adata, *[, ...])

Define clonotype clusters.

tl.clonotype_convergence(adata, *, ...[, ...])

Finds evidence for Convergent evolution of clonotypes.

tl.clonotype_network(adata, *[, sequence, ...])

Computes the layout of the clonotype network.

tl.clonotype_network_igraph(adata[, basis, ...])

Get an igraph object representing the clonotype network.

Analyse clonal diversity#

tl.clonal_expansion(adata, *[, target_col, ...])

Adds a column to obs recording which clonotypes are expanded.

tl.summarize_clonal_expansion(adata, groupby, *)

Summarizes clonal expansion by a grouping variable.

tl.alpha_diversity(adata, groupby, *[, ...])

Computes the alpha diversity of clonotypes within a group.

tl.repertoire_overlap(adata, groupby, *[, ...])

Compute distance between cell groups based on clonotype overlap.

tl.clonotype_modularity(adata[, target_col, ...])

Identifies clonotypes or clonotype clusters consisting of cells that are more transcriptionally related than expected by chance by computing the Clonotype modularity.

tl.clonotype_imbalance(adata, replicate_col, ...)

Aims to find clonotypes that are the most enriched or depleted in a category.

Query reference databases#

tl.ir_query(adata, reference, *[, sequence, ...])

Query a referece database for matching immune cell receptors.

tl.ir_query_annotate(adata, reference, *[, ...])

Annotate cells based on the result of ir_query().

tl.ir_query_annotate_df(adata, reference, *)

Returns the inner join of adata.obs with matching entries from reference.obs based on the result of ir_query().

V(D)J gene usage#

tl.spectratype(adata[, chain, cdr3_col, ...])

Summarizes the distribution of CDR3 region lengths.

Plotting: pl#

Generic#

pl.embedding(adata, basis, *[, color, ...])

A customized wrapper to the scanpy.pl.embedding() function.

Tools#

Every of these plotting functions has a corresponding tool in the scirpy.tl section. Depending on the computational load, tools are either invoked on-the-fly when calling the plotting function or need to be precomputed and stored in AnnData previously.

pl.alpha_diversity(adata, groupby, *[, ...])

Plot the alpha diversity per group.
`

pl.clonal_expansion(adata, groupby, *[, ...])

Visualize clonal expansion.
`

pl.group_abundance(adata, groupby[, ...])

Plots the number of cells per group, split up by a categorical variable.
`

pl.spectratype(adata[, chain, cdr3_col, ...])

Show the distribution of CDR3 region lengths.
`

pl.vdj_usage(adata, *[, vdj_cols, ...])

Creates a ribbon plot of the most abundant VDJ combinations.
`

pl.repertoire_overlap(adata, groupby, *[, ...])

Visualizes overlap betwen a pair of samples on a scatter plot or
`

pl.clonotype_modularity(adata[, ax, ...])

Plots the Clonotype modularity score against the associated log10 p-value.
`

pl.clonotype_network(adata, *[, color, ...])

Plot the Clonotype network.
`

pl.clonotype_imbalance(adata, replicate_col, ...)

Aims to find clonotypes that are the most enriched or depleted in a category.

Base plotting functions: pl.base#

pl.base.bar(data, *[, ax, stacked, style, ...])

Basic plotting function built on top of bar plot in Pandas.

pl.base.line(data, *[, ax, style, ...])

Basic plotting function built on top of line plot in Pandas.

pl.base.barh(data, *[, ax, style, ...])

Basic plotting function built on top of bar plot in Pandas.

pl.base.curve(data, *[, ax, curve_layout, ...])

Basic plotting function for drawing KDE-smoothed curves.

Plot styling: pl.styling#

pl.styling.apply_style_to_axes(ax, style, ...)

Apply a predefined style to an axis object.

pl.styling.style_axes(ax[, title, ...])

Style an axes object.

Datasets: datasets#

Example datasets#

datasets.wu2020()

Return the dataset from [WMdA+20] as MuData object.

datasets.wu2020_3k()

Return the dataset from [WMdA+20] as AnnData object, downsampled to 3000 TCR-containing cells.

datasets.maynard2020()

Return the dataset from [MMR+20] as AnnData object.

Reference databases#

datasets.vdjdb([cached, cache_path])

Download VDJdb and process it into an AnnData object.

datasets.iedb([cached, cache_path])

Download IEBD v3 and process it into an AnnData object.

A reference database is also just a Scirpy-formatted AnnData object. This means you can follow the instructions in the data loading tutorial to build a custom reference database.

Utility functions: util#

util.DataHandler(data[, airr_mod, airr_key, ...])

Transparent access to airr modality in both AnnData and MuData objects.

util.graph.layout_components(graph[, ...])

Compute a graph layout by layouting all connected components individually.

util.graph.layout_fr_size_aware(graph, *[, ...])

Compute the Fruchterman-Reingold layout respecting node sizes.

util.graph.igraph_from_sparse_matrix(matrix, *)

Get an igraph object from an adjacency or distance matrix.

IR distance utilities: ir_dist#

ir_dist.sequence_dist(seqs[, seqs2, metric, ...])

Calculate a sequence x sequence distance matrix.

distance metrics#

ir_dist.metrics.DistanceCalculator(cutoff)

Abstract base class for a CDR3-sequence distance calculator.

ir_dist.metrics.ParallelDistanceCalculator(...)

Abstract base class for a DistanceCalculator that computes distances in parallel.

ir_dist.metrics.IdentityDistanceCalculator([...])

Calculates the Identity-distance between CDR3 sequences.

ir_dist.metrics.LevenshteinDistanceCalculator([...])

Calculates the Levenshtein edit-distance between sequences.

ir_dist.metrics.HammingDistanceCalculator([...])

Computes pairwise distances between gene sequences based on the "hamming" distance metric.

ir_dist.metrics.AlignmentDistanceCalculator(...)

Calculates distance between sequences based on pairwise sequence alignment.

ir_dist.metrics.FastAlignmentDistanceCalculator([...])

Calculates distance between sequences based on pairwise sequence alignment.

ir_dist.metrics.TCRdistDistanceCalculator([...])

Computes pairwise distances between TCR CDR3 sequences based on the "tcrdist" distance metric.