API#

Import scirpy together with scanpy as

import scanpy as sc
import scirpy as ir

For consistency, the scirpy API tries to follow the scanpy API as closely as possible.

Input/Output: `io`#

Note

scirpy’s data structure has been updated in v0.13.0.

Previously, receptor data was expanded into columns of adata.obs, now they are stored as an awkward array in adata.obsm["airr"]. Moreover, we now use MuData to handle paired transcriptomics and AIRR data.

AnnData objects created with older versions of scirpy can be upgraded with scirpy.io.upgrade_schema() to be compatible with the latest version of scirpy.

Please check out

the release notes for details about the changes and
the documentation about Scirpy’s data structure

io.upgrade_schema(adata)

Update older versions of a scirpy anndata object to the latest schema.

The following functions allow to import V(D)J information from various formats.

`io.read_h5mu`(filename[, backed])	Read MuData object from HDF5 file
`io.read_h5ad`(filename[, backed, as_sparse, ...])	Read `.h5ad`-formatted hdf5 file.
`io.read_10x_vdj`(path[, filtered, include_fields])	Read AIRR data from 10x Genomics cell-ranger output.
`io.read_tracer`(path, **kwargs)	Read data from TraCeR ([SLonnbergP+16]).
`io.read_bracer`(path, **kwargs)	Read data from BraCeR ([LEM+18]).
`io.read_bd_rhapsody`(path, **kwargs)	Read IR data from the BD Rhapsody Analysis Pipeline.
`io.read_airr`(path[, use_umi_count_col, ...])	Read data from AIRR rearrangement format.
`io.from_dandelion`(dandelion[, transfer, ...])	Import data from Dandelion ([SRB+21]).

Scirpy can export data to the following formats:

io.write_airr(adata, filename, **kwargs)

Export IR data to AIRR Rearrangement tsv format.

io.to_dandelion(adata)

Export data to Dandelion ([SRB+21]).

To convert own formats into the scirpy Storing AIRR rearrangement data in AnnData, we recommend building a list of AirrCell objects first, and then converting them into an AnnData object using from_airr_cells(). For more details, check the Data loading tutorial.

`io.AirrCell`(cell_id[, ...])	Data structure for a Cell with immune receptors.
`io.from_airr_cells`(airr_cells[, key_added])	Convert a collection of `AirrCell` objects to `AnnData`.
`io.to_airr_cells`(adata, *[, airr_mod, airr_key])	Convert an adata object with IR information back to a list of `AirrCell` objects.

Preprocessing: `pp`#

`pp.index_chains`(adata, *[, filter, ...])	Selects primary/secondary VJ/VDJ cells per chain according to the Immune receptor (IR) model.
`pp.merge_airr`(adata, adata2, *[, airr_mod, ...])	Merge two AnnData objects with IR information (e.g. BCR with TCR).
`pp.ir_dist`(adata[, reference, metric, ...])	Computes a sequence-distance metric between all unique VJ CDR3 sequences and between all unique VDJ CDR3 sequences.

Get: `get`#

The get module allows retrieving AIRR data stored in adata.obsm["airr"] as a per-cell DataFrame or Series.

`get.airr`(adata, airr_variable[, chain, ...])	Retrieve AIRR variables for each cell, given a specific chain.
`get.obs_context`(data, temp_cols)	Contextmanager that temporarily adds columns to obs.
`get.airr_context`(data, airr_variable[, ...])	Contextmanager that temporarily adds AIRR information to obs.

Tools: `tl`#

Tools add an interpretable annotation to the AnnData object which usually can be visualized by a corresponding plotting function.

Generic#

tl.group_abundance(adata, groupby[, ...])

Summarizes the number/fraction of cells of a certain category by a certain group.

Quality control#

tl.chain_qc(adata, *[, airr_mod, airr_key, ...])

Perform quality control based on the receptor-chain pairing configuration.

Define and visualize clonotypes#

`tl.define_clonotypes`(adata, *[, key_added, ...])	Define clonotypes based on CDR3 nucleic acid sequence identity.
`tl.define_clonotype_clusters`(adata, *[, ...])	Define clonotype clusters.
`tl.clonotype_convergence`(adata, *, ...[, ...])	Finds evidence for Convergent evolution of clonotypes.
`tl.clonotype_network`(adata, *[, sequence, ...])	Computes the layout of the clonotype network.
`tl.clonotype_network_igraph`(adata[, basis, ...])	Get an `igraph` object representing the clonotype network.

Analyse clonal diversity#

`tl.clonal_expansion`(adata, *[, target_col, ...])	Adds a column to `obs` recording which clonotypes are expanded.
`tl.summarize_clonal_expansion`(adata, groupby, *)	Summarizes clonal expansion by a grouping variable.
`tl.alpha_diversity`(adata, groupby, *[, ...])	Computes the alpha diversity of clonotypes within a group.
`tl.repertoire_overlap`(adata, groupby, *[, ...])	Compute distance between cell groups based on clonotype overlap.
`tl.clonotype_modularity`(adata[, target_col, ...])	Identifies clonotypes or clonotype clusters consisting of cells that are more transcriptionally related than expected by chance by computing the Clonotype modularity.
`tl.clonotype_imbalance`(adata, replicate_col, ...)	Aims to find clonotypes that are the most enriched or depleted in a category.

Query reference databases#

`tl.ir_query`(adata, reference, *[, sequence, ...])	Query a referece database for matching immune cell receptors.
`tl.ir_query_annotate`(adata, reference, *[, ...])	Annotate cells based on the result of `ir_query()`.
`tl.ir_query_annotate_df`(adata, reference, *)	Returns the inner join of `adata.obs` with matching entries from `reference.obs` based on the result of `ir_query()`.

V(D)J gene usage#

tl.spectratype(adata[, chain, cdr3_col, ...])

Summarizes the distribution of CDR3 region lengths.

Plotting: `pl`#

Generic#

pl.embedding(adata, basis, *[, color, ...])

A customized wrapper to the scanpy.pl.embedding() function.

Tools#

Every of these plotting functions has a corresponding tool in the scirpy.tl section. Depending on the computational load, tools are either invoked on-the-fly when calling the plotting function or need to be precomputed and stored in AnnData previously.

`pl.alpha_diversity`(adata, groupby, *[, ...])	Plot the alpha diversity per group. `
`pl.clonal_expansion`(adata, groupby, *[, ...])	Visualize clonal expansion. `
`pl.group_abundance`(adata, groupby[, ...])	Plots the number of cells per group, split up by a categorical variable. `
`pl.spectratype`(adata[, chain, cdr3_col, ...])	Show the distribution of CDR3 region lengths. `
`pl.vdj_usage`(adata, *[, vdj_cols, ...])	Creates a ribbon plot of the most abundant VDJ combinations. `
`pl.repertoire_overlap`(adata, groupby, *[, ...])	Visualizes overlap betwen a pair of samples on a scatter plot or `
`pl.clonotype_modularity`(adata[, ax, ...])	Plots the Clonotype modularity score against the associated log10 p-value. `
`pl.clonotype_network`(adata, *[, color, ...])	Plot the Clonotype network. `
`pl.clonotype_imbalance`(adata, replicate_col, ...)	Aims to find clonotypes that are the most enriched or depleted in a category.

Base plotting functions: `pl.base`#

`pl.base.bar`(data, *[, ax, stacked, style, ...])	Basic plotting function built on top of bar plot in Pandas.
`pl.base.line`(data, *[, ax, style, ...])	Basic plotting function built on top of line plot in Pandas.
`pl.base.barh`(data, *[, ax, style, ...])	Basic plotting function built on top of bar plot in Pandas.
`pl.base.curve`(data, *[, ax, curve_layout, ...])	Basic plotting function for drawing KDE-smoothed curves.

Plot styling: `pl.styling`#

`pl.styling.apply_style_to_axes`(ax, style, ...)	Apply a predefined style to an axis object.
`pl.styling.style_axes`(ax[, title, ...])	Style an axes object.

Datasets: `datasets`#

Example datasets#

`datasets.wu2020`()	Return the dataset from [WMdA+20] as MuData object.
`datasets.wu2020_3k`()	Return the dataset from [WMdA+20] as AnnData object, downsampled to 3000 TCR-containing cells.
`datasets.maynard2020`()	Return the dataset from [MMR+20] as AnnData object.

Reference databases#

`datasets.vdjdb`([cached, cache_path])	Download VDJdb and process it into an AnnData object.
`datasets.iedb`([cached, cache_path])	Download IEBD v3 and process it into an AnnData object.

A reference database is also just a Scirpy-formatted AnnData object. This means you can follow the instructions in the data loading tutorial to build a custom reference database.

Utility functions: `util`#

`util.DataHandler`(data[, airr_mod, airr_key, ...])	Transparent access to airr modality in both AnnData and MuData objects.
`util.graph.layout_components`(graph[, ...])	Compute a graph layout by layouting all connected components individually.
`util.graph.layout_fr_size_aware`(graph, *[, ...])	Compute the Fruchterman-Reingold layout respecting node sizes.
`util.graph.igraph_from_sparse_matrix`(matrix, *)	Get an igraph object from an adjacency or distance matrix.

IR distance utilities: `ir_dist`#

ir_dist.sequence_dist(seqs[, seqs2, metric, ...])

Calculate a sequence x sequence distance matrix.

distance metrics#

`ir_dist.metrics.DistanceCalculator`(cutoff)	Abstract base class for a CDR3-sequence distance calculator.
`ir_dist.metrics.ParallelDistanceCalculator`(...)	Abstract base class for a DistanceCalculator that computes distances in parallel.
`ir_dist.metrics.IdentityDistanceCalculator`([...])	Calculates the Identity-distance between CDR3 sequences.
`ir_dist.metrics.LevenshteinDistanceCalculator`([...])	Calculates the Levenshtein edit-distance between sequences.
`ir_dist.metrics.HammingDistanceCalculator`([...])	Calculates the Hamming distance between sequences of identical length.
`ir_dist.metrics.AlignmentDistanceCalculator`(...)	Calculates distance between sequences based on pairwise sequence alignment.
`ir_dist.metrics.FastAlignmentDistanceCalculator`([...])	Calculates distance between sequences based on pairwise sequence alignment.
`ir_dist.metrics.TCRdistDistanceCalculator`([...])	Computes pairwise distances between TCR CDR3 sequences based on the "tcrdist" distance metric.

API

Contents

API#

Input/Output: io#

Preprocessing: pp#

Get: get#

Tools: tl#

Generic#

Quality control#

Define and visualize clonotypes#

Analyse clonal diversity#

Query reference databases#

V(D)J gene usage#

Plotting: pl#

Generic#

Tools#

Base plotting functions: pl.base#

Plot styling: pl.styling#

Datasets: datasets#

Example datasets#

Reference databases#

Utility functions: util#

IR distance utilities: ir_dist#

distance metrics#

Input/Output: `io`#

Preprocessing: `pp`#

Get: `get`#

Tools: `tl`#

Plotting: `pl`#

Base plotting functions: `pl.base`#

Plot styling: `pl.styling`#

Datasets: `datasets`#

Utility functions: `util`#

IR distance utilities: `ir_dist`#