BCR analysis in Scirpy

BCR analysis in Scirpy#

%load_ext autoreload
%autoreload 2
# not part of the tutorial, but temporarily surpresses distracting warnings
import warnings
from numba import NumbaDeprecationWarning

warnings.simplefilter(action="ignore", category=(FutureWarning, NumbaDeprecationWarning))

# import all necessary python packages for this tutorial
import muon as mu
import scanpy as sc
import scirpy as ir
import numpy as np
import matplotlib.pyplot as plt

sc.set_figure_params(
    figsize=(5, 5),
)
sc.settings.verbosity = 2  # verbosity: errors (0), warnings (1), info (2), hints (3)

In this tutorial, we guide the user through single cell BCR analysis with Scirpy. The functionality in this notebook requires Scirpy v0.19 or later. Additionally, we leverage the interoperability with Dandelion [SPD+23] for certain preprocessing steps.

This tutorial uses a dataset from Stephenson et al. [SRB+21]. The original dataset consists of 143 samples and > 60k B cells from patients with COVID-19 in different degrees of severity and three control groups. To simplify and speed up the computation, we included cells from the five most abundant COVID-19 positive samples per status category and randomly subsampled down to 5k.

sc.logging.print_header()

scanpy==1.10.4 anndata==0.11.1 umap==0.5.7 numpy==2.0.2 scipy==1.14.1 pandas==2.2.3 scikit-learn==1.5.2 statsmodels==0.14.4 igraph==0.11.8 pynndescent==0.5.13

Pre-processing/Re-annotation#

Unlike TCR data, it is not recommended to directly use the output of Cell Ranger while analysing scBCR-data. Best practice is to reannotate Cell Ranger gene annotation with IgBlast or IMGT/HighV-QUEST, because Cell Ranger currently does not support the IMGT unique numbering scheme [LPommieR+03], which is heavily used by many downstream analysis tools e.g. phylogenetic analysis and mutation inference.

Our data used in this notebook is already reannotated and pre-processed and cells have already passed transcriptome quality control as described by the original authors. Reannotation of BCR data is outside the scope of Scirpy. Instead we refer the reader to one of the following workflows which all rely on the Immcantation suite.

Dandelion [SPD+23] is a Python library for single-cell BCR analysis that also supports re-annotation as showcased in their tutorial. Dandelion has some overlap with Scirpy functionality and is interoperable with Scirpy by using the scirpy.io.from_dandelion()and scirpy.io.to_dandelion() functions that allow to easily switch to Scirpy after the reannotation step.
nf-core/airrflow [GMB+24] is a Nextflow pipeline that can process both bulk and single-cell TCR and BCR-seq data that generates a wide range of outputs, including QC metrics and analysis reports. It may be convenient to first check the outputs of airrflow to get an overview of the data, and then follow up with Scirpy for more custom analyses. The pipeline produces reannotated BCR data in AIRR format that can be imported into Scirpy using scirpy.io.read_airr().
Alternatively, the Immcantation suite can be used directly as showcased in their tutorial. Similar to airrflow, reannotation will result in an AIRR-compilant .tsv file, which can be loaded into Scirpy using scirpy.io.read_airr().

Germline reconstruction#

Germline sequences, primarily required for calculating mutations, can enrich your pre-processed data. Although Scirpy does not currently support germline sequence inference, it’s recommended to leverage its interoperability with Dandelion for this purpose, as detailed in this tutorial on their website. This tutorial will later cover mutation calculation. Note that germline sequences can also be useful for other applications, like phylogenetic analysis, which are beyond the current scope of Scirpy.

Importing data#

Data import into Scirpy from various formats is straightforward, as described in the dedicated tutorial on data loading. In this case, we directly load the demo dataset:

mdata = ir.datasets.stephenson2021_5k()

mdata is a MuData object, which is a framework for multimodal data analysis with a strong focus on multi-omics. It essentially builds on top of AnnData and its general usage is outlined in Scirpy’s data structure.

mdata

MuData object with n_obs × n_vars = 5000 × 24929
  2 modalities
    gex:	5000 x 24929
      obs:	'sample_id', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'full_clustering', 'initial_clustering', 'Resample', 'Collection_Day', 'Sex', 'Age_interval', 'Swab_result', 'Status', 'Smoker', 'Status_on_day_collection', 'Status_on_day_collection_summary', 'Days_from_onset', 'Site', 'time_after_LPS', 'Worst_Clinical_Status', 'Outcome', 'patient_id'
      var:	'feature_types'
      obsm:	'X_pca', 'X_pca_harmony', 'X_umap'
      layers:	'raw'
    airr:	5000 x 0
      obs:	'receptor_type', 'receptor_subtype', 'chain_pairing'
      uns:	'chain_indices', 'scirpy_version'
      obsm:	'airr', 'chain_indices'

BCR Quality Control#

Although many AIRR analysis tools recommend to only include single paired AIR sequences, there is increasing evidence that allelically and isotypically included B cells are quite common [ZPWY23]. During the QC process, the user can decide to filter or keep these cases as appropriate for their analysis.

The demo dataset used here is already pre-filtered, and filtering steps are shown for the sake of completeness below. Typically, we suggest to remove cells flagged as orphan, because they cannot be utilized for downstream analyses that require full AIR sequence information. Similarly, cells flagged as multichain are likely undetected doublets and should be excluded.

Note

Standard Scirpy preprocessing workflow

scirpy.pp.index_chains(): create chain indices for the receptor data
scirpy.tl.chain_qc(): add a summary about the immune cell-receptor compositions to .obs
filter as desired

For more information regarding preprocessing and filtering, especially for transcriptomics data, please refer to the single cell best practice book. If you want to gain a better understanding regarding the function calls shown below, please have a look on our dedicated T cell tutorial

ir.pp.index_chains(mdata, filter=["productive", "require_junction_aa"])
ir.tl.chain_qc(mdata)
mu.pp.filter_obs(
    mdata, "airr:chain_pairing", lambda x: ~np.isin(x, ["orphan VDJ", "orphan VJ", "ambiguous", "multichain"])
)

Filtering chains...
Indexing VJ chains...
Indexing VDJ chains...
build result array
Stored result in `mdata.obs["airr:receptor_type"]`.
Stored result in `mdata.obs["airr:receptor_subtype"]`.
Stored result in `mdata.obs["airr:chain_pairing"]`.

Preprocessing Transcriptomics Data#

Certain analyses can combine transcriptomics and AIRR information. Gene expression (GEX) data needs to be filtered and preprocessed as with any other single-cell dataset. Please refer for a real dataset to the scverse tutorials and the single-cell best practice book for advice. The transcriptomics data in this demo dataset has already been prepared appropriately.

Define Clonotype Cluster for BCR#

The general workflow of defining BCR/TCR clonotype clusters is depicted below:

Analysis steps on IR data#
scirpy function	objective
`scirpy.pp.ir_dist()`	Compute sequence-based distance matrices for all VJ and VDJ sequences.
`scirpy.tl.define_clonotype_clusters()`	Cluster cells by the similarity of their CDR3-sequences.
`scirpy.tl.clonotype_network()`	Compute layout of the clonotype network.
`scirpy.pl.clonotype_network()`	Plot clonotype network colored by different parameters.

In Scirpy we generally distinguish between clonotypes (group of cells that have the same anscestor and identical receptors) and clonotype clusters (group of cells that have similar receotors). Because of somatic hypermutation the above definition of clonotypes wouldn’t correctly represent clonal relationship in the case of BCR, and therefore we have to define certain criteria to cluster similar sequences into clonotype clusters.

Currently, there is no commonly agreed best practice on how to define clonotype cluster for BCR sequences, and it remains an active research field [YK15]. In Scirpy, we implemented a simple, yet effective, network-based approach to define BCR clonotype clusters [BvanSchaikS+24] leveraging the same underlying functionality we had already in place for TCRs.

With scirpy.pp.ir_dist() we can calculate the full pairwise-distance matrix between each junction sequence for each light and heavy chain in the dataset. For BCRs, we recommend the normalized_hamming distance which calculates the hamming distance between junction sequences normalized to their respective length applied to nucleotides (sequence = nt) rather than amino acids (sequence = aa), because SHM acts on the nucleotide level [YK15].

To choose a distance cutoff, we can either use a generic threshold of e.g. 85% junction similarity (cutoff = 15), or choose one based on the empirical distribution of distances. To this end, setting min_dist_histogram = True will display the distance-to-nearest distribution of the junction sequences between VJ-chains (light) and VDJ-chains (heavy), respectively. A cutoff is commonly selected such that it separates the bimodality between sequences with closely related sequences from sequences without closely related sequences [GAB+17].

However, as it is apparent by the following histogram this process can be subjective, and sometimes no clear bimodality can be observed. It is generally true that distance-to-nearest plots are bimodal, but it can be difficult to identify one distinct threshold, especially for small datasets[GAB+17]. In such cases, we suggest to use conservative fixed thresholds to minimize human bias.

ir.pp.ir_dist(mdata, metric="normalized_hamming", cutoff=15, sequence="nt", histogram=True)

Computing sequence x sequence distance matrix for VJ sequences.
Computing sequence x sequence distance matrix for VDJ sequences.

../_images/b01692c292902b10cfc245f7414c843e807f9b0dd42a6c5496260f499ee6923f.png

../_images/535362abc28c48bb1a5f938b04749ba5bd23b8a48b784b04e92933e30d913fa1.png

Next up, calling scirpy.tl.define_clonotype_clusters() will actually define the clonotype clusters based on the distances of the VJ and VDJ CDR3-sequences (calculated by scirpy.pp.ir_dist()) and further parameters. To identify cells that originate from the same ancestor, we require them to have the same V-gene (same_v_gene = True) and J-gene (same_j_gene = True) segments [YK15]. Using receptor_arms = "all" and dual_ir = "any" ensures that both heavy and light chains need to match [DID+13], and is lenient about some cells having a secondary chains where others might have none. Setting within_group to an .obs column will ensure cells with different values do not cluster together (e.g. use patient_id to ensure there are no “public” clonotypes). Dealing with allelically/isotypically included B cells in sc-data is still in its upcoming may require some experimentation with the dual_ir parameter.

Note

Re-annotation with IgBlast/IMGT-HighV-QUEST causes at the moment some issues in Scirpy as it commonly integrates allele information as part of the gene annotation (after the asterisk(*)) and will include ambiguous gene annotations (more than one annotation per gene). At the time, manually manipulating both v_call and j_call columns of the respective files prior to import into Scirpy seems to be an acceptable workaround.

These restrictions are used to construct a network where each node represents a unique junction sequence (clonotypes) and the size of the node represents how many identical sequences this node contains. Nodes that pass all specifications of scirpy.tl.define_clonotype_clusters() will be connected by an edge.

Lastly, partitions will specify the algorithm that is used for community detection in igraph. partitions = "fastgreedy" will initiate a greedy hierarchical clustering algorithm, which might lead to some smaller subclusters, especially if cluster are weakly connected [CNM04]. In practice, this makes little difference in scirpy, since edges representing a distance exceeding the cutoff threshold will automatically be elimitated to improve computational efficiency, already leading to disconnected partitions.

scirpy.tl.define_clonotype_clusters() will add a clone_id and a clone_id_sizecolumn to .obs, which can be customized with key_added.

ir.tl.define_clonotype_clusters(
    mdata,
    sequence="nt",
    metric="normalized_hamming",
    receptor_arms="all",
    dual_ir="any",
    same_v_gene=True,
    same_j_gene=True,
    # within_group = "gex:patient_id",  # disallow "public" clonotypes
    partitions="fastgreedy",
    key_added="clone_id_85_similarity",
)

Initializing lookup tables.
Computing clonotype x clonotype distances.
Stored result in `mdata.obs["airr:clone_id_85_similarity"]`.
Stored result in `mdata.obs["airr:clone_id_85_similarity_size"]`.

Calling scirpy.tl.clonotype_network() will compute the layout for the clonotype network needed for visualization. Setting min_cells to an appropriate value is especially important for big datasets to avoid overcrowding the plot.

ir.tl.clonotype_network(
    mdata, sequence="nt", metric="normalized_hamming", min_cells=3, clonotype_key="clone_id_85_similarity"
)

scirpy.pl.clonotype_network() will display a customizable clonotype network. color can be set to any .obs column. Here we see that most expanded clonotype clusters belong to either Moderate and Severe patients. IWe also observe that no clonotype cluster is shared across patients, which we could show systematically by checking the patient IDs stored inside mdata.obs["gex:patient_id]

_ = ir.pl.clonotype_network(
    mdata,
    color="gex:Status_on_day_collection_summary",
    title="Clonotype Network BCR by patient status",
    label_fontsize=9,
    base_size=20,
)

../_images/779794f66b90e5531c9778f13e976ec44875a051efafadffb34dc849fa2730ae.png

Repertoire Analysis#

Clonotype Expansion#

Upon infection, some precursor lymphocytes that recognize an invading antigen proliferate into cells with various phenotypes that establish an appropriate immune response in a process called clonal expansion (also referred to as clonotype expansion) [AGS20]. Clonotype expansion is one way to characterize the immune response.

scirpy.tl.clonal_expansion() adds a label (inside .obs) showing if there are other cells with same clonotype id or not. We can identify different “levels of expansion” by adjusting the breakpoints parameter. This new column is useful to access information about clonal expansion among different groups.

ir.tl.clonal_expansion(mdata, target_col="clone_id_85_similarity", breakpoints=(1, 2))
expanded = np.sum(mdata.obs["airr:clonal_expansion"] != "<= 1") / len(mdata)
print(f"{expanded:.2%}% of cells belong to an expanded (size >=2) clonotype cluster in this dataset.")

Stored result in `mdata.obs["airr:clonal_expansion"]`.
6.04%% of cells belong to an expanded (size >=2) clonotype cluster in this dataset.

This newly generated column can be visualized using scirpy.pl.group_abundance(), which is designed to quickly visualize categorical data. Here it appears that expanded clonotypes are preferentially found in cells labelled as plasma cells and plasmablast. This makes sense as these cell types are primarily involved in the humoral immune response upon infection and proliferate rapidly [NHTC15].

_ = ir.pl.group_abundance(
    mdata,
    target_col="gex:full_clustering",
    groupby="airr:clonal_expansion",
    sort=["<= 1", "<= 2", "> 2"],
    normalize=True,
)

../_images/79b119bd191bca2453b8f7e28ac565b16d201ad3fe52528fa7d95393de75aaa1.png

Alternatively, clonal expansion can be visualized using scirpy.pl.clonal_expansion(). Each bar will represent one group defined groupby and each colour will identify one clonal expansion breakpoint defined by breakpoints.

_ = ir.pl.clonal_expansion(
    mdata,
    target_col="clone_id_85_similarity",
    breakpoints=(1, 2),
    groupby="gex:full_clustering",
    normalize=True,
)

../_images/d1c4b4bd4bace000c89c08ebf3ad5d97c4b058420ebb0fed7f7fbdec4b3ff859.png

Clonotype Diversity#

TODO

Clonotype Abundance#

Sometimes it might be desired to investigate the biggest clonotype cluster, which can be accomplished with calling scirpy.pl.group_abundance(). It is worth mentioning that bars may be normalized to any variable from .obs by setting the normalize parameter accordingly.

Here, we can see that each of the 10 largest clonotype cluster span across at least two cell type annotations.

_ = ir.pl.group_abundance(
    mdata,
    groupby="airr:clone_id_85_similarity",
    target_col="gex:full_clustering",
    max_cols=10,
    # normalize = "gex:patient_id"  # => this would normalise column height based on how many cells per patient were analysed
)

../_images/295d6f1df0f0e64abba96a29c47a8c831c83ce5f361590ba11754b8f8b1758ad.png

Gene usage#

Gene annotation can be utilized to investigate gene usage for various categories like patient related information (e.g. status, disease, treatment) or cell type annotation. By using scirpy.get.airr_context() as a context manager any AIRR variable stored inside the .obsm["airr"] awkward array can be temporarily added to mdata.obs.

Below, this was applied to visualize gene usage per patient-status. We observe that the majority of the most used genes belong to gene family IGHV3 and IGHV4.

with ir.get.airr_context(mdata, "v_call"):
    ax_Status = ir.pl.group_abundance(
        mdata,
        groupby="VDJ_1_v_call",
        target_col="gex:Status_on_day_collection_summary",
        normalize="gex:patient_id",
        max_cols=20,
        figsize=(4, 4),
    )
_ = ax_Status.set_xticklabels(ax_Status.get_xticklabels(), rotation=50, ha="right")

../_images/a7dd5eebdd0fcb468988d01063a93442c56002ae5bb4f0ffece74d20c8559996.png

To quickly get an overview of the exact combination of V(D)J genes scirpy.pl.vdj_usage() can be used. It will plot all gene combinations of the dataset as a Sankey-plot.

ir.pl.vdj_usage(
    mdata,
    vdj_cols=("VJ_1_v_call", "VJ_1_j_call", "VDJ_1_v_call", "VDJ_1_d_call", "VDJ_1_j_call"),
    full_combination=False,
    max_segments=None,
    max_ribbons=30,
    fig_kws={"figsize": (12, 8)},
)

<Axes: >

../_images/7b65b8afafa4eeca370f505d11819a0dbba0200150d6c7cd621ed61f0c91067d.png

A lot of this tutorial was focused on the primary heavy chain, as it is considered to be more diverse compared to the light chain and thus be potentially more interesting for analysis. Either way, analysing the light chain works similarly and may also yield insightful information. Further, it is possible to analyse Alellically included B-cells with a secondary heavy and/or light chain contig as well.

Spectratype analysis#

Spectratype analysis is another way to describe heterogeneity of the V(D)J sequences. The V(D)J-recombination mechanism brings together different gene segments and might introduce insertions and/or deletions of nucleotides between genes segments, which leads to a broad spectrum of lengths of the resulting transcript [MW17]. Scirpy offers scirpy.pl.spectratype() that plots a distribution by summarizing the length of the junction (CDR3) region for all cells and chains.

Using scirpy.pl.spectratype() we can show that while the spectratype distribution of the whole dataset is almost a normal distribution with a peak at 16 amino acids, there appears to be a preferential usage of spectratypes with length 20 for expanded clonotype clusters. At the same time, spectratypes of length 16 in expanded cells sppear to be far less abundant than expected from the overall spectratype distribution.

with ir.get.airr_context(mdata, "v_call"):
    ir.pl.spectratype(mdata, chain=["VDJ_1"], color="VDJ_1_v_call", viztype="bar", fig_kws={"dpi": 120}, figsize=[6, 6])

../_images/65b6542806eee6945695390f6fcb9e48c9d9f762472dba7fecb393166a2838ad.png

with ir.get.airr_context(mdata, "v_call"):
    ir.pl.spectratype(
        mdata[
            mdata.obs["airr:clonal_expansion"].isin(["<= 2", "> 2"]),
            :,
        ],
        chain=["VDJ_1"],
        cdr3_col="junction_aa",
        color="VDJ_1_v_call",
        viztype="bar",
        fig_kws={"dpi": 120},
        figsize=[6, 6],
    )

../_images/f1707143995addbbed4a06dbdbf5cb7d18e229a02bc07a92822e6c8ef52f2671.png

To visualize different spectratype distributions across different categories (like cells/patients) we can set viztype = "curve" and the curve_layout = "shifted" to “shifted”. This will visualize all distinct categories as a “ridge-plot”. The plot below suggests that plasmablasts could contribute towards the observed peak at 20 aa for expanded clonotype clusters as the junction sequence distribution of this particular cell type seems to be skewed towards longer spectratypes and plasmablasts were preferentially expanded.

ir.pl.spectratype(
    mdata,
    cdr3_col="junction_aa",
    chain="VDJ_1",
    color="gex:full_clustering",
    viztype="curve",
    curve_layout="shifted",
    fig_kws={"figsize": [6, 6]},
    kde_kws={"kde_norm": False},
)

<Axes: title={'center': 'Spectratype of junction_aa by gex:full_clustering'}, xlabel='junction_aa length'>

../_images/bc3333dab63c33cef0e92592667e404d187b8d3095801b945dd0e7ff95892390.png

Thanks to the seamless integration of scverse tools, we can also leverage Scanpy funtionctionalities for plotting. So instead of analysing/visualizing categorical data via “ridge”-plots as shown above we could also, e.g. create a violin plot using scanpy.pl.violin().
The context manager scirpy.get.obs_context() allows to temporary add AIRR data into the .obs column for plotting.

Below, we use a violin plot to visualize potential differences between patient-status groups.

with ir.get.obs_context(
    mdata["gex"], {"junction_len": [len(a) for a in ir.get.airr(mdata, "junction_aa", "VDJ_1")]}
) as m:
    sc.pl.violin(
        m,
        "junction_len",
        groupby="Status_on_day_collection_summary",
        stripplot=False,
        inner="box",
        rotation=45,
        order=["Asymptomatic", "Mild", "Moderate", "Severe", "Critical"],
    )

../_images/bd139c3ca3a206260a955d9faca6381275a01ece222b32839aa9dd2f54752636.png

Logoplots#

Motif sequence analysis is a crucial analysis step and is often executed through generation of logoplots with respective tools. Here, we represent a Scirpy native wrapper function scirpy.pl.logoplot_cdr3_motif to a well established python package called logomaker. With to_type we can specify, how the logoplot is calculated and plot_defaultoffers a convenient “pre-styling” that can be easily adapted. As logomaker offers a lot of further customization opportunities and scirpy.pl.logoplot_cdr3_motif, will allow any input that is accepted by logomaker we suggest to head to their documentation to maximize the use of this amazing package.

Note

The input to scirpy.pl.logoplot_cdr3_motif is expected to include only sequences that have the same junction length, because it is not possible to perform multiple sequence alignment with Scirpy. Therefore, the user has to manually filter the MuData object prior or while the function call.

The logoplot below shows the likelihood of any amino acid to occur among expanded clonotype clusters with a spectratype of 20.

with ir.get.obs_context(
    mdata["airr"], {"junction_len": [len(a) for a in ir.get.airr(mdata, "junction_aa", "VDJ_1")]}
) as m:
    ir.pl.logoplot_cdr3_motif(
        m[(m.obs["junction_len"] == 20) & (m.obs["clonal_expansion"].isin(["<= 2", "> 2"]))],
        chains=["VDJ_1"],
        to_type="information",
    )

../_images/7285d36c1fb7d2ba3f2b1eeb3ceef49c3f35e8a3a5fa98d5f57cc740d4bf4d64.png

The next logoplot is similar to the previous one, but should demonstrate that it is very easy to generate logoplots of certain clonotype clusters, as they always have the same junction length due to the nature of the used hamming distance. As expected there is only small variation inside the same clonotype cluster, but interestingly position 10 and 12 of the VDJ chain differs the most. It is easy to show multiple chains next to each other using plt.subplots().

fix, axs = plt.subplots(1, 2, figsize=(12, 2))
for chain, ax in zip(["VJ_1", "VDJ_1"], axs, strict=False):
    ir.pl.logoplot_cdr3_motif(
        mdata[mdata.obs["airr:clone_id_85_similarity"] == "2707"],
        chains=chain,
        to_type="information",
        ax=ax,
    )

../_images/3d799167cf85f34ae006c23902a2f2a50e250b274cc50c193655a76b397cb799.png

Analysing somatic hypermutation#

While V(D)J-recombination is quite similar between TCR and BCR sequences, somatic hypermuation (SHM) only takes place in activated B cells as part of affinity maturation.

Scirpy allows investigating mutations based on differences between observed sequence and inferred germline sequence. This simplistic approach is powerful, widely used, and benefits from the recent improvements of germline inference tools. Calling scirpy.tl.mutational_load() requires both a column containing sequence alignment and germline alignment information that are IMGT-gapped ([LPommieR+03]). It calculates both the absolute and relative mutational load for the entire receptor sequence, the v segment, and each subregion (FWR1, CDR1, FWR2, CDR2, FWR3, CDR3, FWR4) individually and stores the results in the awkward array that holds all the chain-level information.

ir.tl.mutational_load(mdata)

We can again make use of the :func:scirpy.get.airr_context function to temporarily add the information to adata.obs for plotting.

Below, a violinplot shows the mutation in the heavy chain V-Region grouped by cell cluster annotation. For example, cells labelled as immature or naive B cells experience on average only very little mutation.

with ir.get.airr_context(mdata, "v_mutation_freq"):
    sc.pl.violin(
        mdata, ["VDJ_1_v_mutation_freq"], groupby="gex:full_clustering", stripplot=False, inner="box", rotation=80
    )

../_images/3c9df6cb219242a944098af709e078c5f8032870b46d2df92ceaaeba1541f4ec.png

The next violin plots should only serve to highlight the usage of setting region to “subregion” as FWR1-4 and CDR1-3 can be accessed independently. Here, we exemplarily show FWR1 vs. CDR1. It appears that CDR are more mutated than FWR, which is consistent with the literature.

with ir.get.airr_context(mdata, ["fwr1_mutation_freq", "cdr1_mutation_freq"]):
    sc.pl.violin(
        mdata,
        ["VJ_1_fwr1_mutation_freq", "VJ_1_cdr1_mutation_freq"],
        groupby="gex:Status_on_day_collection_summary",
        stripplot=False,
        inner="box",
        rotation=45,
        order=["Asymptomatic", "Mild", "Moderate", "Severe", "Critical"],
    )

../_images/57206af4299163c7bcec25daa20c7291d4e36b445e656f2070337c118e0f0616.png

BCR analysis in Scirpy

Contents

BCR analysis in Scirpy#

Pre-processing/Re-annotation#

Germline reconstruction#

Importing data#

BCR Quality Control#

Preprocessing Transcriptomics Data#

Define Clonotype Cluster for BCR#

Repertoire Analysis#

Clonotype Expansion#

Clonotype Diversity#

Clonotype Abundance#

Gene usage#

Spectratype analysis#

Logoplots#

Analysing somatic hypermutation#