scirpy.tl.mutational_load#

scirpy.tl.mutational_load(adata, *, regions=('full', 'v', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3'), airr_mod='airr', airr_key='airr', chain_idx_key='chain_indices', sequence_key='sequence_alignment', germline_key='germline_alignment_d_mask', junction_key='junction', ignore_chars=('.', 'N'))#

Calculates absolute and relative mutational load of receptor sequences based on germline alignment.

Receptor sequences MUST be IMGT-aligned and the corresponding germline sequence MUST be available (See sequence_key and germline_key parameters).

IMGT-alignments can be obtained by using the interoperability with Dandelion.

Region boundaries are implemented as described in the shazam documentation which follows the IMGT unique numbering scheme.

Parameters:

adata (Union[AnnData, MuData, DataHandler]) – AnnData or MuData object that contains AIRR information.
regions (Sequence[Literal['full', 'v', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3']] (default: ('full', 'v', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3'))) –
Specify for which regions to calculate the mutational load. By default, calculate it for all regions. The segments follow the definition described in the shazam documentation.
- full: the full sequence without any sub-regions/divisions
- v: Only V_segment (Nucleotides 1 to 312)
- fwr1: Positions 1 to 78.
- cdr1: Positions 79 to 114.
- fwr2: Positions 115 to 165.
- cdr2: Positions 166 to 195.
- fwr3: Positions 196 to 312.
- cdr3: Positions 313 to (313 + juncLength - 6) since the junction sequence includes (on the left) the last codon from FWR3 and (on the right) the first codon from FWR4.
- fwr4: Positions (313 + juncLength - 6 + 1) to the end of the sequence.
airr_mod (default: 'airr') – Name of the modality with AIRR information is stored in the MuData object. if an AnnData object is passed to the function, this parameter is ignored.
airr_key (default: 'airr') – Key under which the AIRR information is stored in adata.obsm as an awkward array.
chain_idx_key (default: 'chain_indices') – Key to select chain indices
sequence_key (str (default: 'sequence_alignment')) – Awkward array key to access sequence alignment information. The sequence must be IMGT-aligned.
germline_key (str (default: 'germline_alignment_d_mask')) – Awkward array key to access germline alignment information. This must be the TMGT germline reference. It is recommended to mask the d-segment with N`s (see `Yaari et al. (2015))
junction_key (str (default: 'junction')) – Awkward array key to access the nucleotide junction sequence. This information is required to obtain the junction length required to calculate the coordinates of the cdr3 and fwr4 regions.
ignore_chars (Sequence[str] (default: ('.', 'N'))) –
A list of characters to ignore while calculating differences. The default is to ignore the following:
- "N": masked or degraded nucleotide. For instance, it is recommended to mask the D-segment, because of lower sequence quality
- ".": “IMGT-gaps”, distinct from “normal gaps (‘-‘)”. It is beneficial to ignore these, because sometimes sequence alignments are “clipped” at the beginning, which would inflate the mutaiton count.

Return type:

None

Returns:

A value for each chain is stored in the awkward array used as input (typically adata.obsm["airr"]) under the keys "{region}_mutation_count" and "{region}_mutation_freq" for each region specified in the `regions parameter. The mutational load for the "full" region is stored in mutation_count and mutation_freq, respectively (i.e. without the {region} prefix). Use scirpy.get.airr() to retrieve the values as a Dataframe.

scirpy.tl.mutational_load

Contents

scirpy.tl.mutational_load#