scirpy.ir_dist.metrics.TCRdistDistanceCalculator#
- class scirpy.ir_dist.metrics.TCRdistDistanceCalculator(cutoff=20, *, dist_weight=3, gap_penalty=4, ntrim=3, ctrim=2, fixed_gappos=True, n_jobs=-1, n_blocks=1, histogram=False, base_matrix='blosum62', distance_cap='default', chain_type=None)#
Computes pairwise distances between TCR CDR3 sequences based on the “tcrdist” distance metric.
The code of this class is heavily based on pwseqdist. Reused under MIT license, Copyright (c) 2020 Andrew Fiore-Gartland.
Using default weight, gap penalty, ntrim and ctrim is equivalent to the original distance published in [DFGH+17].
- Parameters:
dist_weight (
int(default:3)) – Weight applied to the mismatch distances before summing with the gap penaltiesgap_penalty (
int(default:4)) – Distance penalty for the difference in the length of the two sequencesntrim/ctrim – Positions trimmed off the N-terminus (0) and C-terminus (L-1) ends of the peptide sequence. These symbols will be ignored in the distance calculation.
fixed_gappos (
bool(default:True)) – If True, insert gaps at a fixed position after the cysteine residue statring the CDR3 (typically position 6). If False, find the “optimal” position for inserting the gaps to make up the difference in lengthcutoff (
int(default:20)) – Will eleminate distances > cutoff to make efficient use of sparse matrices.n_jobs (
int(default:-1)) – Number of numba parallel threads to use for the pairwise distance calculationn_blocks (
int(default:1)) – Number of joblib delayed objects (blocks to compute) given to joblib.Parallelhistogram (
bool(default:False)) – Determines whether a nearest neighbor histogram should be createdbase_matrix (
Literal['blosum62','tcrblosum'] (default:'blosum62')) – Amino acid substitution matrix used by TCRdist."blosum62"uses the original BLOSUM62 substitution matrix, while"tcrblosum"uses TCRBLOSUM substitution matrices ([PVML24]). Depending onchain_type, either the TCRBLOSUM alpha- or beta-chain matrix is used.distance_cap (
Union[int,None,Literal['default']] (default:'default')) – Maximum distance assigned to a mismatch after converting substitution scores to distances. The default value,"default", keeps the original behavior: BLOSUM62 uses a cap of4, while TCRBLOSUM distances are uncapped. Set to an integer to choose a cap explicitly, orNonefor uncapped distances.chain_type (
Optional[Literal['VJ','VDJ']] (default:None)) – Required whenbase_matrix="tcrblosum"."VJ"selects the alpha-chain matrix and"VDJ"selects the beta-chain matrix. When called viair_dist, this value is set automatically and should not be provided.
Attributes table#
Methods table#
|
Calculates the pairwise distances between two vectors of gene sequences based on the distance metric of the derived class and returns a CSR distance matrix. |
Attributes#
- TCRdistDistanceCalculator.blosum62_substitution_matrix = array([[ 4, -1, -2, -2, 0, -1, -1, 0, -2, -1, -1, -1, -1, -2, -1, 1, 0, -3, -2, 0], [-1, 5, 0, -2, -3, 1, 0, -2, 0, -3, -2, 2, -1, -3, -2, -1, -1, -3, -2, -3], [-2, 0, 6, 1, -3, 0, 0, 0, 1, -3, -3, 0, -2, -3, -2, 1, 0, -4, -2, -3], [-2, -2, 1, 6, -3, 0, 2, -1, -1, -3, -4, -1, -3, -3, -1, 0, -1, -4, -3, -3], [ 0, -3, -3, -3, 9, -3, -4, -3, -3, -1, -1, -3, -1, -2, -3, -1, -1, -2, -2, -1], [-1, 1, 0, 0, -3, 5, 2, -2, 0, -3, -2, 1, 0, -3, -1, 0, -1, -2, -1, -2], [-1, 0, 0, 2, -4, 2, 5, -2, 0, -3, -3, 1, -2, -3, -1, 0, -1, -3, -2, -2], [ 0, -2, 0, -1, -3, -2, -2, 6, -2, -4, -4, -2, -3, -3, -2, 0, -2, -2, -3, -3], [-2, 0, 1, -1, -3, 0, 0, -2, 8, -3, -3, -1, -2, -1, -2, -1, -2, -2, 2, -3], [-1, -3, -3, -3, -1, -3, -3, -4, -3, 4, 2, -3, 1, 0, -3, -2, -1, -3, -1, 3], [-1, -2, -3, -4, -1, -2, -3, -4, -3, 2, 4, -2, 2, 0, -3, -2, -1, -2, -1, 1], [-1, 2, 0, -1, -3, 1, 1, -2, -1, -3, -2, 5, -1, -3, -1, 0, -1, -3, -2, -2], [-1, -1, -2, -3, -1, 0, -2, -3, -2, 1, 2, -1, 5, 0, -2, -1, -1, -1, -1, 1], [-2, -3, -3, -3, -2, -3, -3, -3, -1, 0, 0, -3, 0, 6, -4, -2, -2, 1, 3, -1], [-1, -2, -2, -1, -3, -1, -1, -2, -2, -3, -3, -1, -2, -4, 7, -1, -1, -4, -3, -2], [ 1, -1, 1, 0, -1, 0, 0, 0, -1, -2, -2, 0, -1, -2, -1, 4, 1, -3, -2, -2], [ 0, -1, 0, -1, -1, -1, -1, -2, -2, -1, -1, -1, -1, -2, -1, 1, 5, -2, -2, 0], [-3, -3, -4, -4, -2, -2, -3, -2, -2, -3, -2, -3, -1, 1, -4, -3, -2, 11, 2, -3], [-2, -2, -2, -3, -2, -1, -2, -3, 2, -1, -1, -2, -1, 3, -3, -2, -2, 2, 7, -1], [ 0, -3, -3, -3, -1, -2, -2, -3, -3, 3, 1, -2, 1, -1, -2, -2, 0, -3, -1, 4]], dtype=int32)#
- TCRdistDistanceCalculator.matrix_alphabet = 'ARNDCQEGHILKMFPSTWYV'#
- TCRdistDistanceCalculator.parasail_aa_alphabet = 'ARNDCQEGHILKMFPSTWYVBZX'#
- TCRdistDistanceCalculator.parasail_aa_alphabet_with_unknown = 'ARNDCQEGHILKMFPSTWYVBZX*'#
- TCRdistDistanceCalculator.tcrblosum_alpha_substitution_matrix = array([[ 2, -1, -1, -1, 0, 0, 0, 0, 0, -1, -1, -1, -1, -1, 0, 0, -1, 0, -1, 0], [-1, 1, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1], [-1, 0, 1, 0, 0, 0, 0, 0, 0, -1, -2, 1, 0, 0, 0, 0, 0, 0, 0, -2], [-1, 0, 0, 1, -5, 0, 0, 0, 0, -1, -2, 0, 0, 0, 0, 0, 0, 0, 0, -1], [ 0, 1, 0, -5, 2, -4, -4, 0, -2, -5, 0, -5, -4, -4, -4, 0, -6, -2, -5, 0], [ 0, 0, 0, 0, -4, 2, 0, 0, 0, -1, -2, 1, 0, 0, 0, 0, 0, 0, 0, -2], [ 0, 0, 0, 0, -4, 0, 1, 0, 1, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 1, 0, -2, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, -2, 0, 1, 0, 2, 0, 0, -1, 0, 0, 1, 0, 0, 1, 0, 0], [-1, 0, -1, -1, -5, -1, -1, -2, 0, 3, 0, -1, 0, 0, 0, 0, 1, -1, 0, 0], [-1, -1, -2, -2, 0, -2, 0, -1, 0, 0, 2, -4, 0, 1, 0, -1, -1, -1, 0, 0], [-1, 0, 1, 0, -5, 1, -1, -1, -1, -1, -4, 3, 0, -3, 0, -2, -1, -2, -4, -3], [-1, 0, 0, 0, -4, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, -1, 0], [-1, 0, 0, 0, -4, 0, 0, 0, 0, 0, 1, -3, 0, 1, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, -4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -2, 0, 0, 0, 1, 0, 0, 0, -1], [-1, 0, 0, 0, -6, 0, 0, 0, 0, 1, -1, -1, 0, 0, 0, 0, 1, 0, 0, 0], [ 0, 0, 0, 0, -2, 0, 0, 0, 1, -1, -1, -2, 0, 0, 0, 0, 0, 2, 0, -1], [-1, 0, 0, 0, -5, 0, 0, 0, 0, 0, 0, -4, -1, 0, 0, 0, 0, 0, 1, -1], [ 0, -1, -2, -1, 0, -2, 0, 0, 0, 0, 0, -3, 0, 0, 0, -1, 0, -1, -1, 1]], dtype=int32)#
- TCRdistDistanceCalculator.tcrblosum_beta_substitution_matrix = array([[ 0, 0, 0, 0, -5, 0, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0], [ 0, 2, 0, 0, -4, -1, -1, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0], [ 0, 0, 1, 1, -4, 0, 0, 0, 0, 0, -1, 0, 0, -1, 0, -1, 0, 0, 0, 0], [ 0, 0, 1, 1, -4, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0], [-5, -4, -4, -4, 2, -6, -5, 0, -3, -3, -5, -2, -1, -5, -4, 0, -5, -2, -5, -4], [ 0, -1, 0, 0, -6, 2, -1, -1, -1, 0, 1, -1, 0, -2, -1, -2, -1, 0, 0, -1], [-1, -1, 0, 0, -5, -1, 2, 0, -1, 0, -1, 1, 0, -2, 0, -2, 1, 0, -1, 0], [ 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0], [ 0, 0, 0, 0, -3, -1, -1, 0, 2, 0, 0, -1, 0, 2, 0, -1, 0, 0, 1, 0], [ 0, 0, 0, 0, -3, 0, 0, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, -1, 0, -5, 1, -1, 0, 0, 0, 1, 0, 0, 0, 0, -1, 0, 0, 0, 0], [ 0, 0, 0, 0, -2, -1, 1, 0, -1, 0, 0, 1, 0, -1, 0, 0, 0, 0, -1, 0], [ 0, 0, 0, 0, -1, 0, 0, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 0, -1, 0], [-1, -1, -1, -1, -5, -2, -2, -1, 2, 0, 0, -1, 0, 2, 0, -2, 0, 0, 2, -1], [ 0, 0, 0, 0, -4, -1, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, -1, 0], [ 0, 0, -1, -1, 0, -2, -2, 0, -1, 0, -1, 0, 0, -2, -1, 1, 0, 0, -2, 0], [ 0, 0, 0, 0, -5, -1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 0, 0, 0, -2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], [-1, -1, 0, 0, -5, 0, -1, -1, 1, 0, 0, -1, -1, 2, -1, -2, 0, 0, 2, -1], [ 0, 0, 0, 0, -4, -1, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0, 0, -1, 0]], dtype=int32)#
Methods#
- TCRdistDistanceCalculator.calc_dist_mat(seqs, seqs2=None)#
Calculates the pairwise distances between two vectors of gene sequences based on the distance metric of the derived class and returns a CSR distance matrix. Also creates a histogram based on the minimum value per row of the distance matrix if histogram is set to True.
- Return type: