scirpy.ir_dist.metrics.LevenshteinDistanceCalculator#
- class scirpy.ir_dist.metrics.LevenshteinDistanceCalculator(cutoff=None, **kwargs)#
Calculates the Levenshtein edit-distance between sequences.
The edit distance is the total number of deletion, addition and modification events.
This class relies on Python-levenshtein to calculate the distances.
- Choosing a cutoff:
Each modification stands for a deletion, addition or modification event. While lacking empirical data, it seems unlikely that CDR3 sequences with more than two modifications still recognize the same antigen.
- Parameters:
cutoff (
Optional
[int
] (default:None
)) – Will eleminate distances > cutoff to make efficient use of sparse matrices. The default cutoff is2
.n_jobs – Number of jobs to use for the pairwise distance calculation. If None, use all jobs (only for ParallelDistanceCalculators).
block_size – The width of a block of the matrix that will be delegated to a worker process. The block contains
block_size ** 2
elements.
Attributes table#
The sparse matrix dtype. |
Methods table#
|
Calculate the distance matrix. |
|
Mirror a triangular matrix at the diagonal to make it a square matrix. |
Attributes#
- LevenshteinDistanceCalculator.DTYPE = 'uint8'#
The sparse matrix dtype. Defaults to uint8, constraining the max distance to 255.
Methods#
- LevenshteinDistanceCalculator.calc_dist_mat(seqs, seqs2=None)#
Calculate the distance matrix.
See
DistanceCalculator.calc_dist_mat()
.- Return type:
- static LevenshteinDistanceCalculator.squarify(triangular_matrix)#
Mirror a triangular matrix at the diagonal to make it a square matrix.
The input matrix must be upper triangular to begin with, otherwise the results will be incorrect. No guard rails!
- Return type: