scirpy.ir_dist.metrics.LevenshteinDistanceCalculator#

class scirpy.ir_dist.metrics.LevenshteinDistanceCalculator(cutoff=None, **kwargs)#

Calculates the Levenshtein edit-distance between sequences.

The edit distance is the total number of deletion, addition and modification events.

This class relies on Python-levenshtein to calculate the distances.

Choosing a cutoff:

Each modification stands for a deletion, addition or modification event. While lacking empirical data, it seems unlikely that CDR3 sequences with more than two modifications still recognize the same antigen.

Parameters:
  • cutoff (Optional[int] (default: None)) – Will eleminate distances > cutoff to make efficient use of sparse matrices. The default cutoff is 2.

  • n_jobs – Number of jobs to use for the pairwise distance calculation. If None, use all jobs (only for ParallelDistanceCalculators).

  • block_size – The width of a block of the matrix that will be delegated to a worker process. The block contains block_size ** 2 elements.

Attributes table#

DTYPE

The sparse matrix dtype.

Methods table#

calc_dist_mat(seqs[, seqs2])

Calculate the distance matrix.

squarify(triangular_matrix)

Mirror a triangular matrix at the diagonal to make it a square matrix.

Attributes#

LevenshteinDistanceCalculator.DTYPE = 'uint8'#

The sparse matrix dtype. Defaults to uint8, constraining the max distance to 255.

Methods#

LevenshteinDistanceCalculator.calc_dist_mat(seqs, seqs2=None)#

Calculate the distance matrix.

See DistanceCalculator.calc_dist_mat().

Return type:

csr_matrix

static LevenshteinDistanceCalculator.squarify(triangular_matrix)#

Mirror a triangular matrix at the diagonal to make it a square matrix.

The input matrix must be upper triangular to begin with, otherwise the results will be incorrect. No guard rails!

Return type:

csr_matrix