scirpy.ir_dist.metrics.LevenshteinDistanceCalculator#

class scirpy.ir_dist.metrics.LevenshteinDistanceCalculator(cutoff=2, **kwargs)#

Calculates the Levenshtein edit-distance between sequences.

The edit distance is the total number of deletion, addition and modification events.

This class relies on Python-levenshtein to calculate the distances.

Choosing a cutoff:: Each modification stands for a deletion, addition or modification event. While lacking empirical data, it seems unlikely that CDR3 sequences with more than two modifications still recognize the same antigen.

Parameters:

cutoff (int (default: 2)) – Will eleminate distances > cutoff to make efficient use of sparse matrices. The default cutoff is 2.
n_jobs – Number of jobs to use for the pairwise distance calculation, passed to joblib.Parallel. If -1, use all CPUs (only for ParallelDistanceCalculators). Via the joblib.parallel_config context manager, another backend (e.g. dask) can be selected.
block_size – Deprecated. This is now set in calc_dist_mat.

Attributes table#

The sparse matrix dtype.

`calc_dist_mat`(seqs[, seqs2, block_size])	Calculate the distance matrix.
`squarify`(triangular_matrix)	Mirror a triangular matrix at the diagonal to make it a square matrix.

LevenshteinDistanceCalculator.DTYPE = 'uint8'#: The sparse matrix dtype. Defaults to uint8, constraining the max distance to 255.

LevenshteinDistanceCalculator.calc_dist_mat(seqs, seqs2=None, *, block_size=None)#

Calculate the distance matrix.

Parameters:

seqs (Sequence[str]) – array containing CDR3 sequences. Must not contain duplicates.
seqs2 (Optional[Sequence[str]] (default: None)) – second array containing CDR3 sequences. Must not contain duplicates either.
block_size (Optional[int] (default: None)) – The width of a block that’s sent to a worker. A block contains block_size ** 2 elements. If None the block size is determined automatically based on the problem size.

Return type:

csr_matrix

Returns:

Sparse pairwise distance matrix.

static LevenshteinDistanceCalculator.squarify(triangular_matrix)#

Mirror a triangular matrix at the diagonal to make it a square matrix.

The input matrix must be upper triangular to begin with, otherwise the results will be incorrect. No guard rails!