scirpy.ir_dist.metrics.HammingDistanceCalculator#

class scirpy.ir_dist.metrics.HammingDistanceCalculator(cutoff=2, **kwargs)#

Calculates the Hamming distance between sequences of identical length.

The edit distance is the total number of substitution events. Sequences with different lengths will be treated as though they exceeded the distance-cutoff, i.e. they receive a distance of 0 in the sparse distance matrix and will not be connected by an edge in the graph.

This class relies on Python-levenshtein to calculate the distances.

Choosing a cutoff:

Each modification stands for a substitution event. While lacking empirical data, it seems unlikely that CDR3 sequences with more than two modifications still recognize the same antigen.

Parameters:
  • cutoff (int (default: 2)) – Will eleminate distances > cutoff to make efficient use of sparse matrices. The default cutoff is 2.

  • n_jobs – Number of jobs to use for the pairwise distance calculation, passed to joblib.Parallel. If -1, use all CPUs (only for ParallelDistanceCalculators). Via the joblib.parallel_config context manager, another backend (e.g. dask) can be selected.

  • block_size – Deprecated. This is now set in calc_dist_mat.

Attributes table#

DTYPE

The sparse matrix dtype.

Methods table#

calc_dist_mat(seqs[, seqs2, block_size])

Calculate the distance matrix.

squarify(triangular_matrix)

Mirror a triangular matrix at the diagonal to make it a square matrix.

Attributes#

HammingDistanceCalculator.DTYPE = 'uint8'#

The sparse matrix dtype. Defaults to uint8, constraining the max distance to 255.

Methods#

HammingDistanceCalculator.calc_dist_mat(seqs, seqs2=None, *, block_size=None)#

Calculate the distance matrix.

See DistanceCalculator.calc_dist_mat().

Parameters:
  • seqs (Sequence[str]) – array containing CDR3 sequences. Must not contain duplicates.

  • seqs2 (Optional[Sequence[str]] (default: None)) – second array containing CDR3 sequences. Must not contain duplicates either.

  • block_size (Optional[int] (default: None)) – The width of a block that’s sent to a worker. A block contains block_size ** 2 elements. If None the block size is determined automatically based on the problem size.

Return type:

csr_matrix

Returns:

Sparse pairwise distance matrix.

static HammingDistanceCalculator.squarify(triangular_matrix)#

Mirror a triangular matrix at the diagonal to make it a square matrix.

The input matrix must be upper triangular to begin with, otherwise the results will be incorrect. No guard rails!

Return type:

csr_matrix