scirpy.ir_dist.metrics.DistanceCalculator#

class scirpy.ir_dist.metrics.DistanceCalculator(cutoff)#

Abstract base class for a CDR3-sequence distance calculator.

Parameters:

cutoff (Optional[int]) – Distances > cutoff will be eliminated to make efficient use of sparse matrices. If None, the default cutoff shall be used.

Attributes table#

DTYPE

The sparse matrix dtype.

Methods table#

calc_dist_mat(seqs[, seqs2])

Calculate pairwise distance matrix of all sequences in seqs and seqs2.

squarify(triangular_matrix)

Mirror a triangular matrix at the diagonal to make it a square matrix.

Attributes#

DistanceCalculator.DTYPE = 'uint8'#

The sparse matrix dtype. Defaults to uint8, constraining the max distance to 255.

Methods#

abstract DistanceCalculator.calc_dist_mat(seqs, seqs2=None)#

Calculate pairwise distance matrix of all sequences in seqs and seqs2.

When seqs2 is omitted, computes the pairwise distance of seqs against itself.

Calculates the full pairwise distance matrix.

Important

  • Distances are offset by 1 to allow efficient use of sparse matrices (\(d' = d+1\)).

  • That means, a distance > cutoff is represented as 0, a distance == 0 is represented as 1, a distance == 1 is represented as 2 and so on.

  • Only returns distances <= cutoff. Larger distances are eliminated from the sparse matrix.

  • Distances are non-negative.

Parameters:
  • seqs (Sequence[str]) – array containing CDR3 sequences. Must not contain duplicates.

  • seqs2 (Optional[Sequence[str]] (default: None)) – second array containing CDR3 sequences. Must not contain duplicates either.

Return type:

csr_matrix

Returns:

Sparse pairwise distance matrix.

static DistanceCalculator.squarify(triangular_matrix)#

Mirror a triangular matrix at the diagonal to make it a square matrix.

The input matrix must be upper triangular to begin with, otherwise the results will be incorrect. No guard rails!

Return type:

csr_matrix