scirpy.pp.index_chains

Contents

scirpy.pp.index_chains#

scirpy.pp.index_chains(adata, *, filter=('productive', 'require_junction_aa'), sort_chains_by=mappingproxy({'umi_count': 0, 'duplicate_count': 0, 'consensus_count': 0, 'junction': '', 'junction_aa': ''}), airr_mod='airr', airr_key='airr', key_added='chain_indices')#

Selects primary/secondary VJ/VDJ chains per cell according to the Immune receptor (IR) model.

This function iterates through all chains stored in the awkward array in adata.obsm[airr_key] and

  • labels chains as primary/secondary VJ/VDJ chains

  • labels cells as multichain cells

based on the expression level of the chains and the specified filtering option. By default, non-productive chains and chains without a valid CDR3 amino acid sequence are filtered out.

Additionally, chains without a valid IMGT locus are always filtered out.

For more details, please refer to the Immune receptor (IR) model and the data structure.

Parameters:
  • adata (Union[AnnData, MuData, DataHandler]) – AnnData or MuData object that contains AIRR information.

  • filter (Callable[[Array], bool] | Sequence[str | Callable[[Array], bool]] (default: ('productive', 'require_junction_aa'))) –

    Option to filter chains. Can be either
    • a callback function that takes the full awkward array with AIRR chains as input and returns another awkward array that is a boolean mask which can be used to index the former. (True to keep, False to discard)

    • a list of “filtering presets”. Possible values are "productive" and "require_junction_aa". "productive" removes non-productive chains and "require_junction_aa" removes chains that don’t have a CDR3 amino acid sequence.

    • a list with a combination of both.

    Multiple presets/functions are combined using and. Filtered chains do not count towards calling “multichain” cells.

  • sort_chains_by (Mapping[str, Any] (default: mappingproxy({'umi_count': 0, 'duplicate_count': 0, 'consensus_count': 0, 'junction': '', 'junction_aa': ''}))) – A list of sort keys used to determine an ordering of chains. The chain with the highest value of this tuple will be the primary chain, second-highest the secondary chain. If there are more chains, they will not be indexed, and the cell receives the “multichain” flag.

  • airr_mod (str (default: 'airr')) – Name of the modality with AIRR information is stored in the MuData object. if an AnnData object is passed to the function, this parameter is ignored.

  • airr_key (str (default: 'airr')) – Key under which the AIRR information is stored in adata.obsm as an awkward array.

  • key_added (str (default: 'chain_indices')) – Key under which the chain indicies will be stored in adata.obsm and metadata will be stored in adata.uns.

Return type:

None

Returns:

Nothing, but adds a dataframe to adata.obsm[chain_indices]