scirpy.pp.index_chains#
- scirpy.pp.index_chains(adata, *, filter=('productive', 'require_junction_aa'), sort_chains_by=mappingproxy({'umi_count': 0, 'duplicate_count': 0, 'consensus_count': 0, 'junction': '', 'junction_aa': ''}), airr_mod='airr', airr_key='airr', key_added='chain_indices')#
Selects primary/secondary VJ/VDJ chains per cell according to the Immune receptor (IR) model.
This function iterates through all chains stored in the awkward array in
adata.obsm[airr_key]
andlabels chains as primary/secondary VJ/VDJ chains
labels cells as multichain cells
based on the expression level of the chains and the specified filtering option. By default, non-productive chains and chains without a valid CDR3 amino acid sequence are filtered out.
Additionally, chains without a valid IMGT locus are always filtered out.
For more details, please refer to the Immune receptor (IR) model and the data structure.
- Parameters:
adata (
Union
[AnnData
,MuData
,DataHandler
]) – AnnData or MuData object that contains AIRR information.filter (
Callable
[[Array
],bool
] |Sequence
[str
|Callable
[[Array
],bool
]] (default:('productive', 'require_junction_aa')
)) –- Option to filter chains. Can be either
a callback function that takes the full awkward array with AIRR chains as input and returns another awkward array that is a boolean mask which can be used to index the former. (True to keep, False to discard)
a list of “filtering presets”. Possible values are
"productive"
and"require_junction_aa"
."productive"
removes non-productive chains and"require_junction_aa"
removes chains that don’t have a CDR3 amino acid sequence.a list with a combination of both.
Multiple presets/functions are combined using
and
. Filtered chains do not count towards calling “multichain” cells.sort_chains_by (
Mapping
[str
,Any
] (default:mappingproxy({'umi_count': 0, 'duplicate_count': 0, 'consensus_count': 0, 'junction': '', 'junction_aa': ''})
)) – A list of sort keys used to determine an ordering of chains. The chain with the highest value of this tuple will be the primary chain, second-highest the secondary chain. If there are more chains, they will not be indexed, and the cell receives the “multichain” flag.airr_mod (
str
(default:'airr'
)) – Name of the modality with AIRR information is stored in theMuData
object. if anAnnData
object is passed to the function, this parameter is ignored.airr_key (
str
(default:'airr'
)) – Key under which the AIRR information is stored in adata.obsm as an awkward array.key_added (
str
(default:'chain_indices'
)) – Key under which the chain indicies will be stored inadata.obsm
and metadata will be stored inadata.uns
.
- Return type:
- Returns:
Nothing, but adds a dataframe to
adata.obsm[chain_indices]