scirpy.tl.ir_query_annotate#
- scirpy.tl.ir_query_annotate(adata, reference, *, sequence='aa', metric='identity', strategy='unique-only', include_ref_cols=None, query_key=None, suffix='', inplace=True, airr_mod='airr', airr_mod_ref='airr')#
Annotate cells based on the result of
ir_query().Warning
This is an experimental function that may change in the future.
Multiple entries from the reference can match a single cell in the query dataset. In order to reduce the matching entries to a single value that can be added to
adata.obsand used for plotting and other downstream analyses, you’ll need to choose a strategy to deal with duplicates:unique-only: Only annotate those cells that have a unique result. Cells with multiple inconsistent matches will receive the predicate “ambiguous”
most-frequent: if there are multiple matches, assign the match that is most frequent. If there are ties, it will receive the predicate “ambiguous”
json: store multiple values and their counts as json string
NA values are ignored in all strategies (e.g. if an entry matches
"foo"andnan,"foo"is considered unique)Alternatively, you can use
scirpy.tl.ir_query_annotate_df()to obtain a data frame mapping all cells to their matching entries fromreference.obs.- Parameters:
adata (
Union[AnnData,MuData,DataHandler]) – query datasetreference (
Union[AnnData,MuData,DataHandler]) – reference dataset in anndata format. Must be the same used to runquery_reference.sequence (
Literal['aa','nt'] (default:'aa')) – The sequence parameter used when runningscirpy.pp.ir_dist()metric (
Union[Literal['alignment','identity','levenshtein','hamming'],DistanceCalculator] (default:'identity')) – The metric parameter used when runningscirpy.pp.ir_dist()strategy (
Literal['json','unique-only','most-frequent'] (default:'unique-only')) – Strategy to deal with non-unique values (see above).include_ref_cols (
Optional[Sequence[str]] (default:None)) – Subset the reference database to these columns. Default: include all.query_key (
Optional[str] (default:None)) – Use the distance matric stored under this key inadata.uns. If set to None, the key is automatically inferred based onreference,sequence, andmetric. Additional arguments are passed to the last join.suffix (
str(default:'')) – Removed in v0.13. Has no effect.inplace (default:
True) – IfTrue, a column with the result will be stored inobs. Otherwise the result will be returned.airr_mod (
str(default:'airr')) – Name of the modality with AIRR information is stored in theMuDataobject. if anAnnDataobject is passed to the function, this parameter is ignored.airr_mod_ref (
str(default:'airr')) – Likeairr_mod, but forreference.
- Return type:
- Returns:
If inplace is True, modifies
adata.obsinplace. Otherwise returns a data-frame with one column for each column inreference.obs, aligned toadata.obs_names.