scirpy.tl.ir_query_annotate#
- scirpy.tl.ir_query_annotate(adata, reference, *, sequence='aa', metric='identity', strategy='unique-only', include_ref_cols=None, query_key=None, suffix='', inplace=True, airr_mod='airr', airr_mod_ref='airr')#
Annotate cells based on the result of
ir_query()
.Warning
This is an experimental function that may change in the future.
Multiple entries from the reference can match a single cell in the query dataset. In order to reduce the matching entries to a single value that can be added to
adata.obs
and used for plotting and other downstream analyses, you’ll need to choose a strategy to deal with duplicates:unique-only: Only annotate those cells that have a unique result. Cells with multiple inconsistent matches will receive the predicate “ambiguous”
most-frequent: if there are multiple matches, assign the match that is most frequent. If there are ties, it will receive the predicate “ambiguous”
json: store multiple values and their counts as json string
NA values are ignored in all strategies (e.g. if an entry matches
"foo"
andnan
,"foo"
is considered unique)Alternatively, you can use
scirpy.tl.ir_query_annotate_df()
to obtain a data frame mapping all cells to their matching entries fromreference.obs
.- Parameters:
adata (
Union
[AnnData
,MuData
,DataHandler
]) – query datasetreference (
Union
[AnnData
,MuData
,DataHandler
]) – reference dataset in anndata format. Must be the same used to runquery_reference
.sequence (
Literal
['aa'
,'nt'
] (default:'aa'
)) – The sequence parameter used when runningscirpy.pp.ir_dist()
metric (
Union
[Literal
['alignment'
,'fastalignment'
,'identity'
,'levenshtein'
,'hamming'
],DistanceCalculator
] (default:'identity'
)) – The metric parameter used when runningscirpy.pp.ir_dist()
strategy (
Literal
['json'
,'unique-only'
,'most-frequent'
] (default:'unique-only'
)) – Strategy to deal with non-unique values (see above).include_ref_cols (
Optional
[Sequence
[str
]] (default:None
)) – Subset the reference database to these columns. Default: include all.query_key (
Optional
[str
] (default:None
)) – Use the distance matric stored under this key inadata.uns
. If set to None, the key is automatically inferred based onreference
,sequence
, andmetric
. Additional arguments are passed to the last join.suffix (
str
(default:''
)) – Removed in v0.13. Has no effect.inplace (default:
True
) – IfTrue
, a column with the result will be stored inobs
. Otherwise the result will be returned.airr_mod (
str
(default:'airr'
)) – Name of the modality with AIRR information is stored in theMuData
object. if anAnnData
object is passed to the function, this parameter is ignored.airr_mod_ref (
str
(default:'airr'
)) – Likeairr_mod
, but forreference
.
- Return type:
- Returns:
If inplace is True, modifies
adata.obs
inplace. Otherwise returns a data-frame with one column for each column inreference.obs
, aligned toadata.obs_names
.