scirpy.tl.ir_query_annotate_df#
- scirpy.tl.ir_query_annotate_df(adata, reference, *, sequence='aa', metric='identity', include_ref_cols=None, include_query_cols=(), query_key=None, suffix='', airr_mod='airr', airr_mod_ref='airr')#
Returns the inner join of
adata.obs
with matching entries fromreference.obs
based on the result ofir_query()
.Warning
This is an experimental function that may change in the future.
The function first creates a two-column dataframe mapping cell indices of
adata
to cell indices ofreference
. It then performs an inner join withreference.obs
, and finally performs another join withquery.obs
.This function requires that
~scirpy.tl.ir_query
has been executed onadata
with the same reference and the same parameters forsequence
andmetric
.This function returns all matching entries in the reference database, which can be none for some cells, but many for others. If you want to add a single column to
adata.obs
for plotting, please refer to~scirpy.tl.ir_query_annotate
.- Parameters:
adata (
Union
[AnnData
,MuData
,DataHandler
]) – query datasetreference (
Union
[AnnData
,MuData
,DataHandler
]) – reference datasetsequence (
Literal
['aa'
,'nt'
] (default:'aa'
)) – The sequence parameter used when runningscirpy.pp.ir_dist()
metric (
Union
[Literal
['alignment'
,'fastalignment'
,'identity'
,'levenshtein'
,'hamming'
,'normalized_hamming'
,'tcrdist'
],DistanceCalculator
] (default:'identity'
)) – The metric parameter used when runningscirpy.pp.ir_dist()
include_ref_cols (
Optional
[Sequence
[str
]] (default:None
)) – Subset the reference database to these columns. Default: include all.include_query_cols (
Sequence
[str
] (default:()
)) – Subsetadata.obs
to these columns. Default: include all.query_key (
Optional
[str
] (default:None
)) – Use the distance matric stored under this key inadata.uns
. If set to None, the key is automatically inferred based onreference
,sequence
, andmetric
. Additional arguments are passed to the last join.suffix (
str
(default:''
)) – Suffix appended to columns fromreference.obs
in case their names are conflicting with those inadata.obs
.airr_mod (
str
(default:'airr'
)) – Name of the modality with AIRR information is stored in theMuData
object. if anAnnData
object is passed to the function, this parameter is ignored.airr_mod_ref (
str
(default:'airr'
)) – Likeairr_mod
, but forreference
.
- Return type:
- Returns:
DataFrame with matching entries from
reference.obs
.