scirpy.tl.ir_query_annotate_df

scirpy.tl.ir_query_annotate_df#

scirpy.tl.ir_query_annotate_df(adata, reference, *, sequence='aa', metric='identity', include_ref_cols=None, include_query_cols=(), query_key=None, suffix='', airr_mod='airr', airr_mod_ref='airr')#

Returns the inner join of adata.obs with matching entries from reference.obs based on the result of ir_query().

Warning

This is an experimental function that may change in the future.

The function first creates a two-column dataframe mapping cell indices of adata to cell indices of reference. It then performs an inner join with reference.obs, and finally performs another join with query.obs.

This function requires that ~scirpy.tl.ir_query has been executed on adata with the same reference and the same parameters for sequence and metric.

This function returns all matching entries in the reference database, which can be none for some cells, but many for others. If you want to add a single column to adata.obs for plotting, please refer to ~scirpy.tl.ir_query_annotate.

Parameters:
  • adata (Union[AnnData, MuData, DataHandler]) – query dataset

  • reference (Union[AnnData, MuData, DataHandler]) – reference dataset

  • sequence (Literal['aa', 'nt'] (default: 'aa')) – The sequence parameter used when running scirpy.pp.ir_dist()

  • metric (Union[Literal['alignment', 'fastalignment', 'identity', 'levenshtein', 'hamming'], DistanceCalculator] (default: 'identity')) – The metric parameter used when running scirpy.pp.ir_dist()

  • include_ref_cols (Optional[Sequence[str]] (default: None)) – Subset the reference database to these columns. Default: include all.

  • include_query_cols (Sequence[str] (default: ())) – Subset adata.obs to these columns. Default: include all.

  • query_key (Optional[str] (default: None)) – Use the distance matric stored under this key in adata.uns. If set to None, the key is automatically inferred based on reference, sequence, and metric. Additional arguments are passed to the last join.

  • suffix (str (default: '')) – Suffix appended to columns from reference.obs in case their names are conflicting with those in adata.obs.

  • airr_mod (str (default: 'airr')) – Name of the modality with AIRR information is stored in the MuData object. if an AnnData object is passed to the function, this parameter is ignored.

  • airr_mod_ref (str (default: 'airr')) – Like airr_mod, but for reference.

Return type:

DataFrame

Returns:

DataFrame with matching entries from reference.obs.