scirpy.io.read_airr

Contents

scirpy.io.read_airr#

scirpy.io.read_airr(path, use_umi_count_col='auto', infer_locus=True, cell_attributes='is_cell', include_fields=None, **kwargs)#

Read data from AIRR rearrangement format.

Even though data without these fields can be imported, the following columns are required by scirpy for a meaningful analysis:

  • cell_id

  • productive

  • locus containing a valid IMGT locus name

  • at least one of consensus_count, duplicate_count, or umi_count

  • at least one of junction_aa or junction.

Note

Since scirpy v0.13, there are no restrictions on the AIRR data that can be stored in the scirpy data structure, except that each receptor chain needs to be associated with a cell.

The scirpy Immune receptor (IR) model is now applied in later step using the index_chains() function.

For more information, see Storing AIRR rearrangement data in AnnData.

Parameters:
  • path (Union[str, Sequence[str], Path, Sequence[Path], DataFrame, Sequence[DataFrame]]) – Path to the AIRR rearrangement tsv file. If different chains are split up into multiple files, these can be specified as a List, e.g. ["path/to/tcr_alpha.tsv", "path/to/tcr_beta.tsv"]. Alternatively, this can be a pandas data frame.

  • use_umi_count_col (Union[bool, Literal['auto']] (default: 'auto')) – Whether to add UMI counts from the non-strandard (but common) umi_count column. When this column is used, the UMI counts are moved over to the standard duplicate_count column. Default: Use umi_count if there is no duplicate_count column present.

  • infer_locus (bool (default: True)) – Try to infer the locus column from gene names, in case it is not specified.

  • cell_attributes (Collection[str] (default: 'is_cell')) – Fields in the rearrangement schema that are specific for a cell rather than a chain. The values must be identical over all records belonging to a cell. This defaults to ("i","s","_","c","e","l","l").

  • include_fields (Optional[Any] (default: None)) – Deprecated. Does not have any effect as of v0.13.

  • **kwargs – are passed to from_airr_cells().

Return type:

AnnData

Returns:

AnnData object with AIRR data in obsm["airr"] for each cell. For more details see Storing AIRR rearrangement data in AnnData..