scirpy.io.read_airr

Contents

scirpy.io.read_airr#

scirpy.io.read_airr(path, use_umi_count_col=None, infer_locus=True, cell_attributes='is_cell', include_fields=None, **kwargs)#

Read data from AIRR rearrangement format.

Even though data without these fields can be imported, the following columns are required by scirpy for a meaningful analysis:

  • cell_id

  • productive

  • locus containing a valid IMGT locus name

  • at least one of consensus_count, duplicate_count, or umi_count

  • at least one of junction_aa or junction.

Note

Since scirpy v0.13, there are no restrictions on the AIRR data that can be stored in the scirpy data structure, except that each receptor chain needs to be associated with a cell.

The scirpy Immune receptor (IR) model is now applied in later step using the index_chains() function.

For more information, see Storing AIRR rearrangement data in AnnData.

Parameters:
  • path (Union[str, Sequence[str], Path, Sequence[Path], DataFrame, Sequence[DataFrame]]) – Path to the AIRR rearrangement tsv file. If different chains are split up into multiple files, these can be specified as a List, e.g. ["path/to/tcr_alpha.tsv", "path/to/tcr_beta.tsv"]. Alternatively, this can be a pandas data frame.

  • use_umi_count_col (None (default: None)) – Deprecated, has no effect as of v0.16. Since v1.4 of the AIRR standard, umi_count is an official field in the Rearrangement schema and preferred over duplicate_count. umi_count now always takes precedence over duplicate_count.

  • infer_locus (bool (default: True)) – Try to infer the locus column from gene names, in case it is not specified.

  • cell_attributes (Collection[str] (default: 'is_cell')) – Fields in the rearrangement schema that are specific for a cell rather than a chain. The values must be identical over all records belonging to a cell. This defaults to ("i","s","_","c","e","l","l").

  • include_fields (Optional[Any] (default: None)) – Deprecated. Does not have any effect as of v0.13.

  • **kwargs – are passed to from_airr_cells().

Return type:

AnnData

Returns:

AnnData object with AIRR data in obsm["airr"] for each cell. For more details see Storing AIRR rearrangement data in AnnData..