scirpy.io.read_airr#
- scirpy.io.read_airr(path, use_umi_count_col='auto', infer_locus=True, cell_attributes='is_cell', include_fields=None, **kwargs)#
Read data from AIRR rearrangement format.
Even though data without these fields can be imported, the following columns are required by scirpy for a meaningful analysis:
cell_id
productive
locus
containing a valid IMGT locus nameat least one of
consensus_count
,duplicate_count
, orumi_count
at least one of
junction_aa
orjunction
.
Note
Since scirpy v0.13, there are no restrictions on the AIRR data that can be stored in the scirpy data structure, except that each receptor chain needs to be associated with a cell.
The scirpy Immune receptor (IR) model is now applied in later step using the
index_chains()
function.For more information, see Storing AIRR rearrangement data in AnnData.
- Parameters:
path (
Union
[str
,Sequence
[str
],Path
,Sequence
[Path
],DataFrame
,Sequence
[DataFrame
]]) – Path to the AIRR rearrangement tsv file. If different chains are split up into multiple files, these can be specified as a List, e.g.["path/to/tcr_alpha.tsv", "path/to/tcr_beta.tsv"]
. Alternatively, this can be a pandas data frame.use_umi_count_col (
Union
[bool
,Literal
['auto'
]] (default:'auto'
)) – Whether to add UMI counts from the non-strandard (but common)umi_count
column. When this column is used, the UMI counts are moved over to the standardduplicate_count
column. Default: Useumi_count
if there is noduplicate_count
column present.infer_locus (
bool
(default:True
)) – Try to infer thelocus
column from gene names, in case it is not specified.cell_attributes (
Collection
[str
] (default:'is_cell'
)) – Fields in the rearrangement schema that are specific for a cell rather than a chain. The values must be identical over all records belonging to a cell. This defaults to("i","s","_","c","e","l","l")
.include_fields (
Optional
[Any
] (default:None
)) – Deprecated. Does not have any effect as of v0.13.**kwargs – are passed to
from_airr_cells()
.
- Return type:
- Returns:
AnnData object with AIRR data in
obsm["airr"]
for each cell. For more details see Storing AIRR rearrangement data in AnnData..