scirpy.datasets.stephenson2021_5k

scirpy.datasets.stephenson2021_5k#

scirpy.datasets.stephenson2021_5k()#

Return the dataset from [SRB+21] as MuData object, downsampled to 5000 BCR-containing cells.

The original study sequenced 1,141,860 cells from 143 PBMC samples collected from patients with different severity of COVID-19 and control groups. Gene expression, TCR-enriched and BCR-enriched libraries were prepared for each sample according to 10x Genomics protocol and NovaSeq 6000 was used for sequencing.

A preprocessed dataset for the transciptome library was obtained from Array Express A preprocessed dataset for the BCR-enriched library was obtained from clatworthylab’s GitHub Both dataset have already passed quality control and all cells that didn’t express BCR were discarded.

To speed up computation time, we solely included 5 samples from each of the COVID-19-positive groups and randomly subsampled down to a total of 5k cells.

Return type:

MuData