scirpy.datasets.wu2020_3k#
- scirpy.datasets.wu2020_3k()#
Return the dataset from [WMdA+20] as AnnData object, downsampled to 3000 TCR-containing cells. :rtype:
MuData
Note
Scirpy example datasets are managed through Pooch.
By default, the dataset will be downloaded into your operating system’s default cache directory (See
pooch.os_cache()
for more details). If it has already been downloaded, it will be retrieved from the cache.You can override the default cache dir by setting the
SCIRPY_DATA_DIR
environment variable to a path of your preference.This is how the dataset was processed:
# --- # jupyter: # jupytext: # cell_metadata_filter: -all # notebook_metadata_filter: -kernelspec # text_representation: # extension: .py # format_name: light # format_version: '1.5' # jupytext_version: 1.14.4 # --- import muon as mu import pandas as pd # Use this list of 3k barcodes for consistency with previous versions barcodes = pd.read_csv("./3k_barcodes.csv", header=None)[0].values barcodes = pd.Series(barcodes).str.replace("-\\d+$", "", regex=True).values mdata = mu.read_h5mu("wu2020.h5mu") assert mdata.obs_names.is_unique mdata = mdata[barcodes, :].copy() mdata mdata.write_h5mu("wu2020_3k.h5mu", compression="lzf")