QC pipeline
run_qc_pipeline.Rd
This function implements all the analysis steps for perfoming QC. These include: 1. reading all sample information from metadata object/file and generating one Seurat object per sample. 2. Performs SoupX (ambient RNA removal) and Scrublet (doublet detection) if user defines the corresponding parameters. 3. Filter Seurat object according to QC criteria 4. Generate correspond QC plots.
Usage
run_qc_pipeline(
data_dir,
sample_meta,
sample_meta_filename = NULL,
nfeat_thresh = 500,
mito_thresh = 5,
meta_colnames = c("donor", "condition", "pass_qc"),
out_dir = NULL,
qc_to_plot = c("nFeature_RNA", "nCount_RNA", "percent.mito"),
use_scrublet = TRUE,
use_soupx = FALSE,
tenx_dir = "premrna_outs",
tenx_counts_dir = "filtered_feature_bc_matrix",
obj_filename = "seu_qc",
expected_doublet_rate = 0.06,
force_reanalysis = TRUE,
min.cells = 10,
min.features = 100,
...
)
Arguments
- data_dir
Parent directory where all sample 10x files are stored. Think of it as project directory.
- sample_meta
Sample metadata information in a Data.frame like object. Columns should at least contain 'sample', 'donor', 'condition' and 'pass_qc'.
- sample_meta_filename
Filename of sample metadata information, same as 'meta' parameter above. User should provide one of 'meta' or 'meta_filename'.
- nfeat_thresh
Filter cells that have less than 'nfeat_thresh' counts expressed.
- mito_thresh
Filter cells with more than 'mito_thresh'% counts.
- meta_colnames
Sample metadata column names to store in Seurat metadata.
- out_dir
Output directory for storing analysis results.
- qc_to_plot
Vector of features in metadata to plot.
- use_scrublet
Logical, wether to use Scrublet for doublet detection.
- use_soupx
Logical, wether to use SoupX for ambient RNA removal.
- tenx_dir
Name of 10x base directory, e.g. with outputs after running cellranger. Default 'premrna_outs', i.e. assumes single-nuclei RNA-seq.
- tenx_counts_dir
Name of 10x directory where count matrices are stored. Default 'filtered_feature_bc_matrix'
- obj_filename
Filename of the stored Seurat object, default 'seu_qc'.
- expected_doublet_rate
The expected fraction of transcriptomes that are doublets, typically 0.05 - 0.1
- force_reanalysis
Logical, if intermediate file 'seu_preqc.rds' (with created Seurat object) exists and force_reanalysis = FALSE, read object instead of re-running whole analysis with soupX. Added for computing time efficiency purposes and intermediate object will be created only when 'use_soupx = TRUE'.
- min.cells
Include features/genes detected in at least this many cells.
- min.features
Include cells where at least this many features/genes are detected.
- ...
Additional named parameters passed to Seurat, Scrublet or SoupX.
Value
List of Seurat objects as the length of the number of samples in the sample metadata file. If a single sample, return a Seurat object instead of a list.
Author
C.A.Kapourani C.A.Kapourani@ed.ac.uk