Skip to contents

This function implements all the analysis steps for perfoming QC. These include: 1. reading all sample information from metadata object/file and generating one Seurat object per sample. 2. Performs SoupX (ambient RNA removal) and Scrublet (doublet detection) if user defines the corresponding parameters. 3. Filter Seurat object according to QC criteria 4. Generate correspond QC plots.

Usage

run_qc_pipeline(
  data_dir,
  sample_meta,
  sample_meta_filename = NULL,
  nfeat_thresh = 500,
  mito_thresh = 5,
  meta_colnames = c("donor", "condition", "pass_qc"),
  out_dir = NULL,
  qc_to_plot = c("nFeature_RNA", "nCount_RNA", "percent.mito"),
  use_scrublet = TRUE,
  use_soupx = FALSE,
  tenx_dir = "premrna_outs",
  tenx_counts_dir = "filtered_feature_bc_matrix",
  obj_filename = "seu_qc",
  expected_doublet_rate = 0.06,
  force_reanalysis = TRUE,
  min.cells = 10,
  min.features = 100,
  ...
)

Arguments

data_dir

Parent directory where all sample 10x files are stored. Think of it as project directory.

sample_meta

Sample metadata information in a Data.frame like object. Columns should at least contain 'sample', 'donor', 'condition' and 'pass_qc'.

sample_meta_filename

Filename of sample metadata information, same as 'meta' parameter above. User should provide one of 'meta' or 'meta_filename'.

nfeat_thresh

Filter cells that have less than 'nfeat_thresh' counts expressed.

mito_thresh

Filter cells with more than 'mito_thresh'% counts.

meta_colnames

Sample metadata column names to store in Seurat metadata.

out_dir

Output directory for storing analysis results.

qc_to_plot

Vector of features in metadata to plot.

use_scrublet

Logical, wether to use Scrublet for doublet detection.

use_soupx

Logical, wether to use SoupX for ambient RNA removal.

tenx_dir

Name of 10x base directory, e.g. with outputs after running cellranger. Default 'premrna_outs', i.e. assumes single-nuclei RNA-seq.

tenx_counts_dir

Name of 10x directory where count matrices are stored. Default 'filtered_feature_bc_matrix'

obj_filename

Filename of the stored Seurat object, default 'seu_qc'.

expected_doublet_rate

The expected fraction of transcriptomes that are doublets, typically 0.05 - 0.1

force_reanalysis

Logical, if intermediate file 'seu_preqc.rds' (with created Seurat object) exists and force_reanalysis = FALSE, read object instead of re-running whole analysis with soupX. Added for computing time efficiency purposes and intermediate object will be created only when 'use_soupx = TRUE'.

min.cells

Include features/genes detected in at least this many cells.

min.features

Include cells where at least this many features/genes are detected.

...

Additional named parameters passed to Seurat, Scrublet or SoupX.

Value

List of Seurat objects as the length of the number of samples in the sample metadata file. If a single sample, return a Seurat object instead of a list.

Author

C.A.Kapourani C.A.Kapourani@ed.ac.uk