Skip to contents

Use this function to create and save a directory of custom reference data that can be used with cancereffectsizeR instead of supplied refsets like ces.refset.hg19. All arguments are required except default_exome/exome_interval_padding, which are recommended.

Usage

create_refset(
  output_dir,
  refcds_output,
  species_name,
  genome_build_name,
  BSgenome_name,
  supported_chr = c(1:22, "X", "Y"),
  default_exome = NULL,
  exome_interval_padding = 0
)

Arguments

output_dir

Name/path of an existing, writable output directory where all data will be saved. The name of this directory will serve as the name of the custom refset.

refcds_output

Transcript information in the two-item list (consisting of RefCDS and gr_genes) that is output by build_RefCDS.

species_name

Name of the species, primarily for display (e.g., "human").

genome_build_name

Name of the genome build, primarily for display (e.g., "hg19").

BSgenome_name

The name of the BSgenome package to use (e.g., "hg19"); will used by cancereffectsizeR to load the reference genome via BSgenome::getBSgenome().

supported_chr

Character vector of supported chromosomes. Note that cancereffectsizeR uses NCBI-style chromosome names, which means no chr prefixes ("X", not "chrX"). Mitochondrial contigs shouldn't be included since they would require special handling that hasn't been implemented.

default_exome

A BED file or GRanges object that defines coding regions in the genome as might be used by an exome capture kit. This file (or GRanges) might be acquired or generated from exome capture kit documentation, or alternatively, coding regions defined in a GTF file (or the granges output by build_RefCDS()).

exome_interval_padding

Number of bases to pad start/end of each covered interval, to allow for some variants to be called just outside of targeted regions, where there still may be pretty good sequencing coverage.

Details

To run this function, you'll need to have output from build_RefCDS().