Divide batches of variants into a CompoundVariantSet
Source:R/compound_variants.R
define_compound_variants.Rd
A CompoundVariantSet is a collection of "compound variants". A compound variant is an arbitrary group of variants that have sequencing coverage across some set of samples. (Any of these samples with one or more of the constituent SNVs "has the compound variant"--samples with coverage at only some of the sites are not considered.) The compound variants within a CompoundVariantSet are always disjoint: that is, no individual variant appears in more than one of the compound variants. After collecting variants of interest into a table using select_variants()--and further subsetting or annotating the table as desired--use this function to produce a CompoundVariantSet that combines variants into distinct compound variants based on your criteria.
Arguments
- cesa
CESAnalysis
- variant_table
Data table of variants, in the style generated by select_variants().
- by
One or more column names to use for initial splitting of the input table into variant groups. Each distinct group will then be further divided into compound variants based on
merge_distance
- merge_distance
maximum genomic distance between a given variant and the nearest variant in compound variant for the variant to variant to be merged into the compound variant (as opposed to being assigned to its own compound variant).
Details
This function works first by splitting the input table by the columns given in
by
. For example, splitting on "gene" will split the table into gene-specific
subtables. Then, each subtable is divided into compound variants based on
merge_distance
. All variants in each subtable within the specified genomic
distance of each other will be merged into a candidate compound variant, and then
compound variants will be repeatedly merged until the nearest two variants in each pair
of compound variants are not within merge_distance
. Note that overlapping
variants will always be merged unless you use by
to separate them into different
subtables (for example, by splitting on alt or aa_alt). If you use by
to split
variants by some functional annotation, you can set merge_distance
very high to
merge all same-chromosome sites (e.g., 1e9 on human genome). To merge sites across chromosomes,
set merge_distance = Inf
.