vivaldi

vivaldi (Viral Variant Location and Diversity) is an R package for analyzing intrahost viral variation from Illumina sequencing data. The package is built around variant call format (VCF) files and provides tools to:

import and tidy viral variant calls from one or many VCF files
merge technical replicates
filter variants by coverage and allele frequency thresholds
parse SnpEff annotations
summarize shared variants and mutation spectra
quantify diversity using metrics such as Shannon entropy and dN/dS
generate publication-ready visualizations of variant positions, frequencies, and genome-wide patterns

The package was developed for viral diversity analyses such as those described in the final published paper:

Roder AE, Johnson KEE, Knoll M, Khalfan M, Wang B, Schultz-Cherry S, Banakis S, Kreitman A, Mederos C, Wang W, Ruchnewitz D, Samanovic MI, Mulligan MJ, Lassig M, Łuksza M, Das S, Gresham D, Ghedin E. Optimized Quantification of Intrahost Viral Diversity in SARS-CoV-2 and Influenza Virus Sequence Data. mBio. 2023. https://doi.org/10.1128/mbio.01046-23

Installation

Install from CRAN

install.packages("vivaldi")

Install the development version from GitHub

# install.packages("remotes")
remotes::install_github("GreshamLab/vivaldi")

Package dependencies

vivaldi depends on widely used R packages for variant parsing, data wrangling, and visualization:

vcfR for reading VCF files
dplyr, tidyr, and magrittr for data manipulation
ggplot2 and plotly for plotting
seqinr for reading reference FASTA files
glue for string handling

Input data

The main input is a directory of .vcf files. Depending on the workflow, you may also provide:

a reference FASTA file used for alignment
a table of segment or genome sizes
a replicate mapping table for technical replicate merging
SnpEff-annotated VCF files for annotation-aware downstream analyses

The package ships with example data accessible via system.file("extdata", package = "vivaldi"), including:

H1N1.fa - example influenza reference FASTA
SegmentSize.csv - segment size metadata
reps.csv - replicate metadata
vcfs/ - example annotated VCF files

An example processed dataset is also included as example_filtered_SNV_df.

Typical workflow

A common vivaldi analysis looks like this:

Load VCF files with arrange_data()
Merge technical replicates with merge_replicates() when replicate sequencing is available
Filter low-confidence variants with filter_variants()
Expand annotation fields with prepare_annotations()
Add metadata such as segment sizes with add_metadata()
Summarize diversity with functions such as tstv_ratio(), shannon_entropy(), and dNdS_segment()
Visualize results with functions such as af_distribution(), snv_location(), snv_genome(), snv_segment(), plot_shannon(), and shared_snv_plot()

Quick start

library(vivaldi)

vardir <- system.file("extdata", "vcfs", package = "vivaldi")
reference_fasta <- system.file("extdata", "H1N1.fa", package = "vivaldi")
seg_sizes <- system.file("extdata", "SegmentSize.csv", package = "vivaldi")
rep_info <- system.file("extdata", "reps.csv", package = "vivaldi")

vcf_df <- arrange_data(vardir, reference_fasta, annotated = "yes")
replicates <- read.csv(rep_info)
sizes <- read.csv(seg_sizes)

merged_df <- merge_replicates(
  vcf_df,
  replicates,
  "rep1",
  "rep2",
  c("sample", "CHROM", "POS", "REF", "ALT", "ANN", "ALT_TYPE", "major", "minor")
)

filtered_df <- filter_variants(
  merged_df,
  coverage_cutoff = 0,
  frequency_cutoff = 0.01
)

annotated_df <- prepare_annotations(filtered_df)
annotated_df <- add_metadata(annotated_df, sizes, c("CHROM"), c("segment"))
annotated_df <- shannon_entropy(annotated_df, genome_size = 13133)

plot_shannon(annotated_df)

Core functions

Import and preparation

arrange_data() - read VCF files and combine them into a single dataframe
read_reference_fasta_dna() - extract chromosome or segment sizes from a FASTA file
prepare_annotations() - split SnpEff annotations into separate columns
snpeff_info() - parse annotation information from VCF INFO fields
add_metadata() - join external metadata to the variant dataframe

Filtering and replicate handling

filter_variants() - apply coverage and allele frequency thresholds
merge_replicates() - keep shared variants across sequencing replicates and compute summary frequencies

Summary statistics

tally_it() - count variants over user-defined groups
tstv_ratio() - calculate transition/transversion ratios
shannon_entropy() - calculate per-position, per-segment, and genome-wide diversity metrics
dNdS_segment() - estimate dN/dS summaries by coding feature or segment

Visualization and exploration

af_distribution() - plot minor allele frequency distributions
position_allele_freq() - inspect the allele frequencies of a single site across samples
shared_snv_plot() and shared_snv_table() - identify and summarize shared variants
snv_location() - visualize SNV positions across samples and segments
snv_genome() - summarize SNVs across genomes
snv_segment() - summarize SNVs by genome segment
plot_shannon() - visualize Shannon entropy summaries
tstv_plot() - visualize transition/transversion summaries

Documentation

For a longer worked example, see the package vignette:

browseVignettes("vivaldi")

Source documentation and examples are also available in the package help pages:

help(package = "vivaldi")

Testing and development

The repository uses standard R package tooling, including testthat tests under tests/testthat and GitHub Actions R-CMD-check workflows.

Authors

Marissa Knoll, Katherine Johnson, Megan Hockman, Eric Borenstein, Mohammed Khalfan, Elodie Ghedin, and David Gresham

Maintainer: David Gresham dg107@nyu.edu

Citation

If you use vivaldi in published research, please cite the published article rather than the preprint:

Roder AE, Johnson KEE, Knoll M, Khalfan M, Wang B, Schultz-Cherry S, Banakis S, Kreitman A, Mederos C, Wang W, Ruchnewitz D, Samanovic MI, Mulligan MJ, Lassig M, Łuksza M, Das S, Gresham D, Ghedin E. Optimized Quantification of Intrahost Viral Diversity in SARS-CoV-2 and Influenza Virus Sequence Data. mBio. 2023. https://doi.org/10.1128/mbio.01046-23

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
.github		.github
R		R
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
vivaldi.Rproj		vivaldi.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vivaldi

Installation

Install from CRAN

Install the development version from GitHub

Package dependencies

Input data

Typical workflow

Quick start

Core functions

Import and preparation

Filtering and replicate handling

Summary statistics

Visualization and exploration

Documentation

Testing and development

Authors

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vivaldi

Installation

Install from CRAN

Install the development version from GitHub

Package dependencies

Input data

Typical workflow

Quick start

Core functions

Import and preparation

Filtering and replicate handling

Summary statistics

Visualization and exploration

Documentation

Testing and development

Authors

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages