BIG multivariate data Plotted INTeractively.
Welcome to the bigPint
package website! For users who would like to immediately try out the package in a hands-on fashion, we recommend consulting our example pipeline. This pipeline uses reproducible code and sample data that comes with the bigPint
package, so you can smoothly follow along each line of example code. Please note that bigPint
allows users to input either data
and dataMetrics
objects or SummarizedExperiment
objects to produce graphics. In each article (including the pipeline), example code is first provided for data
and dataMetrics
object formats, followed by the same example code in SummarizedExperiment
object format.
Whether or not you already tried the example pipeline above, you can become more familiar with all aspects of the bigPint
package by reading from the Get Started
tab at the top of this website. It contains nine short vignette articles that we recommend reading in order. These short articles consist of reproducible code and sample data that provide:
bigPint
plots and how to interpret thembigPint
packagebigPint
plots (i.e. data
and dataMetrics
or SummarizedExperiment
)bigPint
plotsbigPint
plotsbigPint
functionsThe bigPint
software aims to “Make BIG data pint-sized”. You can easily create modern and effective plots for your large multivariate datasets. These plots allow you to quickly examine the variability between all samples in your dataset, assess the variability between treatment groups versus between replicate groups, check for normalization issues, and discover outliers in your dataset. They also allow you to superimpose a subset of observations onto your full dataset to better understand how data subsets relates to your whole dataset. Both static and interactive plots are available.
The bigPint
package can be useful for examining any large multivariate dataset. However, we note that the example datasets and example code in this package consider RNA-sequencing datasets. If you are using this software for RNA-sequencing data, then it can help you confirm that the variability between your treatment groups is larger than that between your replicates and determine how various normalization techniques in popular RNA-sequencing analysis packages (such as edgeR, DESeq2, and limma) affect your dataset. Moreover, you can easily superimpose lists of differentially expressed genes (DEGs) onto your dataset to check that they show the expected patterns (large variability between treatment groups and small variability between replicates).
Large multivariate datasets are common across numerous disciplinary fields. The best approach for looking at quantitative multivariate data are scatterplot matrices; parallel coordinate plots; and replicate line plots. Each of these plots enable assessing the association between multiple variables. With effective plotting tools, analysts can improve modeling; they can iterate between visualizations and modeling to enhance the models based on feedback from the visuals.
However, these plots are ineffective with large quantities of data: Overplotting can obscure important structure, and the plots can be slow to render if every observation is mapped to a graphical element. In this package, we developed more useful visualization techniques for large multivariate datasets by incorporating appropriate summaries and using interactivity.