BIG multivariate data Plotted INTeractively.
Welcome to the
bigPint package website! For users who would like to immediately try out the package in a hands-on fashion, we recommend consulting our example pipeline. This pipeline uses reproducible code and sample data that comes with the
bigPint package, so you can smoothly follow along each line of example code. Please note that
bigPint allows users to input either
dataMetrics objects or
SummarizedExperiment objects to produce graphics. In each article (including the pipeline), example code is first provided for
dataMetrics object formats, followed by the same example code in
SummarizedExperiment object format.
Whether or not you already tried the example pipeline above, you can become more familiar with all aspects of the
bigPint package by reading from the
Get Started tab at the top of this website. It contains nine short vignette articles that we recommend reading in order. These short articles consist of reproducible code and sample data that provide:
bigPintplots and how to interpret them
bigPint software aims to “Make BIG data pint-sized”. You can easily create modern and effective plots for your large multivariate datasets. These plots allow you to quickly examine the variability between all samples in your dataset, assess the variability between treatment groups versus between replicate groups, check for normalization issues, and discover outliers in your dataset. They also allow you to superimpose a subset of observations onto your full dataset to better understand how data subsets relates to your whole dataset. Both static and interactive plots are available.
bigPint package can be useful for examining any large multivariate dataset. However, we note that the example datasets and example code in this package consider RNA-sequencing datasets. If you are using this software for RNA-sequencing data, then it can help you confirm that the variability between your treatment groups is larger than that between your replicates and determine how various normalization techniques in popular RNA-sequencing analysis packages (such as edgeR, DESeq2, and limma) affect your dataset. Moreover, you can easily superimpose lists of differentially expressed genes (DEGs) onto your dataset to check that they show the expected patterns (large variability between treatment groups and small variability between replicates).
Large multivariate datasets are common across numerous disciplinary fields. The best approach for looking at quantitative multivariate data are scatterplot matrices; parallel coordinate plots; and replicate line plots. Each of these plots enable assessing the association between multiple variables. With effective plotting tools, analysts can improve modeling; they can iterate between visualizations and modeling to enhance the models based on feedback from the visuals.
However, these plots are ineffective with large quantities of data: Overplotting can obscure important structure, and the plots can be slow to render if every observation is mapped to a graphical element. In this package, we developed more useful visualization techniques for large multivariate datasets by incorporating appropriate summaries and using interactivity.