BIG multivariate data Plotted INTeractively.

Getting Started

Welcome to the bigPint package website! If you are a new user, please begin by reading from the Get Started tab at the top of this website. There are ten short vignette articles in that tab, and we recommend reading them in order. These short vignette articles consist of reproducible code that provide:

  • An introduction to bigPint plots and how to interpret them
  • A guide to installing the bigPint package
  • Expected formats of two input objects in most bigPint functions
  • How to produce static bigPint plots
  • How to produce interactive bigPint plots
  • How to perform hierarchical clustering and use the clusters in bigPint functions
  • A recommended RNA-seq visualization pipeline with example code for you to follow

In a nutshell

The bigPint software aims to “Make BIG data pint-sized”. You can easily create modern and effective plots for your large multivariate datasets. These plots allow you to quickly examine the variability between all samples in your dataset, assess the variability between treatment groups versus between replicate groups, check for normalization issues, and discover outliers in your dataset. They also allow you to superimpose a subset of observations onto your full dataset to better understand how data subsets relates to your whole dataset. Both static and interactive plots are available.


RNA-sequencing visualization

The bigPint package can be useful for examining any large multivariate dataset. However, we note that the example datasets and example code in this package consider RNA-sequencing datasets. If you are using this software for RNA-sequencing data, then it can help you confirm that the variability between your treatment groups is larger than that between your replicates and determine how various normalization techniques in popular RNA-sequencing analysis packages (such as edgeR, DESeq2, and limma) affect your dataset. Moreover, you can easily superimpose lists of differentially expressed genes (DEGs) onto your dataset to check that they show the expected patterns (large variability between treatment groups and small variability between replicates).


Motivation

Large multivariate datasets are common across numerous disciplinary fields. The best approach for looking at quantitative multivariate data are scatterplot matrices; parallel coordinate plots; and replicate line plots. Each of these plots enable assessing the association between multiple variables. With effective plotting tools, analysts can improve modeling; they can iterate between visualizations and modeling to enhance the models based on feedback from the visuals.

However, these plots are ineffective with large quantities of data: Overplotting can obscure important structure, and the plots can be slow to render if every observation is mapped to a graphical element. In this package, we developed more useful visualization techniques for large multivariate datasets by incorporating appropriate summaries and using interactivity.