PeopleSeq Ongoing...

PeopleSeq is a longitudinal study of ostensibly healthy adults who plan to, or have already received, their own genomic sequence information. The PeopleSeq study is the collaborative effort of a growing consortium of commercial and research organizations. The PeopleSeq study will collect valuable empirical data on the medical, behavioral and economic impact of performing predispositional personal genome sequencing in ostensibly healthy adults.

PeopleSeq is described in the following publications:

Linderman MD, Nielsen DE, Green RC. Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium. J Pers Med. 2016;6(2):14.

HealthSeq Ongoing...

HealthSeq is a longitudinal cohort study in which unselected ostensibly healthy participants received a variety of health and non-health-related genetic results from whole genome sequencing. The primary aim of HealthSeq is to improve our understanding of participants’ motivations, expectations, concerns and preferences, and the impacts of receiving personal genome sequencing in a pre-dispositional setting.

HealthSeq results are described in the following publications:

Suckiel SA, Linderman MD, Sanderson SC, Diaz GA, Melissa Kasarskia AW, Schadt EE, et al. Impact of genomic counseling on informed decision-making among ostensibly healthy individuals seeking personal genome sequencing: The HealthSeq project. J Genet Couns. 2016;1–10.

Sanderson SC, Linderman MD, Suckiel SA, Diaz GA, Zinberg RE, Ferryman K, et al. Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project. Eur J Hum Genet. 2015 Jun 3.

Genome Analysis Pipeline Ongoing...

My group develops and maintains the Icahn Institute’s Genome Analysis Pipeline (GAP), which is validated for clinical use in New York State and been successfully applied to identify causal mutations in multiple patients. The GAP is one part of a larger genome analysis infrastructure we are developing to integrate electronic health records, bio-repositories, public databases and our own active sequencing program into an evermore comprehensive understanding of genomic medicine.

The GAP is used in multiple small and large research projects, totalling many hundreds of whole genomes and many thousands of whole exomes and targeted panels.

The GAP is described in the following publications:

Linderman MD, Brandt T, Edelmann L, Jabado O, Kasai Y, Kornreich R, Mahajan M, Shah H, Kasarskis A, Schadt EE Analytical Validation of Whole Exome and Whole Genome Sequencing for Clinical Applications. BMC medical genomics. 2014 Apr;7:20.

Large-scale Co-expression Analysis Completed...

Weighted Gene Co-expression Network Analysis (WGCNA) is a methodology for describing the correlation patterns among genes across microarray samples. Analysis of tens of thousands of probes, however, can take hours and requires hundreds of gigabytes of memory, putting this method out of reach for all but a few organizations and applications. Substantial reductions in the execution time and memory footprint are needed. Those reductions will enable to researchers to apply WGCNA in many new contexts.

CytoSPADE Completed...

Recent advances in flow cytometry enable simultaneous single-cell measurement of 30+ surface and intracellular proteins. In a single experiment we can now measure enough markers to identify and compare functional immune activities across nearly all cell types in the human hematopoietic lineage. However, practical approaches to analyze and visualize data at this scale are only now becoming available. SPADE, described in Qiu et al., Nature Biotechnology 2011 and first used in Bendall et al. Science 2011, is a novel algorithm that organizes cells into hierarchies of related phenotypes, or “trees”, that facilitate the visualization of developmental lineages, identification of rare cell types, and comparison of functional markers across stimuli.

CytoSPADE is a robust, modular and performant implementation of Qiu et al.’s SPADE algorithm, including a rich GUI implemented as a plugin for the Cytoscape Network Visualization platform. CytoSPADE is 12-19 fold faster than the SPADE prototype, ensuring that users can run complex analyses on their laptops in just seconds or minutes. More information is available at cytospade.org and on the software page.

SPADE tree annotated with CD34 expression in healthy human bone marrow (PBMC) samples analyzed via mass cytometry. Adapted from Bendall et al. Science 2011.

CytoSPADE is described in the following publications:

Linderman MD, Bjornson Z, Simonds EF, Qiu P, Bruggner R, Sheode K, Meng TH, Plevritis SK, Nolan GP. CytoSPADE: High-Performance Analysis and Visualization of High-Dimensional Cytometry Data. Bioinformatics. 2012;15(18):2400-1.

Qiu P, Simonds EF, Bendall SC, Gibbs KD, Jr., Bruggner RV, Linderman MD, Sachs K, Nolan GP, Plevritis SK. Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nat Biotechnol. 2011;29(10):886-91.

Bayesian Structure Learning Completed...

Aberrant intracellular signaling plays an important role in many diseases. The causal structure of signal transduction networks can be modeled as Bayesian Networks (BNs), and computationally learned from experimental data. However, learning the structure of BNs is an NP-hard problem that, even with fast heuristics, is too time consuming for large, clinically important networks (20-50 nodes). I developed a novel graphics processing unit (GPU)-accelerated implementation of a Monte Carlo Markov Chain-based algorithm for learning BNs that is up to 7.5-fold faster than already heavily optimized general-purpose processor (GPP)-based implementations.

The GPU-based implementation is just one of several variants within the larger application, each optimized for a different input or machine configuration. I concurrently enhanced the Merge framework to enable efficient integration, testing and intelligently selection among the different potential implementations targeting multicore GPPs, GPUs and distributed compute clusters.

The GPU-accelerated Bayesian Network learning implementation is described in the following publication:

Linderman MD, Bruggner R, Athalye V, Meng TH, Asadi NB, Nolan GP. High-throughput Bayesian network learning using heterogeneous multicore computers. Proc ACM Intl Conf on Supercomputing (ICS); 2010. p. 95-104.

The performance of the BN learning application is sensitive to the performance of of an accumulation of log-space probabilities in the inner most loop:

acc += log(1 + exp(x))

For x far from the origin, this computation can be approximated as 0 or the identity function. The choice of those boundaries creates a performance-precision trade-off. I concurrently developed a tool, Gappa++, for analyzing the numerical behavior of this and other computations. Using Gappa++ I as able to improve performance an additional 10-15%. Gappa++ is described in the following publication and on the software page:

Linderman MD, Ho M, Dill DL, Meng TH, Nolan GP. Towards program optimization through automated analysis of numerical precision. Proc IEEE/ACM Intl Symp on Code Generation and Optimization (CGO); 2010. p. 230-7.

Merge Completed...

Computer systems are undergoing significant change: to improve performance and efficiency, architects are exposing more microarchitectural details directly to programmers. Software that exploits specialized accelerators, such as GPUs, and specialized processor features, such as software-controlled memory, exposes limitations in existing compiler and OS infrastructure.

Sketch of the Merge framework

Merge is a programming model for building applications that will tolerate changing hardware. Merge allows programmers to leverage different processor-specific or domain-specific toolchains to create software modules specialized for different hardware configurations, and it provides language mechanisms to enable the automatic mapping of the application to these processor-specific modules. I showed that this approach can be used to manage computing resources in complex heterogeneous processors and to enable aggressive compiler optimizations.

Merge was used extensively if the complex and computationally intensive Bayesian structure learning application described above. Using Merge we were able to deploy a single application binary that could deliver the best possible performance across a range of problem sizes and hardware configurations (including multicore processors, GPUs and clusters vith MPI). For any given problem size and hardware configuration, the Merge-enabled application automatically and dynamically selects the appropriate implementation (based on predicates supplied by original implementors).

Merge is described in the following publications:

Linderman MD, Balfour J, Meng TH, Dally WJ. Embracing heterogeneity: parallel programming for changing hardware. Proc USENIX Conf on Hot Topics in Parallelism (HOTPAR); 2009.

Linderman MD, Collins JD, Wang H, Meng TH. Merge: a programming model for heterogeneous multi-core systems. Proc Intl Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS); 2008. p. 287-96.